# File I/O

In all of the exercises you have completed thus far, any data that you need has been provided to you directly as Python objects e.g.

```python
concentration_data = [0.001, 0.002, 0.004, 0.006, 0.008, 0.010, 0.020]
```

This is not really representative of a typical data anlysis workflow. Most of the time, if you want to analyse some data, you will already have collected that data and stored it in some kind of **file**, perhaps a `csv` for example. It seems prudent therefore to learn how we can **read** files into Python and **write** files back out of Python: **file I/O** (input/output).

## The general case: reading in files

The most general way to read data from a file in Python is to use the built-in `open` function. Let's look at a simple example: reading in a file that contains some simple text. We're going to look at [example.txt](), which looks like this:

```none
EXAMPLE TEXT FILE

Here is some text.
Here is some more text.

Here is even more text.
```

Here's how we can read in this file in Python:

In [1]:
with open('example.txt', 'r') as stream:
    lines = stream.readlines()

for line in lines:
    print(line, end='')

EXAMPLE TEXT FILE

Here is some text.
Here is some more text.

Here is even more text.


To try this example yourself ...

Let's take this example in sections. First up, we have the `open` function:

```python
with open('example.txt', 'r') as stream:
```

Here we provide two arguments: the path to the file we want to open (`example.txt`) and what we would like to do with that file (`'r'` for **read**).

As you will have already have noticed, we have also used a new keyword: `with`. Here we are doing something quite similar to `import` statements such as:

```python
import numpy as np
```

We are effectively "nicknaming" the output of the `open` function and calling it `stream` instead. You could of course call it something else instead, here we use `stream` as shorthand for a **file stream**: a stream of data read from a file. 

We end the first line with a colon `:` much like function definitions and loops, after which we **indent** all of the code which needs to access `stream`. 

```python
    lines = stream.readlines()
```

The next line actually manipulates the content of the file. The `open` function returns an object which contains all of the data associated with the file, but not necessarily in human-readable or immediately useful way. By calling the `readlines` method, we store a `list` containing each line of the file. The remaining code simply prints these lines for our inspection:

```python
for line in lines:
    print(line, end='')
```

We have used the `end` keyword argument here just to prevent the `print` function from adding needless whitespace to the output (by default each line passed to `print` will be followed by a newline).

## Writing files

Now that we've read in our simple text file, let's make some modifications to it and write it back out again.

We now have the contents of the text file available to us in the form of a `list` of strings:

In [2]:
lines

['EXAMPLE TEXT FILE\n',
 '\n',
 'Here is some text.\n',
 'Here is some more text.\n',
 '\n',
 'Here is even more text.\n']

Note that each `\n` is a **newline character** which will actually **become** a newline when passed to the `print` function:

In [3]:
print('Line 1\nLine 2')

Line 1
Line 2


Let's make some changes to `lines`, starting by removing the last two:

In [4]:
lines = lines[:-2]

lines

['EXAMPLE TEXT FILE\n',
 '\n',
 'Here is some text.\n',
 'Here is some more text.\n']

Now let's add a new line:

In [10]:
lines.append('This new text was added in Python!')

lines

['EXAMPLE TEXT FILE\n',
 '\n',
 'Here is some text.\n',
 'Here is some more text.\n',
 'This new text was added in Python!']

And finally, to **write** our `lines` to a new **file**:

In [12]:
with open('modified_example.txt', 'w') as stream:
    for line in lines:
        stream.write(line)

If you followed along with this entire example, you should see that a new file `modified_example.txt` has now been created in the same directory as your Jupyter notebook - take a look.

As you can see in the code above, writing a file in Python looks much like reading a file: we use the `open` function for both use cases. The difference is that here we specify that we want to **write** a file by passing `'w'` as the second argument, and we use the `write` method rather than the `readlines` method. The `write` method takes a single string as an argument, this is why we have used a `for` loop to write each individual line to a file. We could also have combined all of the lines into a single string beforehand and then passed this to the `write` method, either way will work just fine.

## Reading in scientific data with `numpy`

What we have just been through is the most general case: how to read in **any** file, regardless of what type of data is contained within. For our purposes, we are primarily intereseted in scientific data, numbers that have been collected during some series of experiments. There are many Python packages that can be used to read such data, here we're going to rely on one that we've already been introduced to: `numpy`.

Time for another example file, this time some experimental kinetic data looking at the concentration dependence of the rate of a reaction:

```none
# Concentration / M | Rate
0.001 0.005
0.050 0.270
0.100 0.530
0.150 0.790
0.200 1.060
0.250 1.320
0.300 1.580
0.350 1.850
0.400 2.110
0.450 2.370
0.500 2.630
0.550 2.900
0.600 3.160
0.650 3.420
0.700 3.690
0.750 3.950
0.800 4.210
0.850 4.470
0.900 4.740
0.950 4.740
1.000 5.000
```

Follow this [link]() to download this file.

This data is formatted as simple text in a table of sorts, with values on each line being separated by whitespace. Simple tabular data can be read from files like this using the `loadtxt` function:

In [5]:
import numpy as np

data = np.loadtxt('kinetic_data.dat')

print(data)
type(data)

[[1.00e-03 5.00e-03]
 [5.00e-02 2.70e-01]
 [1.00e-01 5.30e-01]
 [1.50e-01 7.90e-01]
 [2.00e-01 1.06e+00]
 [2.50e-01 1.32e+00]
 [3.00e-01 1.58e+00]
 [3.50e-01 1.85e+00]
 [4.00e-01 2.11e+00]
 [4.50e-01 2.37e+00]
 [5.00e-01 2.63e+00]
 [5.50e-01 2.90e+00]
 [6.00e-01 3.16e+00]
 [6.50e-01 3.42e+00]
 [7.00e-01 3.69e+00]
 [7.50e-01 3.95e+00]
 [8.00e-01 4.21e+00]
 [8.50e-01 4.47e+00]
 [9.00e-01 4.74e+00]
 [9.50e-01 4.74e+00]
 [1.00e+00 5.00e+00]]


numpy.ndarray

We end up with a `numpy` **array** containing all of the data in the file. We can tell just from counting square brackets that this array is **two-dimensional**, in other words it's an array of arrays: a matrix.

In [6]:
data[0]

array([0.001, 0.005])

As you can see from the code above, the first row of `data` is `[0.001, 0.005]` which is indeed the first row of data in the original file. This makes sense, but in all liklihood what we actually want is all of the concentration values in one array, and all of the rate data in another array. We can achieve this by **transposing** the array:

In [7]:
transposed_data = data.T

print(transposed_data)

[[1.00e-03 5.00e-02 1.00e-01 1.50e-01 2.00e-01 2.50e-01 3.00e-01 3.50e-01
  4.00e-01 4.50e-01 5.00e-01 5.50e-01 6.00e-01 6.50e-01 7.00e-01 7.50e-01
  8.00e-01 8.50e-01 9.00e-01 9.50e-01 1.00e+00]
 [5.00e-03 2.70e-01 5.30e-01 7.90e-01 1.06e+00 1.32e+00 1.58e+00 1.85e+00
  2.11e+00 2.37e+00 2.63e+00 2.90e+00 3.16e+00 3.42e+00 3.69e+00 3.95e+00
  4.21e+00 4.47e+00 4.74e+00 4.74e+00 5.00e+00]]


Now we have an array that contains all the same data as before, but the **rows** are now **columns** and vice versa. This allows us to very easily assign the concentration and rate values to separate variables:

In [8]:
concentration, rate = transposed_data

print(concentration)
print(rate)

[0.001 0.05  0.1   0.15  0.2   0.25  0.3   0.35  0.4   0.45  0.5   0.55
 0.6   0.65  0.7   0.75  0.8   0.85  0.9   0.95  1.   ]
[0.005 0.27  0.53  0.79  1.06  1.32  1.58  1.85  2.11  2.37  2.63  2.9
 3.16  3.42  3.69  3.95  4.21  4.47  4.74  4.74  5.   ]


We could also achieve this by changing how we originally read in the file:

In [9]:
concentration, rate = np.loadtxt('kinetic_data.dat', unpack=True)

print(concentration)
print(rate)

[0.001 0.05  0.1   0.15  0.2   0.25  0.3   0.35  0.4   0.45  0.5   0.55
 0.6   0.65  0.7   0.75  0.8   0.85  0.9   0.95  1.   ]
[0.005 0.27  0.53  0.79  1.06  1.32  1.58  1.85  2.11  2.37  2.63  2.9
 3.16  3.42  3.69  3.95  4.21  4.47  4.74  4.74  5.   ]


Here we have added the `unpack` keyword argument, which automatically transposes the output array for us, we are then using **multiple assignment** to immediately split the data into separate variables.