# Reading data from files

Sometimes we need to read data from files. In general, these will be text files or binary files. Text files are easy to read, binary files are not.

Let's start with reading some tops from a file.

<div class="alert alert-success">
<b>Exercise</b>:
<ul>
<li>Write a `for` loop to read the lines of the file one by one, adding key: value pairs to a dictionary as you go.</li>
<li><a title="You will need to skip the loop over lines that look like comments. Use str.split() to break the line at a comma, and `float()` to convert strings to numbers.">**Hints**</a></li>
</ul>
</div>

In [None]:
tops = {}
for line in data:

    # Your code here!


Add this dictionary to `utils.py` by typing `tops = `, followed by this dict.

## Intro to Python students: stop here for now

----

## Read using NumPy

We can use `np.loadtxt()` for numeric files.

In [None]:
import numpy as np
np.loadtxt('../data/L-30_tops.txt', skiprows=1, usecols=[1], delimiter=',')

Or there's [`np.genfromtxt()`](https://docs.scipy.org/doc/numpy-1.13.0/user/basics.io.genfromtxt.html), which copes better with missing values &mdash; try running it on `'../data/B-41_tops.txt'`.

In [None]:
np.genfromtxt('../data/L-30_tops.txt', skip_header=1, delimiter=',')

Both functions have a useful keyword argument, `unpack`, which you should set to `True` to get the columns back as separate vectors.

Note that both functions can read GZIP files too.

## `csv` built-in module

In [None]:
import csv

with open('../data/L-30_tops.csv') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

In [None]:
import csv

with open('../data/L-30_tops.csv') as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row['Formation name'], row['Depth [m]'])

## Read file using pandas

In [None]:
import pandas as pd

df = pd.read_csv('../data/L-30_tops.csv')

In [None]:
df

In [None]:
import pandas as pd

df = pd.read_csv('../data/L-30_tops.txt', skiprows=1, names=['Formation', 'Depth'])

In [None]:
df

In [None]:
df['Formation'] = df['Formation'].str.title()
df.head()

In [None]:
df.to_csv('../data/L-30_tops_improved.csv')

<div class="alert alert-success">
<b>Exercise</b>:
<ul>
<li>- Read the data from B-41_tops.txt</li>
<li>- Write a function that will load data from either of these files</li>
<li>- Load the data to pandas</li>
<li>- Write a new CSV files with the cleaned data</li>
</ul>
</div>