# Reading space-separated files

In one of the radiation lecture, you are asked to read files in `.txt` format in which the header contains metadata, followed by the data in columns, separated by spaces. The file looks like [this](https://github.com/fmaussion/intro_to_programming/blob/master/book/cookbook/SCIA_GSFC_NO2.txt).

Let's read it with pandas.

## Read the metadata

This has to be done in pure python, line by line: 

In [None]:
df_meta = {}
with open('SCIA_GSFC_NO2.txt') as file:
    for i, line in enumerate(file):
        line = line.rstrip()
        print(line)
        if line.startswith('Column'):
            k, v = line.split(':')
            df_meta[k.strip()] = v.strip()
        # Stop after 30 lines
        if i > 30:
            break

In [None]:
df_meta

I think that at this point it would be best to rename the columns to better variable names. Also, the exact line at which the data starts could also be inferred programmatically. This exercise is left to the reader.

## Read the data

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('SCIA_GSFC_NO2.txt',
                 header=None,  # There is no proper header in the file
                 sep=' ',  # The separator is spaces
                 skiprows=25,  # The first rows are not organized (25 could be fetched automatically)
                 index_col=0,  # The first column is the time index
                 parse_dates=True,  # Parse the time automatically
                )

In [None]:
# Give "nicer" names to columns
df.index.name = 'Time (UTC)'
df.columns = list(df_meta.keys())[1:]

In [None]:
df

In [None]:
df['Column 2'].plot();