# Working with tables

## Object creation

First, we are going to import the required packages:
* pandas: library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language
* numpy: NumPy is the fundamental package for scientific computing with Python
* matplotlib: 2D plotting library which produces publication quality
* os: required to manipulate path and filenames

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os

Creating a `DataFrame` by passing a NumPy array

In [None]:
df = pd.DataFrame(np.random.randn(12, 4), columns=list('ABCD'))
df

In [None]:
df.B

In [None]:
df['B']

## Viewing data

Show the table head:

In [None]:
df.head()

The describe() method shows a quick statistic summary of your data:

In [None]:
df.describe()

## Creating a DataFrame from an input file: FALL3D example

In the folder `pts_files` you will find a set of FALL3D output files.  
containing information about evolution of the variables at some tracked 
points.

Such information is printed as a single output file for each point 
specified in the `inputfilename.pts`.

These files are generated only when the record `TRACKPOINTS` in the 
input `filename.inp` is set to `YES`. 

Look at the file `etna-2015.CATANIA.res` and open it using the `read_csv` method

In [None]:
path     = "2-Tables"
fname    = "etna-2015.CATANIA.res"
pts_file = os.path.join(path,fname)

In [None]:
mydateparser = lambda x: pd.datetime.strptime(x, "%d%b%Y_%H:%M")
columns      = ['load',
                'total',
                'pm05',
                'pm10',
                'pm20',                
               ]

df = pd.read_csv(pts_file, 
                 skiprows         = 7, 
                 delim_whitespace = True, 
                 names            = columns, 
                 date_parser      = mydateparser,
                )

In [None]:
df.head()

In [None]:
df.plot(y='load')
df.plot(y='total')
df.plot(y=['pm05','pm10','pm20'])

In [None]:
plt.show()

## Selecting data

You can select via the position of the passed integers:

In [None]:
df.iloc[0:3]

Use `iloc` for purely integer-location based indexing for selection by position

In [None]:
df.index

Additionally, you can get a section using a label:

In [None]:
df.loc['2015-12-3 09']

In [None]:
df.loc['2015-12-3 09','load']

Access a group of rows and columns by label(s) or a boolean array using `loc`

In [None]:
# Convert grams/m3 to micro-grams/m3
df.loc[:,df.columns != 'load'] *= 1E6
df.tail()

Selecting using boolean indexing

In [None]:
df[df<1E-6]

In [None]:
df[df<1E-6] = 0.0
df.head()

In pandas, the most common way to group by time is to use the .resample() method. 
This means that `df.resample('M')` creates an object to which we can apply other 
functions (e.g., 'mean', 'count', 'sum', ...)

You can use the `resample` method to create new timeseries:

In [None]:
df.resample('H').mean()

## Opening FALL3D data from multiple locations

In [None]:
import glob 

path      = "2-Tables"
fnames    = os.path.join(path,"*res")
pts_files = glob.glob(fnames)

In [None]:
pts_files

In [None]:
mydateparser = lambda x: pd.datetime.strptime(x, "%d%b%Y_%H:%M")
cities       = [s.split('.')[1] for s in pts_files]
columns      = ['load',
                'total',
                'pm05',
                'pm10',
                'pm20',                
               ]

args = dict(skiprows         = 7, 
            delim_whitespace = True, 
            names            = columns, 
            date_parser      = mydateparser,
           )

df = pd.concat((pd.read_csv(f,**args) for f in pts_files), 
               keys=cities, 
               names=["city","datetime"], 
              )


It creates a MultiIndex (hierarchical index) object:

In [None]:
df

In [None]:
# Convert grams/m3 to micro-grams/m3
df.loc[:,df.columns != 'load'] *= 1E6
df.head()

In [None]:
df.loc[('ATHENS','2015-12-03 13'),:]

In [None]:
df.groupby("city").max()

In [None]:
df.loc['MESSINA'].plot(y = ["total", "pm20"])
df.loc['MESSINA'].plot(y = "load")
plt.show()

We can remove the multilevel indexing using:

In [None]:
df.reset_index(level=0,inplace=True)

In [None]:
df

In [None]:
df.groupby('city').plot()
plt.show()

## Matplotlib, figures, axes, and subplots

We use the matplotlib's graphing framework `pyplot` to create figures, and through those figures, one or more axes objects can be created. These axes objects are then used for most plotting actions.

First, import `pyplot` and create a figure:
```python
import matplotlib.pyplot as plt 

fig, ax = plt.subplots()
```

`fig` means figure and represents your entire graphic. Your graph is what's called a subplot or axis and is represented by `ax`.

For example, 

In [None]:
for (name,dataframe) in df.groupby('city'):
    fig, (ax1,ax2) = plt.subplots(ncols=2,figsize=(15,5))
    dataframe.plot(y='total', title=name, ax=ax1)
    dataframe.plot(y='load',  title=name, ax=ax2)
    ax1.set_ylabel(r'Concentration [$\mu g/m^3$]')
    ax2.set_ylabel(r'Deposit load [$g/m^2$]')

plt.show()    