# Pandas notebook

## Getting started with Pandas

Start by loading the `pandas` library (with alias `pd`) then load the dataset `airfoil2.csv` using the `read_csv()` function; call the corresponding dataframe `df`.
Use the `head()` method to show how `df` looks like.

Note that this `read_csv()` function is very flexible and can accomodate all sorts of file. 
You will do this in much more details in module 2.
For now, we're giving you a nicely formatted dataset that directly works well with Pandas.

In [0]:
import pandas as pd
df = pd.read_csv("data/airfoil.csv")
df.head()


### Retrieving some basic informations 

Now that you have a DataFrame object `df`, you can explore the kind of information that is stored (beyond the actual data). Using the TAB completion you can get an idea for all the methods and attributes that you may want to use. 

Examples of useful attributes

* `shape` stores the dimensions of the data frame
* `columns` stores the names of the columns 
* `index` stores the names of the rows, by default pandas uses a range from 0 to the number of rows
* `dtypes` stores the `dtype` of each column

Show all of those, check it matches what you expected versus the output of `head` used earlier.

In [0]:
print(">> Shape attr:\n{}".format(df.shape))
print("\n>> Columns:\n{}".format(df.columns))
print("\n>> Index:\n{}".format(df.index))
print("\n>> Dtypes:\n{}".format(df.dtypes))


## Accessing elements in a dataframe

Let's get the 11th value of Frequency using several different approaches:

1. retrieve the series and then access the 10th value
1. using `loc`
1. (bonus) using `iloc`

In [0]:
print("1. {}".format(df['Frequency [Hz]'][10]))
print("2. {}".format(df.iloc[10,0]))
print("3. {}".format(df.loc[10,'Frequency [Hz]']))


### Using loc for fancy selections

Using `loc()`, can you retrieve the sub-dataframe with all the columns whose name has strictly more than 15 characters? Call this `df2`. Using `to_csv`, output this as a tab separated file (not comma) and call the file `airfoil2_2.dat`.

(Open the file in an editor to check it matches what you expect).

In [0]:
df2 = df.loc[:, [e for e in df.columns if len(e)>15]]
df2.head()
df2.to_csv("airfoil2_2.dat", sep='\t')


### Working with a Pd.Series

Retrieve the series corresponding to the sound pressure from the dataframe, display

* show the name of the series
* show the shape attribute of the series (does it correspond to what you expected?)
* the mean and the median
* the mean of the squared values

In [0]:
sp = df['Sound pressure [dB]']
print("Name: {}".format(sp.name))
print("Shape: {}".format(sp.shape))
print("Mean: {0:.2f}, Median: {1:.2f}".format(sp.mean(), sp.median()))
print("Mean of sq. vals: {0:.2f}".format((sp**2).mean()))
