# Pandas notebook

## Getting started with Pandas

Start by loading the `pandas` library (with alias `pd`) then load the dataset `airfoil2.csv` using the `read_csv()` function; call the corresponding dataframe `df`.
Use the `head()` method to show how `df` looks like.

Note that this `read_csv()` function is very flexible and can accomodate all sorts of file. 
You will do this in much more details in module 2.
For now, we're giving you a nicely formatted dataset that directly works well with Pandas.

In [5]:
# import the library
import pandas as pd
# load the dataframe, use "head" to have a look
df = pd.read_csv('data/airfoil.csv')

### Retrieving some basic informations 

Now that you have a DataFrame object `df`, you can explore the kind of information that is stored (beyond the actual data). Using the TAB completion you can get an idea for all the methods and attributes that you may want to use. 

Examples of useful attributes

* `shape` stores the dimensions of the data frame
* `columns` stores the names of the columns 
* `index` stores the names of the rows, by default pandas uses a range from 0 to the number of rows
* `dtypes` stores the `dtype` of each column

Show all of those, check it matches what you expected versus the output of `head` used earlier.

In [17]:
# add your code here to explore the meta-informations of df
print('Dimension {}'.format(df.shape))
print('Columns {}'.format(df.columns))
print('Row name{}'.format(df.index))
print('Type {}'.format(df.dtypes))

Dimension (1503, 6)
Columns Index(['Frequency [Hz]', 'Angle [deg]', 'Chord length [m]',
       'FS velocity [m/s]', 'SSD thickness [m]', 'Sound pressure [dB]'],
      dtype='object')
Row nameRangeIndex(start=0, stop=1503, step=1)
Type Frequency [Hz]           int64
Angle [deg]            float64
Chord length [m]       float64
FS velocity [m/s]      float64
SSD thickness [m]      float64
Sound pressure [dB]    float64
dtype: object


## Accessing elements in a dataframe

Let's get the 11th value of Frequency using several different approaches:

1. retrieve the series and then access the 10th value
1. using `loc`
1. (bonus) using `iloc`

In [41]:
# add your code here
ser = df['Frequency [Hz]']
print(df.loc[10,'Frequency [Hz]'])
df.loc?
print('iloc',df.iloc[10,0])

8000
iloc 8000


### Using loc for fancy selections

Using `loc()`, can you retrieve the sub-dataframe with all the columns whose name has strictly more than 15 characters? Call this `df2`. Using `to_csv`, output this as a tab separated file (not comma) and call the file `airfoil2_2.dat`.

(Open the file in an editor to check it matches what you expect).

In [50]:
# add your code here
df2 = df.loc[0:5 ,[c for c in df.columns if len(c) > 15]]
df3 = df.loc[df['Frequency [Hz]'] > 800,:]
print(df2)
print(df3)
# print ( for c in df.columns)

   Chord length [m]  FS velocity [m/s]  SSD thickness [m]  Sound pressure [dB]
0            0.3048               71.3           0.002663              126.201
1            0.3048               71.3           0.002663              125.201
2            0.3048               71.3           0.002663              125.951
3            0.3048               71.3           0.002663              127.591
4            0.3048               71.3           0.002663              127.461
5            0.3048               71.3           0.002663              125.571
      Frequency [Hz]  Angle [deg]  Chord length [m]  FS velocity [m/s]  \
1               1000          0.0            0.3048               71.3   
2               1250          0.0            0.3048               71.3   
3               1600          0.0            0.3048               71.3   
4               2000          0.0            0.3048               71.3   
5               2500          0.0            0.3048               71.3   
6  

### Working with a Pd.Series

Retrieve the series corresponding to the sound pressure from the dataframe, display

* show the name of the series
* show the shape attribute of the series (does it correspond to what you expected?)
* the mean and the median
* the mean of the squared values

In [56]:
# add your code here
sd = df['Sound pressure [dB]']
print(sd.name)



Sound pressure [dB]
