# Introduction to the Pandas module

## Scope
This notebook gives some key functions to work with data base using the panda module (https://pandas.pydata.org/)

The web gives you a lot of exemples and documentations on this module:

http://pandas.pydata.org/pandas-docs/stable/10min.html

http://www.python-simple.com/python-pandas/panda-intro.php

In [1]:
# Setup
%load_ext autoreload
%matplotlib notebook
%autoreload 2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl

### Load data and creat a dataframe from csv file

More explaination can be found here : https://chrisalbon.com/python/data_wrangling/pandas_dataframe_importing_csv/

In [2]:
df = pd.read_csv("./_DATA/Note_csv.csv", delimiter=";")
df

Unnamed: 0,section,groupe,name,ET,CC
0,MM,A,ami,14.50,11.75
1,MM,A,joyce,8.50,11.50
2,MM,C,lola,9.50,13.25
3,MM,B,irma,7.50,6.00
4,IAI,D,florence,14.50,13.25
...,...,...,...,...,...
90,MM,A,james,13.75,12.75
91,IAI,D,richard,15.25,7.00
92,MM,A,caprice,18.25,15.00
93,IAI,D,al,12.50,9.75


### Display the dataframe

In [3]:
# return the beginning of the dataframe
df = df.fillna(0.0)
df.head(10)

Unnamed: 0,section,groupe,name,ET,CC
0,MM,A,ami,14.5,11.75
1,MM,A,joyce,8.5,11.5
2,MM,C,lola,9.5,13.25
3,MM,B,irma,7.5,6.0
4,IAI,D,florence,14.5,13.25
5,MM,B,vi,11.0,7.5
6,MM,B,brian,14.0,16.25
7,MM,B,antoinette,14.5,17.0
8,IAI,D,fred,9.5,11.5
9,IAI,D,gaston,12.25,5.75


In [4]:
# return the end of the dataframe
df.tail(10)

Unnamed: 0,section,groupe,name,ET,CC
85,MM,A,vin,11.0,13.0
86,MM,A,jeunesse,12.0,10.5
87,MM,A,victoire,11.75,12.0
88,MM,B,joseph,8.0,10.0
89,MM,A,fꭩx,13.0,14.5
90,MM,A,james,13.75,12.75
91,IAI,D,richard,15.25,7.0
92,MM,A,caprice,18.25,15.0
93,IAI,D,al,12.5,9.75
94,MM,B,constance,3.0,7.0


### Selecting data in a dataframe

In [None]:
# get data from index 2
df.loc[2]

In [None]:
# get name from index 2
df.name[2]

In [None]:
# Sliccing is also working

df.name[2:6]

###  Get one of row of the dataframe

In [None]:
df.groupe

###  Get the number of student in each groupe

In [None]:
df.groupe.value_counts()

###  Get the proportion of student between groupes

In [None]:
df.groupe.value_counts(normalize=True)

###  Display the proportion of student between groupes




***Using the plot function of panda:***

visualization optin of pandas can be found here : http://pandas.pydata.org/pandas-docs/version/0.18/visualization.html

In [None]:
fig = plt.figure()
df.groupe.value_counts(normalize=True).plot.pie(
    labels=["A", "B", "C", "D"], colors=["r", "g", "b", "y"], autopct="%.1f"
)
plt.show()

***Using the plot function of matplotlib:***

In [None]:
val = df.groupe.value_counts(normalize=True).values
explode = (0.5, 0, 0.2, 0)
labels = "A", "B", "C", "D"
fig1, ax1 = plt.subplots()
ax1.pie(
    val, explode=explode, labels=labels, autopct="%1.1f%%", shadow=True, startangle=90
)
ax1.axis("equal")  # Equal aspect ratio ensures that pie is drawn as a circle.

plt.show()

###  Get student list who are in groupe A

In [None]:
df[(df.ET > 14.0) & (df.CC > 14.0)]

###  Make calulation on data

In [None]:
df.ET.mean()  # the mean of ET note over all student

In [None]:
df.ET[df.groupe == "B"].mean()  # the mean of note1 over student from A groupe

In [None]:
df.groupby(["groupe"]).mean()  # compte the mean of each note for each groupe

In [None]:
df.groupby(["section"]).mean()  # compte the mean of each note for each section

###  Display the notes with a histogram plot


In [None]:
# CC notes
fig = plt.figure()
df.CC.plot.hist(alpha=0.5, bins=np.arange(1, 20))
plt.show()

In [None]:
# ET notes
fig = plt.figure()
df.ET.plot.hist(alpha=0.5, bins=np.arange(1, 20))
plt.show()

In [None]:
fig = plt.figure()
df.plot.hist(alpha=0.5, bins=np.arange(1, 20))
plt.show()

## Let's compute the mean of both notes


###  We need first to add a new row to a data frame

In [None]:
df["FinalNote"] = 0.0  # add  row filled with 0.0
df

In [None]:
df.head()

### Let's compute the mean

In [None]:
df["FinalNote"] = 0.7 * df.ET + 0.3 * df.CC
# the axis option alows comptuting the mean over lines or rows

In [None]:
df.head()

In [None]:
fig = plt.figure()
df.FinalNote.plot.hist(alpha=0.5, bins=np.arange(1, 20))
plt.show()

## What is the overall mean ?

In [None]:
df.FinalNote.mean()