# Python par la pratique : partie 3 - Pandas

Ce notebook fournit des ressources pour la pratique de Python.

Pour chacune des méthodes, il vous faudra bien comprendre leur fonctionnement et vous pourrez vous documenter sur internet (docs officielles et forums).
N'hésitez pas à modifier les cellules pour tester d'autres configurations que celles données ici.

Si vous souhaitez connaitre les méthodes disponibles d'un objet particulier, utilisez la function ``dir(obj)``.

In [None]:
# Usual import
import pandas as pd

?pd.DataFrame

### **Create a DataFrame object**

In [None]:
l = []
type(l)

In [None]:
t = ()
type(t)

In [None]:
# Option 1
data = [('Mary', 968),
        ('Jessica', 155),
        ('Bob', 578),
        ('John', 403),
        ('Mel', 199)]
columns = ['Names', 'Births']

df = pd.DataFrame(data=data, columns=columns)
df

In [None]:
data = {}
type(data)

In [None]:
# Option 2
data = {
    'Names': ['Bob', 'Jessica', 'Mary', 'John', 'Mel'],
    'Births': [968, 155, 578, 403, 199]
}

df = pd.DataFrame(data=data)
df

In [None]:
# Print the type of the columns
df.dtypes

In [None]:
# Select a column

# Option 1, the output is a Series
#print(df.Births, end="\n\n")

# Option 2, the output is a Series
#print(df["Births"], end="\n\n")

# Option 3, the output is a Series
#print(df.loc[:, "Births"], end="\n\n")

# Option 4, the output is a DataFrame
df.loc[:, ["Births", "Names"]]

In [None]:
# Basic operations on DataFrame objects
df.Births

### **Read an external file (CSV, XLS)**

In [None]:
?pd.read_csv

In [None]:
weather = pd.read_csv("weather.csv")
weather

In [None]:
weather = pd.read_csv("weather.csv", sep=';')
weather

In [None]:
weather.dtypes

In [None]:
# Read an Excel file
?pd.read_excel

### **Describe the data**

In [None]:
sample = weather.head(5)

In [None]:
sample.index = ["row0", "row1", "row2", "row3", "row4"]

In [None]:
sample

In [None]:
sample.set_index("date")

In [None]:
weather.describe()

### **Select data**

In [None]:
sample = weather.loc[[1, 3], :]
sample

*Exercise:*
* *What is the result of `sample.loc[0]`?*
* *How to get the first row of the data frame `sample`?*

In [None]:
sample.iloc[1]

### **Sort the data**

In [None]:
weather

In [None]:
weather.sort_values(by="temp_max", ascending=True, inplace=True)

In [None]:
weather

### **Statistical functions**

In [None]:
weather.mean?

In [None]:
# Compute the mean value of each column
weather.mean(axis=0)

In [None]:
weather.mean(axis=1)

*Exercise:*
   
* *Compute the mean value of each row*
* *Compute the median, maximum and minimum of each row and column, using median, max and min methods*

### **Group and combine**

In [None]:
weather

In [None]:
# Create a "groupby" object, i.e. split
gb = weather.groupby("month")

# Compute the mean values for each group, i.e. combine
weather_by_months = gb.mean()
weather_by_months

In [None]:
gb.count()

In [None]:
gb.min()

In [None]:
# Or using another combine function f

def f(df):
    return df.median()

gb.apply(f)

*Exercise*
* *Compute the number of days for each category of the variable `weather`*
* Use `groupby` and `count` methods

In [None]:
gb = weather.groupby("weather")

In [None]:
gb.count()

### **Sort the indexes of a DataFrame object**

In [None]:
?weather_by_months.sort_index

In [None]:
weather_by_months

In [None]:
weather_by_months.sort_index()

*Well, this is not really what we want...*

In [None]:
# Here, just select the rows in the good order
ordered_columns = ['January', 'February', 'March', 'April', 'May', 'June', 'July',
                   'August', 'September', 'October', 'November', 'December']
weather_by_months.loc[ordered_columns]

### **Concatenate two data frames**

You want to compute `temp_max - temp_min` and add it to the data frame `weather_by_months` using `concat` method

In [None]:
# delta is a Series
delta = weather_by_months.temp_max - weather_by_months.temp_min
delta

In [None]:
?pd.concat

In [None]:
pd.concat([weather_by_months, delta])

In [None]:
pd.concat([weather_by_months, delta], axis=1)

In [None]:
# A simple solution for that specific case would be
weather_by_months["delta"] = weather_by_months.temp_max - weather_by_months.temp_min
weather_by_months

### Reshaping data: pivot, melt

In [None]:
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
                   'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'baz': [1, 2, 3, 4, 5, 6],
                   'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
df

In [None]:
df2 = df.pivot(index="foo", columns="bar", values="baz")
df2

In [None]:
pd.melt(df2)

In [None]:
pd.melt(df2.reset_index(), id_vars=["foo"])