<div class="licence">
<span>Licence CC BY-NC-ND</span>
<span>Valérie Roy</span>
<span><img src="../media/ensmp-25-alpha.png" /></span>
</div>

In [None]:
import matplotlib.pyplot as plt

%matplotlib inline

In [None]:
import pandas as pd
import numpy as np

# pandas DataFrame plots in Matplotlib
   - boxplot, histogram, barchart...
   - plotting pairs of columns

### boxplot

we prepare a **DataFrame** of **adults** with random **age**, **gender**, **height**, **weight**

In [None]:
N = 40 # number of elements
df = pd.DataFrame({'age': np.random.randint(18, 99, size=N),
                   'gender':  np.random.choice(['F', 'M'], size=N),
                   'height' : np.random.randint(140, 189, size=N)/100,
                   'weight':  np.random.randint(350, 890, size=N)/10},)
df.head(3)

### printing the boxplot

In [None]:
# same scale of values
df.boxplot(['age', 'weight']);

In [None]:
df.boxplot(['height']);

we add some **outliers**

In [None]:
df.loc[0, 'height'] = 2.5
df.loc[1, 'height'] = 2.6
df.loc[2, 'height'] = 0.8
df.loc[3, 'height'] = 0.6

In [None]:
df.boxplot(['height']);

### histogram

In [None]:
df.hist();

In [None]:
df.hist(['height'],
        grid=False,
        bins = 20);

### barchart

In [None]:
# animals with their speed and lifespan
df = pd.DataFrame({'speed' : [0.1, 17.5, 40, 48, 52, 69, 88],
                   'lifespan' : [2, 8, 70, 1.5, 25, 12, 28]},
                  index = ['snail', 'pig', 'elephant',
                           'rabbit', 'giraffe', 'coyote', 'horse'])

In [None]:
df.plot.barh();


In [None]:
ax = df.plot.barh(x='lifespan',
                  y='speed')

### plotting a **dataframe**

   - the **dataset** contains the french nuclear and renewable **electricity production** between 1970 and 2011

In [None]:
# location of the file on your disk
filename = 'france-prod-elec-enr-nucl.csv'

# using pandas to read the data 
# and turn it into a dataframe
df = pd.read_csv(filename, index_col=0)


In [None]:
# a quick glimpse
df.head()

a *pandas.DataFrame* object has a **method** to  **plot**
   - the **method** is **built on** **matplotlib.pyplot.plot**

In [None]:
# column names are used for describing each plot
df.plot();

   - a *pandas.Series* object also has a **method** to **plot**
   - and again, it is **built on** **matplotlib.pyplot.plot**

In [None]:
# this time we don't have a descriptive text but it is still quite convenient

(df['prod brute elec nucleaire'] + df['prod brute elec primaire renouv']).plot();

### plotting pairs of columns

- **2D plots** can show **interesting** information **thanks to**: **colors**, **shapes**, ...

   - we use the well-known **iris flowers** dataset
   - it describes three types of **iris:** **virginica**, **versicolor** and **setosa**
   - by the **length** and the **width** of their **sepals** and **petals**
   - we have $50$ iris of each type

In [None]:
# a rough overview of that dataset
 
df = pd.read_csv('iris123.csv')

In [None]:
df.head(2)

In [None]:
df.dtypes

we plot one column in function of one other
   - this way, a simple plot can give you a lot of information on your data

In [None]:
df.plot.scatter(x='sepal length', y = 'sepal width');
# all points are blue ! it is not very informative...

   - it will be more **informative** of you add the **type** !
   - with the `c`` parameters of **scatter** function

In [None]:
plt.scatter(df['sepal length'], df['sepal width'], c=df['type']); # parameter c
# this way it is far more informative !
 

   - we can **use** some other column to **shape** the **markers**

In [None]:
plt.scatter(df['sepal length'], df['sepal width'],
            c=df['type'],
            s=df['petal width']*50); # parameter s