# Day 8 Class Exercises: Matplotlib

These class exercises combine what we've leared with Numpy, Pandas and Matplotlib. You will need to use the data wrangling skills you have learned to make the plots.

Additionally, with these class exercises we learn a few new things.  When new knowledge is introduced you'll see the icon shown on the right: 
<span style="float:right; margin-left:10px; clear:both;">![Task](../media/new_knowledge.png)</span>

## Get Started
Import the Numpy, Pandas, Matplotlib packages and the Jupyter notebook Matplotlib magic

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

## Exercise 1. Load and clean the data for plotting

Import the Real Minimum Wages dataset from https://raw.githubusercontent.com/QuantEcon/lecture-source-py/master/source/_static/lecture_specific/pandas_panel/realwage.csv

In [3]:
minwages = pd.read_csv('https://raw.githubusercontent.com/QuantEcon/lecture-source-py/master/source/_static/lecture_specific/pandas_panel/realwage.csv')
print(minwages.shape)
minwages.head()

(1408, 6)


Unnamed: 0.1,Unnamed: 0,Time,Country,Series,Pay period,value
0,0,2006-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,17132.443
1,1,2007-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18100.918
2,2,2008-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,17747.406
3,3,2009-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18580.139
4,4,2010-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18755.832


Clean the data by performing the following:
1. Add a new column containing just the year
2. Drop rows with missing values
3. Keep only rows in the series "In 2015 constant prices at 2015 USD PPPs"
4. Keep only rows where the pay period is 'Annual'
5. Drop unwanted columns: 'Unnamed: 0', 'Time' and 'Series'
6. Rename the 'value' column as 'Salary'
7. Reset the indexes

In [4]:
minwages['Year'] = pd.to_datetime(minwages['Time']).dt.year
minwages.dropna(inplace=True)
minwages = minwages[minwages['Series'] == "In 2015 constant prices at 2015 USD PPPs"]
minwages = minwages[minwages['Pay period'] == "Annual"]
print(minwages.shape)
minwages.drop(['Unnamed: 0', 'Time', 'Series'], inplace=True, axis=1)
minwages.rename({'value' : 'Salary'}, inplace=True, axis=1)
minwages.reset_index(drop=True, inplace=True)
minwages.head()

(335, 7)


Unnamed: 0,Country,Pay period,Salary,Year
0,Ireland,Annual,17132.443,2006
1,Ireland,Annual,18100.918,2007
2,Ireland,Annual,17747.406,2008
3,Ireland,Annual,18580.139,2009
4,Ireland,Annual,18755.832,2010


## Exercise 2. Add a quartile group column

Do the following:

1. Find the quartiles for the minimal annual salary. 
2. Add a new column to the dataframe named `Group` that contains the values QG1, QG2, QG3 and QG4 representeding the quartile group (QG) to which the row belongs. Rows with a value between 0 and the first quartile get the value QG1, rows between the 1st and 2nd quartile get the value QG2, etc.

In [5]:
q1 = minwages['Salary'].quantile(q=0.25)
q2 = minwages['Salary'].quantile(q=0.5)
q3 = minwages['Salary'].quantile(q=0.75)
q4 = minwages['Salary'].quantile(q=1)
print(q1, q2, q3, q4)

6952.0789 11442.348999999998 16778.676999999996 23401.492000000002


In [6]:
group = pd.Series(np.zeros(minwages.shape[0]))

In [7]:
group[(minwages['Salary'] > 0) & (minwages['Salary'] <= q1)] = 'QG1'
group[(minwages['Salary'] > q1) & (minwages['Salary'] <= q2)] = 'QG2'
group[(minwages['Salary'] > q2) & (minwages['Salary'] <= q3)] = 'QG3'
group[(minwages['Salary'] > q3) & (minwages['Salary'] <= q4)] = 'QG4'
group.unique()
minwages['Group'] = group
minwages.head()

Unnamed: 0,Country,Pay period,Salary,Year,Group
0,Ireland,Annual,17132.443,2006,QG4
1,Ireland,Annual,18100.918,2007,QG4
2,Ireland,Annual,17747.406,2008,QG4
3,Ireland,Annual,18580.139,2009,QG4
4,Ireland,Annual,18755.832,2010,QG4


## Exercise 3. Create a boxplot

Create a graph using a single axis that shows the boxplots of the four groups. Use the Matplot lib [boxplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.boxplot.html) function. This will allow us to see if we properly separated rows by quartiles. It will also allow us to see the spread of the data in each quartile. Be sure to label the x-axis tick marks with the proper quantile group name.

## Exercise 4. Create a Scatterplot

Create a single scatterplot to explore if the salaries in quartile group 1 and quartile group 4 are correlated across years.  In other words are the salaries changing in simlar ways in both groups as time progresses.

**Hints:** 
- We must wrangle our dataframe to build this plot
- Be sure to add the x and y axis labels. 

Recreate the plot above, but set a different color per year and size the points to be larger for later years and smaller for earlier years.

## Exercise 5. Create a grid of scatterplots

Now, let's see the pairwise scatterplot of each quartile group with every other group.  Create a 4x4 grid of subplots. The rows and columns of the subplot represent one of the 4 groups and each plot represents the scatterplot of those groups. You can skip the plots in the diagonal as these will always be the same quartile group. 

<span style="float:right; margin-left:10px; clear:both;">![Task](../media/new_knowledge.png)</span>

Use the following code to ensure that the plot is large enough to see detail:

```python
plt.rcParams["figure.figsize"] = (12, 12)
```
The code above sets the size of the image in "inches" (i.e. 12 x 12 inches).  Also, because the x-axis and y-axis labels will be repeated, we only need to set them on the first column and last rows.  You can set the y-axis labels on the first column by using the `set` function and providing the `ylabel` argument. For example.
```python
axes[0, 0].set(ylabel="QG1")
```

You can do the same for the x-axis on the bottom row using the same style:
```python
axes[3, 0].set(xlabel="QG1")
```

Do you see any correlation between any of the groups?  If so, why do you suspect this is?

## Exercise 6. Create a grid of line plots

Now, let's create a line graph of changes over time for each quartile group.  Let's use a 2x2 subplot grid with each grid showing a different group.