<img src=images/gdd-logo.png width=300px align=right>

# Plotting in Pandas

A great feature of Pandas is that it is really easy to make plots and visualisation directly from the dataframes.

In this notebook we will cover:
- [Pandas Plots](#1)
    - [<mark> Exercise: line </mark>](#line)
    - [<mark> Exercise: bar </mark>](#bar)
- [Using data from a file](#data)
    - [<mark>Exercise: Customize the plot</mark>](#ex-data)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

<a id='1' ></a>
## Pandas Plots

We use the [`.plot()`](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.plot.html) method on a DataFrame to create plot, e.g.

```python
dataframe.plot(x= , y= , kind= , ax= , ...)
```

Where the `ax=` parameter will reference the `Axes` we have already created using `fig, ax = plt.subplots()`.
    
The `.plot()` method takes several optional parameters. Most notably the *kind* parameter, which accepts eleven different string values and determines which kind of plot you’ll create:

1. "area" is for area plots.
2. "bar" is for vertical bar charts.
3. "barh" is for horizontal bar charts.
4. "box" is for box plots.
5. "hexbin" is for hexbin plots.
6. "hist" is for histograms.
7. "kde" is for kernel density estimate charts.
8. "density" is an alias for "kde".
9. "line" is for line graphs.
10. "pie" is for pie charts.
11. "scatter" is for scatter plots.

We'll demonstrate a few of these on some toy datasets.

### Scatter

In [None]:
#Dataset tracking unemployment rate against the stock index price.

Data = {'Unemployment_Rate': [6.1,5.8,5.7,5.7,5.8,5.6,5.5,5.3,5.2,5.2],
        'Stock_Index_Price': [1500,1520,1525,1523,1515,1540,1545,1560,1555,1565]
       }
  
stock_price_df = pd.DataFrame(Data)
stock_price_df

In [None]:
fig, ax = plt.subplots()

stock_price_df.plot(x = 'Stock_Index_Price', y = 'Unemployment_Rate', kind='scatter', ax=ax)

We can the excess output with a semi-colon. This can also be done by creating a dummy variable.

<a id='line' ></a>
### <mark> Exercise: Line </mark>

Produce a line chart for the following dataset tracking yearly unemployment rate. Customise it with a `color` argument (only need to specify the first letter of the color you want to use).

In [None]:
Data = {'Year': [1920,1930,1940,1950,1960,1970,1980,1990,2000,2010],
        'Unemployment_Rate': [9.8,12,8,7.2,6.9,7,6.5,6.2,5.5,6.3]
       }
unemployment_df = pd.DataFrame(Data)
unemployment_df

In [None]:
fig, ax = plt.subplots()

unemployment_df.plot()

<a id='bar' ></a>
### <mark> Exercise: Bar </mark>

Make a bar chart with the data below. See if you can add a title.

In [None]:
#Dataset for countries GDP per Capita.
Data = {'Country': ['USA','Canada','Germany','UK','France'],
        'GDP_Per_Capita': [45000,42000,52000,49000,47000]
       }
  
gdp_df = pd.DataFrame(Data)
gdp_df

### Pie

In [None]:
#Dataset tacking tasks completed
Data = {'Tasks': [300,500,700]}
tasks_df = pd.DataFrame(Data, index = ['Tasks Pending',
                                 'Tasks Ongoing',
                                 'Tasks Completed']
                 )
tasks_df

In [None]:
df.plot(y='Tasks', kind='pie', figsize=(5, 5), startangle=90);

## Using data from a file

Let's take a look at the top 10 directors according to number of TV Shows and Movies on US Netflix.

We have the csv file `netflix-top10-directors.csv` in the folder `data/`. When this folder is saved on your computer in the same location as your notebook file, you can use the relative path as shown below.

In [98]:
pd.read_csv('data/netflix-top10-directors.csv')

Unnamed: 0,director,Movie,TV Show,total
0,Rajiv Chilaka,19.0,,19.0
1,"Raúl Campos, Jan Suter",18.0,,18.0
2,Suhas Kadav,16.0,,16.0
3,Marcus Raboy,15.0,1.0,16.0
4,Jay Karas,14.0,,14.0
5,Cathy Garcia-Molina,13.0,,13.0
6,Jay Chapman,12.0,,12.0
7,Youssef Chahine,12.0,,12.0
8,Martin Scorsese,12.0,,12.0
9,Steven Spielberg,11.0,,11.0


Let's re-read this data, but this time let's specify an index column. What happens to that column?

In [None]:
netflix_directors = pd.read_csv('data/netflix-top10-directors.csv', index_col='director')
netflix_directors

The index is a special type of column, for example you can use it to slice your data (amongst other cool things).

In [None]:
netflix_directors.loc['Jay Chapman': 'Steven Spielberg']

But more interestingly for us, when you call `.plot()` the index will automatically become the `x-axis`. Therefore you don't need to specific the `x=` parameter.

In [None]:
fig, ax = plt.subplots()

netflix_directors.plot(y = 'total', ax=ax)

Cool! Does this chart make sense as a line chart? Probably not as there's no continuation as we move from director to director. Let's instead make a bar plot

In [None]:
fig, ax = plt.subplots()

netflix_directors.plot(y = 'total', kind='bar', ax=ax)

However, let's say you wanted to see when the director made a **TV Show** rather than a **Movie**. We'd probably want to create a bar chart with an extra dimension - a stacked bar chart. 

You can create a stacked bar chart by specifying to columns for `y` and using the parameter `stacked=True`.

In [None]:
fig, ax = plt.subplots()

netflix_directors.plot(y=['Movie', 'TV Show'], kind='bar', stacked=True, ax=ax)

## <mark>Exercise: Customize the plot</mark>

Using what you learned previously make some changes to the plot. You can choose some of your own customizations or you can follow the list below. 

*Notes: The more stars, the harder the customization. Use Google to find answers where we haven't seen the code*

- ★ Add a title
- ★ Add a label to the `y-axis`
- ★★ Change the color from default (blue & orange) to something else
- ★★ Make the chart bigger (using the parameter `figsize=`)
- ★★★ Change the `yticks` so that they are only showing integers and add gridlines
- ★★★ Add a horizontal line to represent the mean of the values

Add customizations to the below

In [None]:
fig, ax = plt.subplots() 

netflix_directors.plot(y=['Movie', 'TV Show'], 
                       kind='bar', stacked=True, 
                       ax=ax)

**Answers**: Uncomment and run the following code to see a solution

In [None]:
# %load answers/ex-netflix-customizations.py

<img src='images/conclusion.png' width=300px align=right>

# Conclusion

In this notebook, we looked at how to plot data with matplotlib using the pandas method `.plot()`.

Carrying over what you learned from the matplotlib notebook, you also set up your figure and axes before creating the plot. This isn't always necessary, however if you want your code to be readable, robust and scalable, this is the best way of doing things.

Up next we will look at an alternative to matplotlib. While matplotlib is the main plotting library, other libraries can provide benefits by being more data-centric and produce out-of-the-box plots with interactive elements.