# Plotting with Pandas

## Notebook Outline

* <a href='#introtoplotting'>Introduction To Plotting</a>
* <a href='#lineplots'>Line Plots</a>
* <a href='#histograms'>Histograms</a>
* <a href='#barcharts'>Bar Charts</a>

<a name='introtoplotting'></a>
# Introduction to Plotting
Mastering plotting in Python is not easy (unfortunately) there are multiple ways to do just about everything, and fine tuning can be complex.  The most commonly used python plotting library is matplotlib.  It gives you _a lot_ of control over plots but the control comes with the price of complexity.

Seaborn is a very popular python plotting that is more aesthetically pleasing than matplotlib and provides easier access to some functionality. It is built on top of matplotlib (which is one reason why we will cover matplotlib below).  I recommend that, if you are interested, you check out seaborn here: <https://seaborn.pydata.org/>

There is a great online gallery of matplotlib graphs here <https://python-graph-gallery.com/> with examples and code. It is a great resource in continuing to learn about data visualization in Python.

We are going to cover matplotlib below because the other libraries, like seaborn, are all built on top of it, and I believe you have to have an understanding (to some degree) of matplotib to successfully plot data with Python. 

Remember that there a multiple ways to create these plots. I am showing you the methods that I usually use that I think is area good balance between simplicity and control.

In [None]:
import os
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline


# Line Plots
Let's look at a typical line plot.  We will use our fuel price data set, that we used in a previous week, for this plot.  First, let's load in the data and take a look at it using head() and info().

In [None]:
filepath = os.path.join(os.getcwd(), 'data', 'AAA_Fuel_Prices.csv')
fuelPrices = pd.read_csv(filepath, parse_dates=[0])

In [None]:
fuelPrices.head(10)

In [None]:
fuelPrices.info()

#### We'd like to plot the price for regular grade gasoline in the US.
Let's first grab that data from the dataframe.

In [None]:
USRegular = fuelPrices.loc[(fuelPrices['County']=='US') &
                           (fuelPrices['Fuel'] == 'Gasoline - Regular'), :]
USRegular.head()

#### Now, let's plot the data: 

#### Introducing plt.subplots()

The first thing we want to do is create our figure and axis.  The figure is the entire canvas that the plot is created on.  The axis is just one area of the canvas that plots particular data.  So, if you have 4 subplots on one figure, you would have 4 axes. If you just have one plot on your figure, then you have one axis.

We create the figure and axis (or axes) using the subplots() method of the pyplot sub-module of the matplotlib python library. The first two arguments are the number of rows and number of columns of the axes layout that we want. Let's see some examples of this.

In [None]:
fig, axs = plt.subplots(2, 2)
print(type(fig))
print(type(axs))
print(axs.shape)
print(type(axs[0, 0]))

In [None]:
fig, axs = plt.subplots(3, 1)
print(axs.shape)
print(type(axs[1]))

In [None]:
fig, axs = plt.subplots(1, 1)
print(type(fig))
print(type(axs))

### In Class Exercise:
In the cell below, please use plt.subplots to create different arrangements of plots

#### Introducing the plot() method of a dataframe (or series)
Now that we have created our figure and axes, we are ready to plot our data.  Just select the column of data you'd like to plot (in this case the 'Price') column and call the plot() method. Also, pass the axis that you'd like the data plotted on to the ax argument.

In [None]:
USRegular.head()

In [None]:
fig, axs = plt.subplots(1, 1)
USRegular['Price'].plot(ax=axs)

#### Introducing the figsize argument to set the figure size
Set figsize to a tuple. The first entry controls the height and the second controls the width.

In [None]:
fig, axs = plt.subplots(1, 1, figsize=(15, 5))
USRegular['Price'].plot(ax=axs)

#### Introducting the set_xlabel(), set_ylabel(), and set_title() methods

In [None]:
fig, axs = plt.subplots(1, 1, figsize=(15, 5))
USRegular['Price'].plot(ax=axs)
axs.set_xlabel('Row Number')
axs.set_ylabel('Price ($)', color='green', fontsize=15)
axs.set_title('Fuel Price In The US', fontsize=20, color='blue')

#### Dealing with datetime indices in plots

In [None]:
fig, axs = plt.subplots(1, 1, figsize=(15, 5))
USRegular.plot(ax=axs, x='Month_of_Price', y='Price')
axs.set_title('Fuel Price')
axs.set_xlabel('Date')
axs.set_ylabel('Price ($)')

### In Class Exercise
Add a cell below and create a plot of fuel prices, add your on labels and title.

#### Plotting multiple lines of price data, and using the 'label' attribute:

In [None]:
fuelPrices['Fuel'].unique()

In [None]:
fig, axs = plt.subplots(1, 1, figsize=(15, 5))

for fuelType in fuelPrices['Fuel'].unique():
    fuelPrices.loc[(fuelPrices['Fuel'] == fuelType) &
                   (fuelPrices['County'] == 'US'), :].plot(ax=axs, y='Price',
                                                           x='Month_of_Price',
                                                          label=fuelType)

axs.set_title('Fuel Price')
axs.set_xlabel('Date')
axs.set_ylabel('Price ($)')

<a name='histograms'></a>
# Plotting Histograms

Note that we use the plot() method again, but this time pass the 'hist' argument to the 'kind' argument.  Valid values of the kind attribute include:
* ‘bar’ or ‘barh’ for bar plots
* ‘hist’ for histogram
* ‘box’ for boxplot
* ‘kde’ or 'density' for density plots
* ‘area’ for area plots
* ‘scatter’ for scatter plots
* ‘hexbin’ for hexagonal bin plots
* ‘pie’ for pie plots

In [None]:
fig, axs = plt.subplots(1, 1, figsize=(15, 5))
fuelPrices.loc[(fuelPrices['County']=='Honolulu') &
               (fuelPrices['Fuel']=='Gasoline - Premium'), 'Price'].plot(kind='hist', ax=axs)

#### Introducing the bins argument:

In [None]:
fig, axs = plt.subplots(1, 1, figsize=(15, 5))
fuelPrices.loc[(fuelPrices['County']=='Honolulu') &
               (fuelPrices['Fuel']=='Gasoline - Premium'), 'Price'].plot(kind='hist', bins=20, ax=axs)

#### Introducing alpha and plotting multiple histograms:

In [None]:
fig, axs = plt.subplots(1, 1, figsize=(15, 5))
for fuelType in ['Gasoline - Regular', 'Gasoline - Premium']:
    fuelPrices.loc[(fuelPrices['County']=='Honolulu') &
                   (fuelPrices['Fuel']==fuelType),
                   'Price'].plot(kind='hist', bins=20, ax=axs, alpha=0.3, label=fuelType)
    axs.legend()

### In Class Exercise
Add a cell below and create a histogram of fuel prices, add your on labels and title.

<a name=barcharts></a>
# Bar Charts

For bar charts, let's use our labor sheet data. First, we load the data:

In [None]:
filepath = os.path.join(os.getcwd(), 'data', 'LaborSheetData.csv')
laborSheetData = pd.read_csv(filepath, parse_dates=[[2, 3], 13])
laborSheetData.head(2)

#### Plotting the mean hourly sales per store:

In [None]:
meanHourlySales = laborSheetData.groupby('Store')['Sales'].mean()
print(meanHourlySales)
fig, axs = plt.subplots(1, 1, figsize=(15, 5))
meanHourlySales.plot(kind='bar', ax=axs)

#### Create plots of sales per manager, per store:

First let's look at an example of zip. I discuss this in the lecture.

In [None]:
zip([1, 2, 3], ['a', 'b', 'c'])

[(1, 'a'), (2, 'b'), (3, 'c')]

In [None]:
for A, B in zip([1, 2, 3], ['a', 'b', 'c']):
    print(A, B)

In [None]:
fig, axs = plt.subplots(2, 2, figsize=(15, 8))
print(axs)
print(axs.flatten())

for store, ax in zip(laborSheetData['Store'].unique(), axs.flatten()):
    
    # laborSheetData.loc[laborSheetData['Store']==store, :].groupby('Manager')['Sales'].mean().plot(ax=ax)
    storeSubset = laborSheetData.loc[laborSheetData['Store']==store, :]
    managerHourlyMeanSales = storeSubset.groupby('Manager')['Sales'].mean()
    managerHourlyMeanSales.plot(ax=ax, kind='bar')
    ax.set_title(store)

fig.tight_layout()

### In Class Exercise
Add a cell below and create a bar plot of data from the laborSheetData dataframe

## Question or Comments About This Notebook?
Feel free to contact me via my LinkedIn: https://www.linkedin.com/in/william-j-henry <br>
You can also email me at will@henryanalytics.com <br>