<img style="float: right;" width="120" src="http://neueda.conygre.com/pydata/images/neueda-logo.jpeg">
<br><br><br>

# Matplotlib

`%matplotlib` inline` is a **magic** command.<br>
It means when plotting matplotlib charts, embed them directly into the notebook

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
import pandas as pd

# Load some data into a DataFrame to demonstrate plotting with
df = pd.read_excel(io="https://neueda.conygre.com/pydata/plotting_data.xls", sheet_name='tips', index_col='ID')

print(df.shape)
df.head()

# Pandas Plotting Functions

Pandas comes with its own in-built plotting function<br>
General format is usually
- `DataFrame.plot.PLOT_TYPE` <br>
- `Series.plot.PLOT_TYPE`

## Histograms

In [None]:
# a series
df['total_bill'].plot.hist()

In [None]:
# an entire DataFrame
df.plot.hist()

## Scatterplot

In [None]:
df.plot.scatter(x='total_bill', y='tip')

## Hexbin

Use `gridsize` argument where appropriate

In [None]:
df.plot.hexbin(x='total_bill', y='tip', gridsize=10)

## Box plot


In [None]:
df.plot.box()

# Examples using matplotlib directly


In [None]:
import matplotlib.pyplot as plt

df.head()

## Histograms

In [None]:
x_data = df['total_bill']

hist_plot = plt.figure()

ax1 = hist_plot.add_subplot(1,1,1)

ax1.hist(x = x_data, bins=10)

ax1.set_title('Histogram of Total Bill')
ax1.set_xlabel('Frequency')
ax1.set_ylabel('Total Bill')

hist_plot.show()

## Scatterplot

In [None]:
scatter_plot = plt.figure()

x_data = df['total_bill']
y_data = df['tip']

ax1 = scatter_plot.add_subplot(1,1,1)

ax1.scatter(x = x_data, y = y_data)
ax1.set_title('Scatterplot of Total Bill vs Tips')
ax1.set_xlabel('Total Bill')
ax1.set_ylabel('Tips')

hist_plot.show()

## Boxplots

Used when a discrete variable is plotted against a continuous variable

In [None]:
box_plot = plt.figure()

female_tips = df[df['sex'] == 'Female']['tip']
male_tips = df[df['sex'] == 'Male']['tip']

ax1 = box_plot.add_subplot(1,1,1)
ax1.boxplot(x = [female_tips, male_tips], labels=['Female', 'Male'])
ax1.set_title('Boxplot of Tips by Gender')
ax1.set_xlabel('Gender')
ax1.set_ylabel('Tips')

box_plot.show()

# Seaborn

In [None]:
import seaborn as sns

## Histograms

In [None]:
# this subplots function returns 2 values as a tuple
# a figure
# a subplot added to the figure

# todo

In [None]:
# Same thing but without the denisty
# supply a kde argument and set it to False

# todo

## Density Pot

**Kernel Density Estimation**

In [None]:
den_plot, ax1 = plt.subplots()

ax1 = sns.distplot(a = df['total_bill'], hist=False)
ax1.set_title('Total Bill Density')
ax1.set_xlabel('Total Bill')
ax1.set_ylabel('Unit Probability')

plt.show()

## Rug Plots

- 1 dimensional representation of a variables distribution
- usually used with other plots to enhance visualization

In [None]:
hist_den_rug_plot, ax1 = plt.subplots()

ax1 = sns.distplot(a = df['total_bill'], rug=True)
ax1.set_title('Total Bill Histogram with Density and Rug Plot')
ax1.set_xlabel('Total Bill')

plt.show()

## Bar Plots

In [None]:
count_plot, ax1 = plt.subplots()

ax1 = sns.countplot(x='day', data=df)
ax1.set_title('Count of Days')
ax1.set_xlabel('Day of Week')
ax1.set_xlabel('Frequency')

plt.show()

## Scatterplot

- No `scatter` function in `seaborn`, use `regplot` instead.
- plots a scatterpoot **and** fits a regression line
- use `fit_reg=False` to toggle regression line on/off

In [None]:
# regplot - total bill vs tip

Alternative is to use `lmplot`<br>
`lmplot` calls `regplot`<br>
`lmplot` creates `figures`, `replot` creates axes<br>

In [None]:
fig = sns.lmplot(x='total_bill', y='tip', data=df)

plt.show()

Or use `jointplot`<br>
`jointplot` creates a scatter that includes a univariate plot on each axis<br>
`jointplot` does not return axes, so no need to create a figure<br>
`jointplot` creates a `JointGrid` object

In [None]:
joint_grid = sns.jointplot(x='total_bill', y='tip', data=df)
joint_grid.set_axis_labels(xlabel='Total Bill', ylabel='Tip')
joint_grid.fig.suptitle(t='Joint Plot of Total Bill and Tip', fontsize=10, y=1.03)

# Hexbins

- Group points on a scatter plit into larger points.
- In the same way that a `histogram` can bin a variable to create a `bar`
- `hexbin` can bin variables to create hexagons

In [None]:
hexbin = sns.jointplot(x="total_bill", y='tip', data=df, kind='hex')
hexbin.set_axis_labels(xlabel='Total Bill', ylabel='Tip')
hexbin.fig.suptitle(t='Hexbin Joint Plot of Total Bill and Tip', fontsize=10, y=1.03)

# 2D Density Plots

- similar to `sns.kdeplot`
- Create a density plot across a bivariate (2 variables)
- Can show just the bivariate
- or show the invividual univariates

In [None]:
# Just the bivariate
kde, ax1 = plt.subplots()

ax1 = sns.kdeplot(data=df['total_bill'], data2=df['tip'], shade=True) # toggle shade True/False
ax1.set_title('Kernek Density Plot of Total Bill & Tip')
ax1.set_xlabel("Total Bill")
ax1.set_ylabel('Tip')

plt.show()

In [None]:
# Include the univariates
kde_joint = sns.jointplot(x='total_bill', y='tip', data=df, kind='kde')

# Bar Plots
Default is to calculate the `mean`<br>
Use the `estimator` parameter to pass in any function<br>

In [None]:
bar, ax1 = plt.subplots()

ax1 = sns.barplot(x='time', y='total_bill', data=df)
ax1.set_title('Bar plot of average total bill for time of day')
ax1.set_xlabel('Time of day')
ax1.set_ylabel('Average total bill')

plt.show()

## Box Plots

- Use to show multiple statistics<br>
- e.g. quartiles, max, min, outliers etc

In [None]:
box, ax1 = plt.subplots()

ax1 = sns.boxplot(x='time', y='total_bill', data=df)
ax1.set_title('Box plot of total bill by time of day')
ax1.set_xlabel('Time of day')
ax1.set_ylabel('Total bill')

plt.show()

## Violin Plot

Include the distribution of the data

In [None]:
# todo: time vs total bill

## Pairwise Plots

- `pairplot` visualizes **ALL** pairwise relationships 
- Creates lots of redundant information
- Use `PairGrid` and manually assign plots for the top hald and bottom half

In [None]:
pair_grid = sns.PairGrid(data = df)

pair_grid = pair_grid.map_upper(sns.regplot) # or use plt.scatter
pair_grid = pair_grid.map_lower(sns.kdeplot)
pair_grid = pair_grid.map_diag(sns.distplot, rug=True)

plt.show()