# Data Visualization in Python

## Histograms in Python: Pandas, Seaborn, Matplotlib, Plotly and Plotnine

Libraries covered:
1. pandas.DataFrame.plot()
2. Matplotlib
3. Seaborn
4. Plotly
5. Plotnine

In [None]:
import pandas as pd
import numpy as np

import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt
from plotnine import *

pd.options.display.max_columns = 500

# Read the data and keep info of 9 states
df = pd.read_csv("https://raw.githubusercontent.com/martinbel/datasets/master/unemployment.csv")
keep_states = ['SC', 'CA', 'FL', 'NY', 'WI', 'WA', 'NJ', 'IL', 'TX']
df = df.query('state == @keep_states')

# show top 3 rows of each state
df.groupby("state").head(3).head(9)

# 1. Pandas Plotting method

Generally what I would try first if I'm doing EDA or Sharing static plots. 

In [None]:
# Change the matplotlib default stype to seaborn
plt.style.use('seaborn')
# plt.style.use('ggplot') similar to R ggplot style

# I use this a lot for quick plots
df.unemployment.hist(bins=30);

In [None]:
# Quite good for how simple it is
df.hist(column='unemployment', by='state', bins=20);
plt.tight_layout()

# 2. Matplotlib

The most flexible option. But involves writing a lot more code!

In [None]:
group_values = list(df.state.unique())
group_values

In [None]:
# set number of columns in the plot
ncols = 3

# calculate number of rows in the plot
nrows = len(group_values) // ncols + (len(group_values) % ncols > 0)

# Define the plot 
plt.figure(figsize = (9, 9))
plt.subplots_adjust(hspace=0.25)
plt.suptitle("Unemployment Rate by State", fontsize=16, y=0.95)

for n, col in enumerate(group_values):
    # add a new subplot at each iteration using nrows and cols
    ax = plt.subplot(nrows, ncols, n + 1)
    
    # Filter the dataframe data for each state
    df_temp = df.query("state == @col")
    df_temp.unemployment.hist(ax=ax, bins=30)
    
    # Let's add some vertical lines with mean, and meadian
    median_x = df_temp.unemployment.median()
    _ = ax.vlines(x=[median_x], ymin=0, ymax=70, colors=['r']);
    
    # Add annotation
    plt.text(median_x, 70, 'Mean')    

    # chart formatting
    ax.set_title(col)
    ax.set_xlabel("")

# 3. Seaborn

Great trade-off between simplicity and advanced features. 

- Allows independent Axis in sub-plots. 
- I can easily add a kernel density
- Similar API for all plots, that makes it very intuitive

In [None]:
# The data is already in long format
df.groupby("state").head(3).head(9)

In [None]:
sns.set(style='darkgrid')

g = sns.FacetGrid(df, 
                  col='state',                # facet col variable
                  col_wrap=3,                 # define nbr of subplots per row
                  sharex=False, sharey=False   # Define which axes are shared
                 )
g.map(sns.histplot, 
      'unemployment', 
      kde=True,
      binwidth=0.5             # Width of each bin
     )

plt.figure(figsize=(7,7))

In [None]:
# Not exactly a histogram but also a possibility to represent the distribution
# This is a Kernet Density plot
sns.kdeplot(df, x='unemployment', hue='state')

# 4. Plotly-Express

It's also a good trade-off between simplicity and advanced features. 

- The advantage of plotly is it allows to make interactive plots. 
- This can be very useful if you are developing interactive applications

In [None]:
px.histogram(df, 
             x='unemployment', 
             #color='state', 
             facet_col='state', 
             facet_col_wrap=3,
             histnorm='probability',
             nbins=50, 
             width=800, height=800
            )

# 5. Plotnine

Plotnine is a ggplot2 port for Python. It's a declarative type of library where you add layers to the plot. 

It's very easy to use and intuitive, similar to writing a recipe. 

In this case, it allows to easily control the scales of each subplot. 

In [None]:
(ggplot(df, aes(x='unemployment')) + 
 geom_histogram() +
 facet_wrap("~ state", scales='y_free') +
 theme(figure_size=(8, 8)) +
 xlab("Unemployment") +
 ggtitle("Histogram of Unemployment by State")
)