# Module 2 Python Practice

we will have supplementary python notebooks to see the the python equivalents of the visualizations we implement in R and ggplot2. 


One of the very useful libraries in Python is `Seaborn`. Let's see how we can use it for our visualizations as well as the `plotnine` library that is a Python implementation of ggplot2. 

**Run the following code cells and study the outputs to understand how they work.** 

In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotnine 
from plotnine.data import mtcars
from plotnine import *

In [None]:
mtcars.head()

In [None]:
# Let's use the cars data to visualize some aspects of the data set.
# Pick some variables
# Make a plot to show if there's any visible correlation, use _kws to change color and alpha transparency
sns.pairplot(mtcars[['mpg','disp','cyl','hp','drat','wt']])

### Try also the following options to play with the output: 

In [None]:
#sns.pairplot(mtcars[['mpg','disp','cyl','hp','drat','wt']], palette = 'Accent', hue="cyl")
#sns.pairplot(mtcars[['mpg','disp','cyl','hp','drat','wt']], palette = 'Accent', hue="cyl", diag_kind="hist")
#sns.pairplot(mtcars[['mpg','disp','cyl','hp','drat','wt']], plot_kws={'color':'maroon','alpha': 0.8})

In [None]:
# Drop non-numeric value
data = mtcars.drop(['name'], axis=1)
# Let's compute all the correlations and look at them 
data.corr()

In [None]:
np.abs(np.array(data.corr()))

**Not very useful to look at numbers, let's use a visualization with the ellipse library.**


In [None]:
# This represents correlations as ellipses; slope represents sign,
# thickness represents strength of correlation: thinner is better

# No existing function to plot correlations as ellipses in Python. So implement one using matplotlib.

from matplotlib.collections import EllipseCollection

def plot_corr_ellipses(data, ax=None, **kwargs):

    M = np.array(data)
    if not M.ndim == 2:
        raise ValueError('data must be a 2D array')
    if ax is None:
        fig, ax = plt.subplots(1, 1, subplot_kw={'aspect':'equal'})
        ax.set_xlim(-0.5, M.shape[1] - 0.5)
        ax.set_ylim(-0.5, M.shape[0] - 0.5)

    # xy locations of each ellipse center
    xy = np.indices(M.shape)[::-1].reshape(2, -1).T

    # set the relative sizes of the major/minor axes according to the strength of
    # the positive/negative correlation
    w = np.ones_like(M).ravel()
    h = 1 - np.abs(M).ravel()
    a = 45 * np.sign(M).ravel()

    ec = EllipseCollection(widths=w, heights=h, angles=a, units='x', offsets=xy,
                           transOffset=ax.transData, array=M.ravel(), **kwargs)
    ax.add_collection(ec)

    # if data is a DataFrame, use the row/column names as tick labels
    if isinstance(data, pd.DataFrame):
        ax.set_xticks(np.arange(M.shape[1]))
        ax.set_xticklabels(data.columns, rotation=90)
        ax.set_yticks(np.arange(M.shape[0]))
        ax.set_yticklabels(data.index)

    return ec

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(15,13))
m = plot_corr_ellipses(data.corr(), ax=ax, cmap='Greens')
cb = fig.colorbar(m)
cb.set_label('Correlation coefficient')
ax.margins(0.1)

**Let's look at the plotnine version of some of the R practice notebook examples.**


In [None]:
# get a small sample from diamonds data set 
from plotnine.data import diamonds

dsamp = diamonds.sample(n=1000, random_state=1)
dsamp.head()

In [None]:
# plot carat vs price and encode 'cut' variable with color

plotnine.options.figure_size=(8,8) 
# default color palette 
gp = ggplot(dsamp, aes(x='carat', y='price', color='cut')) + geom_point()
gp

In [None]:
gp + scale_color_brewer()


In [None]:
# This might be better if we want to emphasize the ideal cut 
gp + scale_color_brewer(type="seq", palette=3)

In [None]:
#Again good choice 
gp + scale_color_brewer(type='seq',palette='Oranges')


In [None]:
# We can also assign colors manually using their hexadecimal codes 
gp + scale_color_manual(values=["#0000FF", "#009F00", "#56B4E9", "#009E73", "#FFFFFF"])


# not a very good color scheme

In [None]:
# Let's create a histogram of carat variable 
gp2 = ggplot(data=dsamp, mapping = aes(x='carat'))+ geom_histogram(binwidth=0.5, mapping=aes(fill = 'stat(count)'))
gp2

In [None]:
gp2 + scale_fill_gradient(name="count", low="blue", high="red")

In [None]:
mtcars['cyl'] = mtcars['cyl'].astype(str) # change cyl type from continuous numeric to discrete string
gp3 = ggplot(data=mtcars, mapping=aes(x='wt', y='mpg', color='cyl')) + geom_point()
gp3

In [None]:
gp3 +  scale_color_brewer(palette="Reds")