<img src="https://nyp-aicourse.s3.ap-southeast-1.amazonaws.com/agods/nyp_ago_logo.png" width='300'/>

Welcome to the lab! Before we get started here are a few pointers on using this notebooks.

1. The notebook is composed of cells; cells can contain code which you can run, or they can hold text and/or images which are there for you to read.

2. You can execute code cells by clicking the ```Run``` icon in the menu, or via the following keyboard shortcuts ```Shift-Enter``` (run and advance) or ```Ctrl-Enter``` (run and stay in the current cell).

3. To interrupt cell execution, click the ```Stop``` button on the toolbar or navigate to the ```Kernel``` menu, and select ```Interrupt ```.
    

# Seaborn

Seaborn is a library for making statistical graphics in Python. It builds on top of matplotlib and integrates closely with pandas data structures.  

It operates on complete dataset and perform the necessary *semantic mapping* and *statistical aggregation* to produce informative plots. Its dataset-oriented, declarative API allows you to focus on what the different elements of your plots mean, rather than on the details of how to draw them.

Seaborn plotting function can be either an “axes-level” or “figure-level” function. Axes-level function plots data onto a single `matplotlib.pyplot.Axes` object while the figure-level function interfaces with matplotlib through a seaborn object, usually a `FacetGrid` that manages the figure, and allows plotting data onto multiple `matplotlib.pyplot.Axes` as subplots. The single figure-level function offers a unitary interface to its various axes-level functions. The organization looks a bit like this:

<img src="images/sns_function_overview.png"/>


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

## A Quick Example

The following is an example of plotting using the Seaborn library. 

In [None]:
data = pd.read_csv("datasets/salary.csv", index_col=0)[:1000]
data.head()

In [None]:
sns.set_theme(style="ticks")
sns.relplot(x="Salary", y="Age", 
            hue="Education", style="Education", 
            col="Gender", 
            data=data)

Notice how we provided only the names of the variables (e.g. *Salary*, *Age*, *Education*, *Gender*), which are the correponding column names in the dataframe) and their roles (e.g *x* or *y-axis* values, *hue*, *col*) in the plot. Behind the scenes, seaborn handled the translation from values in the dataframe to arguments that matplotlib understands. This declarative approach lets you stay focused on the questions that you want to answer, rather than on the details of how to control matplotlib.

## Controlling Figure Aesthetics

In Seaborn, we can use customized themes and a high-level interface for controlling the appearance of Matplotlib figures. 

Below is the default look of from the plot using Matplotlib. 

In [None]:
plt.figure()
x1 = [10, 20, 5, 40, 8]
x2 = [30, 43, 9, 7, 20]
plt.plot(x1, label='Group A')
plt.plot(x2, label='Group B')
plt.legend()
plt.show()

To switch to the Seaborn defaults, simply call the `set_theme()` function:

In [None]:
sns.set_theme()
plt.figure()
x1 = [10, 20, 5, 40, 8]
x2 = [30, 43, 9, 7, 20]
plt.plot(x1, label='Group A')
plt.plot(x2, label='Group B')
plt.legend()
plt.show()

## Seaborn Figure Styles

To control the plot style, Seaborn provides two methods: `set_style()`and `axes_style()`

[set_style()]((https://seaborn.pydata.org/generated/seaborn.set_style.html) sets the aesthetic style of the plots.

In [None]:
sns.set_style("whitegrid")
# sns.set_theme(style='whitegrid')
plt.figure()
x1 = [10, 20, 5, 40, 8]
x2 = [30, 43, 9, 7, 20]
plt.plot(x1, label='Group A')
plt.plot(x2, label='Group B')
plt.legend()
plt.show()

[axes_style()](https://seaborn.pydata.org/generated/seaborn.axes_style.html) returns a parameter dictionary for the aesthetic style of the plots. The function can be used in a `with` statement to temporarily change the style parameters.

In [None]:
sns.set_theme()
plt.figure()
x1 = [10, 20, 5, 40, 8]
x2 = [30, 43, 9, 7, 20]
with sns.axes_style('darkgrid'):
    plt.plot(x1, label='Group A')
    plt.plot(x2, label='Group B')
plt.legend()
plt.show()

## Removing Axes Spines

Sometimes, it might be desirable to remove the top and right axes spines. The despine() function is used to remove the top and right axes spines from the plot:

In [None]:
sns.set_style("white")
plt.figure()
x1 = [10, 20, 5, 40, 8]
x2 = [30, 43, 9, 7, 20]
plt.plot(x1, label='Group A')
plt.plot(x2, label='Group B')
sns.despine()
plt.legend()
plt.show()

## Controlling the Scale of Plot Elements

A separate set of parameters controls the scale of plot elements. This is a handy way to use the same code to create plots that are suited for use in contexts where larger or smaller plots are necessary. To control the context, two functions can be used.

[set_context()](https://seaborn.pydata.org/generated/seaborn.set_context.html) sets the plotting context parameters. This does not change the overall style of the plot but affects things such as the size of the labels and lines. The base context is a *notebook*, and the other contexts are *paper*, *talk*, and *poster*—versions of the notebook parameters scaled by 0.8, 1.3, and 1.6, respectively.

In [None]:
sns.set_context("poster")
plt.figure()
x1 = [10, 20, 5, 40, 8]
x2 = [30, 43, 9, 7, 20]
plt.plot(x1, label='Group A')
plt.plot(x2, label='Group B')
plt.legend()
plt.show()

## Color Palettes

There are three general groups of color palettes, namely *categorical*, *sequential*, and *diverging*, which we will break down in the following sections.                                      

### Categorical Color Palettes

Categorical palettes (or qualitative color palettes) are best suited for distinguishing categorical data that does not have an inherent ordering. Some examples where it is suitable to use categorical color palettes are line charts showing stock trends for different companies, and a bar chart with subcategories; basically, any time you want to group your data. 

There are six default themes in Seaborn: deep, muted, bright, pastel, dark, and colorblind:

<img src='images/categorical_color_palette.png' width=60%/>

In [None]:
sns.palplot(sns.color_palette("pastel"))

### Sequential Color Palettes

Sequential color palettes are appropriate for sequential data ranges from low to high values, or vice versa. It is recommended to use bright colors for low values and dark ones for high values. Some examples of sequential data are absolute temperature, weight, height, or the number of students in a class.

One of the sequential color palettes that Seaborn offers is cubehelix palettes. They have a linear increase or decrease in brightness and some variation in hue, meaning that even when converted to black and white, the information is preserved.

In [None]:
sns.cubehelix_palette(as_cmap=True)

The two main things you’ll change are the start (a value between 0 and 3) and rot, or number of rotations (an arbitrary value, but usually between -1 and 1)

In [None]:
sns.cubehelix_palette(start=.5, rot=-.5, as_cmap=True)

In [None]:
sns.cubehelix_palette(start=.5, rot=-.5, as_cmap=True)

Creating custom sequential palettes that only produce colors that start at either light or dark desaturated colors and end with a specified color can be accomplished with light_palette() or dark_palette().

In [None]:
sns.light_palette("magenta", as_cmap=True)

In [None]:
sns.dark_palette("magenta", as_cmap=True)

By default, creating a color palette only returns a list of colors. If you want to use it as a colormap object, for example, in combination with a heatmap, set the as_cmap=True argument, as demonstrated in the following example:

In [None]:
x = np.arange(25).reshape(5, 5)
ax = sns.heatmap(x, cmap=sns.cubehelix_palette(as_cmap=True))

### Diverging Color Palettes

Diverging color palettes are used for data that consists of a well-defined midpoint. An emphasis is placed on both high and low values. For example, if you are plotting any population *changes* for a particular region from some baseline population, it is best to use diverging colormaps to show the relative increase and decrease in the population. 

Seaborn provides few diverging palettes, e.g. `vlag`, `icefire`.  Matplotlib also provides a few, e.g. `Spectral` and `coolwarm`,

In [None]:
sns.color_palette('coolwarm', as_cmap=True)

You can also specify your own custom colormap for diverging data. This function makes diverging palettes using the husl color system. You pass it two hues (in degrees) and, optionally, the lightness and saturation values for the extremes.

You can refer to the hue wheel used here: 

<img src="images/color_wheel.png" width=60% />

In [None]:
sns.diverging_palette(120, 300, s=60, as_cmap=True)