<div class="alert block alert-info alert">

# <center> Scientific Programming in Python

## <center>Karl N. Kirschner<br>Bonn-Rhein-Sieg University of Applied Sciences<br>Sankt Augustin, Germany

# <center> Data Visualization using Seaborn

**Source**: https://seaborn.pydata.org/tutorial.html
<br><br>

- Enables more complex and alternate plots


- Uses matplotlib as a base
    - however, some syntax is different, while others are the same
    - functions that control some Seaborn things, for example:
        - markers=["D", "o"]
        - sizes=(10, 125)
        - edgecolor=".2"
        - linewidth=.5,
        - alpha=.75

### For citing matplotlib:
    
Waskom, M. L., (2021). seaborn: statistical data visualization. Journal of Open Source Software, 6(60), 3021, https://doi.org/10.21105/joss.03021
<br><br>    
    
**Bibtex file**:
@article{Waskom2021,<br>
    doi = {10.21105/joss.03021},<br>
    url = {https://doi.org/10.21105/joss.03021},<br>
    year = {2021},<br>
    publisher = {The Open Journal},<br>
    volume = {6},<br>
    number = {60},<br>
    pages = {3021},<br>
    author = {Michael L. Waskom},<br>
    title = {seaborn: statistical data visualization},<br>
    journal = {Journal of Open Source Software}<br>
 }

<hr style="border:2px solid gray"></hr>

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

## seaborn built-in styles
- sns.set(style="ticks")
    - `darkgrid`, `whitegrid`, `dark`, `white` and `ticks`

In [None]:
sns.set(style="ticks")

##  `scatterplot` for Scatter Plotting

`seaborn.scatterplot(*, x=None, y=None, hue=None, style=None, size=None, data=None, palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None, size_norm=None, markers=True, style_order=None, x_bins=None, y_bins=None, units=None, estimator=None, ci=95, n_boot=1000, alpha=None, x_jitter=None, y_jitter=None, legend='auto', ax=None, **kwargs)`

- https://seaborn.pydata.org/generated/seaborn.scatterplot.html#seaborn.scatterplot

- There are often different ways to do things, as demonstrated below


- Example: Plotting x, y data (i.e. 2D plots)
    - scatterplot
    - relplot with kind=scatter
    - replot with kind=line


- Notice: how we now define the source of the data using `data=df`.

In [None]:
# tips = sns.load_dataset("tips")

tips = pd.read_csv('tips.csv')

print(tips)

In [None]:
sns.scatterplot(x="total_bill", y="tip", data=tips, s=150)
plt.show()

In [None]:
## plot using the data's index
tips = tips.reset_index()
tips

## `relplot` for Relational Plot

`seaborn.relplot(*, x=None, y=None, hue=None, size=None, style=None, data=None, row=None, col=None, col_wrap=None, row_order=None, col_order=None, palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None, size_norm=None, markers=None, dashes=None, style_order=None, legend='auto', kind='scatter', height=5, aspect=1, facet_kws=None, units=None, **kwargs)`

- relationship between two variables

- https://seaborn.pydata.org/generated/seaborn.relplot.html

- `height` (in inches)

### Line Plot

- `kind='line'`

In [None]:
sns.relplot(x='index', y='total_bill', data=tips, kind='line', height=8)
plt.show()

### Line Plot

- `kind='scatter'`

In [None]:
sns.relplot(x="total_bill", y="tip", data=tips, kind='scatter', s=150, height=8)
plt.show()

Color code according to smoker yes|no.

In [None]:
sns.relplot(x="total_bill", y="tip", data=tips, kind='scatter', s=150, height=8,
            hue="smoker")
plt.show()

Color code according to size of the party.

In [None]:
sns.relplot(x="total_bill", y="tip", kind='scatter', data=tips, s=150, height=8,
            hue="size")
plt.show()

Now add differentiate according to the customer's sex using makers.

In [None]:
sns.relplot(x="total_bill", y="tip", kind='scatter', data=tips, s=150, height=8,
            hue="smoker",
            style="sex")
plt.show()

Create multiple plots based on values
- column: col="header"
- row: row="header"

    
Example: "day", "size"

In [None]:
sns.relplot(x="total_bill", y="tip", kind='scatter', data=tips, s=150,
            hue="time",
            col="day")
plt.show()

In [None]:
plt.figure()
sns.relplot(x="total_bill", y="tip", kind='scatter', data=tips, s=150,
            hue="time",
            col="day",
            col_wrap=2)
plt.show()

## Additional examples of plots that are not obvious to do using matplotlib

### Jointplots

- scatter plots with histograms


- https://seaborn.pydata.org/generated/seaborn.jointplot.html


**Kind**
- scatter
- reg (linear regression with 95% confidence interval)
- resid (residuals)
- kde (plots using kernel density estimates)
- hex (histogram with hexagonal bins)

In [None]:
sns.jointplot(x="tip", y="total_bill", data=tips,
             kind="scatter")
plt.show()

Add a regression line and kernel density fits

In [None]:
## linear regression with 95% confidence interval
sns.jointplot(x="tip", y="total_bill", data=tips, 
              kind="reg")
plt.show()

#### Kernel density estimation for smoothing data

- estimate population based on finite sampling

- https://en.wikipedia.org/wiki/Kernel_density_estimation

In [None]:
sns.jointplot(x="tip", y="total_bill", data=tips,
              kind="kde", space=0.1, color='r')
plt.show()

In [None]:
## a little more control with the number of levels
## cmap='Blues' or color='r'
sns.kdeplot(data=tips, x="tip", y="total_bill", n_levels=100, cmap='Blues', shade=True, thresh=0)
plt.show()

***
## Catagorical plots
For plots who data is within catagories (vs. looking for relationships via scatter plots).

In [None]:
## barplots
sns.catplot(x="day", y="total_bill", hue="smoker",
            kind="bar", data=tips)
plt.show()

#### Violin Plots

- https://en.wikipedia.org/wiki/Violin_plot


- https://seaborn.pydata.org/generated/seaborn.violinplot.html

In [None]:
sns.violinplot(x="day", y="total_bill", data=tips,
               palette="Set3")
plt.show()

In [None]:
sns.violinplot(x="day", y="total_bill", data=tips,
               palette="Set3",
               hue="smoker", split=True)
plt.show()

In [None]:
## plot bill total and smoker as a function of day
## note: three variables shown in 2-D

sns.catplot(x="day", y="total_bill", data=tips,
            palette="Set1",
            hue="smoker",
            kind="swarm")
plt.show()

#### stripplot

- A scatterplot where one variable is categorical
- Use in combo with violin or box plots

- https://seaborn.pydata.org/generated/seaborn.stripplot.html

In [None]:
sns.stripplot(x="day", y="total_bill", data=tips, jitter=0.2)

sns.violinplot(x="day", y="total_bill", data=tips, palette="Set3")

plt.show()

***
## Subplots

In [None]:
fig, axes = plt.subplots(1, 2, sharey=True, figsize=(6, 4))

sns.boxplot(x="day", y="tip", data=tips,
            ax=axes[1])

sns.scatterplot(x="total_bill", y="tip", data=tips,
                hue="day",
                ax=axes[0])

plt.show()

## Pairwise associations of the data, with distributions

- Univariate distribution of the column data along the diagonal axes


- A good way to get an over of possible relationships within the data

In [None]:
sns.pairplot(data=tips, hue="day", kind="scatter")
plt.show()

In [None]:
sns.pairplot(data=tips, hue="day", kind="scatter", corner=True)
plt.show()

In [None]:
sns.pairplot(data=tips, hue="day", kind="reg", corner=True)
plt.show()

***
### Color Paletts in Seaborn
https://seaborn.pydata.org/tutorial/color_palettes.html#palette-tutorial

**Possible values**:

Accent, Accent_r,

Blues, Blues_r, BrBG, BrBG_r, BuGn, BuGn_r, BuPu, BuPu_r,

CMRmap, CMRmap_r,

Dark2, Dark2_r,

GnBu, GnBu_r, Greens, Greens_r, Greys, Greys_r,

OrRd, OrRd_r, Oranges, Oranges_r,

PRGn, PRGn_r, Paired, Paired_r, Pastel1, Pastel1_r, Pastel2, Pastel2_r, PiYG, PiYG_r, PuBu, PuBuGn, PuBuGn_r, PuBu_r, PuOr, PuOr_r, PuRd, PuRd_r, Purples, Purples_r,

RdBu, RdBu_r, RdGy, RdGy_r, RdPu, RdPu_r, RdYlBu, RdYlBu_r, RdYlGn, RdYlGn_r, Reds, Reds_r,

Set1, Set1_r, Set2, Set2_r, Set3, Set3_r, Spectral, Spectral_r,

Wistia, Wistia_r,

YlGn, YlGnBu, YlGnBu_r, YlGn_r, YlOrBr, YlOrBr_r, YlOrRd, YlOrRd_r,

afmhot, afmhot_r, autumn, autumn_r,

binary, binary_r, bone, bone_r, brg, brg_r, bwr, bwr_r,

cividis, cividis_r, cool, cool_r, coolwarm, coolwarm_r, copper, copper_r, cubehelix, cubehelix_r, flag, flag_r,

gist_earth, gist_earth_r, gist_gray, gist_gray_r, gist_heat, gist_heat_r, gist_ncar, gist_ncar_r, gist_rainbow, gist_rainbow_r, gist_stern, gist_stern_r, gist_yarg, gist_yarg_r, gnuplot, gnuplot2, gnuplot2_r, gnuplot_r, gray, gray_r,

hot, hot_r, hsv, hsv_r, icefire, icefire_r, inferno, inferno_r, jet, jet_r,

magma, magma_r, mako, mako_r,

nipy_spectral, nipy_spectral_r,

ocean, ocean_r,

pink, pink_r, plasma, plasma_r, prism, prism_r,

rainbow, rainbow_r, rocket, rocket_r,

seismic, seismic_r, spring, spring_r, summer, summer_r,

tab10, tab10_r, tab20, tab20_r, tab20b, tab20b_r, tab20c, tab20c_r, terrain, terrain_r, twilight, twilight_r, twilight_shifted, twilight_shifted_r,

viridis, viridis_r, vlag, vlag_r,

winter, winter_r

---
#### Qualitative

- Used with catagorical data (e.g. days of the week, smoker/non-smoker)


- For example:
    - pastel, deep, dark, colorblind
    - Paired, Set2

In [None]:
len(sns.color_palette("pastel"))

In [None]:
sns.palplot(sns.color_palette("pastel", 10))

In [None]:
palette = "deep"
sns.palplot(sns.color_palette(palette, len(sns.color_palette(palette))))

In [None]:
palette = "dark"
sns.palplot(sns.color_palette(palette, len(sns.color_palette(palette))))

In [None]:
palette = "colorblind"
sns.palplot(sns.color_palette(palette, len(sns.color_palette(palette))))

In [None]:
palette = "Set2"
sns.palplot(sns.color_palette(palette, len(sns.color_palette(palette))))

In [None]:
palette = "Paired"
sns.palplot(sns.color_palette(palette, len(sns.color_palette(palette))))

#### Sequential

- Sequential data where you need to highlight the one end and the middle part of the data (e.g. elevation)


- Examples
    - Blues
    - cubehelix
    - GnBu_d


In [None]:
sns.palplot(sns.color_palette("Blues", 20))

In [None]:
## reversed
sns.palplot(sns.color_palette("Blues_r", 20))

In [None]:
sns.palplot(sns.color_palette("GnBu", 20))

In [None]:
## dark
sns.palplot(sns.color_palette("GnBu_d", 20))

In [None]:
sns.palplot(sns.color_palette("cubehelix", 20))

#### divergent

- Use when the high and low values are more intersting (and the middle is not) - e.g. data normalized from -1 to +1

- Example
    - Spectral
    - BrBG
    - RdBu
    

In [None]:
sns.palplot(sns.color_palette("Spectral", 20))

In [None]:
sns.palplot(sns.color_palette("BrBG", 20))

In [None]:
sns.palplot(sns.color_palette("RdBu", 20))

In [None]:
## qualitative: dark vs Set2
sns.pairplot(data=tips, hue="day", kind="scatter", palette='Set2', corner=True)

plt.show()

---

#### Additional examples for other packages:

https://colab.research.google.com/notebooks/charts.ipynb#scrollTo=Xn0jLwr8evoR