# matplotlib and seaborn

In [None]:
import pip
!pip install numpy matplotlib seaborn

## Comparison
**matplotlib** is the standard Python plotting library. **seaborn** is based on matplotlib and provides a more high-level interface, different styles, and interfaces nicely with pandas dataframes.

Take a look at how the two libraries render a standard histogram:

In [None]:
import numpy as np
x = np.random.randn(100)
x[:10]

### Matplotlib
![Matplotlib logo](https://matplotlib.org/_static/logo2_compressed.svg)

In [None]:
import matplotlib.pyplot as plt
plt.hist(x)
plt.show()

### Seaborn
![Seaborn logo](https://seaborn.pydata.org/_static/logo-wide-lightbg.svg)

In [None]:
import seaborn as sns
sns.histplot(x)
plt.show()

## Visualization 101

### Data Types
Choosing the right plot type also depends on the type of data to visualize.

#### Quantitative Data
- **Continuous**: time, height, weight, temperature, speed, volume, price, power consumption $\rightarrow$ **measureable**
- **Discrete**: units sold, shoe size, quantities (e.g. number of languages spoken/emails received/workers/orders) $\rightarrow$ **countable**

#### Qualitative Data
- **Categorical/Nominal**: gender, hair color, country, languages, dog breed $\rightarrow$ cannot be arranged in any particular order
- **Ordinal**: rankings, school grades, income categories, pain scale $\rightarrow$ can be arranged in order

![Qualitative vs. Quantitative Data](https://i2.wp.com/intellspot.com/wp-content/uploads/2018/03/qualitative-and-quantitative-data-a-short-infographic.png?resize=680%2C363)

### Visual Encodings
How can we map from data to display elements?

#### Position
The intersection of a point on the x- and the y-axis indicates a data point's **position**. It's the most common visual encoding but can only encode two variables (at least three). 

#### Retinal Variables
- Size
- Color (Hue, Saturation)
- Shape
- Texture
- Orientation
- Line (Width, Type)

![Visual Encodings](https://images.squarespace-cdn.com/content/v1/5879aba73a0411c5027d77c9/1529320974623-SMYWQDSSE8VDAHOIK85R/ke17ZwdGBToddI8pDm48kH23KVWagbNOYpajbj_MQLNZw-zPPgdn4jUwVcJE1ZvWQUxwkmyExglNqGp0IvTJZamWLI2zvYWH8K3-s_4yszcp2ryTI0HqTOaaUohrI8PIJtg7yny0RBSV5PxpX1XPrwAROGqRUCBAuccPtaePpQsKMshLAGzx4R3EDFOm1kBS/image-asset.jpeg?format=1000w)

![Visual Encodings](https://clauswilke.com/dataviz/aesthetic_mapping_files/figure-html/common-aesthetics-1.png)

## Parts of a Figure
Most of the upcoming terminology might be straightforward. Note the difference between ``Figure`` and ``Axes``. The ``Figure`` is the final image and contains one or more ``Axes``. An ``Axes`` is one actual plot. ``Axis`` means something different and refers to the dimension in the plot and defines the limits, ranges, and ticks.

![Anatomy of a Figure](https://matplotlib.org/_images/anatomy1.png)

Source: https://matplotlib.org/stable/gallery/showcase/anatomy.html

## An Example

![Car Types](https://www.smartmotorist.com/wp-content/uploads/2019/01/types-of-cars-e1560223698227.png)

### Importing Libraries and Data Loading

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
# use matplotlib in jupyter notebook
%matplotlib inline  
import seaborn as sns

df = pd.read_csv('https://raw.githubusercontent.com/ADSLab-Salzburg/DataAnalysiswithPython/main/data/Automobile_data.csv', index_col=0)
df.head()

### Data Types
Which of the columns contain qualitative (categorical, ordinal) or quantitative (continuous or discrete) data?

### Seaborn's Plotting Functions
There are two ways to create a plot in seaborn:
1. (recommended) Pass a ``DataFrame`` to the ``data`` argument, while passing column names to the axes arguments, ``x`` and ``y``.
2. Directly pass in ``Series`` of data to the axes arguments.


### Scatter Plot
Seaborn provides a [``scatterplot()``](http://seaborn.pydata.org/generated/seaborn.scatterplot.html#seaborn.scatterplot) function.

In [None]:
sns.scatterplot(x='price', y='horsepower', data=df)
plt.show()

In [None]:
sns.scatterplot(x='price', y='horsepower', data=df,
                hue='body-style')  # color by body style
plt.show()

In [None]:
sns.scatterplot(x='price', y='horsepower', data=df,
                hue='body-style',
                style='body-style')  # change marker style according to body style
plt.show()

### Customization with Matplotlib
seaborn is a high-level interface to matplotlib. Some low-level functionality is still handled in matplotlib only:

#### Setting axis limits

In [None]:
sns.scatterplot(x='price', y='horsepower', data=df,
                hue='body-style', style='body-style')
plt.xlim(0, 50000)
plt.ylim(0, 300)
plt.show()

#### Defining a title

In [None]:
sns.scatterplot(x='price', y='horsepower', data=df,
                hue='body-style', style='body-style')
plt.xlim(0, 50000)
plt.ylim(0, 300)

plt.title('Correlation of Horsepower and Price for Different Vehicle Types')

plt.show()

#### Legend position

In [None]:
sns.scatterplot(x='price', y='horsepower', data=df,
                hue='body-style', style='body-style')
plt.xlim(0, 50000)
plt.ylim(0, 300)
plt.title('Correlation of Horsepower and Price for Different Vehicle Types')

plt.legend(loc='lower right')  # vs. 'best'

plt.show()

#### x-Label rotation

In [None]:
sns.scatterplot(x='price', y='horsepower', data=df,
                hue='body-style', style='body-style')
plt.xlim(0, 50000)
plt.ylim(0, 300)
plt.title('Correlation of Horsepower and Price for Different Vehicle Types')
plt.legend(loc='lower right')

plt.xticks(rotation=45)  # rotate x-labels by 45 degrees

plt.show()

### Seaborn themes
Another advantage of Seaborn is that it comes with decent style themes right out of the box. The default theme is called 'darkgrid'. We'll now use 'whitegrid' instead to plot the distribution of selected car companies' prices.

In [None]:
# Choose data
df_company = df.loc[df['company'].str.contains("volkswagen|audi|jaguar|chevrolet|toyota|alfa-romero"), :]

# Set theme
sns.set_style('whitegrid')

# Boxplot
sns.boxplot(x='company', y='price', data=df_company)
plt.xticks(rotation=45)
plt.title('Companies\' Price Distribution')
plt.show()

### Seaborn color palettes
Different countries (and their car manufacturers) are historically aligned with typical colors. We can define color palettes on our own by creating an ordered list of hex values (the same order as the x-axis). We also change the plot type to ``violin`` as this type also displays the data distribution instead of only the summary statistics:


In [None]:
# Define color palette
car_colors = ['#FF0000',  # red for Italian cars
              '#C0C0C0',  # silver for German cars
              '#000099',  # imperial blue for American cars
              '#005500',  # green for British cars
              '#C0C0C0',
              '#333333',  # black for Japanese cars (not too black as we still want to see some information)
              '#C0C0C0']
# Violin plot
sns.violinplot(x='company', y='price', data=df_company, palette=car_colors, scale='count')
plt.xticks(rotation=45)
plt.title('Companies\' Price Distribution')
plt.show()

As we can't see too much here, we change the plot type once again to ``swarmplot`` to see the individual data points:

In [None]:
# Swarm plot
sns.swarmplot(x='company', y='price', data=df_company, palette=car_colors)
plt.xticks(rotation=45)
plt.title('Companies\' Price Distribution')
plt.show()

### Combining Plots
We can plot several plots in the same axes to overlay the information. 

In [None]:
# Optional: Set figure size with matplotlib
plt.figure(figsize=(10,6))
# Violin plot
sns.violinplot(x='company', y='price', data=df_company, palette=car_colors, scale='count')
# Swarm plot
sns.swarmplot(x='company', y='price', data=df_company, color='w')
plt.xticks(rotation=45)
plt.title('Companies\' Price Distribution')
plt.show()

### Creating Subplots
We can also create several Axes per one figure.

In [None]:
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(16,8), sharey=True)
sns.violinplot(x='company', y='price', data=df_company, palette=car_colors, scale='count', ax=ax1)
sns.swarmplot(x='company', y='price', data=df_company, ax=ax2)

ax1.set_title('Companies\' Price Distribution (Violinplot)')
ax2.set_title('Companies\' Price Distribution (Swarmplot)')
ax2.set_ylabel("")
plt.show()

## Aside: The Boxplot

![Boxplot infographic](https://matplotlib.org/stable/_images/boxplot_explanation.png)

Source: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.boxplot.html

In [None]:
df_hatchback = pd.DataFrame(df.loc[df['body-style'].str.contains('hatchback'), 'price'])
sns.boxplot(data=df_hatchback)
sns.swarmplot(data=df_hatchback, color='k', alpha=.7)
plt.xticks([0], ['hatchback'])
plt.title('Hatchback Prices')
plt.show()

### How to use the plots externally
One does not only want to use the plots internally in notebooks and scripts. Storing the images from the pop-up window can not be automated easily. By using `plt.savefig(...)` (instead of `plt.show()`), we can specify the location to store the file.

__Hint:__ this can be combined with Git and LaTeX in a very cool way!

In [None]:
plt.figure(figsize=(10,6))
sns.violinplot(x='company', y='price', data=df_company, palette=car_colors, scale='count')
sns.swarmplot(x='company', y='price', data=df_company, color='w')
plt.xticks(rotation=45)
plt.title('Companies\' Price Distribution')

# instead of plt.show()
plt.savefig('sample_figure.png')

But we don't want to keep that file. Let's clean up. ;-)

In [None]:
import os; os.remove('sample_figure.png')

## Do you know XKCD?
![XKCD](https://imgs.xkcd.com/comics/python_environment.png)

If you want to make a graph like that, google for `matplotlib` and `xkcd`.

## Matplotlib, seaborn, pandas?
If confused about how and when to use which framework, the following thoughts can guide you:
1. Use pandas plotting utilities to get a first idea of what your data looks like.
2. Use seaborn for advanced styling and more complex graph types.
3. Use the underlying matplotlib functionality to customize your plot.

## Wrap-up Exercise
Browse the [seaborn API reference](http://seaborn.pydata.org/api.html) or the [example gallery](https://seaborn.pydata.org/examples/index.html) and come up with your visualization of the automobile data. Store it on your disc and upload it to Quiz 4. Two students' graphs should not look the same! I am curious about your visualizations.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

# Inspriation/Further Reading
- [Blog post about Python visualizatoin tools](https://blog.magrathealabs.com/choosing-one-of-many-python-visualization-tools-7eb36fa5855f)
- [matplotlib usage FAQ](https://matplotlib.org/faq/usage_faq.html)
- [Data visualization blog post](https://erikcianci.com/blog/data-visualization-fundamentals#data-types)
- [Python seaborn tutorial](https://elitedatascience.com/python-seaborn-tutorial)