# Plotting with Python

__Plotting is an essential component of data analysis__. It helps you to understand complex data sets and help others understand them too. However, the python visualization world can be a frustrating place. There are many different options and choosing the right one is a challenge. 

## The Python Visualization Landscape


![Python Visualization Landscape](_img/landscape.png)
 
 
Source: [Nicolas P. Rougier](https://github.com/rougier/python-visualization-landscape): An adaptation the Python Visualization Landscape slide from [Jake VanderPlas](http://vanderplas.com/)' keynote at PyCon 2017 ([slides](https://speakerdeck.com/jakevdp/pythons-visualization-landscape-pycon-2017), [video](https://www.youtube.com/watch?v=FytuB8nFHPQ)). 


Online you find many post and reports which compare and assess different Python libraries (e.g. blog post by [Dan Saber](https://dsaber.com/2016/10/02/a-dramatic-tour-through-pythons-data-visualization-landscape-including-ggplot-and-altair/) or by [Tim Hopper](http://pythonplot.com/)). These are very valuable sources of information if you are not yet sure which library fits best your needs. 

In this section we introduce two plotting libraries.


### matplotlib

[matplotlib](https://matplotlib.org/)  is probably the most well known 2D plotting Python library. It allows to produce publication quality figures in a variety of formats and interactive environments across platforms. However, matplotlib is the cause of frustration due to the complex syntax. 


### Seaborn

[Seaborn](http://seaborn.pydata.org/index.html) is a library for making attractive and informative statistical graphics in Python. It is built on top of [matplotlib](https://matplotlib.org/) and it supports [numpy](http://www.numpy.org/) and [pandas](http://pandas.pydata.org/) data structures and statistical routines from [scipy](https://scipy.org/) and [statsmodels](http://www.statsmodels.org/stable/index.html).

_If matplotlib tries to make easy things easy and hard things possible, seaborn tries to make a well-defined set of hard things easy too_ ([Quote by Michael Waskom](http://seaborn.pydata.org/introduction.html)).

## Data generation

For the sake of simplicity we create a sine curve a cosine curve for plotting. 

In [None]:
import numpy as np

X= np.linspace(start=2*-np.pi, stop=2*np.pi, num=300)
C = np.cos(X)
S = np.sin(X)

***
## Matplotlib

matplotlib, created by [John D. Hunter](https://en.wikipedia.org/wiki/John_D._Hunter), is probably the single most used Python package for 2D-graphics. It provides both a very quick way to visualize data from Python and publication-quality figures in many formats. 

matplotlib's `pyplot` module provides a convenient interface to the matplotlib object-oriented plotting library. It is modeled closely after [MATLAB](https://de.wikipedia.org/wiki/Matlab). 

We import the matplotlib libarary and matplotlib's `pyplot` module using the canonical commands

    import matplotlib as mpl
    import matplotlib.pyplot as plt

Further, when working with Jupyter notebooks we call the magic function

    % matplotlib inline

to make sure the figures are created within the notebook (actually the figures are shown directly in the cell output).

In [None]:
import matplotlib as mpl
import matplotlib.pyplot as plt

%matplotlib inline

It is worth noting that __matplotlib has two interfaces__. The first is based on MATLAB and uses a __state-based interface__. The second option is an an __object-oriented interface__. Hence, multiple solutions to problems that look somewhat similar but are not the same, which for sure causes confusion for new users if the looking for solutions to their tasks via [stack overflow](https://stackoverflow.com/) or Google search. _(Note that in this section we use the object oriented interface.)_

Yet another source of confusion is that matplotlib is well integrated into other Python libraries, such as [pandas](http://pandas.pydata.org/), [seaborn](http://seaborn.pydata.org/index.html), [xarray](http://xarray.pydata.org/en/stable/), among others. Hence, there is confusion as to when one should use pure matplotlib to plot something with a tool that is built on top of matplotlib. 

> **Be aware that anytime there is more than one way to build a visualization. Therefore it is challenging for the new or infrequent user to follow the right path.** 

###  Creating a `Figure` and an `Axes` object 

> With respect to matplotlib terminology it is important to understand that the `Figure` is the final image that may contain one or more axes. The `Axes` represent an individual plot.    

To create a `Figure` object we call

    plt.figure()

However, a more convenient way to create a `Figure` object and an `Axes` object at once, is to call

    plt.subplots() 
    
Note that the function call returns `Figure` object and an `Axes` object (or array of `Axes` objects).

In [None]:
fig, ax = plt.subplots()

In [None]:
print(type(fig))
print(type(ax))

Now can use the `Axes` object `ax` to add data for plotting. Therefore we simply call the `plot` function on the `Axes` object. Note that the `plot` function expect an `x` and a `y` argument.

### Plot some data

In [None]:
fig, ax = plt.subplots()
ax.plot(X, C)

To add more data on the same `Axes` object we call the `plot` function sequentially.

In [None]:
fig, ax = plt.subplots()
ax.plot(X, C)
ax.plot(X, S)

### Customizing the plot

There are many ways to provide different plotting styles. Look up the [documentation](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.subplots.html) for details.

For the sake of this tutorial we color the cosine curve green and use a dashed line with a line width of 3.5 for the sine curve.

In [None]:
fig, ax = plt.subplots()
ax.plot(X, C, color="green")
ax.plot(X, S, linestyle="--", linewidth=3.5)

> ### __Challenge: Customizing the plot__ 
The goal of this challenge is to customize our plot in various ways. We are going to 
* add a title to the plot
* add a legend to the plot
* set a limit for the x-axis
* set x and y ticks
* set tick labels for the x axis
* add a grid
* adjust the figure size
* save the figure to disk

Our final plot will look like this:

<img src="_img/my_awesome_mpl_figure.png" width="800">


In order to further design our plot we need to get familiar with the matplotlib terminology. The graphic below from the [matplotlib faq](https://matplotlib.org/faq/usage_faq.html) helps in understanding the different terminology of a plot.


<img src="_img/mpl_anatomy.png" width="600">

**Our baseline plot**

_Hint: Copy the code and paste it in the cells below to build your graphs sequentially. You can also use the `fig` statement at the end of a cell to display the figure_

In [None]:
fig, ax = plt.subplots()
ax.plot(X, C, color="green", label='cosine')
ax.plot(X, S, linestyle="--", linewidth=3.5, label='sine')

**Add a title to the plot**

_Hint: Use the_ `AXES.set_title` _method. Provide a title string and the font size (e.g. `size=16`)_


In [None]:
### your code here ...

**Add a legend to the plot**

_Hint: Use the_ `AXES.legend` _method. Provide the labels and the font size (e.g._ `fontsize=12`_)._ 


In [None]:
### your code here ...

**Set a limit for the x-axis to $-4$ to $4$**

_Hint: Use the_ `AXES.set_xlim` _method._

In [None]:
### your code here ...

**Set the x-ticks to $-\pi, 0, \pi$ and the y-ticks to $-1,0,1$**

_Hint: Use the_ `AXES.set_xticks` _and_ `AXES.set_yticks` _methods. The actual value of $\pi$ is availaibe in the_ `math` _and_ `numpy` _modules_


In [None]:
### your code here ...

**Set tick labels for the x axis to $-\pi, 0, +\pi$**

_Hint: Use the_ `AXES.set_xticklabels` _method. Write Greek letters using LaTeX notation:_ `'$-\pi$', '$0$', '$+\pi$'`_. For better readability scale the text size and add the_ `size=12` _argument._


In [None]:
### your code here ...

**Add a grid to the figure**

_Hint: Use the_ `AXES.grid` _method._

In [None]:
### your code here ...

**Save the figure to disk**

_Hint: Use the_ `FIGURE.savefig` _method. Save the figure as .png file in the folder_ `"../figures/YOUR_FIGURE_NAME.png"`_. Add the_ `dpi=300` _argument to enhance the quality._

In [None]:
### your code here ...

_If for whatever reason you did not make it, uncomment the line below to load a solution._

In [None]:
# %load ../src/_solutions/mpl_plot_customization.py

### Annotations, points, lines

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# x-coordinates
t=np.arange(0,20,0.001)

# y-coordinates
u=325*np.sin(2*np.pi*50*t/1000.0)

# Figures and Axes
fig,ax=plt.subplots() 
ax.plot(t,u,'b',lw=2,label="Kurve")

# Line
ax.plot([0,20],[230,255],'r--',label="Gerade")

# Point
ax.plot(5,325,'ro',label="Punkt")

# Annotation
plt.annotate('Hier ist ein wichtiger Punkt!', xy=(5, 325), xytext=(5, 200),
             arrowprops=dict(facecolor='black', shrink=0.05),
             fontsize=10, color='k')

ax.set_title("Beispielkurve")
ax.set_ylabel("y_label")
ax.set_xlabel("x_label")
ax.legend(loc='best')
ax.grid(color='g',ls='dashed',lw='0.5')

plt.show()

### Multiple plots

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Data
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
y3 = np.tan(x)
y4 = np.exp(x)

# Figure and Axes
fig, axs = plt.subplots(2, 2, figsize=(10, 8))

# Plot 1
axs[0, 0].plot(x, y1)
axs[0, 0].set_title('Plot 1')

# Plot 2
axs[0, 1].plot(x, y2)
axs[0, 1].set_title('Plot 2')

# Plot 3
axs[1, 0].plot(x, y3)
axs[1, 0].set_title('Plot 3')

# Plot 4
axs[1, 1].plot(x, y4)
axs[1, 1].set_title('Plot 4')


plt.tight_layout()
plt.show()

> **Challenge:** Arrange the plots for the following functions in a 3x2 Grid.

In [None]:
# Data
x = np.linspace(0, 2 * np.pi, 100)
y1 = np.sin(x)
y2 = np.cos(x)
y3 = np.tan(x)
y4 = np.exp(-x) * np.sin(2 * x)
y5 = np.log(x + 1)
y6 = np.sqrt(x)

In [None]:
### your code here...

In [None]:
# %load ../src/_solutions/multiple_plots.py

### Assigning axes-objects with a `for`- loop

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Data
x = np.linspace(0, 2 * np.pi, 100)
y_data = [
    np.sin(x),
    np.cos(x),
    np.tan(x),
    np.exp(-x) * np.sin(2 * x),
    np.log(x + 1),
    np.sqrt(x)
]

# rows, columns
nrows, ncols = 3, 2

# Subplots
fig, axs = plt.subplots(nrows, ncols, figsize=(6, 8))


for i in range(nrows):
    for j in range(ncols):
        index = i * ncols + j  # Berechne den Index für den Zugriff auf y_data
        axs[i, j].plot(x, y_data[index])
        axs[i, j].set_title(f'Plot {index + 1}')


plt.tight_layout()
plt.show()


### Individual plot titles

In [None]:
titles = {1:'This is the first plot',
          2:'This is the second plot',
          3:'This is the third plot',
          4:'This is the fourth plot',
          5:'This is the fifth plot',
          6:'This is the sixth plot'}

In [None]:
titles[6]

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# data
x = np.linspace(-2 * np.pi, 2 * np.pi, 10000)
y_data = [
    np.sin(x),
    np.cos(x),
    x**2,
    np.exp(x),
    1-np.cos(x)**2,
    1-np.sin(x)**2
]


nrows, ncols = 3, 2


fig, axs = plt.subplots(nrows, ncols, figsize=(10, 10))


for i in range(nrows):
    for j in range(ncols):
        index = i * ncols + j
        ax = axs[i, j]
        ax.plot(x, y_data[index])
        ax.set_title(f'{titles[index + 1]}')

        
        

plt.tight_layout()
plt.show()


> __Final note__: The matplotlib library is extremely powerful. For more information and inspiration visit the [gallery site](https://matplotlib.org/gallery/index.html) of the matplotlib documentation or check out the great tutorial by [Nicolas P. Rougier](https://github.com/rougier/matplotlib-tutorial) or the blog post by [Chris Moffitt](http://pbpython.com/effective-matplotlib.html), which were inspiring sources for this section.

***

## Seaborn

[Seaborn](https://seaborn.pydata.org/) is a Python visualization library based on matplotlib. It provides a high-level interface for drawing **attractive statistical graphics**.

conda install seaborn

conda install pandas

In [None]:
import seaborn as sns

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

### Lineplots in seaborn

#### Data Generation

In [None]:
import numpy as np

X= np.linspace(start=2*-np.pi, stop=2*np.pi, num=300)
C = np.cos(X)
S = np.sin(X)

In [None]:
import pandas as pd
dict = {'x': X, 'y': C, 'z': S}

data = pd.DataFrame(dict)
data

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Liniendiagramm erstellen
sns.lineplot(x='x', y='y', data=data, color = 'green', linewidth = 0.75)
sns.lineplot(x='x', y='z', data=data, color = 'blue', linestyle='--', linewidth = 2.5)

# Formatiere Ticks
sns.set_style("ticks")
plt.grid(True)
plt.xticks([-np.pi, 0, np.pi], [r'$-\pi$', '0', r'$\pi$'])
plt.yticks([-1,0,1])

plt.title('My awesome Plot in seaborn', size = 20)

plt.show()

### Load sample datasets in `seaborn`

In [None]:
print(sns.get_dataset_names())

### fmri-dataset

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset('fmri')

In [None]:
df.head(1)

In [None]:
sns.lineplot(x='timepoint'
             , y='signal'
             , data=df
             #, errorbar = None
            )

plt.show()


### car_crashes-dataset

In [None]:
crash_df = sns.load_dataset('car_crashes')

crash_df.head()

# Distribution Plots

In [None]:
_ = sns.displot(crash_df['not_distracted'])

# Joint Plot

In [None]:
sns.jointplot(x = 'speeding', y = 'alcohol', data = crash_df, kind = 'reg')

# KDE Plots

In [None]:
sns.kdeplot(crash_df['alcohol'])

# Boxplot

In [None]:
tips_df = sns.load_dataset('tips')

In [None]:
sns.boxplot(x='day', y='total_bill', data = tips_df)

In [None]:
sns.boxplot(x='day', y='total_bill', data = tips_df, hue = 'sex')

# Violin Plot

In [None]:
sns.violinplot(x = 'day', y = 'total_bill', data = tips_df, hue = 'sex')

In [None]:
sns.violinplot(x = 'day', y = 'total_bill', data = tips_df, hue = 'sex', split = True)

In [None]:
from matplotlib import colormaps
list(colormaps)

In [None]:
sns.violinplot(x = 'day', y = 'total_bill', data = tips_df, hue = 'sex', split = True, palette = 'Dark2_r')

# Pairplot

In [None]:
sns.pairplot(crash_df
             #,hue = 'alcohol'
            )

### tips-datset

In [None]:
tips_df = sns.load_dataset('tips')

In [None]:
tips_df.head(1)

In [None]:
sns.pairplot(tips_df, hue = 'sex')

### Multiple Plots - `FacetGrid`

In [None]:
g = sns.FacetGrid(data=tips_df, col="sex")
g.map(plt.scatter, "tip", "total_bill", alpha=.7);

### Iris Dataset

In [None]:
import seaborn as sns

# Use seaborn graphic settings
sns.set()

# Load the example Iris flower dataset
iris = sns.load_dataset("iris")
#iris

In [None]:
# Plot tip as a function of total bill across days
g = sns.lmplot(x="sepal_length", y="sepal_width", hue="species",
               truncate=True, height=8, data=iris)

# Use more informative axis labels than are provided by default
g.set_axis_labels("Sepal length (mm)", "Sepal width (mm)");

> **Challenge:** 

a) Create a pairplot of Irisdataset and color it according to species
 
b) Create a FacetGrid of scatterplots for `sepal_length` and `sepal_width`

In [None]:
### a) your code here... 

In [None]:
# %load ../src/_solutions/sns_pairplot.py

In [None]:
### b) your code here... 

In [None]:
# %load ../src/_solutions/sns_pairplot2.py

### Layout

In [None]:
# background
sns.set_style('white')

# size of figure
import matplotlib.pyplot as plt
plt.figure(figsize=(8,4))

# font size and formatting
sns.set_context('talk', font_scale=1.4)

# turn of axis
sns.despine(bottom = True)

sns.histplot(tips_df)

### Choosing a palette

*    deep
*    muted
*    bright
*    pastel
*    dark
*    colorblind

In [None]:
sns.color_palette('dark')

In [None]:
# color palette
sns.set_context('paper', font_scale=1.4)
plt.grid(False)
_ = sns.barplot(x = 'sex', y = 'total_bill', data= tips_df, hue = 'sex', palette = 'deep')

***