<img src="../figures/HeaDS_logo_large_withTitle.png" width="300">

<img src="../figures/tsunami_logo.png" width="600">

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Center-for-Health-Data-Science/PythonTsunami/blob/intro/Visualizations/Matplotlib.ipynb)

# Matploblib

*Prepared by Henry Webel at [NNF CPR](https://www.cpr.ku.dk/staff/rasmussen-group/?pure=en/persons/662319)  [![Twitter](https://img.shields.io/twitter/url/https/twitter.com/cloudposse.svg?style=social&label=Follow%20%40Henrywebel)](https://twitter.com/henrywebel)
  - Pre-requisites: Python Intro, NumPy, minimal Pandas

## Colab: Saving the notebook in Drive, TOC and updating one library
Save a copy in your drive if you want to save your changes: `File` -> `Save a copy in Drive`


![Save Colab Notebook in Google Drive](../figures/colab_save_in_drive.png)

or 

![Save Colab Notebook in Google Drive](../figures/colab_save_in_drive_2.png)


**Table of Contents in Colab**
> Allows easier navigation

![Table of content in Colab](../figures/colab_toc.png)

In [None]:
import plotly
if tuple(int(x) for x in plotly.__version__.split('.')) < (4,14):
  !pip install plotly --upgrade
  exit()

If the code in the `if`-claue runs, you will need to restart your runtime by pressing `RESTART RUNTIME`

![Restart Colab runtime after installation](../figures/colab_restart_runtime_after_install.png)

## Tutorial selection
- [cheat sheets](https://github.com/rougier/matplotlib-cheatsheet)
- [SciPy 2019](https://github.com/story645/mpl_tutorial)
- [Usage Guide](https://matplotlib.org/tutorials/introductory/usage.html#sphx-glr-tutorials-introductory-usage-py)
- [The lifecycle of a plot](https://matplotlib.org/tutorials/introductory/lifecycle.html)
- [All Tutorials from Matplotlib](https://matplotlib.org/tutorials/index.html)


## Objectives

1. Matplotlibs two [APIs](https://matplotlib.org/3.1.1/api/index.html#usage-patterns): pyplot API vs object-orientated API like interface
2. Distinction between figure, axes and axis
3. labels, ticks, legends, annotations
4. Some use-cases
5. No confusion when you search for help on [stackoverflow](https://stackoverflow.com/questions/tagged/matplotlib)

> The goal is not to introduce most parts of the API, but to make it accessible

## [Matplotlib](https://matplotlib.org/3.1.1/index.html)

- versatile set of instruction for plotting figures
- widely used by third party libraries
- supports many [backends](https://matplotlib.org/tutorials/introductory/usage.html#backends) (application, machine or operating system specific)

In [None]:
import matplotlib

matplotlib.__version__

In [None]:
matplotlib.get_backend()

## Definitions

- A *figure* in common language would probably not suggest the (painful?) _automated programming interface_ (API) naming
  - A *figure* can contain multiple **ax*e*s** (subplots), each of which in a 2D plot has two [**ax*e*s**](https://dictionary.cambridge.org/dictionary/english/axis) (which in singular are the **x-axis** and **y-axis**)
  - axes and axis are sometimes hard to distinguish when spoken ([check pronunciation](https://dictionary.cambridge.org/dictionary/english/axis))

![Figure, Axes and Axis](../figures/matplotlib/fig_axes_axis.png)

## Matplotlib API

- `pyplot` global functionality is a copy of `matlab`-plotting functionality.
- recommendation by matplotlib: Use object orientated plotting, see [usage guide](https://matplotlib.org/3.1.1/tutorials/introductory/usage.html#figure)

> An **application programming interface (API)** is a computing interface which defines interactions between multiple software intermediaries. It defines the kinds of calls or requests that can be made, how to make them, the data formats that should be used, the conventions to follow, etc. It can also provide extension mechanisms so that users can extend existing functionality in various ways and to varying degrees.[1] An API can be entirely custom, specific to a component, or it can be designed based on an industry-standard to ensure interoperability. Through information hiding, APIs enable modular programming, which allows users to use the interface independently of the implementation. ([Wikipedia](https://en.wikipedia.org/wiki/API))
>  
>Loosely defined, API describes everything an application programmer needs to know about piece of code to know how to use it. ([wiki.python.org](https://wiki.python.org/moin/API#:~:text=API%20is%20a%20shortcut%20for,know%20how%20to%20use%20it.))
>
> [5min Video on APIs](https://www.youtube.com/watch?v=GZvSYJDk-us&t=471s)

### Example

1. `pyplot` plotting
2. Object Orientated plotting

    
> We do not plot data just yet, but only create the objects needed for plotting

### Pyplot-API

In [None]:
import matplotlib.pyplot as plt

fig = plt.figure()
ax = plt.subplot()

> `matplotlib.pyplot` or `plt` commands always apply to the "current" figure or axes. 

### Object-Oriented API

In [None]:
import matplotlib

fig = matplotlib.figure.Figure()
ax = fig.add_subplot(1, 1, 1)
fig

> methods of objects (instances of a class) are used  
> Most methods have an equivalent function in `plt` omitting a "set_" or "get_" prefix, i.e. `get_

- `add_` prefix for `suplot` in example here

### Mixing up both

> This is in my impression the most common use  
> Later both functions from `matplotlib.pyplot` as well as methods of `Axes` and `Figure` objects will be used.

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()

> I use [`matplotlib.pyplot.subplots`](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.subplots.html#matplotlib.pyplot.subplots) most of the times  
> a `Figure` has the same function as a method [`matplotlib.figure.Figure.subplots`](https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.subplots)
- Idea: use a utility function to get a set of object instances.

In [None]:
# plt.subplots?  # see later section

## Anatomy of a figure
[code](https://matplotlib.org/3.1.1/gallery/showcase/anatomy.html)

> Reference and Sneak Peak on what you can do

![[Matplotlib Anatomoy of a Figure](https://matplotlib.org/3.1.1/gallery/showcase/anatomy.html)](https://matplotlib.org/3.1.1/_images/anatomy.png)

### Exercise and Playground: The plot without annotations

Can be used both as reference and exercise
1. Try to turn on and of some code to see what is changed
2. How to do this using the `matplotlib.pyplot` API? (In the next cell we only use the Object-Oriented matplotlib API for now!)

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.ticker import AutoMinorLocator, FuncFormatter, MultipleLocator

# Numpy part - please skip
np.random.seed(19680801)

X = np.linspace(0.5, 3.5, 100)
Y1 = 3 + np.cos(X)
Y2 = 1 + np.cos(1 + X / 0.75) / 2
Y3 = np.random.uniform(Y1, Y2, len(X))

data = {'X': X, 'red_line': Y1, 'blue_line': Y2, 'circles': Y3}
data = pd.DataFrame(data)
data.head()

In [None]:
# Matplotlib part
fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(1, 1, 1, aspect=1)


def minor_tick(x, pos):
    if not x % 1.0:
        return ""
    return "%.2f" % x


ax.xaxis.set_major_locator(MultipleLocator(1.000))
ax.xaxis.set_minor_locator(AutoMinorLocator(4))
ax.yaxis.set_major_locator(MultipleLocator(1.000))
ax.yaxis.set_minor_locator(AutoMinorLocator(4))
ax.xaxis.set_minor_formatter(FuncFormatter(minor_tick))

ax.set_xlim(0, 4)
ax.set_ylim(0, 4)

ax.tick_params(which="major", width=1.0)
ax.tick_params(which="major", length=10)
ax.tick_params(which="minor", width=1.0, labelsize=10)
ax.tick_params(which="minor", length=5, labelsize=10, labelcolor="0.25")

ax.grid(linestyle="--", linewidth=0.5, color=".25", zorder=-10)

ax.plot(X, Y1, c=(0.25, 0.25, 1.00), lw=2, label="Blue signal", zorder=10)
ax.plot(X, Y2, c=(1.00, 0.25, 0.25), lw=2, label="Red signal")
ax.plot(X, Y3, linewidth=0, marker="o",
        markerfacecolor="w", markeredgecolor="k")

ax.set_title("Anatomy of a figure", fontsize=20, verticalalignment="bottom")
ax.set_xlabel("X axis label")
ax.set_ylabel("Y axis label")

_ = ax.legend()

How to do this using the matplotlib.pyplot API?

In [None]:
# plt.xlim() # Get or set the x limits of the current axes.
# plt.xticks() # equivalent for current Axes ax.xaxis (check function signature) - both get and set possible!

## Plotting Ecosystem

Libraries using matplotlib
- [seaborn ](https://seaborn.pydata.org/)
- pandas `.plot`- method ([guide](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html), 
    [method-doc](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html#plotting), 
    [plotting-sublibrary](https://pandas.pydata.org/pandas-docs/stable/reference/plotting.html))
    
> Claim: You will hardly use matplotlib directly, when your data is in a [`pandas.DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)  
> If you program in pure numpy this would be different.
    

In [None]:
import seaborn as sns

sns.__version__

In [None]:
import pandas as pd

pd.__version__

In [None]:
# pd.plotting?

## Example using matplotlib, seaborn and pandas plotting together

### Data

The data is accumulated death counts for the four nordic countries from covid. It was extracted from a dataset loading in a later section of this notebook.

In [None]:
# some accumulated death number for covid for four countries
some_counts = {
    'Denmark': {'20-07-31': 615,
                '20-08-31': 624,
                '20-09-30': 650,
                '20-10-31': 721,
                '20-11-30': 837,
                '20-12-31': 1298,
                '21-01-31': 2126,
                '21-02-28': 2362,
                '21-03-31': 2420,
                '21-04-30': 2450},
    'Finland': {'20-07-31': 329,
                '20-08-31': 336,
                '20-09-30': 344,
                '20-10-31': 358,
                '20-11-30': 399,
                '20-12-31': 561,
                '21-01-31': 671,
                '21-02-28': 742,
                '21-03-31': 844,
                '21-04-30': 885},
    'Norway': {'20-07-31': 255,
               '20-08-31': 264,
               '20-09-30': 274,
               '20-10-31': 282,
               '20-11-30': 332,
               '20-12-31': 436,
               '21-01-31': 564,
               '21-02-28': 622,
               '21-03-31': 673,
               '21-04-30': 707},
    'Sweden': {'20-07-31': 5743,
               '20-08-31': 5821,
               '20-09-30': 5893,
               '20-10-31': 5938,
               '20-11-30': 6681,
               '20-12-31': 8727,
               '21-01-31': 11591,
               '21-02-28': 12826,
               '21-03-31': 13465,
               '21-04-30': 13761}
}

### Figure with 4 subplots

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 10))
axes

### Matplotlib bar plot

- check [`axes` API](https://matplotlib.org/3.1.0/api/axes_api.html) documentation


In [None]:
x = list(range(len(some_counts["Denmark"].keys())))
x

In [None]:
y = list(some_counts["Denmark"].values())
y

In [None]:
ax = axes[0, 0]
ax.bar(x, y)

In [None]:
fig

### Bar plot using Pandas

In [None]:
df = pd.DataFrame(some_counts)
df

In [None]:
axes[0, 1].clear()
df.plot(kind="bar", ax=axes[0, 1], rot=45)

In [None]:
fig

#### Side Note: Matplotlib also works with `pandas.DataFrames`

> Don't use it. Use Pandas directly!

In [None]:
axes[1, 0].bar(
    x=np.arange(len(df)) - 0.2, height="Denmark", width=0.2,
    data=df
)
axes[1, 0].bar(
    x=np.arange(len(df)), height="Norway", width=0.2,
    data=df
)
axes[1, 0].bar(
    x=np.arange(len(df)) + 0.2, height="Finland", width=0.2,
    data=df
)
display(fig)
axes[1, 0].clear()

In [None]:
axes[1, 0].clear()
fig

### Barplot in Seaborn

- seaborn summarizes the data 

In [None]:
ax = axes[1,1]
ax.clear()
ax = sns.lineplot(data=df, ax=ax)
fig

In [None]:
_ = ax.set_xticks(ticks=ax.get_xticks())
_ = ax.set_xticklabels(ax.get_xticklabels(), rotation=45)
fig

In [None]:
# sns.pairplot(data=df)

### Exercise: Plot something in the last subplot

- You could look at `df.diff()` and plot the counts for each months

In [None]:
print(axes)
fig

## Disgression: Interactive plots with plotly

> There is an [extensive guide](https://towardsdatascience.com/visualization-with-plotly-express-comprehensive-guide-eb5ee4b50b57) available, adapted to be [run convientenly in colab](https://colab.research.google.com/github/Center-for-Health-Data-Science/PythonTsunami/blob/intro/Visualizations/PlotlyExpress_ComprehensiveGuide.ipynb)

- `plotly.express` offers a convienient API to create plots, very much as the one from pandas.

In [None]:
import plotly.express as px
px.line(df)

In [None]:
fig_px = px.line(df,            
        title="Range Slider")
fig_px.update_xaxes(rangeslider_visible=True)

- As explained in the reference article, plotly expects most of the time data in the so called long-format. 
- The original data was in long-format, so you wouldn't need to do the following data manipulations

In [None]:
df_long = df.unstack()
df_long.index.rename(['Country', 'Date'], inplace=True)
df_long = df_long.to_frame(name='deaths')
df_long = df_long.reset_index()
df_long.head()

In [None]:
fig_px = px.line(df_long, x='Date', y='deaths', color='Country')
fig_px

In [None]:
fig_px.update_xaxes(rangeslider_visible=True)

- For the rest, explore the [linked guide in colab](https://colab.research.google.com/github/Center-for-Health-Data-Science/PythonTsunami/blob/intro/Visualizations/PlotlyExpress_ComprehensiveGuide.ipynb)
    - you can for example plot from pandas per default using plotly
    ```python
    import pandas as pd
    pd.options.plotting.backend = "plotly" # set plotly as plotting backend
    pd.options.plotting.backend = "matplotlib" # revert back
    ```
- An often mentioned alternative is [bokeh](https://docs.bokeh.org/en/latest/docs/first_steps.html)
- there is even more...

See a Python plotting tools overview from [this blog-article](https://pbpython.com/python-vis-flowchart.html)

![Python plotting tools overview](Python-plot-tools.png)

### Exercise: Plot the data in a bar plot

Have a look at the [documentation](https://plotly.com/python/plotly-express/). It can be run interactively using the launch [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/plotly/plotly.py/doc-prod?filepath=doc%2Fpython%2Fplotly-express.md) tag on top right corner of the website.

## Extras: Some spotlights

> Collection of options for figure layouts (tbc)

**Back to Matplotlib**

### Save a plot to disk (drive)

Often times you want to save a figure (containing one or more subplots to disk). This can be done using `matplotlib.Figure.savefig` method.

Let's save the last sections `Figure` instance called fig.

In [None]:
fig.suptitle(
    'Example using matplotlib, seaborn and pandas plotting interfaces in one figure', y=1.02)
fig.tight_layout()
fig

You can explicitly specify the figure format or just and the respective file-format.

In [None]:
fig.savefig(fname='my_first_saved_figure.pdf')

To see which formats your backend ("canvas") on your machine supports, run:

In [None]:
fig.canvas.get_supported_filetypes()

In **colab** you can mount your drive to save figures to your personal drive folder

In [None]:
# from google.colab import drive
# drive.mount('/content/drive')

### `matplotlib.pyplot.subplots`

- if sharing x-axis or y-axis is turned on, an object will be shared between subplots (axes). 

In [None]:
fig, axes = plt.subplots(
    nrows=1, ncols=1, sharex=False, sharey=False, gridspec_kw=None, figsize=(10, 10)
)

In [None]:
# import matplotlib.pyplot as plt
# import numpy as np
fig, axes = plt.subplots(
    nrows=2, ncols=2, sharex=True, sharey=True, gridspec_kw=None, figsize=(10, 10)
)
for _rows in axes:
    for _ax in _rows:
        _ax.hist(np.random.randn(500), bins=50, color="k", alpha=0.5)
plt.subplots_adjust(
    left=None, bottom=None, right=None, top=None, wspace=None, hspace=None
)  # gridspec_kw={'wspace': 0}

### `matplotlib.gridspec.GridSpec`

In [None]:
_image = np.arange(7*7).reshape((7, 7))

fig = plt.figure(figsize=(12, 8))
gs = matplotlib.gridspec.GridSpec(nrows=1, ncols=2, width_ratios=[2, 1])
ax1 = fig.add_subplot(gs[0, 0])
ax1.set_title("one", fontsize=14)
ax1.imshow(_image)  # plot the 1st image
ax1.axis("off")
ax2 = fig.add_subplot(gs[0, 1])
ax2.set_title("Output", fontsize=14)
ax2.imshow(_image)  # plot the output for the 1st image
ax2.axis("off")
plt.show()

#### Different sized subplots

- see also tutorial on [grid specifications](https://matplotlib.org/3.1.1/tutorials/intermediate/gridspec.html)

In [None]:
fig = plt.figure()
# define axes [left, bottom, width, height] as fractions of figure width and height.
frames = [[0.04, 0.08, 0.22, 0.90], [0.4, 0.08, 0.63, 0.90]]
axes = (fig.add_axes(frames[0], frame_on=False),
        fig.add_axes(frames[1], frame_on=True))

In [None]:
fig, axes = plt.subplots(
    ncols=2, gridspec_kw={"width_ratios": [5, 1], "wspace": 0.2}, figsize=(10, 4)
)

In [None]:
_ = axes[1].axis("off")
fig

In [None]:
fig.clear()  # fig.clf()
fig

In [None]:
# import matplotlib.pyplot as plt
from matplotlib import table

fig, axes = plt.subplots(
    ncols=2, gridspec_kw={"width_ratios": [5, 1], "wspace": 0.2}, figsize=(10, 4)
)

_counts_summed = pd.Series({0: 513823, 1: 219778, 2: 65228})
_counts_summed.name = "frequency"

ax = axes[0]
_ = _counts_summed.plot(kind="bar", ax=ax)
ax.set_xlabel("peptides from n miscleavages")
ax.set_ylabel("frequency")

ax = axes[1]
ax.axis("off")
_ = pd.plotting.table(
    ax=ax, data=_counts_summed, loc="best", colWidths=[1], edges="open"
)
_ = fig.suptitle("Peptides frequencies")

### Reusing code

- define a function which takes as first argument an `axes` object: `myfunc(ax, ...)`
- use only object-orientated matplotlib (OOM) API

> Reference: [Coding Style](https://matplotlib.org/3.1.1/tutorials/introductory/usage.html#coding-styles) section in Usage Guide

In [None]:
dict_results = {
    ("precision", "mean"): {"model_1": 0.84698, "model_2": 0.78923},
    ("precision", "std"): {"model_1": 0.05484, "model_2": 0.06584},
    ("recall", "mean"): {"model_1": 0.94122, "model_2": 0.2425},
    ("recall", "std"): {"model_1": 0.078694, "model_2": 0.08599},
}
df_results = pd.DataFrame(dict_results)
df_results

In [None]:
df_results.to_dict()

In [None]:
def plot_performance(ax, result, metric, title, _process_index=None):
    """Plot mean and standard deviation (std) of metrics.

    Parameters
    ----------
    ax : matplotlib.Axes
        Axes to draw on.
    result : pandas.DataFrame
        results. Rows are models. Each metric has a mean and stddev in a MultiIndex
        columns object of the type ('metric', ('mean', 'std'))
    metric : str
        The metric to select from the columns of the `result` DataFrame.
    title : str     
        Title of the axes
    _process_index : function, optional
        Function to process model names, by default None

    Returns
    -------
    matplotlib.Axes 
        Return reference to the passed ax of the argument `ax`
    """
    df = result
    df = df.sort_values(by=[(metric, 'mean')])
    colors = np.where(["1" in row for row in df.index], "darkred", "white")
    if _process_index is not None:
        df.index = _process_index(df.index)
    y = df.index
    width = df[(metric, "mean")]
    xerr = df[(metric, "std")]
    ax.set_xlim(0, 1.1)
    ax.tick_params(labelsize=15)
    ax.barh(
        y=y,
        width=width,
        xerr=xerr,
        capsize=4,
        color=colors,
        height=0.6,
        edgecolor="black",
    )

    metric_name = " ".join(metric.split("_")).capitalize()
    if metric == "f1":
        metric_name += " score"
    ax.set_title("{}\n{}".format(title, metric_name), fontsize=15)
    return ax

In [None]:
fig, axes = plt.subplots(1, 2, sharey=True)
ax = axes[1]
_ = plot_performance(
    ax=ax, result=df_results, metric="precision", title="Precision with stddev"
)

#### Exercise
- add the missing plot
- Plot confidence interval (not the stderr)
- Change to pandas plotting

### Reading the API specifications

In my opinion Matplotlib does not have an easy to read function signatures (arguments description).
  - some arguments are positional **only** (which allows for flexible arguments specification)
  - derived clases "hide" a lot of options in **kwargs** (read: keyword arguments)
  
Lets have a look at [`matplotlib.axes.Axes.plot`](https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.axes.Axes.plot.html#matplotlib.axes.Axes.plot) or [`matplotlib.figure.Figure`](https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure)

In [None]:
fig, ax = plt.subplots()
# ax.plot?

## Gallery
- Gallaries: [matplotlib](https://matplotlib.org/gallery), [seaborn](https://seaborn.pydata.org/examples/index.html), [pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html), [plotly](https://plotly.com/python/), [python graph gallery](https://www.python-graph-gallery.com/)
- check out some examples: [XKCD](https://matplotlib.org/3.1.1/gallery/showcase/xkcd.html#sphx-glr-gallery-showcase-xkcd-py)


## Covid19 data

Let's plot some Covid19 aggregates. 

- [COVID-19 repository with John Hopkins university data](https://github.com/datasets/covid-19)
- [European Centre for Disease Prevention and Control (ECDC): Testing data](ecdc.europa.eu/en/publications-data/covid-19-testing) and [ECDC daily numbers worldwide](https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide)
- [EU open data](https://data.europa.eu/euodp/en/data/dataset/covid-19-coronavirus-data) (not checked)

### John Hopkins university data
- [COVID-19 repository with John Hopkins university data](https://github.com/datasets/covid-19)

In [None]:
URL_COUNTRIES_AGG = "https://raw.githubusercontent.com/datasets/covid-19/main/data/countries-aggregated.csv"
URL_REFERENCE = "https://raw.githubusercontent.com/datasets/covid-19/main/data/reference.csv"

data_covid19 = pd.read_csv(URL_COUNTRIES_AGG, index_col="Date")
data_covid19.index = pd.to_datetime(data_covid19.index)
data_covid19_reference = pd.read_csv(URL_REFERENCE)

Data is aggregated over time, giving totals on a given day.

In [None]:
data_covid19.sort_index().tail(15)

In [None]:
mask = data_covid19.Country.unique()
mask = data_covid19_reference.Combined_Key.isin(mask)

population = data_covid19_reference.loc[mask, ["Combined_Key", "Population"]]
population = population.set_index("Combined_Key").dropna()
population.index.name = "Country"
population.sort_values(by="Population", ascending=False).head(10).T / 1_000_000

#### Example data for combining Matplotlib, pandas and seaborn plotting above

In [None]:
mask_nordics = data_covid19.Country.isin(["Denmark", "Sweden", "Norway", "Finland"])
nordics = data_covid19[mask_nordics]
nordics = nordics.pivot(columns="Country", values="Deaths")
nordics.tail()

In [None]:
nordics_monthly = nordics.resample("1M").last()
nordics_monthly.index = nordics_monthly.index.astype("string") 
nordics_monthly.tail(10).to_dict()

### European Centre for Disease Prevention and Control (ECDC)

- [European Centre for Disease Prevention and Control (ECDC): Testing data](ecdc.europa.eu/en/publications-data/covid-19-testing) and [ECDC daily numbers worldwide](https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide)

In [None]:
url_ecdc_weekly_cases = "https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/data.csv"
url_ecdc_weekly_testing = "https://opendata.ecdc.europa.eu/covid19/testing/csv"

ecdc_weekly_cases = pd.read_csv(
    url_ecdc_weekly_cases, parse_dates=True, infer_datetime_format=True)
ecdc_weekly_testing = pd.read_csv(url_ecdc_weekly_testing)

In [None]:
ecdc_weekly_cases.head()

In [None]:
ecdc_weekly_cases.dateRep = pd.to_datetime(ecdc_weekly_cases.dateRep)
ecdc_weekly_cases.set_index('dateRep').head()  # not persistent

In [None]:
ecdc_weekly_testing.tail()

### Open Excericse

- select some countries, select one or more timepoints and plot some descriptive statistics.
- just get into plotting, don't think too much about the data (do it, but don't get stuck!)

In [None]:
# your plots

## Color maps and Styles

- [Color maps](https://matplotlib.org/3.3.1/tutorials/colors/colormaps.html#sphx-glr-tutorials-colors-colormaps-py), abreviated `cmap`, map numeric values (of a certain range) to colors.
- [style sheets](https://matplotlib.org/3.3.1/gallery/style_sheets/style_sheets_reference.html#sphx-glr-gallery-style-sheets-style-sheets-reference-py) define several aspects at once

## Scientific Figures - Case Study

### Plot some (calibration) curves

In [None]:
x = np.linspace(0, 2, 100)

plt.plot(x, x, label="linear")
plt.plot(x, x ** 2, label="quadratic")
plt.plot(x, x ** 3, label="cubic")

plt.xlabel("x label")
plt.ylabel("y label")

plt.title("Simple Measurment Plots")

plt.legend()

plt.show()

### Plot error bars on bar plot 

Let's plot some performance metrics

1. Use seaborn on original data
2. Calculate errors yourself and add it to the mean

### Seaborn

- [docs](https://seaborn.pydata.org/)

In [None]:
# Load your own data and adapt it to the example from the section on plotting from matplotlib, seaborn and pandas in one figure
# sns.barplot(data=None, ax=None)

### Calculate error bars yourself

In [None]:
x = np.arange(1, 11)
y1 = np.array([4, 5, 6, 7, 20, 33, 44, 55, 66, 77])
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(10, 7))

yerr = y1 * np.random.random(len(y1))

ax.errorbar(x, y1, yerr=yerr, marker='o', capsize=5, ecolor='black')

In [None]:
y1_series = pd.Series(y1, index=range(1, 11))
y1_series.plot()  # reproduce the above plot

### Plot receiver operating curves (ROC) with their area under the curve (AUC) averages

- scikit learn [scikit-learn example](https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#sphx-glr-auto-examples-model-selection-plot-roc-py)

In [None]:
two_curves_tpr_fnr_threshold = [
    (np.array([0., 0., 0., 0., 0.03125, 0.03125, 0.0625,
               0.0625, 0.09375, 0.09375, 0.25, 0.25, 0.3125, 0.3125,
               0.34375, 0.34375, 0.40625, 0.4375, 0.5, 0.5, 0.53125,
               0.5625, 0.71875, 0.78125, 0.96875, 1.]),
     np.array([0., 0.05, 0.1, 0.475, 0.475, 0.525, 0.525, 0.55, 0.575,
               0.675, 0.675, 0.75, 0.75, 0.8, 0.8, 0.825, 0.825, 0.85,
               0.875, 0.9, 0.9, 0.925, 0.925, 0.925, 0.925, 1.]),
     np.array([1.99999331, 0.99999331, 0.99998854, 0.91223666, 0.84963339,
               0.8353687, 0.68986971, 0.65838198, 0.63379411, 0.52059319,
               0.44464182, 0.42263138, 0.4052373, 0.39234362, 0.38807793,
               0.37538563, 0.37119168, 0.36701706, 0.3628623, 0.35872791,
               0.35052229, 0.34240416, 0.30324951, 0.29947508, 0.25636921,
               0.17784229])
     ),
    (np.array([0., 0., 0., 0., 0.,
               0.03333333, 0.03333333, 0.06666667, 0.06666667, 0.06666667,
               0.1, 0.1, 0.13333333, 0.13333333, 0.16666667,
               0.26666667, 0.26666667, 0.33333333, 0.4, 0.43333333,
               0.43333333, 0.53333333, 0.63333333, 0.63333333, 0.7,
               0.73333333, 0.8, 0.8, 1., 1.]),
     np.array([0., 0.02380952, 0.28571429, 0.33333333, 0.42857143,
               0.45238095, 0.47619048, 0.47619048, 0.54761905, 0.5952381,
               0.61904762, 0.64285714, 0.64285714, 0.69047619, 0.71428571,
               0.71428571, 0.76190476, 0.76190476, 0.76190476, 0.78571429,
               0.88095238, 0.88095238, 0.88095238, 0.9047619, 0.9047619,
               0.9047619, 0.9047619, 0.92857143, 0.92857143, 1.]),
     np.array([1.99998948, 0.99998948, 0.97677558, 0.96747665, 0.63481054,
               0.6267451, 0.61451484, 0.61040487, 0.55146471, 0.54717785,
               0.52565108, 0.52133169, 0.51700907, 0.50835687, 0.50402855,
               0.49537078, 0.48671579, 0.43513235, 0.4181907, 0.41398366,
               0.38902387, 0.36463261, 0.3606304, 0.33703732, 0.33317917,
               0.3255299, 0.32173955, 0.31422917, 0.26785577, 0.17872516])
     )
]

In [None]:
from sklearn.metrics import roc_curve, auc


def plot_roc_curve(ax, runs_roc_auc_scores, endpoint=""):
    tprs = []
    base_fpr = np.linspace(0, 1, 101)
    roc_aucs = []
    for fpr, tpr, threshold in runs_roc_auc_scores:
        roc_auc = auc(fpr, tpr)
        roc_aucs.append(roc_auc)

        ax.plot(fpr, tpr, "royalblue", alpha=0.05)

        tpr = np.interp(base_fpr, fpr, tpr)
        tpr[0] = 0.0
        tprs.append(tpr)

    tprs = np.array(tprs)
    mean_tprs = tprs.mean(axis=0)
    std = tprs.std(axis=0)

    tprs_upper = mean_tprs + std
    tprs_lower = mean_tprs - std

    mean_rocauc = np.mean(roc_aucs).round(2)
    sd_rocauc = np.std(roc_aucs).round(2)
    se_rocauc = sd_rocauc / np.sqrt(len(roc_aucs))

    CI = (mean_rocauc - 1.96 * se_rocauc, mean_rocauc + 1.96 * se_rocauc)

    ax.plot(
        base_fpr,
        mean_tprs,
        color="royalblue",
        label="Mean ROC\n(AUC = {}±{})".format(mean_rocauc, sd_rocauc),
    )
    ax.fill_between(
        base_fpr, tprs_lower, tprs_upper, color="grey", alpha=0.4, label="±1 std. dev"
    )

    ax.plot([0, 1], [0, 1], "r--")
    ax.set_xlim([-0.01, 1.02])
    ax.set_ylim([-0.01, 1.02])
    ax.set_ylabel("True Positive Rate", fontsize=15)
    ax.set_xlabel("False Positive Rate", fontsize=15)
    ax.tick_params(labelsize=15)
    ax.legend(fontsize=12)
    ax.set_title("{}\nAUCs".format(endpoint), fontsize=15)
    # print("95% CI:{}".format(CI))
    return ax

In [None]:
fig, ax = plt.subplots()
_ = plot_roc_curve(
    ax=ax, runs_roc_auc_scores=two_curves_tpr_fnr_threshold, endpoint='Target')

## Remarks
- matplotlib is powerful and thus maybe frighting in the beginning
- many issues and technical details are still a mistery for me
- this notebook is not a reference, please browse the offical documentation for latest news
    - links to examples are referenced in (hopefully) by hyperlinks