![matplotlib](https://matplotlib.org/_static/logo2.png)

(image: matplotlib.org)

## Workshop: Matplotlib and Data Visualization

In this workshop, we will cover using Matplotlib to create data visualizations.

By now, you have already seen Matplotlib in action in the NumPy and Pandas workshops. This workshop serves as a more structured introduction to Matplotlib.

Specifically, we'll be focusing on `matplotlib.pyplot`.

### Cheatsheet

https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Matplotlib_Cheat_Sheet.pdf

### References

1. https://matplotlib.org/api/pyplot_summary.html
2. https://scipy-cookbook.readthedocs.io/items/idx_matplotlib_simple_plotting.html


### Installation
Windows: Start Button -> "Anaconda Prompt"

Ubuntu / MacOS: conda should be in your path

Activate the environment

```
conda activate module1
```

Matplotlib should already be installed. If not, install it:
```
conda install matplotlib
```

### Average Daily Polyclinic Attendances for Selected Diseases

We will be practicing `matplotlib` concepts on this dataset.

### Download Instructions

1. Go to https://data.gov.sg/dataset/average-daily-polyclinic-attendances-selected-diseases
2. Click on the Download button
3. Unzip and extract the .csv file. Note the path for use below

Note: on Windows you may wish to rename the unzipped folder to something shorter.

In [3]:
import matplotlib
matplotlib?

In [2]:
import matplotlib.pyplot as plt
plt?

### Read the data

We'll use `pandas.read_csv` to read the data

In [23]:
import pandas as pd

# Use pandas to read the CSV file into a pandas.DataFrame,
# setting the 0th column as the index

df = pd.read_csv('D:/tmp/polyclinic-attendance/average-daily-polyclinic-attendances-for-selected-diseases.csv',
                index_col=0)
df.head(5)

Unnamed: 0_level_0,disease,no._of_cases
epi_week,Unnamed: 1_level_1,Unnamed: 2_level_1
2012-W01,Acute Upper Respiratory Tract infections,2932
2012-W01,Acute Conjunctivitis,120
2012-W01,Acute Diarrhoea,491
2012-W01,Chickenpox,18
2012-W02,Acute Upper Respiratory Tract infections,3189


In [11]:
df.index

Index(['2012-W01', '2012-W01', '2012-W01', '2012-W01', '2012-W02', '2012-W02',
       '2012-W02', '2012-W02', '2012-W03', '2012-W03',
       ...
       '2017-W50', '2017-W50', '2017-W51', '2017-W51', '2017-W51', '2017-W51',
       '2017-W52', '2017-W52', '2017-W52', '2017-W52'],
      dtype='object', name='epi_week', length=1252)

Hmm, looks like this date format isn't recognized.

We'll need to supply a custom date parser.

In [22]:
# create the parser 
def parse_date(date):
    """Parses a yyyy-WNN date string
    Args:
        date: a date string in the yyyy-WNN format
    Returns:
        A pandas.datetime64 
    """
    # https://stackoverflow.com/questions/17087314/get-date-from-week-number
    return pd.datetime.strptime(date + '-0', '%Y-W%W-%w')

def parse_dates(dates):
    """Parses a list of dates
    Args:
        dates: a list of dates
    Returns:
        A list of pandas.datetime64
    """
    return [parse_date(d) for d in dates]

# test the parser
parse_dates(['2012-W01', '2012-W52'])

[datetime.datetime(2012, 1, 8, 0, 0), datetime.datetime(2012, 12, 30, 0, 0)]

### Re-read the CSV with custom date parser

Let's try to plot the total number of cases over time.

### Plot workflow

Here's a basic workflow for creating a plot.

```
import matplotlib.pyplot as plt

# create subplots lined up as 1 row and 2 columns
# 20
# ax1 and ax2 are the axes for each of the subplot
fig, (ax1, ax2) = plot.subplots(nrows=1, ncols=2,
                                figsize=(20, 10)

ax1.plot(x1, y1)
ax1.set(title='The left plot',
        ylabel='the y-axis',
        xlabel='the x-axis')

ax2.plot(x2, y2)
ax2.set(title='The right plot',
        ylabel='the y-axis',
        xlabel='the x-axis')
plt.show()
```