# **Lecture 9: Data Visualization with Matplotlib**

## Introduction to Data Visualization

### Why Data Visualization?
- Facilitates understanding of complex data.
- Enables identification of patterns, trends, and outliers.
- Aids in communicating findings effectively.

### Introduction to Matplotlib
- A comprehensive library for creating static, animated, and interactive visualizations in Python.
- Designed to work well with NumPy and pandas data structures.
- Widely used for its flexibility and ease of use.

## Getting Started with Matplotlib

### Installation
Ensure Matplotlib is installed using pip:
```python
!pip install matplotlib
```
### A typical matplotlib figure
![](https://matplotlib.org/stable/_images/anatomy.png)

#### Figure

The **whole** figure.  The Figure keeps
track of all the child Axes, a group of
'special' Artists (titles, figure legends, colorbars, etc), and
even nested subfigures.

The easiest way to create a new Figure is with pyplot::

```python
fig = plt.figure()  # an empty figure with no Axes
fig, ax = plt.subplots()  # a figure with a single Axes
fig, axs = plt.subplots(2, 2)  # a figure with a 2x2 grid of Axes
# a figure with one axes on the left, and two on the right:
fig, axs = plt.subplot_mosaic([['left', 'right_top'],
                              ['left', 'right_bottom']])
```

It is often convenient to create the Axes together with the Figure, but you
can also manually add Axes later on.

### Axes

An Axes is an Artist attached to a Figure that contains a region for
plotting data, and usually includes two (or three in the case of 3D)
Axis objects (be aware of the difference
between **Axes** and **Axis**) that provide ticks and tick labels to
provide scales for the data in the Axes. Each Axes also
has a title
(set via set_title), an x-label (set via
set_xlabel), and a y-label set via
set_ylabel).

The Axes class and its member functions are the primary
entry point to working with the OOP interface, and have most of the
plotting methods defined on them (e.g. ``ax.plot()``, shown above, uses
the plot method)

### Axis

These objects set the scale and limits and generate ticks (the marks
on the Axis) and ticklabels (strings labeling the ticks).

### Artist

Basically, everything visible on the Figure is an Artist.

## **Basic Plotting  with Matplotlib**

### Line Plots

Importing Matplotlib and creating a simple line plot:
```python
import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]

# Creating a line plot
plt.plot(x, y)
plt.title('Simple Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()
```

or

```python
fig, ax = plt.subplots(figsize=(5, 2.7))
ax.plot(x, y)
ax.set_title('Simple Line Plot')
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')
```

### **Coding styles: The explicit and the implicit interfaces**

As noted above, there are essentially two ways to use Matplotlib:

- Explicitly create Figures and Axes, and call methods on them (the
  "object-oriented (OO) style").
- Rely on pyplot to implicitly create and manage the Figures and Axes, and
  use pyplot functions for plotting.

In general, we suggest using the OO style, particularly for
complicated plots, and functions and scripts that are intended to be reused
as part of a larger project. However, the pyplot style can be very convenient
for quick interactive work.

### Bar Charts
Comparing quantities among different categories.
```python
categories = ['A', 'B', 'C', 'D']
values = [10, 23, 6, 18]

fig, ax = plt.subplots(figsize=(5, 2.7))
ax.bar(categories, values)
ax.set_title('Sample Bar Chart')
ax.set_xlabel('Category')
ax.set_ylabel('Value')
```

### Scatter Plots
Visualizing relationships between two variables.
```python
# Random data generation
x = np.random.rand(50)
y = np.random.rand(50)

fig, ax = plt.subplots(figsize=(5, 2.7))
ax.scatter(x, y)
ax.set_title('Random Scatter Plot')
ax.set_xlabel('X')
ax.set_ylabel('Y')
plt.show()
```

### Histograms
Understanding the distribution of a dataset.
```python
data = np.random.randn(1000)

fig, ax = plt.subplots(figsize=(5, 2.7))
ax.hist(data, bins=30)
ax.set_title('Histogram')
ax.set_xlabel('Value')
ax.set_ylabel('Frequency')
```

## Customizing Plots

### Enhancing Visual Appeal
Customizing colors, markers, and line styles.
```python
x = np.linspace(0, 10, 100)
y = np.cos(x)

fig, ax = plt.subplots(figsize=(5, 2.7))
ax.plot(x, y, '-r', marker='o', label='cos(X)')
ax.set_title('Customized Plot')
ax.set_xlabel('X')
ax.set_ylabel('cos(X)')
ax.legend()
```

### Adding Annotations
Highlighting key insights.
```python
ax.plot(x, y, '-g')
ax.set_title('Plot with Annotation')
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_annotate('Local Max', xy=(6.28, 1), xytext=(7, 1.5),
             arrowprops=dict(facecolor='black', shrink=0.05))
```

## **Example**
Plot the time series of COVID cases in WI
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from google.colab import drive
drive.mount('/content/drive')

filedir = '/content/drive/MyDrive/Teaching/FWE458_Spring2024/Lec9/'
fname = filedir + "COVID19-Historical-V2-ST.csv"

# get data for plots
DF = pd.read_csv(fname)

x = DF.Date
y1 = DF.POS_NEW_CP
y2 = DF.POS_7DAYAVG_CP

fig, ax = plt.subplots(figsize=(15, 3))

ax.bar(x, y1, label='new cases')
ax.set_xlabel("Date")
ax.set_ylabel("# of new cases")
ax.set_title("COVID cases in Wisconsin")
ax.plot(x, y2, 'r-*', linewidth = 3, label='7-day average')

start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, 40))
ax.tick_params(axis='x', labelrotation = 45)
ax.legend()

```

Relationship between new death and new cases
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# get data for plots
DF = pd.read_csv(fname)

x = DF.POS_7DAYAVG_CP
y = DF.DTH_7DAYAVG_CP
xtime = pd.to_datetime(DF.Date)
year = xtime.dt.year
fig, ax = plt.subplots(figsize=(15, 3))
ax.scatter(x, y, s=None, c=year)
ax.set_xlabel("# new cases")
ax.set_ylabel("# of new death")
ax.set_title("COVID cases vs death in Wisconsin")

m, b = np.polyfit(x[year==2020], y[year==2020], 1)
ax.plot(x, m*x + b)
m, b = np.polyfit(x[year==2021], y[year==2021], 1)
ax.plot(x, m*x + b)
m, b = np.polyfit(x[year==2022], y[year==2022], 1)
ax.plot(x, m*x + b)
```

Change Line Styles and set axis limits; change y axis to log scale
```python
# get data for plots
DF = pd.read_csv(fname)

x = DF.Date
y1 = DF.POS_7DAYAVG_CP
y2 = DF.NEG_7DAYAVG

fig, ax = plt.subplots(figsize=(10, 2.7))
ax.plot(x, y1, label='Positive 7-day average')  # Plot the first line
ax.plot(x, y2, label='Negative 7-day average')  # Plot the second line

#ax.plot(x, y1, 'k:', label='Positive 7-day average', linewidth=3)  # Plot the first line
#ax.plot(x, y2, 'r-o', label='Negative 7-day average', MarkerSize=3)  # Plot the second line

ax.set_xlabel('Date')  # Add an x-label to the axes.
ax.set_ylabel('# of cases')  # Add a y-label to the axes.
ax.set_title("Test results of COVID in WI")  # Add a title to the axes.
ax.legend();  # Add a legend.

# set xlim
#start, end = ax.get_xlim()
#ax.set_xlim(end-100, end);
##ax.set_xlim(1, 10);

# change y axis to log scale
#ax.set_yscale('log')

start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, 40))
ax.tick_params(axis='x', labelrotation = 45)
```

Subplots

```python
x = DF.Date
y1 = DF.POS_ASN_CP
y2 = DF.POS_BLK_CP
y3 = DF.POS_WHT_CP
y4 = DF.POS_MLTOTH_CP


fig, axs = plt.subplots(1,4, figsize=(10, 8), sharex=True, sharey=True)

axs[0].plot(x, y1, 'k-', label='POS_ASN_CP', linewidth=3)  # Plot the first line
axs[0].set_xlabel('Date')  # Add an x-label to the axes.
axs[0].set_ylabel('# of cases')  # Add a y-label to the axes.
axs[0].set_title("POS_ASN_CP")  # Add a title to the axes.
start, end = axs[0].get_xlim()
axs[0].xaxis.set_ticks(np.arange(start, end, 80))
axs[0].tick_params(axis='x', labelrotation = 45)

axs[1].plot(x, y2, 'k-', label='POS_BLK_CP', linewidth=3)  # Plot the first line
axs[1].set_xlabel('Date')  # Add an x-label to the axes.
axs[1].set_ylabel('# of cases')  # Add a y-label to the axes.
axs[1].set_title("POS_BLK_CP")  # Add a title to the axes.
start, end = axs[1].get_xlim()
axs[1].xaxis.set_ticks(np.arange(start, end, 80))
axs[1].tick_params(axis='x', labelrotation = 45)

axs[2].plot(x, y3, 'k-', label='POS_WHT_CP', linewidth=3)  # Plot the first line
axs[2].set_xlabel('Date')  # Add an x-label to the axes.
axs[2].set_ylabel('# of cases')  # Add a y-label to the axes.
axs[2].set_title("POS_WHT_CP")  # Add a title to the axes.
start, end = axs[2].get_xlim()
axs[2].xaxis.set_ticks(np.arange(start, end, 80))
axs[2].tick_params(axis='x', labelrotation = 45)

axs[3].plot(x, y4, 'k-', label='POS_MLTOTH_CP', linewidth=3)  # Plot the first line
axs[3].set_xlabel('Date')  # Add an x-label to the axes.
# axs[3].set_ylabel('# of cases')  # Add a y-label to the axes.
axs[3].set_title("POS_MLTOTH_CP")  # Add a title to the axes.
start, end = axs[3].get_xlim()
axs[3].xaxis.set_ticks(np.arange(start, end, 80))
axs[3].tick_params(axis='x', labelrotation = 45)

fig.tight_layout()
```

```python
fig, axs = plt.subplot_mosaic([['upleft', 'right'],
                               ['lowleft', 'right']])
axs['upleft'].set_title('upleft')
axs['lowleft'].set_title('lowleft')
axs['right'].set_title('right');
axs['right'].plot(x,y)
fig.tight_layout()
```