# Lab 5B - Plotting with `matplotlib`
*Day 5 - August 5, 2024*

*I School Python Bootcamp*

*Author: Lauren Chambers*

`matplotlib` is one of the most widely used libraries for data visualization in Python. It provides a comprehensive and flexible framework for creating a wide variety of static, animated, and interactive plots and charts. Whether you are analyzing scientific data, exploring large datasets, or simply looking to present your results in a visually appealing manner, `matplotlib` has the tools you need. 

In this notebook, we will explore the fundamentals of Matplotlib, including how to create basic plots, customize their appearance, and combine multiple plots into a single figure. 

Throughout we'll be using a sample dataset that represents passengers on the Titanic. First we need to load our libraries and import the data.

In [None]:
import csv
import matplotlib.pyplot as plt
import pandas as pd

In [None]:
titanic_data = []

with open('titanic.csv', newline='') as f:
    reader = csv.reader(f)
    for row in reader:
        titanic_data.append(row)

titanic_data

...Let's cheat a bit and use `pandas` to more intuitively show what our dataset looks like

In [None]:
pd.read_csv("titanic.csv")

## Basics of `matplotlib`
Okay, now that we have a better handle on our data, we can get plotting.

Just like with `drawsvg`, the initalization code for `matplotlib` doesn't look like anything if you display it:

In [None]:
# Initialize
plt.figure()

# Display
plt.show()

Let's do a simple line showing fare versus passenger ID:

In [None]:
# Initialize
plt.figure()

# Process data
passenger_ids = [int(row[0]) for row in titanic_data[1:]]
fares = [float(row[9]) for row in titanic_data[1:]]

# Draw plot and axis labels
plt.plot(passenger_ids, fares)
plt.ylabel("Fare ($USD)")
plt.xlabel("Passenger ID")

# Display
plt.show()

That's all well and good, but perhaps unsurprisingly, there don't seem to be any interesting correlations between the passenger IDs and the fare they paid to get onto the ship. Let's try a different plot type to get something more interesting.

In [None]:
# Initialize
plt.figure()

# Process data
fares = [float(row[9]) for row in titanic_data[1:]]

# Draw plot and axis labels
plt.hist(fares, bins=20)
plt.title("Histogram of fares on the Titanic")
plt.ylabel("Number of Passengers")
plt.xlabel("Fare ($USD)")

# Display
plt.show()

What about a scatter plot that shows ages versus fare?

In [None]:
# Initialize
plt.figure()

# Process data
fares = [float(row[9]) for row in titanic_data[1:]]
ages = [float(row[5]) if row[5] != "" else 0 for row in titanic_data[1:]]

# Draw plot and axis labels
plt.scatter(ages, fares)
plt.ylabel("Fare ($USD)")
plt.xlabel("Age of Passengers (years)")

# Display
plt.show()

Let's also investigate survival rates by sex. That requires calculating some aggregate statistics first:

In [None]:
# Create dict with survival aggregate statistics
survival_by_sex = {"male": {"survived":0,
                            "died": 0},
                   "female": {"survived":0,
                            "died": 0}}

for row in titanic_data[1:]:
    sex = row[4]
    survived = bool(int(row[1]))
    if survived:
        survived_key = "survived"
    else: 
        survived_key = "died"
    survival_by_sex[sex][survived_key] += 1

survival_by_sex

In [None]:
# Initialize
plt.figure()

# Draw plot and axis labels
width = 0.4  # the width of the bars
plt.bar([0 - width/2, 1 - width/2], survival_by_sex["male"].values(), label="Male", width=width)
plt.bar([0 + width/2, 1 + width/2], survival_by_sex["female"].values(), label="Female", width=width)
plt.ylabel("Number of passengers")
plt.title("Survival rates by sex")
plt.xticks([0, 1], labels=["survived", "died"])
plt.ylim(0, 600)
plt.legend()

# Display
plt.show()

If we want to save any of these figures to a file, we can use the `savefig()` function. Note that `plt.savefig()` must be called before `plt.show()`. Do you know why?

## Subplots

Let's break down the data a bit more. Can we show subplots that divide the data based on class?

In [None]:
survival_by_sex_and_class = {1: {"male": {"survived":0,
                                            "died": 0},
                                   "female": {"survived":0,
                                            "died": 0}},
                             2: {"male": {"survived":0,
                                            "died": 0},
                                   "female": {"survived":0,
                                            "died": 0}},
                             3: {"male": {"survived":0,
                                            "died": 0},
                                   "female": {"survived":0,
                                            "died": 0}}}

for row in titanic_data[1:]:
    class_level = int(row[2])
    sex = row[4]
    survived = bool(int(row[1]))
    if survived:
        survived_key = "survived"
    else: 
        survived_key = "died"
    survival_by_sex_and_class[class_level][sex][survived_key] += 1

survival_by_sex_and_class

In [None]:
# Initialize
fig, (ax1, ax2, ax3) = plt.subplots(nrows=1, ncols=3, sharey=True)

# Draw plot and axis labels
width = 0.4  # the width of the bars
for c, ax in zip([1, 2, 3], [ax1, ax2, ax3]):
    ax.bar([0 - width/2, 1 - width/2], survival_by_sex_and_class[c]["male"].values(), label="Male", width=width)
    ax.bar([0 + width/2, 1 + width/2], survival_by_sex_and_class[c]["female"].values(), label="Female", width=width)
    ax.set_xticks([0, 1], labels=["survived", "died"])
    ax.set_title(str(c) + " Class")

ax1.set_ylabel("Number of passengers")
plt.suptitle("Survival rates by sex and class")
plt.legend(bbox_to_anchor = (1, .5))

# Display
plt.show()

# Exercises

# Exercise 1

Create a simple line plot from scratch. Plot the following data points:

```python
x = [0, 1, 2, 3, 4, 5]
y = [0, 1, 4, 9, 16, 25]
```

Add labels for the x-axis and y-axis, and add a title to the plot.

# Exercise 2

Create a bar plot from scratch. Plot the following data points:

```python
categories = ['A', 'B', 'C', 'D', 'E']
values = [5, 7, 3, 8, 4]
```

Add labels for the x-axis and y-axis, and add a title to the plot.

## Exercise 3

Re-create your bar plot from Exercise 2, but change the color of the bars, and change the font style for the titles and axis labels.

Hint: try setting `fontfamily = "serif"`

## Exercise 4 

Create a figure with 2 vertical subplots. The top subplot should be your plot from Exercise 1, and the bottom subplot should be your plot from Exercise 3.