
# Matplotlib: Zero to Hero 🎯

A self-contained, hands-on notebook to learn **Matplotlib** from scratch.

**Rules for this course notebook:**
- We use **Matplotlib** only (no seaborn).
- Each chart gets its **own figure** (no subplots).
- We **do not set explicit colors**; let Matplotlib choose defaults.
- Every section includes small, runnable examples.


## 0) Setup

In [None]:

# Standard imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# For reproducibility
rng = np.random.default_rng(42)

# Display inline in notebooks (Jupyter magic works even when as a string; if not, ignore)
# %matplotlib inline



## 1) Your First Plot

**Goal:** Create a simple line plot.

**Concepts introduced:**
- `plt.figure()` creates a new figure (canvas).
- `plt.plot(x, y)` draws a line.
- `plt.title`, `plt.xlabel`, `plt.ylabel` add metadata.
- `plt.show()` renders the figure.


In [None]:

x = np.linspace(0, 10, 50)
y = np.sin(x)

plt.figure()
plt.plot(x, y)
plt.title("Sine Wave")
plt.xlabel("x")
plt.ylabel("sin(x)")
plt.show()



## 2) Scatter Plot

**When to use:** Show relationship between two numeric variables.

**Key function:** `plt.scatter(x, y)`


In [None]:

x = rng.normal(loc=50, scale=10, size=200)
y = 2.5 * x + rng.normal(scale=15, size=200)

plt.figure()
plt.scatter(x, y)
plt.title("Height vs. Weight (synthetic)")
plt.xlabel("Height")
plt.ylabel("Weight")
plt.show()



## 3) Bar Chart

**When to use:** Compare categories or discrete groups.

**Key function:** `plt.bar(categories, values)`


In [None]:

categories = ["A", "B", "C", "D", "E"]
values = rng.integers(10, 100, size=len(categories))

plt.figure()
plt.bar(categories, values)
plt.title("Category Counts")
plt.xlabel("Category")
plt.ylabel("Count")
plt.show()



## 4) Horizontal Bar Chart

**Why:** Better for long category labels or when ordering by value.

**Key function:** `plt.barh(categories, values)`


In [None]:

labels = ["Algorithm", "Data Cleaning", "Visualization", "Modeling", "Deployment"]
hours = rng.integers(5, 40, size=len(labels))

plt.figure()
plt.barh(labels, hours)
plt.title("Project Effort by Task")
plt.xlabel("Hours")
plt.ylabel("Task")
plt.show()



## 5) Histogram

**When to use:** Visualize the **distribution** of a numeric variable.

**Key function:** `plt.hist(data, bins=...)`


In [None]:

data = rng.normal(loc=70, scale=10, size=1000)

plt.figure()
plt.hist(data, bins=30)
plt.title("Histogram of Test Scores")
plt.xlabel("Score")
plt.ylabel("Frequency")
plt.show()



## 6) Empirical CDF (Alternative to Density)

We won't use seaborn for KDE. A simple alternative is the **empirical CDF**.


In [None]:

data = np.sort(rng.normal(size=500))
ecdf = np.arange(1, len(data)+1) / len(data)

plt.figure()
plt.plot(data, ecdf)
plt.title("Empirical CDF")
plt.xlabel("Value")
plt.ylabel("ECDF")
plt.show()



## 7) Box Plot

**When to use:** Summarize distribution (median, quartiles, outliers).

**Key function:** `plt.boxplot(data)`


In [None]:

samples = [rng.normal(loc=m, scale=1.2, size=200) for m in [0, 1, 2, 1.5]]
plt.figure()
plt.boxplot(samples, labels=["Group A", "Group B", "Group C", "Group D"])
plt.title("Box Plots by Group")
plt.xlabel("Group")
plt.ylabel("Value")
plt.show()



## 8) Violin Plot (Optional)

**When to use:** Visualize full distribution shape.

**Key function:** `plt.violinplot(dataset)`


In [None]:

dataset = [rng.normal(loc=mu, scale=0.7, size=300) for mu in [0.0, 1.0, 2.0]]
plt.figure()
plt.violinplot(dataset, showmeans=True, showextrema=True, showmedians=True)
plt.title("Violin Plot Example")
plt.xlabel("Group Index")
plt.ylabel("Value")
plt.show()



## 9) Multiple Lines (One Chart per Figure)

We avoid subplots—so create separate figures instead, one per chart.


In [None]:

x = np.linspace(0, 2*np.pi, 200)

plt.figure()
plt.plot(x, np.sin(x))
plt.title("sin(x)")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

plt.figure()
plt.plot(x, np.cos(x))
plt.title("cos(x)")
plt.xlabel("x")
plt.ylabel("y")
plt.show()



## 10) Legends, Labels, and Titles

**Key functions:**
- `plt.title("...")`
- `plt.xlabel("...")`, `plt.ylabel("...")`
- `plt.legend()` after adding labels via plotting calls.


In [None]:

x = np.linspace(0, 10, 100)
y1 = np.log1p(x)
y2 = np.sqrt(x)

plt.figure()
plt.plot(x, y1, label="log(1+x)")
plt.plot(x, y2, label="sqrt(x)")
plt.title("Two Functions")
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.show()



## 11) Ticks and Tick Labels

**Why:** Improve readability and focus.

**Key functions:** `plt.xticks(...)`, `plt.yticks(...)`


In [None]:

x = np.linspace(0, 2*np.pi, 100)
y = np.sin(x)

plt.figure()
plt.plot(x, y)
plt.title("Custom Ticks")
plt.xlabel("x (radians)")
plt.ylabel("sin(x)")
plt.xticks([0, np.pi/2, np.pi, 3*np.pi/2, 2*np.pi],
           ["0", "π/2", "π", "3π/2", "2π"])
plt.yticks([-1, 0, 1])
plt.show()



## 12) Grid Lines

**Key function:** `plt.grid(True)`


In [None]:

x = np.linspace(0, 10, 100)
y = x**2

plt.figure()
plt.plot(x, y)
plt.title("With Grid")
plt.xlabel("x")
plt.ylabel("x^2")
plt.grid(True)
plt.show()



## 13) Annotations and Text

**Key functions:** `plt.text(x, y, "msg")`, `plt.annotate("...", xy=(...), xytext=(...))`


In [None]:

x = np.linspace(0, 10, 100)
y = np.sin(x)

plt.figure()
plt.plot(x, y)
plt.title("Annotations")
plt.xlabel("x")
plt.ylabel("sin(x)")

# Add text and arrow
plt.text(2, 0.9, "Peak ahead")
plt.annotate("Local max", xy=(np.pi/2, 1.0),
             xytext=(2.5, 0.5),
             arrowprops=dict(arrowstyle="->"))
plt.show()



## 14) Error Bars

**When to use:** Show measurement uncertainty.

**Key function:** `plt.errorbar(x, y, yerr=...)`


In [None]:

x = np.arange(1, 11)
y = 2 * x + rng.normal(scale=1.0, size=len(x))
y_err = rng.uniform(0.2, 1.0, size=len(x))

plt.figure()
plt.errorbar(x, y, yerr=y_err, fmt="o-")
plt.title("Error Bars")
plt.xlabel("Measurement #")
plt.ylabel("Value")
plt.show()



## 15) Images and Heatmaps

**Key function:** `plt.imshow(array)` with `plt.colorbar()` for a legend.


In [None]:

# Create a synthetic 2D field
grid = rng.normal(size=(30, 30))

plt.figure()
im = plt.imshow(grid, origin="lower")
plt.title("Heatmap via imshow")
plt.xlabel("X index")
plt.ylabel("Y index")
plt.colorbar(im, label="Intensity")
plt.show()



## 16) Scatter with a Colorbar (Colormap)

**Why:** Map a third variable to color and add a colorbar legend.


In [None]:

x = rng.uniform(-3, 3, size=300)
y = rng.uniform(-3, 3, size=300)
z = x**2 + y**2

plt.figure()
sc = plt.scatter(x, y, c=z)
plt.title("Scatter with Colorbar")
plt.xlabel("x")
plt.ylabel("y")
plt.colorbar(sc, label="x^2 + y^2")
plt.show()



## 17) Time Series

**Tip:** Use pandas for date handling; Matplotlib plots the result.


In [None]:

dates = pd.date_range("2025-01-01", periods=60, freq="D")
values = np.cumsum(rng.normal(scale=1.0, size=len(dates))) + 10

ts = pd.Series(values, index=dates)

plt.figure()
plt.plot(ts.index, ts.values)
plt.title("Time Series Example")
plt.xlabel("Date")
plt.ylabel("Value")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()



## 18) Stylesheets

Matplotlib includes built-in styles. We won't force a specific color, but you can try different looks:

**Usage:** `plt.style.use("ggplot")` or `plt.style.use("default")`


In [None]:

# Try a style, then reset to default
plt.style.use("ggplot")

x = np.linspace(0, 10, 200)
y = np.sinc(x - 5)

plt.figure()
plt.plot(x, y)
plt.title("Stylesheet Demo (ggplot)")
plt.xlabel("x")
plt.ylabel("sinc(x-5)")
plt.show()

# Reset
plt.style.use("default")



## 19) Saving Figures

**Key function:** `plt.savefig("filename.png", dpi=300, bbox_inches="tight")`

> Call `savefig` **before** `plt.show()` in scripts. In notebooks, either order can work, but saving before showing is a good habit.


In [None]:

x = np.linspace(0, 4*np.pi, 400)
y = np.sin(x) / (1 + 0.1*x)

plt.figure()
plt.plot(x, y)
plt.title("Save Me")
plt.xlabel("x")
plt.ylabel("sin(x)/(1+0.1x)")
plt.savefig("/mnt/data/mpl_saved_example.png", dpi=300, bbox_inches="tight")
plt.show()

print("Saved figure to /mnt/data/mpl_saved_example.png")



## 20) Common Pitfalls & Best Practices

- Always create a **new figure** for each plot with `plt.figure()`.
- Label axes and add a title—your future self will thank you.
- Use `plt.tight_layout()` when labels overlap.
- Avoid setting colors unless you must; defaults are accessible and consistent.
- For reproducibility in random plots, set a random seed.
- Prefer vectorized operations (NumPy) over Python loops for speed.



## 21) Mini Project: From CSV to Insight

**Goal:** Load a CSV, clean it, visualize key relationships, and save at least one figure.

Steps:
1. Load a CSV (you can replace with your own file).
2. Inspect with `head`, `describe`, `info`, `isna`.
3. Pick 1–2 interesting relationships to visualize (scatter, bar, hist).
4. Add titles, labels, and legends where appropriate.
5. Save at least one plot to disk.


In [None]:

from io import StringIO

csv = StringIO("""
city,month,temp,visitors
Alpha,Jan,5,120
Alpha,Feb,8,160
Alpha,Mar,12,220
Beta,Jan,-2,80
Beta,Feb,1,110
Beta,Mar,6,150
Gamma,Jan,10,200
Gamma,Feb,12,230
Gamma,Mar,15,300
""")
df = pd.read_csv(csv)

# Inspect
print(df.head())
print(df.describe(numeric_only=True))
print(df.isna().sum())

# Plot: temperature vs visitors
plt.figure()
plt.scatter(df["temp"], df["visitors"])
plt.title("Visitors vs Temperature")
plt.xlabel("Temperature (°C)")
plt.ylabel("Visitors")
plt.show()

# Save a histogram of visitors
plt.figure()
plt.hist(df["visitors"], bins=5)
plt.title("Distribution of Visitors")
plt.xlabel("Visitors")
plt.ylabel("Frequency")
plt.savefig("/mnt/data/visitors_hist.png", dpi=300, bbox_inches="tight")
plt.show()

print("Saved histogram to /mnt/data/visitors_hist.png")



## 22) Challenge Exercises

1. Create a scatter plot where marker size depends on a third variable (e.g., income) and add a colorbar for a fourth variable (e.g., population density).
2. Build a line chart from a noisy signal and add annotations for peaks.
3. Load a real CSV from a project/course and produce at least three distinct plots (scatter, bar/hist, time series). Save all figures.
4. Recreate one of your favorite plots using only Matplotlib—no seaborn.
