# Class 3: Data Visualization with Matplotlib

**Objective**: Learn how to create and customize visualizations (line plots, scatter plots, histograms) using Matplotlib to explore data.

**Topics**:
- What is Matplotlib?
- Line plots with `plt.plot()`
- Scatter plots with `plt.scatter()`
- Histograms with `plt.hist()`
- Customizing plots (labels, titles, colors, legends)
- Brief intro to subplots

This notebook includes explanations, examples, and exercises to help you visualize data. We'll also advance our Iris dataset mini-project with a scatter plot. Run the code and try the exercises!

## 1. What is Matplotlib?

Matplotlib is a Python library for creating visualizations like plots and charts. It’s great for exploring datasets and presenting findings, a key skill for AI and data science.

Let’s import Matplotlib (plus NumPy and pandas for data). Run the cell below:

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Ensure plots display in the notebook
%matplotlib inline

## 2. Line Plots

Line plots connect points to show trends, like a function over time.

### Example 1: Plotting a Sine Wave

In [None]:
# Create data for a sine wave
x = np.linspace(0, 10, 100)  # 100 points from 0 to 10
y = np.sin(x)

# Plot
plt.plot(x, y, label='sin(x)', color='blue')
plt.title('Sine Wave')
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.legend()
plt.grid(True)
plt.show()

**Quick Check**: What does `label='sin(x)'` do? (Hint: Look at the legend!)

## 3. Scatter Plots

Scatter plots show individual points, great for comparing two variables.

### Example 2: Scatter Plot of Random Data

In [None]:
# Generate random data
x = np.random.rand(50)  # 50 random numbers between 0 and 1
y = np.random.rand(50)

# Scatter plot
plt.scatter(x, y, color='red', label='Random Points', alpha=0.6)
plt.title('Random Scatter Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.show()

**Note**: `alpha=0.6` makes points slightly transparent to see overlaps.

## 4. Histograms

Histograms show the distribution of a single variable, like how often values appear.

### Example 3: Histogram of Random Data

In [None]:
# Generate random data (normal distribution)
data = np.random.randn(1000)  # 1000 random numbers

# Histogram
plt.hist(data, bins=30, color='green', edgecolor='black', alpha=0.7)
plt.title('Histogram of Random Data')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

**Quick Check**: What does `bins=30` control? (Answer: Number of bars in the histogram.)

## 5. Customizing Plots

You can add labels, titles, colors, and legends to make plots clear and professional.

### Example 4: Customized Scatter Plot

In [None]:
# Two sets of points
x1, y1 = np.random.rand(20), np.random.rand(20)
x2, y2 = np.random.rand(20) + 1, np.random.rand(20) + 1

# Scatter plot with two groups
plt.scatter(x1, y1, color='blue', label='Group 1', marker='o')
plt.scatter(x2, y2, color='orange', label='Group 2', marker='^')
plt.title('Two Groups of Points')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.grid(True)
plt.show()

## 6. Subplots (Brief Intro)

Subplots let you show multiple plots in one figure.

### Example 5: Line and Histogram in Subplots

In [None]:
# Create a figure with two subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))  # 1 row, 2 columns

# Line plot on first subplot
ax1.plot(x, np.cos(x), color='purple')
ax1.set_title('Cosine Wave')
ax1.set_xlabel('x')
ax1.set_ylabel('cos(x)')

# Histogram on second subplot
ax2.hist(np.random.randn(500), bins=20, color='gray')
ax2.set_title('Random Histogram')
ax2.set_xlabel('Value')
ax2.set_ylabel('Frequency')

plt.tight_layout()  # Adjust spacing
plt.show()

## Exercises

Now it’s your turn! Complete the exercises below to practice Matplotlib. Write your code in the provided cells and run them to see your plots.

**Exercise 1**: Create a line plot of `x` vs. `x**2` for `x` ranging from 0 to 10 (use `np.arange(0, 10, 0.1)`). Add a title and labels.

In [None]:
# Your code here



**Exercise 2**: Make a histogram of 100 random numbers from a normal distribution (`np.random.randn(100)`). Use 15 bins and a blue color.

In [None]:
# Your code here



**Exercise 3**: Create a scatter plot with two sets of 30 random points: one set in green circles, the other in red triangles. Add a legend.

In [None]:
# Your code here



**Exercise 4**: Using the Iris DataFrame (loaded below), create a histogram of `petal length (cm)`. Add labels and a title.

In [None]:
# Load Iris dataset (use your iris.csv or sklearn)
from sklearn.datasets import load_iris
iris = load_iris()
df_iris = pd.DataFrame(iris.data, columns=iris.feature_names)
df_iris['species'] = iris.target_names[iris.target]

# Your code here



## Mini-Project Progress

For our mini-project, we’re visualizing the Iris dataset. In Class 2, you selected petal length and width. Now, let’s create a scatter plot of `petal length (cm)` vs. `petal width (cm)`.

**Task**: Using `df_iris`, plot petal length vs. petal width. Add a title, labels, and a grid. Save the plot as `iris_scatter.png`.

In [None]:
# Your code here
plt.scatter(df_iris['petal length (cm)'], df_iris['petal width (cm)'], color='purple', alpha=0.6)
plt.title('Iris: Petal Length vs. Petal Width')
plt.xlabel('Petal Length (cm)')
plt.ylabel('Petal Width (cm)')
plt.grid(True)
plt.savefig('iris_scatter.png')
plt.show()

**Think Ahead**: Could coloring points by species make this plot more informative? We’ll try that in Class 4!

**Optional Challenge**: Create a subplot with two histograms: one for `sepal length (cm)` and one for `sepal width (cm)`. (Hint: Use `plt.subplots()`.)

In [None]:
# Optional: Try it here



## Wrap-Up

Great job! You’ve learned how to:
- Create line plots, scatter plots, and histograms with Matplotlib.
- Customize plots with titles, labels, colors, and legends.
- Use subplots to show multiple visualizations.
- Visualize Iris dataset features for our mini-project.

Save this notebook and your `iris_scatter.png`. Share your plots if asked. Next class, we’ll read/write data and finish the mini-project!