# Matplotlib Plot Types

Beyond line and scatter plots, Matplotlib offers many chart types for different data visualization needs. This notebook covers the most commonly used plot types.

## Learning Objectives

By the end of this notebook, you will be able to:

1. Create bar charts (vertical and horizontal)
2. Create histograms to visualize distributions
3. Create pie charts for proportional data
4. Create box plots for statistical summaries
5. Choose the right plot type for your data

In [None]:
import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

---

## 1. Bar Charts

Bar charts compare discrete categories. Use them when you have categorical data.

In [None]:
# Basic vertical bar chart
categories = ['Python', 'JavaScript', 'Java', 'C++', 'Go']
values = [35, 28, 22, 15, 10]

fig, ax = plt.subplots(figsize=(8, 5))

ax.bar(categories, values)

ax.set_title('Programming Language Popularity')
ax.set_xlabel('Language')
ax.set_ylabel('Percentage')

plt.show()

In [None]:
# Customizing bar colors
colors = ['#3776ab', '#f7df1e', '#b07219', '#00599C', '#00ADD8']

fig, ax = plt.subplots(figsize=(8, 5))

bars = ax.bar(categories, values, color=colors, edgecolor='black', linewidth=1.2)

ax.set_title('Programming Language Popularity')
ax.set_xlabel('Language')
ax.set_ylabel('Percentage')

plt.show()

In [None]:
# Adding value labels on bars
fig, ax = plt.subplots(figsize=(8, 5))

bars = ax.bar(categories, values, color='steelblue')

# Add value labels on top of bars
for bar, value in zip(bars, values):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2, height + 0.5,
            f'{value}%', ha='center', va='bottom', fontsize=10)

ax.set_title('Programming Language Popularity')
ax.set_ylabel('Percentage')
ax.set_ylim(0, 42)  # Make room for labels

plt.show()

In [None]:
# Horizontal bar chart (good for long category names)
fig, ax = plt.subplots(figsize=(8, 5))

ax.barh(categories, values, color='coral')

ax.set_title('Programming Language Popularity')
ax.set_xlabel('Percentage')

plt.show()

In [None]:
# Grouped bar chart (comparing multiple series)
categories = ['Q1', 'Q2', 'Q3', 'Q4']
product_a = [20, 35, 30, 35]
product_b = [25, 32, 34, 20]
product_c = [15, 25, 28, 30]

x = np.arange(len(categories))  # Label locations
width = 0.25  # Width of bars

fig, ax = plt.subplots(figsize=(10, 6))

bars1 = ax.bar(x - width, product_a, width, label='Product A', color='steelblue')
bars2 = ax.bar(x, product_b, width, label='Product B', color='coral')
bars3 = ax.bar(x + width, product_c, width, label='Product C', color='seagreen')

ax.set_xlabel('Quarter')
ax.set_ylabel('Sales (thousands)')
ax.set_title('Quarterly Sales by Product')
ax.set_xticks(x)
ax.set_xticklabels(categories)
ax.legend()

plt.show()

In [None]:
# Stacked bar chart
fig, ax = plt.subplots(figsize=(10, 6))

ax.bar(categories, product_a, label='Product A', color='steelblue')
ax.bar(categories, product_b, bottom=product_a, label='Product B', color='coral')
ax.bar(categories, product_c, bottom=np.array(product_a) + np.array(product_b), 
       label='Product C', color='seagreen')

ax.set_xlabel('Quarter')
ax.set_ylabel('Total Sales (thousands)')
ax.set_title('Quarterly Sales (Stacked)')
ax.legend()

plt.show()

---

## 2. Histograms

Histograms show the distribution of continuous data by grouping values into bins.

In [None]:
# Generate random data
np.random.seed(42)
data = np.random.randn(1000)  # 1000 samples from standard normal

fig, ax = plt.subplots(figsize=(8, 5))

ax.hist(data, bins=30)

ax.set_title('Distribution of Random Data')
ax.set_xlabel('Value')
ax.set_ylabel('Frequency')

plt.show()

In [None]:
# Customizing histogram appearance
fig, ax = plt.subplots(figsize=(8, 5))

ax.hist(data, bins=30, color='skyblue', edgecolor='navy', alpha=0.7)

ax.set_title('Distribution of Random Data')
ax.set_xlabel('Value')
ax.set_ylabel('Frequency')

plt.show()

In [None]:
# Density histogram (normalized)
fig, ax = plt.subplots(figsize=(8, 5))

ax.hist(data, bins=30, density=True, color='lightgreen', edgecolor='darkgreen')

ax.set_title('Probability Density')
ax.set_xlabel('Value')
ax.set_ylabel('Density')

plt.show()

In [None]:
# Comparing multiple distributions
np.random.seed(42)
data1 = np.random.normal(0, 1, 1000)   # Mean 0, std 1
data2 = np.random.normal(2, 1.5, 1000)  # Mean 2, std 1.5

fig, ax = plt.subplots(figsize=(10, 5))

ax.hist(data1, bins=30, alpha=0.5, label='Distribution 1 (mean=0)')
ax.hist(data2, bins=30, alpha=0.5, label='Distribution 2 (mean=2)')

ax.set_title('Comparing Two Distributions')
ax.set_xlabel('Value')
ax.set_ylabel('Frequency')
ax.legend()

plt.show()

In [None]:
# Different histogram types
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Default (bar)
axes[0].hist(data, bins=30, histtype='bar')
axes[0].set_title('histtype="bar"')

# Step
axes[1].hist(data, bins=30, histtype='step', linewidth=2)
axes[1].set_title('histtype="step"')

# Stepfilled
axes[2].hist(data, bins=30, histtype='stepfilled', alpha=0.5)
axes[2].set_title('histtype="stepfilled"')

plt.tight_layout()
plt.show()

In [None]:
# Cumulative histogram
fig, ax = plt.subplots(figsize=(8, 5))

ax.hist(data, bins=50, cumulative=True, color='purple', alpha=0.7)

ax.set_title('Cumulative Distribution')
ax.set_xlabel('Value')
ax.set_ylabel('Cumulative Frequency')

plt.show()

---

## 3. Pie Charts

Pie charts show proportions of a whole. Use sparingly - bar charts are often clearer.

In [None]:
# Basic pie chart
sizes = [35, 25, 20, 15, 5]
labels = ['Python', 'JavaScript', 'Java', 'C++', 'Other']

fig, ax = plt.subplots(figsize=(8, 8))

ax.pie(sizes, labels=labels)
ax.set_title('Programming Language Usage')

plt.show()

In [None]:
# Pie chart with percentages
fig, ax = plt.subplots(figsize=(8, 8))

ax.pie(sizes, labels=labels, autopct='%1.1f%%')
ax.set_title('Programming Language Usage')

plt.show()

In [None]:
# Customized pie chart
colors = ['#3776ab', '#f7df1e', '#b07219', '#00599C', '#gray']
explode = (0.05, 0, 0, 0, 0)  # Explode first slice

fig, ax = plt.subplots(figsize=(8, 8))

wedges, texts, autotexts = ax.pie(
    sizes, 
    labels=labels, 
    autopct='%1.1f%%',
    colors=colors,
    explode=explode,
    shadow=True,
    startangle=90  # Start from top
)

# Make percentage text bold
for autotext in autotexts:
    autotext.set_fontweight('bold')

ax.set_title('Programming Language Usage')

plt.show()

In [None]:
# Donut chart (pie with hole)
fig, ax = plt.subplots(figsize=(8, 8))

wedges, texts, autotexts = ax.pie(
    sizes, 
    labels=labels, 
    autopct='%1.1f%%',
    pctdistance=0.75,
    wedgeprops=dict(width=0.5)  # This creates the hole
)

# Add center text
ax.text(0, 0, 'Languages', ha='center', va='center', fontsize=14, fontweight='bold')

ax.set_title('Programming Language Usage')

plt.show()

---

## 4. Box Plots

Box plots (box-and-whisker plots) show the statistical summary of data: median, quartiles, and outliers.

In [None]:
# Basic box plot
np.random.seed(42)
data = np.random.randn(100)

fig, ax = plt.subplots(figsize=(8, 5))

ax.boxplot(data)

ax.set_title('Box Plot')
ax.set_ylabel('Value')

plt.show()

**Box Plot Components:**
- **Box**: Shows the interquartile range (IQR) - middle 50% of data
- **Line inside box**: Median (50th percentile)
- **Whiskers**: Extend to 1.5 * IQR from box edges
- **Outliers**: Points beyond the whiskers

In [None]:
# Multiple box plots (comparing groups)
np.random.seed(42)
group_a = np.random.normal(50, 10, 100)
group_b = np.random.normal(55, 15, 100)
group_c = np.random.normal(45, 8, 100)

fig, ax = plt.subplots(figsize=(8, 5))

ax.boxplot([group_a, group_b, group_c], labels=['Group A', 'Group B', 'Group C'])

ax.set_title('Comparison of Three Groups')
ax.set_ylabel('Value')

plt.show()

In [None]:
# Horizontal box plot
fig, ax = plt.subplots(figsize=(8, 5))

ax.boxplot([group_a, group_b, group_c], 
           labels=['Group A', 'Group B', 'Group C'],
           vert=False)  # Horizontal

ax.set_title('Horizontal Box Plot')
ax.set_xlabel('Value')

plt.show()

In [None]:
# Customized box plot
fig, ax = plt.subplots(figsize=(10, 6))

bp = ax.boxplot(
    [group_a, group_b, group_c],
    labels=['Group A', 'Group B', 'Group C'],
    patch_artist=True,  # Fill boxes with color
    notch=True,  # Add notch at median
    showmeans=True,  # Show mean marker
    meanline=True  # Show mean as line
)

# Color each box differently
colors = ['lightblue', 'lightgreen', 'lightyellow']
for patch, color in zip(bp['boxes'], colors):
    patch.set_facecolor(color)

ax.set_title('Customized Box Plot')
ax.set_ylabel('Value')
ax.grid(True, alpha=0.3)

plt.show()

In [None]:
# Box plot with individual points
fig, ax = plt.subplots(figsize=(10, 6))

bp = ax.boxplot(
    [group_a, group_b, group_c],
    labels=['Group A', 'Group B', 'Group C'],
    patch_artist=True
)

# Add jittered points
for i, data in enumerate([group_a, group_b, group_c], 1):
    x = np.random.normal(i, 0.04, len(data))  # Add jitter
    ax.scatter(x, data, alpha=0.3, s=10, color='gray')

ax.set_title('Box Plot with Individual Data Points')
ax.set_ylabel('Value')

plt.show()

---

## 5. Choosing the Right Plot Type

| Data Type | Best Plot |
|-----------|----------|
| Categorical comparison | Bar chart |
| Distribution of continuous data | Histogram |
| Parts of a whole | Pie/Donut chart |
| Statistical summary | Box plot |
| Trend over time | Line plot |
| Relationship between variables | Scatter plot |

In [None]:
# Example: Same data, different visualizations
np.random.seed(42)
exam_scores = np.random.normal(75, 10, 200)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Histogram - shows distribution shape
axes[0].hist(exam_scores, bins=20, color='steelblue', edgecolor='white')
axes[0].set_title('Histogram: Distribution Shape')
axes[0].set_xlabel('Score')
axes[0].set_ylabel('Frequency')

# Box plot - shows statistical summary
axes[1].boxplot(exam_scores, vert=True)
axes[1].set_title('Box Plot: Statistical Summary')
axes[1].set_ylabel('Score')

plt.tight_layout()
plt.show()

---

## Exercises

### Exercise 1: Sales Bar Chart

Create a grouped bar chart showing monthly sales for two years:
- Months: Jan, Feb, Mar, Apr, May, Jun
- Year 2023: [150, 180, 200, 220, 250, 280]
- Year 2024: [160, 200, 240, 260, 290, 320]

Add appropriate labels, title, and legend.

In [None]:
# Your code here


### Exercise 2: Age Distribution Histogram

Generate 500 random ages from a normal distribution with mean 35 and standard deviation 12. Create a histogram with:
- 25 bins
- Density normalization
- A vertical line at the mean
- Appropriate title and labels

In [None]:
# Your code here


### Exercise 3: Budget Pie Chart

Create a pie chart showing a monthly budget:
- Housing: 35%
- Food: 20%
- Transportation: 15%
- Entertainment: 10%
- Savings: 15%
- Other: 5%

Explode the "Savings" slice and add percentages.

In [None]:
# Your code here


### Exercise 4: Test Scores Box Plot

Generate test scores for 4 classes (50 students each):
- Class A: mean=78, std=8
- Class B: mean=72, std=12
- Class C: mean=85, std=5
- Class D: mean=70, std=15

Create a box plot comparing all classes with:
- Different colored boxes
- Notches showing confidence interval
- Grid lines

In [None]:
# Your code here


### Exercise 5: Combined Visualization

Create a figure with 2x2 subplots showing the same dataset (1000 random samples from a normal distribution with mean=100, std=15) as:
1. Histogram
2. Box plot
3. Bar chart of counts in ranges: <85, 85-100, 100-115, >115
4. Pie chart of the same ranges

In [None]:
# Your code here


---

## Solutions

<details>
<summary>Click to reveal Exercise 1 solution</summary>

```python
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
year_2023 = [150, 180, 200, 220, 250, 280]
year_2024 = [160, 200, 240, 260, 290, 320]

x = np.arange(len(months))
width = 0.35

fig, ax = plt.subplots(figsize=(10, 6))

ax.bar(x - width/2, year_2023, width, label='2023', color='steelblue')
ax.bar(x + width/2, year_2024, width, label='2024', color='coral')

ax.set_xlabel('Month')
ax.set_ylabel('Sales (thousands)')
ax.set_title('Monthly Sales Comparison')
ax.set_xticks(x)
ax.set_xticklabels(months)
ax.legend()
ax.grid(axis='y', alpha=0.3)

plt.show()
```

</details>

<details>
<summary>Click to reveal Exercise 2 solution</summary>

```python
np.random.seed(42)
ages = np.random.normal(35, 12, 500)
mean_age = np.mean(ages)

fig, ax = plt.subplots(figsize=(10, 6))

ax.hist(ages, bins=25, density=True, color='lightblue', edgecolor='navy', alpha=0.7)
ax.axvline(mean_age, color='red', linestyle='--', linewidth=2, label=f'Mean: {mean_age:.1f}')

ax.set_title('Age Distribution')
ax.set_xlabel('Age')
ax.set_ylabel('Density')
ax.legend()

plt.show()
```

</details>

<details>
<summary>Click to reveal Exercise 3 solution</summary>

```python
categories = ['Housing', 'Food', 'Transportation', 'Entertainment', 'Savings', 'Other']
sizes = [35, 20, 15, 10, 15, 5]
explode = (0, 0, 0, 0, 0.1, 0)  # Explode Savings
colors = ['#ff9999', '#66b3ff', '#99ff99', '#ffcc99', '#ff99cc', '#c2c2f0']

fig, ax = plt.subplots(figsize=(8, 8))

ax.pie(sizes, labels=categories, autopct='%1.1f%%', 
       explode=explode, colors=colors, shadow=True, startangle=90)

ax.set_title('Monthly Budget Breakdown')

plt.show()
```

</details>

<details>
<summary>Click to reveal Exercise 4 solution</summary>

```python
np.random.seed(42)
class_a = np.random.normal(78, 8, 50)
class_b = np.random.normal(72, 12, 50)
class_c = np.random.normal(85, 5, 50)
class_d = np.random.normal(70, 15, 50)

fig, ax = plt.subplots(figsize=(10, 6))

bp = ax.boxplot([class_a, class_b, class_c, class_d],
                labels=['Class A', 'Class B', 'Class C', 'Class D'],
                patch_artist=True,
                notch=True)

colors = ['lightblue', 'lightgreen', 'lightyellow', 'lightpink']
for patch, color in zip(bp['boxes'], colors):
    patch.set_facecolor(color)

ax.set_title('Test Scores by Class')
ax.set_ylabel('Score')
ax.grid(True, alpha=0.3)

plt.show()
```

</details>

<details>
<summary>Click to reveal Exercise 5 solution</summary>

```python
np.random.seed(42)
data = np.random.normal(100, 15, 1000)

# Calculate counts for ranges
ranges = ['<85', '85-100', '100-115', '>115']
counts = [
    np.sum(data < 85),
    np.sum((data >= 85) & (data < 100)),
    np.sum((data >= 100) & (data < 115)),
    np.sum(data >= 115)
]

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Histogram
axes[0, 0].hist(data, bins=30, color='steelblue', edgecolor='white')
axes[0, 0].set_title('Histogram')
axes[0, 0].set_xlabel('Value')
axes[0, 0].set_ylabel('Frequency')

# Box plot
axes[0, 1].boxplot(data)
axes[0, 1].set_title('Box Plot')
axes[0, 1].set_ylabel('Value')

# Bar chart
axes[1, 0].bar(ranges, counts, color=['coral', 'lightblue', 'lightgreen', 'gold'])
axes[1, 0].set_title('Bar Chart by Range')
axes[1, 0].set_xlabel('Range')
axes[1, 0].set_ylabel('Count')

# Pie chart
axes[1, 1].pie(counts, labels=ranges, autopct='%1.1f%%', 
               colors=['coral', 'lightblue', 'lightgreen', 'gold'])
axes[1, 1].set_title('Pie Chart by Range')

plt.tight_layout()
plt.show()
```

</details>

---

## Summary

In this notebook, you learned:

- **Bar charts**: `ax.bar()` for vertical, `ax.barh()` for horizontal, grouped and stacked variants
- **Histograms**: `ax.hist()` for distributions, with density normalization and various styles
- **Pie charts**: `ax.pie()` for proportions, with customization options
- **Box plots**: `ax.boxplot()` for statistical summaries, comparing groups
- **Choosing plots**: Match the visualization to your data type

---

## Next Steps

Continue to [03_customization.ipynb](03_customization.ipynb) to learn advanced styling, annotations, and themes.