# Matplotlib - Part 2: Histograms and Distributions

This notebook covers visualizing distributions with histograms, boxplots, and violin plots.

**Topics covered:**
- Histograms
- Box plots
- Violin plots
- Density plots (KDE)
- Pie charts

**Problems:** 17 (Easy: 1-6, Medium: 7-12, Hard: 13-17)

In [None]:
# ============================================
# SETUP - Run this cell first!
# ============================================
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sys
sys.path.insert(0, '..')
from utils.checker import check

%matplotlib inline
np.random.seed(42)
print("Setup complete!")

---
## Problem 1: Simple Histogram
**Difficulty:** Easy

### Concept
Histograms visualize the distribution of continuous data by grouping values into bins and showing the frequency of each bin. They're essential for understanding data patterns and distributions.

### Syntax
```python
n, bins, patches = ax.hist(data)  # Returns counts, bin edges, and patches
```

### Example
```python
data = np.random.randn(500)
fig, ax = plt.subplots()
n, bins, patches = ax.hist(data)
```

### Task
Create a histogram of 1000 random normal values. The data is already generated for you.
Store the returned values in `n` (counts), `bins` (bin edges), and `patches` (bar artists).

### Expected Properties
- `n` should be an array of frequencies
- `bins` should be an array of bin edges

In [None]:
# Your solution:
data = np.random.randn(1000)

n, bins, patches = None, None, None

In [None]:
# Verification
check.is_not_none(n, "P1a: Histogram counts returned")
check.is_not_none(bins, "P1b: Bin edges returned")

---
## Problem 2: Histogram with Specified Bins
**Difficulty:** Easy

### Concept
The number of bins controls histogram granularity. More bins show finer detail but can be noisy; fewer bins smooth the distribution but may hide features.

### Syntax
```python
n, bins, patches = ax.hist(data, bins=20)  # Specify number of bins
```

### Example
```python
data = np.random.randn(500)
fig, ax = plt.subplots()
n, bins, patches = ax.hist(data, bins=15)
```

### Task
Create a histogram with exactly 20 bins.
Calculate the number of bins from the bins array (length - 1) and store in `num_bins`.

### Expected Properties
- `num_bins` should be 20
- `bins` array length should be 21 (edges for 20 bins)

In [None]:
# Your solution:
data = np.random.randn(1000)

num_bins = None

In [None]:
# Verification
check.is_type(num_bins, int, "P2: Type check")
check.is_true(num_bins == 20, "P2: Correct bin count", "Should have exactly 20 bins")

---
## Problem 3: Simple Box Plot
**Difficulty:** Easy

### Concept
Box plots (box-and-whisker plots) show the five-number summary: minimum, first quartile (Q1), median, third quartile (Q3), and maximum. They're excellent for comparing distributions and identifying outliers.

### Syntax
```python
bp = ax.boxplot(data)  # Returns dictionary with box plot elements
```

### Example
```python
data = np.random.randn(100)
fig, ax = plt.subplots()
bp = ax.boxplot(data)
```

### Task
Create a box plot of random data. The data is already generated for you.
Store the box plot dictionary in `bp`.

### Expected Properties
- `bp` should be a dictionary containing box plot elements

In [None]:
# Your solution:
data = np.random.randn(100)

bp = None

In [None]:
# Verification
check.is_not_none(bp, "P3: Box plot created")

---
## Problem 4: Simple Pie Chart
**Difficulty:** Easy

### Concept
Pie charts show proportions of a whole. Each slice's size represents its percentage of the total. They're best used when you have a small number of categories.

### Syntax
```python
wedges, texts, autotexts = ax.pie(sizes, labels=labels)
# OR just:
wedges = ax.pie(sizes, labels=labels)
```

### Example
```python
sizes = [30, 40, 20, 10]
labels = ['A', 'B', 'C', 'D']
fig, ax = plt.subplots()
wedges = ax.pie(sizes, labels=labels)
```

### Task
Create a pie chart showing market share using the provided data.
Store the wedges (slice objects) in `wedges`.

### Expected Properties
- `wedges` should not be None
- Should contain wedge patches for each data point

In [None]:
# Your solution:
sizes = [30, 25, 20, 15, 10]
labels = ['A', 'B', 'C', 'D', 'E']

wedges = None

In [None]:
# Verification
check.is_not_none(wedges, "P4: Pie chart created")

---
## Problem 5: Histogram with Different Color
**Difficulty:** Easy

### Concept
Colors in histograms can highlight specific distributions or match a color scheme. Use the `color` parameter to set the bar color.

### Syntax
```python
n, bins, patches = ax.hist(data, color='green')
```

### Example
```python
data = np.random.randn(500)
fig, ax = plt.subplots()
n, bins, patches = ax.hist(data, color='blue')
```

### Task
Create a green histogram using the provided data.
Extract the color from the first patch and store the green component in `green_value`.

### Expected Properties
- Histogram bars should be green
- Green component of color should be approximately 0.5

In [None]:
# Your solution:
data = np.random.randn(500)

green_value = None

In [None]:
# Verification
check.is_not_none(green_value, "P5: Color extracted")
check.is_true(green_value > 0.4, "P5: Green component", "Should be green color")

---
## Problem 6: Histogram with Edge Color
**Difficulty:** Easy

### Concept
Edge colors define the outline of histogram bars, making individual bins more distinct. This is especially useful when bars are close together.

### Syntax
```python
n, bins, patches = ax.hist(data, edgecolor='black')
```

### Example
```python
data = np.random.randn(500)
fig, ax = plt.subplots()
n, bins, patches = ax.hist(data, edgecolor='white', linewidth=1.5)
```

### Task
Create a histogram with black edges on the bars.
Extract the edge color sum from the first patch and store in `edge_sum` (black = RGB sum close to 0).

### Expected Properties
- Bars should have black outlines
- Edge color RGB sum should be close to 0

In [None]:
# Your solution:
data = np.random.randn(500)

edge_sum = None

In [None]:
# Verification
check.is_not_none(edge_sum, "P6: Edge color extracted")
check.is_true(edge_sum < 0.1, "P6: Black edge color", "Edge color should be black (sum close to 0)")

---
## Problem 7: Multiple Box Plots
**Difficulty:** Medium

### Concept
Comparing multiple distributions side-by-side with box plots reveals differences in central tendency, spread, and outliers across groups.

### Syntax
```python
bp = ax.boxplot([data1, data2, data3])  # Pass list of datasets
```

### Example
```python
data1 = np.random.randn(100)
data2 = np.random.randn(100) + 2
fig, ax = plt.subplots()
bp = ax.boxplot([data1, data2])
```

### Task
Create box plots for three datasets side by side. The data is already generated for you.
Store the box plot dictionary in `bp`.

### Expected Properties
- Should show 3 box plots side by side
- Each box plot represents one dataset

In [None]:
# Your solution:
data1 = np.random.randn(100)
data2 = np.random.randn(100) + 1
data3 = np.random.randn(100) - 1

bp = None

In [None]:
# Verification
check.is_not_none(bp, "P7: Multiple box plots created")

---
## Problem 8: Normalized Histogram (Density)
**Difficulty:** Medium

### Concept
Density histograms normalize the data so the total area equals 1. This makes it comparable to probability density functions and allows comparison of distributions with different sample sizes.

### Syntax
```python
n, bins, patches = ax.hist(data, density=True)
```

### Example
```python
data = np.random.randn(1000)
fig, ax = plt.subplots()
n, bins, patches = ax.hist(data, bins=30, density=True)
```

### Task
Create a normalized histogram (density=True) so the total area sums to 1.
Calculate the area by summing `n * bin_width` and store in `area`.

### Expected Properties
- Total area should be approximately 1.0
- Y-axis represents probability density, not counts

In [None]:
# Your solution:
data = np.random.randn(1000)

area = None

In [None]:
# Verification
check.is_not_none(area, "P8: Area calculated")
check.is_true(abs(area - 1.0) < 0.01, "P8: Area sums to 1", "Area should be approximately 1.0")

---
## Problem 9: Overlapping Histograms
**Difficulty:** Medium

### Concept
Overlapping histograms with transparency allow direct comparison of two distributions. The `alpha` parameter controls transparency (0=transparent, 1=opaque).

### Syntax
```python
ax.hist(data1, alpha=0.5, label='Dataset 1')
ax.hist(data2, alpha=0.5, label='Dataset 2')
```

### Example
```python
data1 = np.random.randn(500)
data2 = np.random.randn(500) + 1
fig, ax = plt.subplots()
ax.hist(data1, alpha=0.5, label='Group A')
ax.hist(data2, alpha=0.5, label='Group B')
ax.legend()
```

### Task
Create two overlapping histograms with alpha=0.5 transparency.
Store the returns from both histograms in `n1, bins1, patches1` and `n2, bins2, patches2`.

### Expected Properties
- Both histograms should be visible
- Overlapping regions should show both colors blended

In [None]:
# Your solution:
data1 = np.random.randn(500)
data2 = np.random.randn(500) + 2

n1, bins1, patches1 = None, None, None
n2, bins2, patches2 = None, None, None

In [None]:
# Verification
check.is_not_none(n1, "P9a: First histogram created")
check.is_not_none(n2, "P9b: Second histogram created")

---
## Problem 10: Pie Chart with Explode
**Difficulty:** Medium

### Concept
"Exploding" a pie slice separates it from the rest, drawing attention to that particular category. The `explode` parameter specifies how far each slice is offset.

### Syntax
```python
explode = (0.1, 0, 0, 0)  # Explode first slice
wedges = ax.pie(sizes, labels=labels, explode=explode)
```

### Example
```python
sizes = [30, 40, 30]
labels = ['A', 'B', 'C']
explode = (0, 0.1, 0)  # Explode second slice
fig, ax = plt.subplots()
wedges = ax.pie(sizes, labels=labels, explode=explode)
```

### Task
Create a pie chart with the first slice exploded/separated.
Use the provided explode tuple. Store the wedges in `wedges`.

### Expected Properties
- First slice should be separated from the pie
- Other slices should remain together

In [None]:
# Your solution:
sizes = [30, 25, 20, 15, 10]
labels = ['A', 'B', 'C', 'D', 'E']
explode = (0.1, 0, 0, 0, 0)

wedges = None

In [None]:
# Verification
check.is_not_none(wedges, "P10: Exploded pie chart created")

---
## Problem 11: Box Plot with Notches
**Difficulty:** Medium

### Concept
Notches in box plots display the confidence interval around the median. If notches from two boxes don't overlap, their medians are likely significantly different.

### Syntax
```python
bp = ax.boxplot(data, notch=True)
```

### Example
```python
data = [np.random.randn(100), np.random.randn(100) + 1]
fig, ax = plt.subplots()
bp = ax.boxplot(data, notch=True)
```

### Task
Create box plots with notches showing confidence intervals around the median.
Use the provided data and set notch=True.

### Expected Properties
- Box plots should have notches (indentations) around the median
- Notches represent confidence intervals

In [None]:
# Your solution:
data = [np.random.randn(100), np.random.randn(100) + 1]

bp = None

In [None]:
# Verification
check.is_not_none(bp, "P11: Notched box plot created")

---
## Problem 12: Horizontal Box Plot
**Difficulty:** Medium

### Concept
Horizontal box plots are useful when category labels are long or when you want to emphasize the horizontal spread of data. Set `vert=False` to make them horizontal.

### Syntax
```python
bp = ax.boxplot(data, vert=False)  # vert=False for horizontal
```

### Example
```python
data = [np.random.randn(100), np.random.randn(100) * 2]
fig, ax = plt.subplots()
bp = ax.boxplot(data, vert=False)
```

### Task
Create a horizontal box plot by setting vert=False.
Use the provided data.

### Expected Properties
- Box plots should be horizontal (not vertical)
- Data spread should be shown along the x-axis

In [None]:
# Your solution:
data = [np.random.randn(100), np.random.randn(100) * 2]

bp = None

In [None]:
# Verification
check.is_not_none(bp, "P12: Horizontal box plot created")

---
## Problem 13: Violin Plot
**Difficulty:** Hard

### Concept
Violin plots combine box plots with kernel density estimation, showing the full distribution shape. Wider sections indicate higher probability of values in that range.

### Syntax
```python
vp = ax.violinplot(data)  # Returns dictionary with violin plot elements
```

### Example
```python
data = [np.random.randn(100), np.random.randn(100) + 2]
fig, ax = plt.subplots()
vp = ax.violinplot(data)
```

### Task
Create a violin plot showing distribution shapes for three datasets.
The data is already generated for you. Store the violin plot dictionary in `vp`.

### Expected Properties
- `vp` should be a dictionary containing violin plot elements
- Should show distribution shapes for each dataset

In [None]:
# Your solution:
data = [np.random.randn(100), np.random.randn(100) + 2, np.random.randn(100) * 2]

vp = None

In [None]:
# Verification
check.is_not_none(vp, "P13: Violin plot created")

---
## Problem 14: Histogram with Custom Bin Edges
**Difficulty:** Hard

### Concept
Custom bin edges allow precise control over bin boundaries, useful for specific data ranges or when you need bins aligned with meaningful values.

### Syntax
```python
custom_bins = [0, 1, 2, 5, 10]  # Irregular bin widths
n, bins, patches = ax.hist(data, bins=custom_bins)
```

### Example
```python
data = np.random.randn(1000)
bins = [-4, -2, -1, 0, 1, 2, 4]
fig, ax = plt.subplots()
n, bins, patches = ax.hist(data, bins=bins)
```

### Task
Create a histogram with custom bin edges: [-3, -2, -1, 0, 1, 2, 3].
The custom_bins list is already defined. Verify the bins match by checking length and values.

### Expected Properties
- `bins` array should have 7 elements
- `bins` should match the custom_bins list

In [None]:
# Your solution:
data = np.random.randn(1000)
custom_bins = [-3, -2, -1, 0, 1, 2, 3]

n, bins, patches = None, None, None

In [None]:
# Verification
check.is_not_none(bins, "P14a: Bins created")
check.is_true(len(bins) == 7, "P14b: Correct number of bin edges", "Should have 7 bin edges")
check.is_true(all(bins == custom_bins), "P14c: Bins match custom", "Bins should match custom_bins list")

---
## Problem 15: 2D Histogram
**Difficulty:** Hard

### Concept
2D histograms (heatmaps) show the density of points in 2D space. They're useful for visualizing the joint distribution of two variables or finding areas of high point concentration.

### Syntax
```python
h = ax.hist2d(x, y, bins=20)  # Returns (counts, xedges, yedges, image)
plt.colorbar(h[3])  # h[3] is the image object
```

### Example
```python
x = np.random.randn(1000)
y = np.random.randn(1000)
fig, ax = plt.subplots()
h = ax.hist2d(x, y, bins=30)
plt.colorbar(h[3])
```

### Task
Create a 2D histogram (heatmap of point density) for the provided x and y data.
Add a colorbar to show the density scale. Store the result in `h`.

### Expected Properties
- `h` should be a tuple containing histogram data
- Should show a 2D heatmap with colorbar

In [None]:
# Your solution:
x = np.random.randn(1000)
y = np.random.randn(1000)

h = None

In [None]:
# Verification
check.is_not_none(h, "P15: 2D histogram created")

---
## Problem 16: Pie Chart with Percentages
**Difficulty:** Hard

### Concept
Adding percentage labels to pie charts makes it easier to read exact proportions. The `autopct` parameter formats and displays these values automatically.

### Syntax
```python
wedges, texts, autotexts = ax.pie(sizes, labels=labels, autopct='%1.1f%%')
```

The format string:
- `%1.1f` = floating point with 1 decimal place
- `%%` = literal percent sign

### Example
```python
sizes = [40, 30, 20, 10]
labels = ['A', 'B', 'C', 'D']
fig, ax = plt.subplots()
wedges, texts, autotexts = ax.pie(sizes, labels=labels, autopct='%1.0f%%')
```

### Task
Create a pie chart that displays percentage values on each slice using autopct='%1.1f%%'.
Store wedges, texts, and autotexts from the return value.

### Expected Properties
- `autotexts` should not be None
- Each slice should display its percentage

In [None]:
# Your solution:
sizes = [35, 30, 20, 15]
labels = ['Python', 'Java', 'C++', 'Other']

wedges, texts, autotexts = None, None, None

In [None]:
# Verification
check.is_not_none(autotexts, "P16: Pie chart with percentages created")

---
## Problem 17: Step Histogram
**Difficulty:** Hard

### Concept
Step histograms show only the outline of the distribution without filled bars. They're useful for overlaying multiple distributions clearly or when you want to emphasize the distribution shape.

### Syntax
```python
n, bins, patches = ax.hist(data, histtype='step')
```

Histogram types:
- `'bar'`: traditional filled bars (default)
- `'step'`: outline only
- `'stepfilled'`: filled step histogram

### Example
```python
data = np.random.randn(500)
fig, ax = plt.subplots()
n, bins, patches = ax.hist(data, bins=20, histtype='step', linewidth=2)
```

### Task
Create a step histogram (outline only, no fill) using histtype='step'.
Store the return values in n, bins, and patches.

### Expected Properties
- Histogram should show only outlines (no filled bars)
- `n` should contain the frequency counts

In [None]:
# Your solution:
data = np.random.randn(500)

n, bins, patches = None, None, None

In [None]:
# Verification
check.is_not_none(n, "P17: Step histogram created")

---
## Summary

Run this cell to see your overall progress on this notebook.

In [None]:
check.summary()