# Scatter Plots in Matplotlib

## Overview

**Scatter plots** are essential for visualizing relationships between two continuous variables. They reveal patterns, correlations, clusters, and outliers in your data.

```
Scatter Plot = Points on 2D plane showing (x, y) relationships
```

### What We'll Learn

**1. Basic Scatter Plots** 🔵
- Creating scatter plots
- Markers and customization
- Multiple scatter series

**2. Marker Customization** ⭐
- Marker types and sizes
- Edge colors and widths
- Transparency effects

**3. Color Mapping** 🎨
- Continuous color scales
- Categorical colors
- Colorbars
- Custom colormaps

**4. Bubble Charts** ⚪
- Variable-sized markers
- 3D data representation
- Size scaling techniques

**5. Correlations** 📈
- Trend lines
- Linear regression
- Correlation coefficients
- Confidence intervals

**6. Categorical Scatter** 📊
- Group comparisons
- Strip plots
- Swarm plots
- Jitter techniques

**7. Advanced Techniques** 🚀
- Density scatter plots
- Error bars
- Marginal distributions
- 3D scatter plots

### Why Master Scatter Plots?

```
✓ Visualize correlations
✓ Identify patterns and clusters
✓ Detect outliers
✓ Compare groups
✓ Show 3+ dimensions with color/size
✓ Essential for data exploration
```

### Common Use Cases

- **Science**: Relationship between variables
- **Finance**: Stock price correlations
- **ML/AI**: Feature relationships, clustering
- **Marketing**: Customer segmentation
- **Healthcare**: Patient data analysis
- **Social Sciences**: Survey data visualization

### Learning Objectives

By the end of this notebook, you will:
1. ✅ Create publication-quality scatter plots
2. ✅ Customize markers effectively
3. ✅ Use color to encode additional dimensions
4. ✅ Create bubble charts
5. ✅ Add trend lines and statistics
6. ✅ Visualize categorical data
7. ✅ Apply advanced scatter techniques

Let's explore scatter plots! 🚀

In [None]:
# Standard imports
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib import cm
from matplotlib.colors import Normalize
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

# Display settings
%matplotlib inline

# Set random seed
np.random.seed(42)

print(f"Matplotlib version: {plt.matplotlib.__version__}")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")
print(f"\n✅ Setup complete!")
print("\n📌 Note: scatter() is the main function for scatter plots.")
print("   Use it to visualize relationships between variables.")

## 1. Basic Scatter Plots

### Creating Scatter Plots

```python
# Basic syntax
ax.scatter(x, y)

# With parameters
ax.scatter(x, y,
          s=50,              # Size
          c='blue',          # Color
          marker='o',        # Marker style
          alpha=0.8,         # Transparency
          edgecolors='black', # Edge color
          linewidths=1,      # Edge width
          label='Data')      # Legend label
```

### scatter() vs plot()

```python
# scatter() - Better for:
# • Individual data points
# • Color/size mapping per point
# • Large datasets with varied markers
ax.scatter(x, y, c=colors, s=sizes)

# plot() with markers - Better for:
# • Connected data points
# • Uniform marker styling
# • Simpler plots
ax.plot(x, y, 'o')
```

### Key Parameters

```python
s       # Marker size (scalar or array)
c       # Color (single color or array)
marker  # Marker style ('o', 's', '^', etc.)
alpha   # Transparency (0-1)
cmap    # Colormap for color mapping
vmin    # Minimum color scale value
vmax    # Maximum color scale value
edgecolors  # Edge/border color
linewidths  # Edge width
label   # Legend label
```

### Marker Sizes

```python
# Size in points squared
s=10    # Very small
s=50    # Small (default-ish)
s=100   # Medium
s=200   # Large
s=500   # Very large

# Array of sizes (bubble chart)
sizes = [20, 50, 100, 200, 500]
ax.scatter(x, y, s=sizes)
```

### Multiple Scatter Series

```python
# Method 1: Separate calls
ax.scatter(x1, y1, label='Group 1')
ax.scatter(x2, y2, label='Group 2')

# Method 2: Color mapping
colors = ['red' if g == 0 else 'blue' for g in groups]
ax.scatter(x, y, c=colors)
```

### Marker Styles

```python
'o'   # Circle (most common)
's'   # Square
'^'   # Triangle up
'v'   # Triangle down
'D'   # Diamond
'*'   # Star
'+'   # Plus
'x'   # X
'.'   # Point
'p'   # Pentagon
'h'   # Hexagon
```

### Edge Customization

```python
# No edge
ax.scatter(x, y, edgecolors='none')

# Black edge
ax.scatter(x, y, edgecolors='black', linewidths=1)

# White edge (pop effect)
ax.scatter(x, y, c='red', edgecolors='white', linewidths=2)

# Same as face color
ax.scatter(x, y, edgecolors='face')
```

### Best Practices

```
✓ Use s=50-100 for typical plots
✓ Add alpha=0.6-0.8 for overlapping points
✓ Use edgecolors for definition
✓ Keep marker styles consistent within groups
✗ Don't make markers too large (> 500)
✗ Don't mix too many marker styles
```

In [None]:
print("=== BASIC SCATTER PLOTS ===\n")

# Generate sample data
np.random.seed(42)
n = 100
x = np.random.randn(n)
y = 2 * x + np.random.randn(n) * 0.5

# Example 1: Basic scatter plot
print("Example 1: Basic Scatter Plot")

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Simple scatter
axes[0, 0].scatter(x, y)
axes[0, 0].set_title('Basic Scatter', fontweight='bold', fontsize=12)
axes[0, 0].grid(True, alpha=0.3)

# Custom color and size
axes[0, 1].scatter(x, y, s=100, c='coral', alpha=0.7)
axes[0, 1].set_title('Larger, Colored, Transparent', fontweight='bold', fontsize=12)
axes[0, 1].grid(True, alpha=0.3)

# With edges
axes[1, 0].scatter(x, y, s=80, c='steelblue', 
                   edgecolors='black', linewidths=1, alpha=0.8)
axes[1, 0].set_title('With Black Edges', fontweight='bold', fontsize=12)
axes[1, 0].grid(True, alpha=0.3)

# Different marker
axes[1, 1].scatter(x, y, s=100, c='green', marker='^', 
                   edgecolors='darkgreen', linewidths=1.5, alpha=0.7)
axes[1, 1].set_title('Triangle Markers', fontweight='bold', fontsize=12)
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Example 2: Multiple groups
print("\n" + "="*70)
print("Example 2: Multiple Groups")
print("="*70)

fig, ax = plt.subplots(figsize=(10, 8))

# Generate three groups
n_points = 50
group1_x = np.random.randn(n_points) + 0
group1_y = np.random.randn(n_points) + 0

group2_x = np.random.randn(n_points) + 3
group2_y = np.random.randn(n_points) + 3

group3_x = np.random.randn(n_points) + 6
group3_y = np.random.randn(n_points) + 1

# Plot each group
ax.scatter(group1_x, group1_y, s=80, c='#E64B35', 
          edgecolors='black', linewidths=0.5, alpha=0.7, label='Group A')
ax.scatter(group2_x, group2_y, s=80, c='#4DBBD5', 
          edgecolors='black', linewidths=0.5, alpha=0.7, label='Group B')
ax.scatter(group3_x, group3_y, s=80, c='#00A087', 
          edgecolors='black', linewidths=0.5, alpha=0.7, label='Group C')

ax.set_title('Scatter Plot with Multiple Groups', fontsize=16, fontweight='bold')
ax.set_xlabel('Feature 1', fontsize=12)
ax.set_ylabel('Feature 2', fontsize=12)
ax.legend(fontsize=11, loc='upper left')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Example 3: Marker styles comparison
print("\n" + "="*70)
print("Example 3: Different Marker Styles")
print("="*70)

fig, ax = plt.subplots(figsize=(12, 8))

markers = ['o', 's', '^', 'v', 'D', '*', 'p', 'h']
marker_names = ['Circle', 'Square', 'Triangle Up', 'Triangle Down',
               'Diamond', 'Star', 'Pentagon', 'Hexagon']
colors = plt.cm.Set3(np.linspace(0, 1, len(markers)))

for i, (marker, name, color) in enumerate(zip(markers, marker_names, colors)):
    x_offset = (i % 4) * 3
    y_offset = (i // 4) * 3
    
    ax.scatter([x_offset], [y_offset], s=500, marker=marker, 
              c=[color], edgecolors='black', linewidths=2, alpha=0.8)
    ax.text(x_offset, y_offset - 0.8, name, ha='center', fontsize=11, fontweight='bold')

ax.set_xlim(-1, 10)
ax.set_ylim(-2, 5)
ax.set_title('Marker Style Gallery', fontsize=16, fontweight='bold')
ax.axis('off')

plt.tight_layout()
plt.show()

print("\n💡 Tips:")
print("   • scatter() allows per-point color/size control")
print("   • Use alpha for overlapping points")
print("   • Add edges for better definition")
print("   • Keep marker sizes reasonable (50-200)")

## 2. Color Mapping

### Continuous Color Mapping

Color points based on a continuous variable (3rd dimension):

```python
# Basic color mapping
scatter = ax.scatter(x, y, c=z, cmap='viridis')
plt.colorbar(scatter, ax=ax, label='Z values')
```

### Common Colormaps

```python
# Sequential (single hue)
'viridis'   # Default, perceptually uniform
'plasma'    # Purple to yellow
'inferno'   # Black to yellow
'magma'     # Black to white
'cividis'   # Colorblind-friendly

# Sequential (multi-hue)
'Blues', 'Greens', 'Reds', 'Oranges'
'YlOrRd'    # Yellow-Orange-Red
'YlGnBu'    # Yellow-Green-Blue

# Diverging (two hues from center)
'RdBu'      # Red-Blue
'RdYlGn'    # Red-Yellow-Green
'coolwarm'  # Cool-Warm
'seismic'   # Red-White-Blue

# Qualitative (distinct colors)
'tab10', 'tab20'  # Tableau colors
'Set1', 'Set2', 'Set3'
'Paired', 'Accent'
```

### Color Scale Control

```python
# Set color limits
ax.scatter(x, y, c=z, vmin=0, vmax=100, cmap='viridis')

# Normalize manually
from matplotlib.colors import Normalize
norm = Normalize(vmin=z.min(), vmax=z.max())
ax.scatter(x, y, c=z, norm=norm, cmap='viridis')
```

### Colorbar Customization

```python
scatter = ax.scatter(x, y, c=z, cmap='viridis')

# Add colorbar
cbar = plt.colorbar(scatter, ax=ax)

# Customize colorbar
cbar.set_label('Temperature (°C)', fontsize=12)
cbar.ax.tick_params(labelsize=10)
```

### Categorical Colors

```python
# Method 1: Manual colors
colors = ['red' if c == 'A' else 'blue' for c in categories]
ax.scatter(x, y, c=colors)

# Method 2: Numeric categories
categories_numeric = [0, 1, 0, 1, 2, 2, ...]
ax.scatter(x, y, c=categories_numeric, cmap='tab10')

# Method 3: Pandas categorical
colors = df['category'].map({'A': 'red', 'B': 'blue', 'C': 'green'})
ax.scatter(x, y, c=colors)
```

### Reverse Colormap

```python
# Add '_r' to reverse
ax.scatter(x, y, c=z, cmap='viridis_r')
```

### Custom Colormap

```python
from matplotlib.colors import LinearSegmentedColormap

# Define colors
colors = ['blue', 'white', 'red']
n_bins = 100
cmap = LinearSegmentedColormap.from_list('custom', colors, N=n_bins)

ax.scatter(x, y, c=z, cmap=cmap)
```

### Best Practices

```
✓ Use 'viridis' or 'plasma' as defaults
✓ Use diverging colormaps for data with natural center
✓ Add colorbar for continuous scales
✓ Use 'cividis' for colorblind accessibility
✗ Don't use 'jet' (not perceptually uniform)
✗ Don't use rainbow colors for sequential data
```

In [None]:
print("=== COLOR MAPPING ===\n")

# Generate data
np.random.seed(42)
n = 200
x = np.random.randn(n)
y = np.random.randn(n)
z = x**2 + y**2  # Color by distance from origin

# Example 1: Different colormaps
print("Example 1: Popular Colormaps")

fig, axes = plt.subplots(2, 2, figsize=(14, 10))
colormaps = ['viridis', 'plasma', 'coolwarm', 'RdYlBu_r']
titles = ['Viridis (Default)', 'Plasma', 'Coolwarm (Diverging)', 'RdYlBu Reversed']

for ax, cmap, title in zip(axes.flat, colormaps, titles):
    scatter = ax.scatter(x, y, c=z, s=50, cmap=cmap, alpha=0.7, edgecolors='black', linewidths=0.5)
    ax.set_title(title, fontweight='bold', fontsize=12)
    plt.colorbar(scatter, ax=ax, label='Distance²')
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Example 2: Continuous color mapping with statistics
print("\n" + "="*70)
print("Example 2: Temperature vs Humidity (Color = Air Quality Index)")
print("="*70)

fig, ax = plt.subplots(figsize=(12, 8))

# Generate realistic data
temperature = np.random.normal(25, 5, 150)
humidity = np.random.normal(60, 15, 150)
aqi = 50 + 2*temperature + 0.5*humidity + np.random.randn(150)*10  # Air Quality Index

scatter = ax.scatter(temperature, humidity, c=aqi, s=80, 
                    cmap='RdYlGn_r', alpha=0.7, 
                    edgecolors='black', linewidths=0.5)

# Colorbar
cbar = plt.colorbar(scatter, ax=ax)
cbar.set_label('Air Quality Index (AQI)', fontsize=12, fontweight='bold')
cbar.ax.tick_params(labelsize=10)

ax.set_title('Temperature vs Humidity\nColored by Air Quality Index', 
            fontsize=16, fontweight='bold')
ax.set_xlabel('Temperature (°C)', fontsize=12)
ax.set_ylabel('Humidity (%)', fontsize=12)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Example 3: Categorical colors
print("\n" + "="*70)
print("Example 3: Categorical Color Mapping")
print("="*70)

fig, ax = plt.subplots(figsize=(10, 8))

# Generate categorical data
categories = np.random.choice(['Setosa', 'Versicolor', 'Virginica'], 150)
x = np.random.randn(150)
y = np.random.randn(150)

# Color map
color_map = {'Setosa': '#E64B35', 'Versicolor': '#4DBBD5', 'Virginica': '#00A087'}
colors = [color_map[cat] for cat in categories]

# Plot each category
for category in color_map.keys():
    mask = categories == category
    ax.scatter(x[mask], y[mask], c=color_map[category], s=80,
              label=category, alpha=0.7, edgecolors='black', linewidths=0.5)

ax.set_title('Categorical Scatter Plot (Iris Species)', fontsize=16, fontweight='bold')
ax.set_xlabel('Sepal Length', fontsize=12)
ax.set_ylabel('Sepal Width', fontsize=12)
ax.legend(title='Species', fontsize=11, title_fontsize=12)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n💡 Tips:")
print("   • Use 'viridis' or 'plasma' as default colormaps")
print("   • Use diverging colormaps for data with natural center")
print("   • Always add colorbar for continuous color scales")
print("   • Use 'cividis' for colorblind-friendly plots")

## 3. Bubble Charts

**Bubble charts** are scatter plots where marker size represents a third variable, allowing visualization of 3-4 dimensions simultaneously.

### Basic Bubble Chart

```python
# Size represents third variable
ax.scatter(x, y, s=sizes)

# With color as 4th dimension
ax.scatter(x, y, s=sizes, c=colors, cmap='viridis')
```

### Size Scaling

```python
# Direct sizing (area in points²)
sizes = [20, 50, 100, 200, 500]

# Scale from data
sizes = population / 10000  # Scale down large values
sizes = (values - values.min()) / (values.max() - values.min()) * 500 + 20

# Square root scaling (more intuitive)
sizes = np.sqrt(values) * 10
```

### Size Legend

```python
# Create size legend
for size in [50, 100, 200, 500]:
    ax.scatter([], [], s=size, c='gray', 
              label=f'{size}', alpha=0.6)
ax.legend(title='Size', loc='upper left')
```

### 4D Visualization

```python
# x, y = position
# size = 3rd dimension
# color = 4th dimension
scatter = ax.scatter(x, y, s=sizes, c=colors, 
                    cmap='viridis', alpha=0.6)
plt.colorbar(scatter, label='4th Dimension')
```

### Best Practices

```
✓ Use size for important quantitative variable
✓ Scale sizes appropriately (not too large/small)
✓ Add transparency for overlapping bubbles
✓ Include size legend
✓ Use edges for better definition
✗ Don't use size for too many discrete levels
✗ Don't make bubbles too large (overlap issue)
```

In [None]:
print("=== BUBBLE CHARTS ===\n")

# Example 1: Simple bubble chart
print("Example 1: GDP vs Life Expectancy (Bubble = Population)")

fig, ax = plt.subplots(figsize=(12, 8))

# Generate realistic data
np.random.seed(42)
n_countries = 50
gdp = np.random.lognormal(10, 1, n_countries)  # GDP per capita
life_exp = 50 + 5 * np.log(gdp) + np.random.randn(n_countries) * 3
population = np.random.lognormal(15, 2, n_countries)  # Population

# Scale population for bubble sizes
sizes = (population - population.min()) / (population.max() - population.min()) * 1000 + 50

scatter = ax.scatter(gdp, life_exp, s=sizes, 
                    c='steelblue', alpha=0.5,
                    edgecolors='darkblue', linewidths=1.5)

ax.set_title('GDP per Capita vs Life Expectancy\nBubble size = Population', 
            fontsize=16, fontweight='bold')
ax.set_xlabel('GDP per Capita ($)', fontsize=12)
ax.set_ylabel('Life Expectancy (years)', fontsize=12)
ax.grid(True, alpha=0.3)

# Add size legend
pop_sizes = [population.min(), population.mean(), population.max()]
size_labels = ['Small', 'Medium', 'Large']
for pop, label in zip(pop_sizes, size_labels):
    size = (pop - population.min()) / (population.max() - population.min()) * 1000 + 50
    ax.scatter([], [], s=size, c='steelblue', alpha=0.5, 
              edgecolors='darkblue', linewidths=1.5, label=label)
ax.legend(title='Population', loc='lower right', fontsize=10)

plt.tight_layout()
plt.show()

# Example 2: 4D visualization
print("\n" + "="*70)
print("Example 2: 4D Bubble Chart (x, y, size, color)")
print("="*70)

fig, ax = plt.subplots(figsize=(12, 8))

# Generate 4 dimensions
n = 100
x = np.random.randn(n) * 10 + 50  # Feature 1
y = np.random.randn(n) * 15 + 100  # Feature 2
sizes = np.random.rand(n) * 500 + 50  # Feature 3
colors = np.random.rand(n)  # Feature 4

scatter = ax.scatter(x, y, s=sizes, c=colors, 
                    cmap='viridis', alpha=0.6,
                    edgecolors='black', linewidths=0.5)

# Colorbar
cbar = plt.colorbar(scatter, ax=ax)
cbar.set_label('Feature 4 (Color)', fontsize=12, fontweight='bold')

ax.set_title('4D Bubble Chart\nPosition (x, y) + Size + Color', 
            fontsize=16, fontweight='bold')
ax.set_xlabel('Feature 1', fontsize=12)
ax.set_ylabel('Feature 2', fontsize=12)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Example 3: Categorical bubble chart
print("\n" + "="*70)
print("Example 3: Sales Performance by Region")
print("="*70)

fig, ax = plt.subplots(figsize=(12, 8))

# Data for different regions
regions = ['North', 'South', 'East', 'West']
colors_map = {'North': '#E64B35', 'South': '#4DBBD5', 
             'East': '#00A087', 'West': '#F39B7F'}

for region in regions:
    n = 15
    revenue = np.random.lognormal(12, 0.5, n)
    profit_margin = np.random.normal(0.15, 0.05, n)
    market_share = np.random.rand(n) * 1000 + 100
    
    ax.scatter(revenue, profit_margin, s=market_share,
              c=colors_map[region], label=region, alpha=0.6,
              edgecolors='black', linewidths=1)

ax.set_title('Sales Performance by Region\nBubble size = Market Share', 
            fontsize=16, fontweight='bold')
ax.set_xlabel('Revenue ($)', fontsize=12)
ax.set_ylabel('Profit Margin', fontsize=12)
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax.legend(title='Region', fontsize=11, title_fontsize=12)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n💡 Tips:")
print("   • Scale bubble sizes appropriately")
print("   • Use transparency for overlapping bubbles")
print("   • Add size legend for interpretation")
print("   • Combine size + color for 4D visualization")

## 4. Correlations & Trend Lines

### Linear Regression Line

```python
from scipy import stats

# Calculate regression
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)

# Create line
line_x = np.array([x.min(), x.max()])
line_y = slope * line_x + intercept

# Plot
ax.plot(line_x, line_y, 'r--', linewidth=2, label=f'R² = {r_value**2:.3f}')
```

### Polynomial Fit

```python
# Fit polynomial
coeffs = np.polyfit(x, y, deg=2)  # degree 2 (quadratic)
poly = np.poly1d(coeffs)

# Create smooth line
x_line = np.linspace(x.min(), x.max(), 100)
y_line = poly(x_line)

ax.plot(x_line, y_line, 'r--', linewidth=2)
```

### Confidence Intervals

```python
from scipy import stats

# Calculate prediction interval
def prediction_interval(x, y, x_pred, confidence=0.95):
    n = len(x)
    slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
    
    # Predicted y
    y_pred = slope * x_pred + intercept
    
    # Standard error
    residuals = y - (slope * x + intercept)
    s_res = np.sqrt(np.sum(residuals**2) / (n - 2))
    
    # Margin
    t = stats.t.ppf((1 + confidence) / 2, n - 2)
    margin = t * s_res * np.sqrt(1/n + (x_pred - x.mean())**2 / np.sum((x - x.mean())**2))
    
    return y_pred - margin, y_pred + margin

# Plot confidence interval
lower, upper = prediction_interval(x, y, x_line)
ax.fill_between(x_line, lower, upper, alpha=0.2, label='95% CI')
```

### Correlation Coefficient

```python
# Pearson correlation
corr = np.corrcoef(x, y)[0, 1]
print(f'Correlation: {corr:.3f}')

# With p-value
corr, p_value = stats.pearsonr(x, y)
print(f'Correlation: {corr:.3f}, p-value: {p_value:.4f}')

# Spearman (rank) correlation
corr, p_value = stats.spearmanr(x, y)
```

### Annotating Statistics

```python
# Add statistics to plot
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)

text = f'y = {slope:.2f}x + {intercept:.2f}\n'
text += f'R² = {r_value**2:.3f}\n'
text += f'p < {p_value:.4f}'

ax.text(0.05, 0.95, text, transform=ax.transAxes,
       fontsize=12, verticalalignment='top',
       bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
```

### Multiple Regression Lines

```python
# For different groups
for group in groups:
    mask = categories == group
    x_g, y_g = x[mask], y[mask]
    
    slope, intercept, r, p, se = stats.linregress(x_g, y_g)
    line_y = slope * line_x + intercept
    
    ax.plot(line_x, line_y, '--', linewidth=2)
```

### Best Practices

```
✓ Always visualize data before fitting
✓ Report R² and p-value
✓ Use confidence intervals
✓ Check for outliers
✓ Consider non-linear relationships
✗ Don't extrapolate beyond data range
✗ Don't assume causation from correlation
```

In [None]:
print("=== CORRELATIONS & TREND LINES ===\n")

# Example 1: Linear regression with confidence interval
print("Example 1: Linear Regression with Confidence Interval")

fig, ax = plt.subplots(figsize=(12, 8))

# Generate data with linear relationship
np.random.seed(42)
n = 100
x = np.random.uniform(0, 10, n)
y = 2 * x + 5 + np.random.randn(n) * 2

# Scatter plot
ax.scatter(x, y, s=60, alpha=0.6, edgecolors='black', linewidths=0.5)

# Linear regression
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)

# Regression line
x_line = np.linspace(x.min(), x.max(), 100)
y_line = slope * x_line + intercept
ax.plot(x_line, y_line, 'r--', linewidth=2.5, label='Linear Fit')

# Confidence interval
residuals = y - (slope * x + intercept)
s_res = np.sqrt(np.sum(residuals**2) / (n - 2))
t = stats.t.ppf(0.975, n - 2)  # 95% confidence
margin = t * s_res * np.sqrt(1/n + (x_line - x.mean())**2 / np.sum((x - x.mean())**2))
ax.fill_between(x_line, y_line - margin, y_line + margin, 
                alpha=0.2, color='red', label='95% CI')

# Add statistics
text = f'y = {slope:.2f}x + {intercept:.2f}\n'
text += f'R² = {r_value**2:.3f}\n'
text += f'p < {p_value:.4f}'
ax.text(0.05, 0.95, text, transform=ax.transAxes,
       fontsize=12, verticalalignment='top',
       bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.7))

ax.set_title('Linear Regression with 95% Confidence Interval', 
            fontsize=16, fontweight='bold')
ax.set_xlabel('X Variable', fontsize=12)
ax.set_ylabel('Y Variable', fontsize=12)
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Example 2: Multiple correlations
print("\n" + "="*70)
print("Example 2: Correlation Strength Comparison")
print("="*70)

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Different correlation strengths
correlations = [0.95, 0.7, 0.3, 0.05]
titles = ['Strong Positive (r=0.95)', 'Moderate Positive (r=0.70)', 
         'Weak Positive (r=0.30)', 'No Correlation (r=0.05)']

for ax, corr, title in zip(axes.flat, correlations, titles):
    # Generate correlated data
    x = np.random.randn(100)
    y = corr * x + np.sqrt(1 - corr**2) * np.random.randn(100)
    
    # Scatter
    ax.scatter(x, y, s=50, alpha=0.6, edgecolors='black', linewidths=0.5)
    
    # Regression line
    slope, intercept, r, p, se = stats.linregress(x, y)
    x_line = np.array([x.min(), x.max()])
    y_line = slope * x_line + intercept
    ax.plot(x_line, y_line, 'r--', linewidth=2, label=f'R² = {r**2:.3f}')
    
    ax.set_title(title, fontweight='bold', fontsize=11)
    ax.legend(fontsize=10)
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Example 3: Polynomial fit
print("\n" + "="*70)
print("Example 3: Polynomial vs Linear Fit")
print("="*70)

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Generate non-linear data
x = np.linspace(-3, 3, 100)
y = x**2 - 2*x + 1 + np.random.randn(100) * 2

# Linear fit
axes[0].scatter(x, y, s=50, alpha=0.6, edgecolors='black', linewidths=0.5)
slope, intercept, r, p, se = stats.linregress(x, y)
y_lin = slope * x + intercept
axes[0].plot(x, y_lin, 'r--', linewidth=2.5, label=f'Linear (R²={r**2:.3f})')
axes[0].set_title('Linear Fit (Poor)', fontsize=14, fontweight='bold')
axes[0].legend(fontsize=11)
axes[0].grid(True, alpha=0.3)

# Polynomial fit
axes[1].scatter(x, y, s=50, alpha=0.6, edgecolors='black', linewidths=0.5)
coeffs = np.polyfit(x, y, 2)
poly = np.poly1d(coeffs)
y_poly = poly(x)
ss_res = np.sum((y - y_poly)**2)
ss_tot = np.sum((y - y.mean())**2)
r_squared = 1 - (ss_res / ss_tot)
axes[1].plot(x, y_poly, 'g--', linewidth=2.5, label=f'Quadratic (R²={r_squared:.3f})')
axes[1].set_title('Polynomial Fit (Better)', fontsize=14, fontweight='bold')
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n💡 Tips:")
print("   • Always visualize data before fitting")
print("   • Report R² and p-value")
print("   • Use confidence intervals")
print("   • Consider non-linear relationships")

## 5. Advanced Scatter Techniques

### Density Scatter Plots

For large datasets with overlapping points:

```python
# Method 1: Hexbin
ax.hexbin(x, y, gridsize=30, cmap='Blues')
plt.colorbar(label='Count')

# Method 2: 2D histogram
h = ax.hist2d(x, y, bins=50, cmap='Blues')
plt.colorbar(h[3], ax=ax, label='Count')

# Method 3: Contour
from scipy.stats import gaussian_kde
xy = np.vstack([x, y])
z = gaussian_kde(xy)(xy)
ax.scatter(x, y, c=z, s=20, cmap='viridis')
```

### Marginal Distributions

```python
# Using GridSpec
import matplotlib.gridspec as gridspec

gs = gridspec.GridSpec(3, 3)
ax_main = plt.subplot(gs[1:, :-1])
ax_xhist = plt.subplot(gs[0, :-1], sharex=ax_main)
ax_yhist = plt.subplot(gs[1:, -1], sharey=ax_main)

# Main scatter
ax_main.scatter(x, y)

# Marginal histograms
ax_xhist.hist(x, bins=30)
ax_yhist.hist(y, bins=30, orientation='horizontal')
```

### Error Bars

```python
# Scatter with error bars
ax.errorbar(x, y, xerr=x_err, yerr=y_err,
           fmt='o', markersize=6,
           ecolor='gray', capsize=3, capthick=1)
```

### Jitter for Categorical Data

```python
# Add random jitter to avoid overlap
categories = [0, 1, 2, 0, 1, 2, ...]
jitter = 0.1
x_jittered = categories + np.random.uniform(-jitter, jitter, len(categories))
ax.scatter(x_jittered, values)
```

### 3D Scatter Plots

```python
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

ax.scatter(x, y, z, c=colors, marker='o', s=50)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
```

### Animated Scatter

```python
from matplotlib.animation import FuncAnimation

fig, ax = plt.subplots()
scatter = ax.scatter([], [])

def update(frame):
    # Update scatter data
    scatter.set_offsets(data[frame])
    return scatter,

anim = FuncAnimation(fig, update, frames=n_frames, interval=50)
```

### Connected Scatter (Path)

```python
# Show trajectory
ax.plot(x, y, 'o-', linewidth=1, markersize=6, alpha=0.6)

# Or with arrows
for i in range(len(x)-1):
    ax.annotate('', xy=(x[i+1], y[i+1]), xytext=(x[i], y[i]),
               arrowprops=dict(arrowstyle='->', lw=1))
```

### Best Practices

```
✓ Use hexbin/hist2d for large datasets (> 1000 points)
✓ Add marginal distributions for context
✓ Use jitter for categorical/discrete data
✓ Consider 3D for 3 continuous variables
✗ Don't use 3D unnecessarily (harder to read)
✗ Don't overplot without transparency
```

In [None]:
print("=== ADVANCED SCATTER TECHNIQUES ===\n")

# Example 1: Density scatter (hexbin)
print("Example 1: Density Scatter Plot (Large Dataset)")

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Generate large dataset
np.random.seed(42)
n = 10000
x = np.random.multivariate_normal([0, 0], [[1, 0.5], [0.5, 1]], n).T

# Regular scatter (hard to see density)
axes[0].scatter(x[0], x[1], s=5, alpha=0.3)
axes[0].set_title('Regular Scatter (10,000 points)\nHard to see density', 
                 fontweight='bold', fontsize=12)
axes[0].grid(True, alpha=0.3)

# Hexbin (shows density)
hb = axes[1].hexbin(x[0], x[1], gridsize=30, cmap='YlOrRd', mincnt=1)
axes[1].set_title('Hexbin Plot\nClearly shows density', 
                 fontweight='bold', fontsize=12)
plt.colorbar(hb, ax=axes[1], label='Count')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Example 2: Scatter with marginal distributions
print("\n" + "="*70)
print("Example 2: Scatter with Marginal Histograms")
print("="*70)

import matplotlib.gridspec as gridspec

# Create figure
fig = plt.figure(figsize=(10, 10))
gs = gridspec.GridSpec(3, 3, hspace=0.05, wspace=0.05)

ax_main = fig.add_subplot(gs[1:, :-1])
ax_xhist = fig.add_subplot(gs[0, :-1], sharex=ax_main)
ax_yhist = fig.add_subplot(gs[1:, -1], sharey=ax_main)

# Generate data
x = np.random.randn(500)
y = 0.5 * x + np.random.randn(500) * 0.5

# Main scatter
ax_main.scatter(x, y, s=30, alpha=0.5, edgecolors='black', linewidths=0.3)
ax_main.set_xlabel('X Variable', fontsize=12)
ax_main.set_ylabel('Y Variable', fontsize=12)
ax_main.grid(True, alpha=0.3)

# Marginal histograms
ax_xhist.hist(x, bins=30, color='steelblue', alpha=0.7)
ax_xhist.set_ylabel('Frequency', fontsize=10)
ax_xhist.tick_params(labelbottom=False)
ax_xhist.grid(True, alpha=0.3, axis='y')

ax_yhist.hist(y, bins=30, orientation='horizontal', color='coral', alpha=0.7)
ax_yhist.set_xlabel('Frequency', fontsize=10)
ax_yhist.tick_params(labelleft=False)
ax_yhist.grid(True, alpha=0.3, axis='x')

fig.suptitle('Scatter with Marginal Distributions', 
            fontsize=16, fontweight='bold', y=0.98)

plt.show()

# Example 3: 3D scatter plot
print("\n" + "="*70)
print("Example 3: 3D Scatter Plot")
print("="*70)

from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure(figsize=(12, 9))
ax = fig.add_subplot(111, projection='3d')

# Generate 3D data
n = 300
x = np.random.randn(n)
y = np.random.randn(n)
z = x**2 + y**2 + np.random.randn(n) * 0.5
colors = z

scatter = ax.scatter(x, y, z, c=colors, cmap='viridis', 
                    s=50, alpha=0.6, edgecolors='black', linewidths=0.5)

ax.set_xlabel('X Variable', fontsize=12)
ax.set_ylabel('Y Variable', fontsize=12)
ax.set_zlabel('Z Variable', fontsize=12)
ax.set_title('3D Scatter Plot', fontsize=16, fontweight='bold', pad=20)

# Colorbar
cbar = plt.colorbar(scatter, ax=ax, pad=0.1, shrink=0.8)
cbar.set_label('Color Scale', fontsize=10)

plt.tight_layout()
plt.show()

# Example 4: Jittered categorical scatter
print("\n" + "="*70)
print("Example 4: Categorical Scatter with Jitter")
print("="*70)

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Generate categorical data
categories = np.random.choice([0, 1, 2], 150)
values = categories * 10 + np.random.randn(150) * 3

# Without jitter (overlapping)
axes[0].scatter(categories, values, s=50, alpha=0.6)
axes[0].set_title('Without Jitter\n(Points overlap)', 
                 fontweight='bold', fontsize=12)
axes[0].set_xlabel('Category')
axes[0].set_ylabel('Value')
axes[0].set_xticks([0, 1, 2])
axes[0].set_xticklabels(['Group A', 'Group B', 'Group C'])
axes[0].grid(True, alpha=0.3, axis='y')

# With jitter (spread out)
jitter = 0.15
x_jittered = categories + np.random.uniform(-jitter, jitter, len(categories))
axes[1].scatter(x_jittered, values, s=50, alpha=0.6)
axes[1].set_title('With Jitter\n(Better visibility)', 
                 fontweight='bold', fontsize=12)
axes[1].set_xlabel('Category')
axes[1].set_ylabel('Value')
axes[1].set_xticks([0, 1, 2])
axes[1].set_xticklabels(['Group A', 'Group B', 'Group C'])
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("\n💡 Tips:")
print("   • Use hexbin for large datasets (>1000 points)")
print("   • Add marginal distributions for context")
print("   • Use jitter for categorical data")
print("   • 3D plots are cool but harder to interpret")

## Practice Exercises

### Beginner Level

**1. Basic Scatter**
```
Create a scatter plot with:
  • 100 random points
  • Size = 80
  • Color = 'steelblue'
  • 70% transparency
  • Black edges
```

**2. Multiple Groups**
```
Create scatter plot with 3 groups:
  • Different colors for each
  • Add legend
  • Use consistent marker sizes
```

**3. Marker Styles**
```
Compare different markers:
  • Circle, square, triangle
  • Same color, different shapes
  • Add legend for each
```

**4. Color Mapping**
```
Create scatter with continuous color:
  • Use 'viridis' colormap
  • Add colorbar
  • Label the colorbar
```

**5. Edge Customization**
```
Create scatter with:
  • White edges (pop effect)
  • 2pt edge width
  • Colored face
```

### Intermediate Level

**6. Simple Bubble Chart**
```
Create bubble chart where:
  • x, y = position
  • Size varies (50-500)
  • Add size legend
  • Use transparency
```

**7. Linear Regression**
```
Create scatter with trend line:
  • Calculate slope, intercept
  • Plot regression line
  • Show R² value
  • Add equation as text
```

**8. Categorical Colors**
```
Visualize iris dataset:
  • 3 species, different colors
  • Sepal length vs width
  • Add legend
  • Use professional colors
```

**9. Color Scale Control**
```
Create scatter with:
  • Custom vmin, vmax
  • Diverging colormap
  • Center at 0
  • Add colorbar
```

**10. Overlapping Points**
```
Handle 1000+ overlapping points:
  • Use alpha = 0.3
  • Small markers (s=20)
  • Or use hexbin
```

### Advanced Level

**11. 4D Visualization**
```
Create bubble chart with:
  • x, y = position
  • Size = 3rd dimension
  • Color = 4th dimension
  • Add both colorbar and size legend
```

**12. Confidence Interval**
```
Create regression plot with:
  • Scatter points
  • Regression line
  • 95% confidence interval (shaded)
  • Statistics in text box
```

**13. Marginal Distributions**
```
Create plot with:
  • Central scatter
  • Top histogram (x distribution)
  • Right histogram (y distribution)
  • Aligned axes
```

**14. Density Scatter**
```
For 10,000 points:
  • Compare regular scatter vs hexbin
  • Side-by-side plots
  • Show density advantage
```

**15. Correlation Matrix**
```
Create scatter plot matrix:
  • 3-4 variables
  • All pairwise comparisons
  • Diagonal = histograms
  • Color by category
```

### Challenge Problems

**16. Interactive Annotations**
```
Create scatter with:
  • Automatically annotate outliers
  • Mark points > 2 std from mean
  • Add arrows to outliers
  • Label with coordinates
```

**17. Time Series Scatter**
```
Show trajectory over time:
  • Connected scatter points
  • Color gradient by time
  • Arrows showing direction
  • Mark start/end
```

**18. Multi-Panel Analysis**
```
Create 2×2 subplot:
  • Raw scatter
  • With regression line
  • Residuals plot
  • QQ plot
```

**19. Custom Colormap**
```
Design custom colormap:
  • Corporate colors
  • Apply to scatter
  • Test colorblind-safe
  • Add professional colorbar
```

**20. Complete Analysis Dashboard**
```
Build comprehensive scatter analysis:
  • Multiple scatter plots
  • Different views of same data
  • Statistics annotations
  • Professional styling
  • Consistent color scheme
```

## Quick Reference Card

### Basic Scatter

```python
# Simple
ax.scatter(x, y)

# Customized
ax.scatter(x, y,
          s=100,              # Size
          c='blue',           # Color
          marker='o',         # Marker
          alpha=0.7,          # Transparency
          edgecolors='black', # Edge color
          linewidths=1,       # Edge width
          label='Data')       # Legend
```

### Color Mapping

```python
# Continuous
scatter = ax.scatter(x, y, c=z, cmap='viridis')
plt.colorbar(scatter, label='Z values')

# With limits
ax.scatter(x, y, c=z, vmin=0, vmax=100, cmap='viridis')

# Categorical
colors = ['red' if c == 'A' else 'blue' for c in categories]
ax.scatter(x, y, c=colors)
```

### Colormaps

```python
# Sequential
'viridis', 'plasma', 'inferno', 'magma', 'cividis'
'Blues', 'Greens', 'Reds', 'YlOrRd'

# Diverging
'RdBu', 'RdYlGn', 'coolwarm', 'seismic'

# Qualitative
'tab10', 'tab20', 'Set1', 'Set2', 'Paired'

# Reverse
'viridis_r', 'RdBu_r'
```

### Bubble Charts

```python
# Variable sizes
sizes = [50, 100, 200, 500, 1000]
ax.scatter(x, y, s=sizes)

# 4D visualization
scatter = ax.scatter(x, y, s=sizes, c=colors, 
                    cmap='viridis', alpha=0.6)
plt.colorbar(scatter)
```

### Linear Regression

```python
from scipy import stats

# Calculate
slope, intercept, r, p, se = stats.linregress(x, y)

# Plot line
x_line = np.array([x.min(), x.max()])
y_line = slope * x_line + intercept
ax.plot(x_line, y_line, 'r--', lw=2, 
       label=f'R² = {r**2:.3f}')
```

### Polynomial Fit

```python
# Fit
coeffs = np.polyfit(x, y, deg=2)
poly = np.poly1d(coeffs)

# Plot
x_line = np.linspace(x.min(), x.max(), 100)
y_line = poly(x_line)
ax.plot(x_line, y_line, 'r--', lw=2)
```

### Density Plots

```python
# Hexbin (best for large data)
hb = ax.hexbin(x, y, gridsize=30, cmap='Blues')
plt.colorbar(hb, label='Count')

# 2D histogram
h = ax.hist2d(x, y, bins=50, cmap='Blues')
plt.colorbar(h[3])
```

### Jitter

```python
# Add random offset
jitter = 0.1
x_jittered = x + np.random.uniform(
    -jitter, jitter, len(x))
ax.scatter(x_jittered, y)
```

### 3D Scatter

```python
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x, y, z, c=colors, s=50)
```

### Markers

```python
'o'  # Circle (default)
's'  # Square
'^'  # Triangle up
'v'  # Triangle down
'D'  # Diamond
'*'  # Star
'+'  # Plus
'x'  # X
'p'  # Pentagon
'h'  # Hexagon
```

### Size Guidelines

```python
s=10    # Very small
s=50    # Small
s=100   # Medium (default-ish)
s=200   # Large
s=500   # Very large
```

### Complete Template

```python
fig, ax = plt.subplots(figsize=(10, 8))

# Scatter with color and size
scatter = ax.scatter(x, y,
                    s=sizes,
                    c=colors,
                    cmap='viridis',
                    alpha=0.7,
                    edgecolors='black',
                    linewidths=0.5)

# Colorbar
cbar = plt.colorbar(scatter, ax=ax)
cbar.set_label('Color Scale', fontsize=12)

# Regression line
slope, intercept, r, p, se = stats.linregress(x, y)
x_line = np.array([x.min(), x.max()])
y_line = slope * x_line + intercept
ax.plot(x_line, y_line, 'r--', lw=2, 
       label=f'R² = {r**2:.3f}')

# Labels and styling
ax.set_title('Scatter Plot', fontsize=16, fontweight='bold')
ax.set_xlabel('X Variable', fontsize=12)
ax.set_ylabel('Y Variable', fontsize=12)
ax.legend(loc='best', fontsize=11)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
```

### Best Practices

```
✓ Use s=50-100 for typical plots
✓ Add alpha=0.6-0.8 for overlapping
✓ Use edgecolors for definition
✓ Add colorbar for color mapping
✓ Use 'viridis' as default colormap
✓ Include regression line for correlation
✗ Don't make markers too large
✗ Don't use 'jet' colormap
✗ Don't forget to label colorbars
```

## Summary

### What We Learned 🎓

**1. Basic Scatter Plots**
- Creating scatter plots with `ax.scatter()`
- Marker customization (size, style, edges)
- Multiple scatter series
- Transparency and overlapping

**2. Marker Customization**
- All marker types and styles
- Size control (scalar and arrays)
- Edge colors and widths
- Alpha transparency

**3. Color Mapping**
- Continuous color scales
- Categorical colors
- Colormaps (viridis, plasma, diverging)
- Colorbar customization
- Color scale control (vmin, vmax)

**4. Bubble Charts**
- Variable-sized markers
- Size scaling techniques
- 4D visualization (x, y, size, color)
- Size legends

**5. Correlations**
- Linear regression lines
- Polynomial fits
- Confidence intervals
- R² and p-values
- Statistics annotations

**6. Advanced Techniques**
- Density scatter (hexbin, hist2d)
- Marginal distributions
- Jitter for categorical data
- 3D scatter plots
- Error bars

---

### Key Takeaways 💡

**Best Practices:**

```
✓ Use scatter() for individual data points
✓ Add transparency (alpha=0.6-0.8) for overlap
✓ Use edgecolors for better definition
✓ Add colorbar for continuous color scales
✓ Use 'viridis' or 'plasma' as default
✓ Scale bubble sizes appropriately
✓ Include regression lines for correlations
✓ Use hexbin for large datasets (> 1000)
✓ Add jitter for categorical data
✓ Report R² and p-value for trends
```

**Common Mistakes:**

```
✗ Markers too large (> 500)
✗ No transparency with overlapping points
✗ Using 'jet' colormap (not perceptually uniform)
✗ Missing colorbar labels
✗ No size legend for bubble charts
✗ Extrapolating beyond data range
✗ Assuming causation from correlation
```

---

### Use Case Guide

**Basic Relationship:**
```python
ax.scatter(x, y, s=60, alpha=0.7, edgecolors='black')
```

**Correlation Analysis:**
```python
ax.scatter(x, y, s=60, alpha=0.7)
# Add regression line
slope, intercept, r, p, se = stats.linregress(x, y)
ax.plot(x_line, y_line, 'r--', label=f'R²={r**2:.3f}')
```

**3D Data (color = 3rd dimension):**
```python
scatter = ax.scatter(x, y, c=z, cmap='viridis', s=60)
plt.colorbar(scatter, label='Z')
```

**4D Data (size + color):**
```python
scatter = ax.scatter(x, y, s=sizes, c=colors, 
                    cmap='viridis', alpha=0.6)
plt.colorbar(scatter, label='Color')
# Add size legend
```

**Group Comparison:**
```python
for group in groups:
    mask = categories == group
    ax.scatter(x[mask], y[mask], label=group, s=60)
ax.legend()
```

**Large Dataset (> 1000 points):**
```python
hb = ax.hexbin(x, y, gridsize=30, cmap='Blues')
plt.colorbar(hb, label='Count')
```

**Categorical Data:**
```python
jitter = 0.1
x_jittered = categories + np.random.uniform(
    -jitter, jitter, len(categories))
ax.scatter(x_jittered, values)
```

---

### Colormap Selection Guide

**Sequential Data (low to high):**
- Use: `'viridis'`, `'plasma'`, `'Blues'`, `'YlOrRd'`
- Example: Temperature, population, sales

**Diverging Data (two directions from center):**
- Use: `'RdBu'`, `'RdYlGn'`, `'coolwarm'`
- Example: Profit/loss, correlation, change

**Categorical Data (distinct groups):**
- Use: `'tab10'`, `'Set1'`, `'Paired'`
- Example: Species, regions, categories

**Colorblind-Safe:**
- Use: `'cividis'`, `'viridis'`
- Always test in grayscale

---

### Next Steps 🚀

You've mastered scatter plots! Next notebooks:

1. **05_bar_charts.ipynb** - Bar charts and variations
2. **06_histograms.ipynb** - Distribution visualization
3. **07_heatmaps.ipynb** - Heatmaps and correlation matrices
4. **13_3d_plots.ipynb** - Advanced 3D visualization

---

### Resources 📚

- **Scatter docs**: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html
- **Colormaps**: https://matplotlib.org/stable/tutorials/colors/colormaps.html
- **Colorblind**: https://davidmathlogic.com/colorblind/
- **SciPy stats**: https://docs.scipy.org/doc/scipy/reference/stats.html

---

**Congratulations! You've mastered scatter plots! 🎉**

Practice with real datasets to solidify your skills!