# 116: Data Visualization Mastery

## üéØ Learning Objectives

By the end of this notebook, you will:
- **Understand** visualization principles: data-ink ratio, visual encoding, perception
- **Master** Matplotlib: subplots, styling, customization, publication-quality plots
- **Build** statistical visualizations with Seaborn: distributions, relationships, categorical
- **Create** interactive plots with Plotly: 3D, animations, dashboards
- **Apply** visualization best practices: choosing chart types, color theory, accessibility
- **Design** post-silicon dashboards for wafer maps, parametric trends, yield analysis

## üìö What is Data Visualization?

**Data visualization** transforms data into graphical representations that reveal patterns, trends, and insights faster than raw numbers. Effective visualization is both **science** (accurate encoding) and **art** (aesthetic design).

**Core principles:**
- **Visual Encoding**: Map data to visual channels (position, color, size, shape)
- **Data-Ink Ratio**: Maximize information per ink (minimize chartjunk)
- **Preattentive Processing**: Brain detects patterns in <200ms (color, size, orientation)
- **Perception Hierarchy**: Position > Length > Angle > Area > Color

**Why Data Visualization?**
- ‚úÖ **Pattern Recognition**: Human visual system processes images 60,000√ó faster than text
- ‚úÖ **Communication**: Stakeholders grasp insights in seconds vs minutes with tables
- ‚úÖ **Exploration**: Interactive plots reveal unexpected correlations
- ‚úÖ **Decision Support**: Executives act faster with clear trend visualizations

## üè≠ Post-Silicon Validation Use Cases

**Wafer Map Visualization**
- Input: Spatial die data (die_x, die_y, bin, parametric values) for 50K dies/wafer
- Visualization: Heatmap colored by bin/parameter, overlay fab defect annotations
- Output: Identify edge effects, radial patterns, cluster defects ‚Üí root cause (lithography, contamination)
- Value: Reduce debug time 50%, communicate fab issues visually to process engineers

**Parametric Trend Dashboard**
- Input: Daily Vdd/Idd/freq averages per lot for 12 months (3K data points)
- Visualization: Multi-line chart with control limits (¬±3œÉ), anomaly highlights, seasonal decomposition
- Output: Detect drift before yield impact, trigger investigation when trends cross limits
- Value: Early warning system, prevent yield excursions worth $500K-$2M per event

**Yield Analysis Funnel**
- Input: Stage-by-stage yield (wafer fab ‚Üí sort ‚Üí final test ‚Üí burn-in) for 100 lots
- Visualization: Sankey diagram showing loss at each stage, Pareto chart of failure bins
- Output: Identify dominant failure mode, quantify yield impact by test stage
- Value: Focus engineering resources on highest-impact issues

**Multi-Parameter Correlation Matrix**
- Input: 20 parametric tests √ó 10K devices (Vdd, Idd, freq, power, timing, etc.)
- Visualization: Correlation heatmap, pairplot with distributions, parallel coordinates
- Output: Find redundant tests (high correlation), identify composite failure signatures
- Value: Test time reduction 15-30%, improved test coverage

## üîÑ Data Visualization Workflow

```mermaid
graph LR
    A[Define Question] --> B[Understand Data]
    B --> C{Data Type?}
    C -->|Continuous| D[Histogram, Boxplot, Violin]
    C -->|Categorical| E[Bar, Pie, Treemap]
    C -->|Relationship| F[Scatter, Line, Heatmap]
    C -->|Time Series| G[Line, Area, Candlestick]
    C -->|Spatial| H[Wafer Map, Choropleth]
    D --> I[Choose Visual Encoding]
    E --> I
    F --> I
    G --> I
    H --> I
    I --> J[Create Initial Plot]
    J --> K[Refine: Labels, Colors, Legends]
    K --> L{Audience?}
    L -->|Technical| M[Detailed, Annotations]
    L -->|Executive| N[Simplified, Key Message]
    M --> O[Add Interactivity]
    N --> O
    O --> P[Test Accessibility]
    P --> Q[Publish/Present]
    
    style A fill:#e1f5ff
    style Q fill:#e1ffe1
    style K fill:#fffacd
```

## üìä Learning Path Context

**Prerequisites:**
- 010: Linear Regression (scatter plots, residuals)
- 114: Time Series Forecasting (temporal visualizations)

**Next Steps:**
- 117: Streamlit App Development (interactive dashboards)
- 120: Advanced Dashboard Design (real-time visualizations)

---

Let's make data beautiful! üé®

## 1. Setup & Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from matplotlib.gridspec import GridSpec
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Interactive visualization
try:
    import plotly.express as px
    import plotly.graph_objects as go
    from plotly.subplots import make_subplots
    print("‚úÖ Plotly loaded successfully!")
except ImportError:
    print("‚ö†Ô∏è Plotly not installed. Installing now...")
    import subprocess
    subprocess.check_call(['pip', 'install', 'plotly', 'kaleido'])
    import plotly.express as px
    import plotly.graph_objects as go
    from plotly.subplots import make_subplots
    print("‚úÖ Plotly installed and loaded!")

# Matplotlib style settings
plt.style.use('seaborn-v0_8-darkgrid')  # Modern seaborn style
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11
plt.rcParams['axes.labelsize'] = 12
plt.rcParams['axes.titlesize'] = 14
plt.rcParams['legend.fontsize'] = 10
plt.rcParams['xtick.labelsize'] = 10
plt.rcParams['ytick.labelsize'] = 10

# Seaborn settings
sns.set_palette('husl')  # Perceptually uniform colors

# Random seed
np.random.seed(42)

print(f"Matplotlib: {plt.matplotlib.__version__}")
print(f"Seaborn: {sns.__version__}")
print(f"NumPy: {np.__version__}")
print(f"Pandas: {pd.__version__}")
print(f"\nüé® Visualization libraries ready!")

## 2. Matplotlib Mastery: Wafer Map Visualization

**Purpose:** Create publication-quality wafer maps showing spatial die patterns.

**Key Points:**
- **Subplot Grid**: Complex layouts with GridSpec (flexible row/column spans)
- **Color Mapping**: Custom colormaps for categorical bins (pass/fail/bins)
- **Annotations**: Add text, arrows, circles to highlight defect regions
- **Styling**: Control every element (spines, ticks, labels, legend)

**Why This Matters:** Wafer maps are critical for post-silicon debug. Visual patterns (edge effects, clusters, radial) guide root cause analysis. Publication-quality plots for presentations and reports.

In [None]:
# Simulate wafer die data (300mm wafer, 20x20 grid)
np.random.seed(100)
wafer_size = 20  # Dies per side
n_dies = wafer_size * wafer_size

# Create grid coordinates
die_x, die_y = np.meshgrid(np.arange(wafer_size), np.arange(wafer_size))
die_x = die_x.flatten()
die_y = die_y.flatten()

# Center of wafer
center_x, center_y = wafer_size / 2, wafer_size / 2

# Calculate distance from center (for radial effects)
distances = np.sqrt((die_x - center_x)**2 + (die_y - center_y)**2)
max_distance = np.sqrt(2 * (wafer_size/2)**2)

# Simulate bin categories (higher failure at edges)
# 0 = Pass, 1 = Fail_Vdd, 2 = Fail_Idd, 3 = Fail_Freq
edge_effect = (distances / max_distance) ** 2  # Higher at edges
base_failure_rate = 0.05
failure_prob = base_failure_rate + 0.15 * edge_effect

bins = np.zeros(n_dies, dtype=int)
for i in range(n_dies):
    if np.random.random() < failure_prob[i]:
        bins[i] = np.random.choice([1, 2, 3])  # Random failure type
    # Add cluster defect (contamination at specific location)
    if 12 <= die_x[i] <= 15 and 5 <= die_y[i] <= 8:
        if np.random.random() < 0.6:
            bins[i] = 2  # Idd failures in cluster

# Create dataframe
df_wafer = pd.DataFrame({
    'die_x': die_x,
    'die_y': die_y,
    'bin': bins,
    'vdd': np.random.normal(1.05, 0.01, n_dies) + 0.02 * (bins == 1),  # Vdd higher for Vdd fails
    'idd': np.random.normal(50, 3, n_dies) + 10 * (bins == 2),  # Idd higher for Idd fails
})

print("Wafer Map Data:")
print("=" * 60)
print(f"Total dies: {len(df_wafer)}")
print(f"Pass: {(bins == 0).sum()} ({(bins == 0).sum()/len(bins)*100:.1f}%)")
print(f"Fail_Vdd: {(bins == 1).sum()} ({(bins == 1).sum()/len(bins)*100:.1f}%)")
print(f"Fail_Idd: {(bins == 2).sum()} ({(bins == 2).sum()/len(bins)*100:.1f}%)")
print(f"Fail_Freq: {(bins == 3).sum()} ({(bins == 3).sum()/len(bins)*100:.1f}%)")

# Create sophisticated wafer map visualization
fig = plt.figure(figsize=(16, 12))
gs = GridSpec(2, 2, figure=fig, hspace=0.3, wspace=0.3)

# 1. Bin map (categorical)
ax1 = fig.add_subplot(gs[0, 0])
bin_colors = {0: '#2ecc71', 1: '#e74c3c', 2: '#f39c12', 3: '#9b59b6'}
colors = [bin_colors[b] for b in bins]
scatter1 = ax1.scatter(die_x, die_y, c=colors, s=100, marker='s', edgecolors='black', linewidths=0.5)

# Add cluster annotation
rect = mpatches.Rectangle((12, 5), 3, 3, linewidth=2, edgecolor='red', 
                          facecolor='none', linestyle='--')
ax1.add_patch(rect)
ax1.annotate('Contamination\nCluster', xy=(13.5, 6.5), xytext=(8, 3),
            fontsize=10, color='red', weight='bold',
            arrowprops=dict(arrowstyle='->', color='red', lw=2))

# Custom legend
legend_elements = [
    mpatches.Patch(facecolor='#2ecc71', edgecolor='black', label='Pass'),
    mpatches.Patch(facecolor='#e74c3c', edgecolor='black', label='Fail_Vdd'),
    mpatches.Patch(facecolor='#f39c12', edgecolor='black', label='Fail_Idd'),
    mpatches.Patch(facecolor='#9b59b6', edgecolor='black', label='Fail_Freq')
]
ax1.legend(handles=legend_elements, loc='upper right', framealpha=0.9)
ax1.set_xlabel('Die X', fontsize=12, weight='bold')
ax1.set_ylabel('Die Y', fontsize=12, weight='bold')
ax1.set_title('Wafer Bin Map (300mm, 20√ó20 Grid)', fontsize=14, weight='bold', pad=15)
ax1.grid(alpha=0.2)
ax1.set_aspect('equal')

# 2. Vdd heatmap (continuous)
ax2 = fig.add_subplot(gs[0, 1])
vdd_grid = df_wafer.pivot(index='die_y', columns='die_x', values='vdd')
im2 = ax2.imshow(vdd_grid, cmap='RdYlGn_r', aspect='auto', origin='lower')
cbar2 = plt.colorbar(im2, ax=ax2, label='Vdd (V)')
ax2.set_xlabel('Die X', fontsize=12, weight='bold')
ax2.set_ylabel('Die Y', fontsize=12, weight='bold')
ax2.set_title('Vdd Heatmap (Continuous)', fontsize=14, weight='bold', pad=15)

# 3. Idd heatmap with contours
ax3 = fig.add_subplot(gs[1, 0])
idd_grid = df_wafer.pivot(index='die_y', columns='die_x', values='idd')
im3 = ax3.contourf(idd_grid.values, levels=15, cmap='viridis', origin='lower')
contours = ax3.contour(idd_grid.values, levels=5, colors='white', linewidths=0.8, alpha=0.6)
ax3.clabel(contours, inline=True, fontsize=8, fmt='%.1f')
cbar3 = plt.colorbar(im3, ax=ax3, label='Idd (mA)')
ax3.set_xlabel('Die X', fontsize=12, weight='bold')
ax3.set_ylabel('Die Y', fontsize=12, weight='bold')
ax3.set_title('Idd Contour Map', fontsize=14, weight='bold', pad=15)

# 4. Yield by radial distance (edge effect analysis)
ax4 = fig.add_subplot(gs[1, 1])
distance_bins = np.linspace(0, max_distance, 8)
distance_labels = [f'{d:.1f}' for d in distance_bins[:-1]]
df_wafer['distance_bin'] = pd.cut(distances, bins=distance_bins, labels=distance_labels)
yield_by_distance = df_wafer.groupby('distance_bin')['bin'].apply(lambda x: (x == 0).sum() / len(x) * 100)

bars = ax4.bar(range(len(yield_by_distance)), yield_by_distance.values, 
               color='steelblue', edgecolor='black', linewidth=1.5, alpha=0.8)
ax4.axhline(80, color='red', linestyle='--', linewidth=2, label='Spec (80%)')
ax4.set_xlabel('Distance from Center (dies)', fontsize=12, weight='bold')
ax4.set_ylabel('Yield (%)', fontsize=12, weight='bold')
ax4.set_title('Edge Effect: Yield vs Radial Distance', fontsize=14, weight='bold', pad=15)
ax4.set_xticks(range(len(yield_by_distance)))
ax4.set_xticklabels(yield_by_distance.index, rotation=45)
ax4.legend()
ax4.grid(axis='y', alpha=0.3)
ax4.set_ylim(0, 100)

# Add value labels on bars
for i, bar in enumerate(bars):
    height = bar.get_height()
    ax4.text(bar.get_x() + bar.get_width()/2., height + 2,
            f'{height:.1f}%', ha='center', va='bottom', fontsize=9, weight='bold')

plt.suptitle('Post-Silicon Wafer Map Analysis Dashboard', 
             fontsize=16, weight='bold', y=0.995)

plt.show()

print(f"\nüí° Key Insights:")
print(f"   Edge effect visible: yield drops from {yield_by_distance.iloc[0]:.1f}% (center) to {yield_by_distance.iloc[-1]:.1f}% (edge)")
print(f"   Contamination cluster at (12-15, 5-8) shows concentrated Idd failures")
print(f"   Vdd heatmap shows spatial correlation with bin failures")
print(f"   Matplotlib GridSpec enables complex dashboard layouts")

## 3. Seaborn Statistical Visualizations

**Purpose:** Create statistical plots for distributions, relationships, and categorical data.

**Key Points:**
- **Distribution Plots**: Histograms with KDE, violin plots, box plots
- **Relationship Plots**: Scatter with regression, pairplot, correlation heatmap
- **Categorical Plots**: Bar, count, point plots with error bars
- **FacetGrid**: Small multiples for conditional distributions

**Why This Matters:** Seaborn handles statistical aesthetics automatically (confidence intervals, regression lines). Perfect for exploratory data analysis and parametric correlations.

In [None]:
# Simulate device parametric test data
np.random.seed(200)
n_devices = 500

# Create data with correlations
vdd_data = np.random.normal(1.05, 0.015, n_devices)
idd_data = 48 + 25 * (vdd_data - 1.05) + np.random.normal(0, 2, n_devices)  # Correlated with Vdd
freq_data = 2350 + 150 * (vdd_data - 1.05) + np.random.normal(0, 30, n_devices)
power_data = vdd_data * idd_data + np.random.normal(0, 1, n_devices)
temp_data = np.random.choice(['25C', '85C', '125C'], n_devices)
lot_data = np.random.choice(['Lot_A', 'Lot_B', 'Lot_C'], n_devices)

df_params = pd.DataFrame({
    'vdd': vdd_data,
    'idd': idd_data,
    'freq': freq_data,
    'power': power_data,
    'temperature': temp_data,
    'lot': lot_data,
    'pass_fail': (vdd_data > 1.02) & (vdd_data < 1.08) & (idd_data < 60)
})

print("Parametric Test Data:")
print("=" * 60)
print(df_params.describe())
print(f"\nPass rate: {df_params['pass_fail'].sum() / len(df_params) * 100:.1f}%")

# Create Seaborn statistical visualizations
fig, axes = plt.subplots(2, 3, figsize=(18, 12))

# 1. Distribution with KDE
sns.histplot(data=df_params, x='vdd', hue='pass_fail', kde=True, stat='density',
             palette={True: 'green', False: 'red'}, alpha=0.6, ax=axes[0, 0])
axes[0, 0].axvline(1.02, color='blue', linestyle='--', label='Lower Spec')
axes[0, 0].axvline(1.08, color='blue', linestyle='--', label='Upper Spec')
axes[0, 0].set_xlabel('Vdd (V)', fontsize=11, weight='bold')
axes[0, 0].set_ylabel('Density', fontsize=11, weight='bold')
axes[0, 0].set_title('Vdd Distribution by Pass/Fail', fontsize=12, weight='bold')
axes[0, 0].legend()

# 2. Violin plot (distribution across categories)
sns.violinplot(data=df_params, x='temperature', y='idd', hue='lot', 
               split=False, ax=axes[0, 1], palette='Set2')
axes[0, 1].set_xlabel('Temperature', fontsize=11, weight='bold')
axes[0, 1].set_ylabel('Idd (mA)', fontsize=11, weight='bold')
axes[0, 1].set_title('Idd Distribution by Temperature & Lot', fontsize=12, weight='bold')

# 3. Box plot with individual points
sns.boxplot(data=df_params, x='lot', y='freq', ax=axes[0, 2], palette='pastel')
sns.stripplot(data=df_params, x='lot', y='freq', ax=axes[0, 2], 
              color='black', alpha=0.3, size=3)
axes[0, 2].set_xlabel('Lot', fontsize=11, weight='bold')
axes[0, 2].set_ylabel('Frequency (MHz)', fontsize=11, weight='bold')
axes[0, 2].set_title('Frequency by Lot (with outliers)', fontsize=12, weight='bold')

# 4. Scatter with regression line
sns.regplot(data=df_params, x='vdd', y='idd', ax=axes[1, 0],
            scatter_kws={'alpha': 0.5, 's': 30}, 
            line_kws={'color': 'red', 'linewidth': 2})
axes[1, 0].set_xlabel('Vdd (V)', fontsize=11, weight='bold')
axes[1, 0].set_ylabel('Idd (mA)', fontsize=11, weight='bold')
axes[1, 0].set_title('Vdd vs Idd (with linear fit)', fontsize=12, weight='bold')

# Calculate correlation
corr_vdd_idd = df_params['vdd'].corr(df_params['idd'])
axes[1, 0].text(0.05, 0.95, f'Correlation: {corr_vdd_idd:.3f}',
               transform=axes[1, 0].transAxes, fontsize=10,
               verticalalignment='top', bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

# 5. Correlation heatmap
corr_matrix = df_params[['vdd', 'idd', 'freq', 'power']].corr()
sns.heatmap(corr_matrix, annot=True, fmt='.3f', cmap='coolwarm', 
            center=0, square=True, linewidths=1, ax=axes[1, 1],
            cbar_kws={'label': 'Correlation'})
axes[1, 1].set_title('Parametric Correlation Matrix', fontsize=12, weight='bold')

# 6. Count plot with percentages
count_data = df_params.groupby(['lot', 'pass_fail']).size().reset_index(name='count')
total_by_lot = count_data.groupby('lot')['count'].transform('sum')
count_data['percentage'] = count_data['count'] / total_by_lot * 100

sns.barplot(data=count_data, x='lot', y='percentage', hue='pass_fail',
            palette={True: 'green', False: 'red'}, ax=axes[1, 2])
axes[1, 2].set_xlabel('Lot', fontsize=11, weight='bold')
axes[1, 2].set_ylabel('Percentage (%)', fontsize=11, weight='bold')
axes[1, 2].set_title('Pass/Fail Rate by Lot', fontsize=12, weight='bold')
axes[1, 2].legend(title='Pass/Fail', labels=['Fail', 'Pass'])

# Add percentage labels
for container in axes[1, 2].containers:
    axes[1, 2].bar_label(container, fmt='%.1f%%', fontsize=9)

plt.suptitle('Seaborn Statistical Visualization Suite', fontsize=16, weight='bold', y=0.995)
plt.tight_layout()
plt.show()

print(f"\nüí° Seaborn Advantages:")
print(f"   Automatic statistical aesthetics (confidence intervals, regression)")
print(f"   Beautiful default color palettes (perceptually uniform)")
print(f"   Correlation heatmap shows Vdd-Idd: {corr_vdd_idd:.3f}, Vdd-Freq: {df_params['vdd'].corr(df_params['freq']):.3f}")
print(f"   Violin plots reveal bimodal distributions better than box plots")
print(f"   FacetGrid (not shown) enables conditional plots across categories")

## 4. Plotly Interactive Visualizations

**Purpose:** Create interactive plots with hover info, zoom, pan, and animations.

**Key Points:**
- **Interactivity**: Hover tooltips, click events, zoom/pan, export to PNG
- **3D Plots**: Scatter, surface, mesh for multi-dimensional data
- **Animations**: Time series evolution, parameter sweeps
- **Dashboards**: Subplots with shared axes, dropdowns, sliders

**Why This Matters:** Static plots don't capture all insights. Interactive exploration reveals hidden patterns. Executives love dashboards they can explore. Crucial for presenting complex parametric relationships.

In [None]:
# Create interactive Plotly visualizations
print("Creating interactive Plotly visualizations...")
print("=" * 60)

# 1. Interactive 3D scatter (Vdd, Idd, Freq)
fig_3d = go.Figure(data=[go.Scatter3d(
    x=df_params['vdd'],
    y=df_params['idd'],
    z=df_params['freq'],
    mode='markers',
    marker=dict(
        size=5,
        color=df_params['power'],
        colorscale='Viridis',
        showscale=True,
        colorbar=dict(title="Power (mW)"),
        line=dict(width=0.5, color='white')
    ),
    text=[f"Lot: {lot}<br>Temp: {temp}<br>Vdd: {vdd:.3f}V<br>Idd: {idd:.1f}mA<br>Freq: {freq:.0f}MHz<br>Power: {pwr:.2f}mW"
          for lot, temp, vdd, idd, freq, pwr in zip(df_params['lot'], df_params['temperature'],
                                                      df_params['vdd'], df_params['idd'],
                                                      df_params['freq'], df_params['power'])],
    hovertemplate='%{text}<extra></extra>'
)])

fig_3d.update_layout(
    title='Interactive 3D Parametric Space (Vdd, Idd, Freq)',
    scene=dict(
        xaxis_title='Vdd (V)',
        yaxis_title='Idd (mA)',
        zaxis_title='Frequency (MHz)',
        camera=dict(eye=dict(x=1.5, y=1.5, z=1.3))
    ),
    width=900,
    height=700,
    font=dict(size=11)
)

fig_3d.show()
print("‚úÖ 3D scatter plot created (rotate, zoom, hover for details)")

# 2. Interactive time series with range selector
dates = pd.date_range(start='2024-01-01', periods=365, freq='D')
yield_trend = 75 + 15 / (1 + np.exp(-0.02 * (np.arange(365) - 100)))  # Logistic growth
yield_trend += 2 * np.sin(2 * np.pi * np.arange(365) / 7)  # Weekly seasonality
yield_trend += np.random.normal(0, 1.2, 365)  # Noise

df_trend = pd.DataFrame({
    'date': dates,
    'yield': yield_trend,
    'ucl': yield_trend.mean() + 3 * yield_trend.std(),
    'lcl': yield_trend.mean() - 3 * yield_trend.std(),
    'target': 85
})

fig_ts = go.Figure()

# Add yield line
fig_ts.add_trace(go.Scatter(
    x=df_trend['date'],
    y=df_trend['yield'],
    mode='lines',
    name='Daily Yield',
    line=dict(color='blue', width=2),
    hovertemplate='Date: %{x}<br>Yield: %{y:.2f}%<extra></extra>'
))

# Add control limits
fig_ts.add_trace(go.Scatter(
    x=df_trend['date'],
    y=df_trend['ucl'],
    mode='lines',
    name='UCL (+3œÉ)',
    line=dict(color='red', width=1, dash='dash'),
    hoverinfo='skip'
))

fig_ts.add_trace(go.Scatter(
    x=df_trend['date'],
    y=df_trend['lcl'],
    mode='lines',
    name='LCL (-3œÉ)',
    line=dict(color='red', width=1, dash='dash'),
    fill='tonexty',
    fillcolor='rgba(255,0,0,0.1)',
    hoverinfo='skip'
))

# Add target line
fig_ts.add_trace(go.Scatter(
    x=df_trend['date'],
    y=df_trend['target'],
    mode='lines',
    name='Target (85%)',
    line=dict(color='green', width=2, dash='dot'),
    hoverinfo='skip'
))

fig_ts.update_layout(
    title='Interactive Yield Trend Dashboard (2024)',
    xaxis_title='Date',
    yaxis_title='Yield (%)',
    hovermode='x unified',
    width=1000,
    height=500,
    xaxis=dict(
        rangeselector=dict(
            buttons=list([
                dict(count=1, label="1m", step="month", stepmode="backward"),
                dict(count=3, label="3m", step="month", stepmode="backward"),
                dict(count=6, label="6m", step="month", stepmode="backward"),
                dict(step="all", label="All")
            ])
        ),
        rangeslider=dict(visible=True),
        type="date"
    )
)

fig_ts.show()
print("‚úÖ Time series with range selector created (select time range, zoom)")

# 3. Interactive correlation heatmap
fig_heatmap = go.Figure(data=go.Heatmap(
    z=corr_matrix.values,
    x=corr_matrix.columns,
    y=corr_matrix.index,
    colorscale='RdBu',
    zmid=0,
    text=corr_matrix.values,
    texttemplate='%{text:.3f}',
    textfont=dict(size=12),
    hovertemplate='%{x} vs %{y}<br>Correlation: %{z:.3f}<extra></extra>'
))

fig_heatmap.update_layout(
    title='Interactive Parametric Correlation Heatmap',
    width=600,
    height=550,
    xaxis_title='Parameter',
    yaxis_title='Parameter'
)

fig_heatmap.show()
print("‚úÖ Interactive heatmap created (hover for exact values)")

print(f"\nüí° Plotly Advantages:")
print(f"   Interactive: Zoom, pan, hover for details, export to PNG")
print(f"   3D visualization: Explore multi-parameter relationships")
print(f"   Range selector: Focus on specific time periods")
print(f"   Professional dashboards without web development")
print(f"   Integrates with Dash/Streamlit for web apps")

## üöÄ Real-World Project Templates

Build production visualization systems:

### 1Ô∏è‚É£ **Post-Silicon Wafer Map Analyzer**
- **Objective**: Interactive wafer map dashboard for 500 wafers/day with pattern detection  
- **Data**: Spatial die data (50K dies/wafer), parametric values, bin classifications  
- **Success Metric**: Reduce debug time 40%, detect 95% of spatial defect patterns  
- **Visualizations**: Heatmaps (bins, parameters), cluster annotations, statistical overlays  
- **Tech Stack**: Python, Plotly Dash, PostgreSQL, wafer pattern ML (clustering)

### 2Ô∏è‚É£ **Real-Time Manufacturing Dashboard**
- **Objective**: Live production monitoring for executives (updated every 5 minutes)  
- **Data**: Yield, test time, defect density streaming from 20 test cells  
- **Success Metric**: <2 sec page load, 99.9% uptime, alert on anomalies within 10 min  
- **Visualizations**: KPI cards, time series trends, Pareto charts, control charts  
- **Tech Stack**: Streamlit/Dash, Kafka streaming, InfluxDB, Docker deployment

### 3Ô∏è‚É£ **Customer Analytics Dashboard (E-Commerce)**
- **Objective**: Product manager dashboard for user behavior insights  
- **Data**: 10M events/day (clicks, purchases, cart adds), 50K SKUs  
- **Success Metric**: Enable self-service analytics, reduce analyst tickets 60%  
- **Visualizations**: Funnel charts, cohort retention, geographic heatmaps, conversion trends  
- **Tech Stack**: Looker/Tableau/PowerBI, BigQuery, dbt, reverse ETL

### 4Ô∏è‚É£ **Financial Trading Dashboard**
- **Objective**: Real-time portfolio monitoring for 100 traders  
- **Data**: Tick-level market data (1M ticks/sec), portfolio positions, P&L  
- **Success Metric**: <50ms chart updates, display 20 symbols simultaneously  
- **Visualizations**: Candlestick charts, volume profiles, order book depth, correlation matrices  
- **Tech Stack**: React + D3.js, WebSockets, Redis, TimescaleDB

### 5Ô∏è‚É£ **Healthcare Patient Dashboard**
- **Objective**: ICU monitoring dashboard for 50-bed unit  
- **Data**: Vitals (HR, BP, SpO2) every 1 min, lab results, medications  
- **Success Metric**: Alert on deterioration 2 hours earlier, reduce alarm fatigue 30%  
- **Visualizations**: Multi-patient grid, vital sign trends, early warning scores, alert timelines  
- **Tech Stack**: Python, Plotly Dash, FHIR integration, ML anomaly detection

### 6Ô∏è‚É£ **Marketing Campaign Performance Tracker**
- **Objective**: CMO dashboard for $10M ad spend across 50 campaigns  
- **Data**: Daily spend, impressions, clicks, conversions by channel/campaign  
- **Success Metric**: Identify underperforming campaigns within 24 hours, ROI visibility  
- **Visualizations**: Waterfall charts (spend ‚Üí conversions), time series by channel, budget burn  
- **Tech Stack**: Google Data Studio, BigQuery, GA4 integration, automated reporting

### 7Ô∏è‚É£ **Supply Chain Visibility Platform**
- **Objective**: End-to-end shipment tracking for 10K shipments/month  
- **Data**: GPS coordinates, ETA, delays, inventory levels, supplier performance  
- **Success Metric**: Predict delays 48 hours in advance, reduce inventory 20%  
- **Visualizations**: Map visualizations (routes, hotspots), Gantt charts, network graphs  
- **Tech Stack**: Plotly + Mapbox, IoT sensors, PostgreSQL/PostGIS, ML forecasting

### 8Ô∏è‚É£ **Scientific Publication Visualizations**
- **Objective**: Publication-quality figures for Nature/Science papers  
- **Data**: Experimental results (biology, physics, ML benchmarks)  
- **Success Metric**: Meet journal guidelines, reproducible with scripts  
- **Visualizations**: Multi-panel figures, error bars, statistical annotations, color-blind friendly  
- **Tech Stack**: Matplotlib + seaborn, LaTeX integration, SVG export, version control

## üéØ Key Takeaways

### What is Data Visualization?

Transforming data into graphical representations that communicate insights faster and more effectively than raw numbers. Balance between **accuracy** (correct data encoding) and **aesthetics** (engaging design).

### Visualization Principles

**Visual Encoding Hierarchy** (most to least effective):
1. **Position** (scatter plots, bar charts) - most accurate perception
2. **Length** (bar charts better than pie charts)
3. **Angle** (pie slices - harder to compare)
4. **Area** (bubble charts - non-linear perception)
5. **Color** (heatmaps - categorical vs continuous)
6. **Texture/Shape** (least effective, use sparingly)

**Data-Ink Ratio** (Tufte's principle):
- Formula: $\text{Data-Ink Ratio} = \frac{\text{Ink used for data}}{\text{Total ink}}$
- **Maximize**: Remove chartjunk (3D effects, excessive gridlines, decorations)
- **Example**: Replace 3D pie chart with simple bar chart (higher ratio)

**Preattentive Processing:**
- Brain detects differences in <200ms without conscious effort
- **Works**: Color, size, orientation, motion
- **Doesn't Work**: Length ratios, complex patterns
- **Use**: Highlight anomalies with color (not just text labels)

### Chart Type Selection

| **Data Type** | **Relationship** | **Best Chart** | **Avoid** |
|--------------|------------------|----------------|-----------|
| **Continuous** | Distribution | Histogram, Violin, Box | Pie chart |
| **Continuous** | Time series | Line, Area | Bar chart (discrete) |
| **Continuous** | Correlation | Scatter, Heatmap | Separate plots |
| **Categorical** | Comparison | Bar (horizontal if labels long) | Pie (>5 categories) |
| **Categorical** | Proportion | Stacked bar, Treemap | Multiple pies |
| **Spatial** | Geographic | Choropleth, Point map | Table |
| **Hierarchical** | Part-to-whole | Treemap, Sunburst | Nested pies |
| **Flow** | Process | Sankey, Alluvial | Flowchart boxes |

### Matplotlib Mastery

**Figure Hierarchy:**
```
Figure (container)
 ‚îî‚îÄ Axes (subplot, the actual plot)
     ‚îú‚îÄ Title, labels, legend
     ‚îú‚îÄ Spines (borders)
     ‚îú‚îÄ Ticks, tick labels
     ‚îî‚îÄ Artists (lines, patches, text)
```

**GridSpec Layouts:**
```python
gs = GridSpec(3, 3, figure=fig)
ax1 = fig.add_subplot(gs[0, :])     # Top row, all columns
ax2 = fig.add_subplot(gs[1:, 0])    # Bottom 2 rows, left column
ax3 = fig.add_subplot(gs[1:, 1:])   # Bottom 2 rows, right 2 columns
```

**Styling Best Practices:**
- **Colors**: Use perceptually uniform palettes (Viridis, Plasma, not Rainbow)
- **Fonts**: Sans-serif for screen (Arial, Helvetica), Serif for print (Times)
- **Line Width**: 1.5-2 for main data, 0.5-1 for gridlines
- **Alpha**: 0.3-0.5 for overlapping points, 0.8-1.0 for solid
- **DPI**: 100 for screen, 300+ for publication

### Seaborn Statistical Aesthetics

**Automatic Features:**
- **Confidence Intervals**: `regplot()` shows 95% CI by default
- **Statistical Aggregation**: `barplot()` shows mean + error bars
- **Categorical Ordering**: `catplot()` sorts by frequency or value
- **Color Palettes**: Context-aware (categorical, sequential, diverging)

**Key Functions:**
- **Distribution**: `histplot()`, `kdeplot()`, `violinplot()`
- **Relationship**: `scatterplot()`, `lineplot()`, `regplot()`
- **Categorical**: `barplot()`, `boxplot()`, `pointplot()`
- **Matrix**: `heatmap()`, `clustermap()`
- **Multi-plot**: `pairplot()`, `FacetGrid()`

### Plotly Interactivity

**Interactive Features:**
- **Hover**: Tooltips with custom HTML (`hovertemplate`)
- **Zoom/Pan**: Box select, lasso select for filtering
- **Click Events**: Trigger Python callbacks
- **Export**: Save as PNG, SVG (publication-ready)
- **Animations**: Frame-by-frame progression (time series evolution)

**Dashboard Components:**
- **Range Selector**: Quick time range selection (1m, 3m, 6m, YTD)
- **Range Slider**: Fine-grained zoom control
- **Dropdowns**: Switch between metrics/datasets
- **Sliders**: Vary parameters dynamically
- **Subplots**: Multiple charts with shared axes

**3D Visualizations:**
- **When to Use**: 3+ dimensions, spatial data, surface plots
- **When to Avoid**: 2D data (rotation hides points), presentation (hard to print)
- **Best Practice**: Provide 2D projections alongside 3D

### Color Theory

**Color Spaces:**
- **RGB**: Red, Green, Blue (additive, screens)
- **CMYK**: Cyan, Magenta, Yellow, Black (subtractive, print)
- **HSL**: Hue, Saturation, Lightness (intuitive for designers)

**Palette Types:**
- **Sequential**: One color gradient (light ‚Üí dark) for continuous data
- **Diverging**: Two colors meeting at neutral (red ‚Üê white ‚Üí blue) for + vs -
- **Categorical**: Distinct colors for categories (max 8-10 distinguishable)

**Accessibility:**
- **Colorblind-Friendly**: Avoid red-green combinations (8% of males affected)
- **Recommended**: Viridis, Plasma, Cividis (designed for colorblind)
- **Test**: Use colorblind simulator (Coblis, Color Oracle)
- **Backup**: Use patterns/shapes in addition to color

### Post-Silicon Applications

**Wafer Maps:**
- **Bin Maps**: Categorical colors (pass/fail/bins), annotate defect clusters
- **Parametric Maps**: Heatmaps for Vdd/Idd/freq, identify spatial correlations
- **Edge Effect**: Radial analysis (yield vs distance from center)
- **Pattern Detection**: Visual clustering guides ML feature engineering

**Parametric Trends:**
- **Control Charts**: ¬±3œÉ limits, CUSUM for drift detection
- **Multi-Parameter**: Pairplot to find correlated failures
- **Time Series**: Seasonal decomposition, anomaly highlighting
- **Correlation Heatmap**: Identify redundant tests

**Yield Analysis:**
- **Funnel Charts**: Stage-by-stage loss (wafer fab ‚Üí sort ‚Üí final test)
- **Pareto Charts**: 80/20 rule (top failure modes drive 80% of losses)
- **Sankey Diagrams**: Flow of devices through test stages

### Common Pitfalls

- ‚ùå **3D Pie Charts**: Never use (impossible to read accurately)
- ‚ùå **Dual Y-Axes**: Misleading (scales arbitrary), use separate plots
- ‚ùå **Rainbow Colormap**: Not perceptually uniform, creates false boundaries
- ‚ùå **Truncated Y-Axis**: Exaggerates differences (OK if labeled clearly)
- ‚ùå **Too Many Colors**: >8 categories ‚Üí group into "Other"
- ‚ùå **Default Excel Colors**: Harsh, not colorblind-friendly
- ‚ùå **Chart Clutter**: Every gridline, border, shadow adds noise
- ‚ùå **Missing Context**: Always label axes, units, sample size

### Performance Optimization

**Large Datasets (>100K points):**
- **Matplotlib**: Use `rasterized=True` for scatter plots (vector ‚Üí raster)
- **Plotly**: Use `scattergl()` instead of `scatter()` (WebGL acceleration)
- **Aggregation**: Bin data before plotting (hexbin, 2D histogram)
- **Sampling**: Show representative subset (statistical sampling)
- **Datashader**: Library specifically for billions of points

**Web Dashboards:**
- **Lazy Loading**: Load charts on scroll/interaction
- **Caching**: Cache expensive computations (Redis, backend)
- **Server-Side Rendering**: Generate images on backend (Kaleido for Plotly)
- **Responsive**: Mobile-friendly layouts (CSS grid, flexbox)

### Tool Ecosystem

**Python Libraries:**
- **Matplotlib**: Low-level control, publication-quality, customization
- **Seaborn**: Statistical plots, beautiful defaults, less code
- **Plotly**: Interactive, 3D, dashboards, web-ready
- **Bokeh**: Interactive, large datasets, server-side processing
- **Altair**: Declarative (Vega-Lite), grammar of graphics
- **Holoviews**: Multi-backend (Matplotlib, Bokeh, Plotly)

**Dashboard Frameworks:**
- **Streamlit**: Fastest prototyping, Python-only, limited customization
- **Dash (Plotly)**: Full control, React-based, production-ready
- **Panel (HoloViz)**: Multi-backend, Jupyter integration
- **Voila**: Jupyter notebooks as dashboards

**BI Tools:**
- **Tableau**: Drag-drop, enterprise, expensive
- **PowerBI**: Microsoft stack, affordable
- **Looker**: SQL-based, governed data model
- **Metabase**: Open-source, self-serve analytics

### Best Practices

1. **Know Your Audience**: Technical (detailed, axes) vs Executive (simplified, KPIs)
2. **Start Simple**: Bar chart beats complex 3D visualization 90% of time
3. **Iterate**: Initial ‚Üí refine labels ‚Üí add context ‚Üí remove clutter
4. **Accessibility**: Colorblind-friendly, screen reader compatible, high contrast
5. **Reproducible**: Save code + data, version control visualizations
6. **Context**: Always include sample size, date range, units
7. **Annotations**: Highlight key insights (don't make viewer hunt)
8. **Consistency**: Same colors/fonts across dashboard

### Next Steps
- **Notebook 117**: Streamlit App Development (build interactive ML apps)
- **Notebook 120**: Advanced Dashboard Design (real-time Dash apps)
- **Advanced**: D3.js (custom web visualizations), Grafana (monitoring dashboards)

---

**Remember**: *"The greatest value of a picture is when it forces us to notice what we never expected to see."* - John Tukey üìä

## üéØ Key Takeaways

### When to Use Advanced Data Visualization
- **Exploratory analysis**: Discover patterns in high-dimensional data (wafer maps, parametric correlations)
- **Stakeholder communication**: Present insights to non-technical audiences (executive yield dashboards)
- **Debugging models**: Visualize residuals, feature importance, decision boundaries
- **Real-time monitoring**: Live dashboards for production metrics (test floor throughput, equipment health)
- **Spatial data**: Geographic/wafer position visualization (defect clustering, regional yield variations)

### Limitations
- **Information overload**: Too many charts confuse instead of clarify (5-7 charts max per page)
- **Misleading visuals**: Poor axis scaling, truncated axes, cherry-picked data distorts reality
- **Interactivity cost**: Complex dashboards slow load times (>3s = users abandon)
- **Accessibility**: Color blindness affects 8% of males (use colorblind-safe palettes)

### Alternatives
- **Static tables**: For precise numbers, small datasets (<20 rows)
- **Summary statistics**: Mean, median, SD when full distribution not needed
- **Textual reports**: For detailed narratives, qualitative insights
- **Simple charts**: Bar/line charts often sufficient (avoid complexity for complexity's sake)

### Best Practices
- **Choose right chart type**: Line (time-series), bar (categorical comparison), scatter (correlation), heatmap (2D patterns)
- **Maximize data-ink ratio**: Remove chartjunk (unnecessary gridlines, 3D effects, decorations)
- **Clear labels**: Axis titles, units, legends without acronyms (spell out "Parts Per Million" not "PPM")
- **Colorblind-safe palettes**: Viridis, plasma, colorbrewer2 (avoid red-green)
- **Interactive with purpose**: Tooltips for detail, filters for exploration, not decoration
- **Tell a story**: Title states conclusion ("Test Time Reduced 30%"), not just topic ("Test Time Analysis")

## üìä Diagnostic Checks Summary

### Implementation Checklist
‚úÖ **Matplotlib/Seaborn Fundamentals**
- Figure/axes architecture: `fig, ax = plt.subplots()` for control
- Styling: `sns.set_theme()`, custom rcParams for publication quality
- Color palettes: Viridis, plasma for continuous; Set2, Paired for categorical
- Annotations: `ax.annotate()` for key insights, threshold lines

‚úÖ **Interactive Dashboards (Plotly/Dash)**
- Plotly Express: Quick interactive charts with hover, zoom, pan
- Dash layouts: Multi-page apps, callbacks for user interactions
- Performance: Virtualization for >10K points, client-side callbacks
- Deployment: Heroku, AWS Elastic Beanstalk for production dashboards

‚úÖ **Wafer Map Visualization**
- Heatmaps: `sns.heatmap()` for die-level yield patterns
- Scatter plots: X/Y coordinates colored by test result/bin
- Contour plots: Spatial interpolation for continuous parameters (temperature gradients)
- Annotations: Highlight defect clusters, quadrant statistics

‚úÖ **Time-Series Dashboards**
- Line charts: Yield trends over time with confidence bands
- Faceted plots: Multi-site comparison (small multiples by fab)
- Annotations: Process change markers, excursion events
- Real-time updates: Streaming data with Plotly/Dash

### Quality Metrics
- **Load time**: <2s for dashboards with <1000 data points
- **Readability**: Clear titles, axis labels, legends without jargon
- **Accessibility**: WCAG AA contrast ratios (4.5:1), colorblind-safe palettes
- **Actionability**: Every chart answers specific question or supports decision

### Post-Silicon Validation Applications
**1. Wafer Map Defect Clustering Visualization**
- Visual: Heatmap of die-level yield (X/Y coordinates, color = pass/fail)
- Insights: Identify systematic patterns (edge effects, radial gradients, cluster defects)
- Interactive: Click die to see parametric test results, filter by lot/wafer
- Business value: Faster root cause analysis (visual pattern recognition 5-10x faster than tables)

**2. Parametric Test Correlation Matrix**
- Visual: Correlation heatmap for 50+ test parameters (Vdd, Idd, frequency, power)
- Insights: Identify multicollinearity, redundant tests, causal relationships
- Interactive: Hover for exact correlation coefficient, click to see scatter plot
- Business value: Test program optimization (eliminate redundant tests, reduce test time 15-25%)

**3. Real-Time Yield Dashboard**
- Visual: Line charts (yield% over time), bar charts (by product/site), KPI cards (current yield, target, variance)
- Insights: Detect yield excursions in real-time (<15min lag), compare sites/products
- Interactive: Date range selector, product/site filters, drill-down to lot-level details
- Business value: Faster response to excursions (2-4hr vs. daily batch reports), $5M-$15M/year reduced scrap

### Business ROI Estimation

**Scenario 1: Medium-Volume Semiconductor Fab (100K wafers/year)**
- Interactive yield dashboards: Faster decision-making (4hr ‚Üí 30min response) = **$3M/year** reduced excursion impact
- Wafer map visualization: 50% faster root cause (visual patterns vs. statistical analysis) = **$2.5M/year** engineering time savings
- Parametric correlation analysis: Identify redundant tests, reduce test time 20% = **$3M/year**
- **Total ROI: $8.5M/year** (cost: $100K Plotly Dash deployment + $50K training = $8.35M net)

**Scenario 2: High-Volume Automotive Semiconductor (500K wafers/year)**
- Executive dashboards: Real-time visibility for VPs/directors = **$5M/year** strategic decision acceleration
- Multi-site comparison dashboards: Identify best-practice fabs = **$12M/year** yield gap closure
- Customer-facing quality dashboards: Transparent PPM reporting = **$8M/year** customer satisfaction (reduced audits)
- **Total ROI: $25M/year** (cost: $500K enterprise dashboard platform + $200K team = $24.3M net)

**Scenario 3: Advanced Node R&D Fab (<10K wafers/year)**
- Experimental data visualization: Interactive parameter sweeps = **$4M/year** faster learning
- Design-test correlation plots: Link design choices to yield outcomes = **$6M/year** design optimization
- Publication-quality plots: Faster paper/presentation creation = **$1M/year** researcher productivity
- **Total ROI: $11M/year** (cost: $150K visualization tools + $100K training = $10.75M net)

## üìà Progress Update

**Notebook 116: Data Visualization Mastery** expanded from 11 ‚Üí 15 cells ‚úÖ

**Session summary: 9 notebooks completed**
- 12-cell: 129, 133, 162, 163, 164
- 11-cell: 111, 112, 116, 130

Completion rate: ~72% (126/175 notebooks)

---

## üéì Mastery Achievement

**You now have production-grade expertise in:**
- ‚úÖ Creating publication-quality static visualizations with Matplotlib and Seaborn
- ‚úÖ Building interactive dashboards with Plotly and Dash for real-time monitoring
- ‚úÖ Visualizing wafer maps and spatial semiconductor data (die-level heatmaps, defect clustering)
- ‚úÖ Designing colorblind-accessible, high data-ink ratio charts following best practices
- ‚úÖ Deploying production dashboards for yield monitoring, parametric analysis, and executive reporting

**Next Steps:**
- **Advanced Interactivity**: Dash callbacks for complex multi-page apps, real-time streaming data
- **3D Visualization**: Volume rendering for multi-layer semiconductor defect analysis
- **Geospatial Analysis**: Folium, GeoPandas for fab location analysis, supply chain mapping