# 1.1: Python for Engineers with Wafer Map Data Examples

## 🎯 Learning Objectives

By the end of this notebook, you will be able to:
- Master Python fundamentals in the context of semiconductor manufacturing
- Work with wafer map data using pandas and numpy
- Create professional visualizations for semiconductor data
- Build reusable functions for wafer analysis

## 📊 Semiconductor Context

In semiconductor manufacturing, we work with various types of data:
- **Wafer Maps**: Visual representations of die locations and their pass/fail status
- **Process Parameters**: Temperature, pressure, time, flow rates
- **Metrology Data**: Critical dimensions, overlay, thickness measurements
- **Yield Data**: Good die count, yield percentage, defect density

This notebook focuses on wafer map analysis using Python.

## 🔧 Environment Setup

In [None]:
# Import essential libraries for semiconductor data analysis
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings

# Configure plotting
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

print("✅ Environment setup complete!")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

## 📈 Python Fundamentals for Engineers

### Variables and Data Types

In [None]:
# Semiconductor process parameters (common data types)
wafer_size = 300  # mm (integer)
temperature = 850.5  # Celsius (float)
process_name = "FEOL_Lithography"  # (string)
is_production = True  # (boolean)
recipe_params = [1200, 850, 45.2, 0.95]  # (list)

print(f"Wafer Size: {wafer_size} mm (type: {type(wafer_size)})")
print(f"Temperature: {temperature}°C (type: {type(temperature)})")
print(f"Process: {process_name} (type: {type(process_name)})")
print(f"Production: {is_production} (type: {type(is_production)})")
print(f"Recipe Parameters: {recipe_params} (type: {type(recipe_params)})")

### Working with Lists and Dictionaries

In [None]:
# Process step data structure
process_steps = [
    "Oxidation",
    "Photolithography", 
    "Etching",
    "Ion Implantation",
    "Metallization"
]

# Process parameters dictionary
process_params = {
    "temperature": 850.5,
    "pressure": 0.75,
    "time": 120,
    "gas_flow": 250,
    "rf_power": 1500
}

print("Process Steps:")
for i, step in enumerate(process_steps, 1):
    print(f"  {i}. {step}")

print("\nProcess Parameters:")
for param, value in process_params.items():
    print(f"  {param}: {value}")

### Functions for Reusable Code

In [None]:
def calculate_yield(good_die, total_die):
    """
    Calculate wafer yield percentage.
    
    Parameters:
    good_die (int): Number of good/passing die
    total_die (int): Total number of die on wafer
    
    Returns:
    float: Yield percentage
    """
    if total_die == 0:
        return 0.0
    
    yield_pct = (good_die / total_die) * 100
    return round(yield_pct, 2)

def calculate_defect_density(defects, area_cm2):
    """
    Calculate defect density (defects per cm²).
    
    Parameters:
    defects (int): Number of defects
    area_cm2 (float): Area in cm²
    
    Returns:
    float: Defect density
    """
    if area_cm2 == 0:
        return 0.0
    
    density = defects / area_cm2
    return round(density, 4)

# Test the functions
wafer_yield = calculate_yield(850, 1000)
defect_dens = calculate_defect_density(25, 707.1)  # 300mm wafer area

print(f"Wafer Yield: {wafer_yield}%")
print(f"Defect Density: {defect_dens} defects/cm²")

## 🔢 NumPy for Numerical Analysis

### Creating and Manipulating Arrays

In [None]:
# Create synthetic wafer map data (2D array)
wafer_size = 50  # 50x50 die grid
wafer_map = np.random.choice([0, 1], size=(wafer_size, wafer_size), p=[0.15, 0.85])
# 0 = fail, 1 = pass, with 85% yield

print(f"Wafer map shape: {wafer_map.shape}")
print(f"Total die: {wafer_map.size}")
print(f"Good die: {np.sum(wafer_map)}")
print(f"Failed die: {np.sum(wafer_map == 0)}")
print(f"Yield: {np.mean(wafer_map) * 100:.2f}%")

In [None]:
# Process parameter arrays
temperatures = np.array([848.2, 850.1, 851.8, 849.5, 850.7, 852.1])
pressures = np.array([0.74, 0.75, 0.76, 0.75, 0.74, 0.77])
yields = np.array([84.2, 86.1, 87.5, 85.8, 84.9, 88.2])

# Statistical analysis
print("Temperature Statistics:")
print(f"  Mean: {np.mean(temperatures):.2f}°C")
print(f"  Std Dev: {np.std(temperatures):.2f}°C")
print(f"  Range: {np.ptp(temperatures):.2f}°C")

print("\nYield Statistics:")
print(f"  Mean: {np.mean(yields):.2f}%")
print(f"  Median: {np.median(yields):.2f}%")
print(f"  Min: {np.min(yields):.2f}%")
print(f"  Max: {np.max(yields):.2f}%")

### Array Operations and Broadcasting

In [None]:
# Calculate temperature deviation from target
target_temp = 850.0
temp_deviation = temperatures - target_temp

# Calculate process capability (simplified)
spec_limit = 2.0  # ±2°C tolerance
within_spec = np.abs(temp_deviation) <= spec_limit

print("Temperature Deviations from Target:")
for i, (temp, dev, spec) in enumerate(zip(temperatures, temp_deviation, within_spec)):
    status = "✅ PASS" if spec else "❌ FAIL"
    print(f"  Run {i+1}: {temp:.1f}°C (Δ{dev:+.1f}°C) {status}")

print(f"\nProcess Capability: {np.sum(within_spec)}/{len(within_spec)} runs within spec")

## 📊 Pandas for Structured Data

### Creating and Working with DataFrames

In [None]:
# Create a semiconductor process data DataFrame
process_data = pd.DataFrame({
    'lot_id': ['LOT001', 'LOT002', 'LOT003', 'LOT004', 'LOT005', 'LOT006'],
    'wafer_id': ['W01', 'W02', 'W03', 'W04', 'W05', 'W06'],
    'temperature': temperatures,
    'pressure': pressures,
    'yield_pct': yields,
    'process_step': ['LITHO', 'ETCH', 'LITHO', 'ETCH', 'LITHO', 'ETCH'],
    'tool_id': ['T001', 'T002', 'T001', 'T002', 'T003', 'T002']
})

print("Process Data Summary:")
print(process_data)
print(f"\nDataFrame shape: {process_data.shape}")
print(f"Data types:\n{process_data.dtypes}")

### Data Analysis and Filtering

In [None]:
# Basic statistics
print("Numerical Data Statistics:")
print(process_data.describe())

# Filtering data
high_yield = process_data[process_data['yield_pct'] > 86]
litho_steps = process_data[process_data['process_step'] == 'LITHO']

print(f"\nHigh Yield Wafers (>86%):")
print(high_yield[['wafer_id', 'yield_pct', 'temperature']])

print(f"\nLithography Steps:")
print(litho_steps[['wafer_id', 'yield_pct', 'tool_id']])

### Grouping and Aggregation

In [None]:
# Group by process step
step_summary = process_data.groupby('process_step').agg({
    'yield_pct': ['mean', 'std', 'min', 'max'],
    'temperature': ['mean', 'std'],
    'wafer_id': 'count'
})

print("Summary by Process Step:")
print(step_summary)

# Group by tool
tool_summary = process_data.groupby('tool_id')['yield_pct'].agg(['mean', 'count'])
print("\nSummary by Tool:")
print(tool_summary)

## 📈 Data Visualization

### Wafer Map Visualization

In [None]:
# Create wafer map plot
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Wafer map heatmap
im1 = ax1.imshow(wafer_map, cmap='RdYlGn', aspect='equal')
ax1.set_title('Wafer Map (Green=Pass, Red=Fail)', fontsize=12, fontweight='bold')
ax1.set_xlabel('Die X Position')
ax1.set_ylabel('Die Y Position')
plt.colorbar(im1, ax=ax1, label='Pass/Fail')

# Yield by radial position
center = wafer_size // 2
y, x = np.ogrid[:wafer_size, :wafer_size]
radial_distance = np.sqrt((x - center)**2 + (y - center)**2)

# Bin by radial zones
max_radius = center
zones = np.linspace(0, max_radius, 6)
zone_yields = []

for i in range(len(zones)-1):
    mask = (radial_distance >= zones[i]) & (radial_distance < zones[i+1])
    if np.any(mask):
        zone_yield = np.mean(wafer_map[mask]) * 100
        zone_yields.append(zone_yield)
    else:
        zone_yields.append(0)

zone_centers = [(zones[i] + zones[i+1])/2 for i in range(len(zones)-1)]
ax2.plot(zone_centers, zone_yields, 'bo-', linewidth=2, markersize=8)
ax2.set_title('Yield vs Radial Position', fontsize=12, fontweight='bold')
ax2.set_xlabel('Radial Distance from Center')
ax2.set_ylabel('Yield (%)')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### Process Parameter Visualization

In [None]:
# Create process control charts
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))

# Temperature control chart
ax1.plot(range(1, len(temperatures)+1), temperatures, 'bo-', linewidth=2)
ax1.axhline(y=850, color='g', linestyle='--', label='Target')
ax1.axhline(y=852, color='r', linestyle='--', label='USL')
ax1.axhline(y=848, color='r', linestyle='--', label='LSL')
ax1.set_title('Temperature Control Chart', fontweight='bold')
ax1.set_xlabel('Run Number')
ax1.set_ylabel('Temperature (°C)')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Yield vs Temperature scatter
ax2.scatter(temperatures, yields, c='blue', s=100, alpha=0.7)
ax2.set_title('Yield vs Temperature', fontweight='bold')
ax2.set_xlabel('Temperature (°C)')
ax2.set_ylabel('Yield (%)')
ax2.grid(True, alpha=0.3)

# Yield distribution
ax3.hist(yields, bins=5, alpha=0.7, color='skyblue', edgecolor='black')
ax3.set_title('Yield Distribution', fontweight='bold')
ax3.set_xlabel('Yield (%)')
ax3.set_ylabel('Frequency')
ax3.grid(True, alpha=0.3)

# Box plot by process step
process_data.boxplot(column='yield_pct', by='process_step', ax=ax4)
ax4.set_title('Yield by Process Step', fontweight='bold')
ax4.set_xlabel('Process Step')
ax4.set_ylabel('Yield (%)')

plt.tight_layout()
plt.show()

## 🔍 Advanced Wafer Analysis Functions

In [None]:
def analyze_wafer_patterns(wafer_map):
    """
    Analyze patterns in wafer map data.
    
    Parameters:
    wafer_map (numpy.ndarray): 2D array of pass/fail data
    
    Returns:
    dict: Analysis results
    """
    results = {}
    
    # Basic statistics
    total_die = wafer_map.size
    good_die = np.sum(wafer_map)
    results['total_die'] = total_die
    results['good_die'] = good_die
    results['yield_pct'] = (good_die / total_die) * 100
    
    # Edge vs center analysis
    center = wafer_map.shape[0] // 2
    edge_width = 5
    
    # Center region
    center_region = wafer_map[center-10:center+10, center-10:center+10]
    results['center_yield'] = np.mean(center_region) * 100
    
    # Edge region
    edge_mask = np.zeros_like(wafer_map, dtype=bool)
    edge_mask[:edge_width, :] = True
    edge_mask[-edge_width:, :] = True
    edge_mask[:, :edge_width] = True
    edge_mask[:, -edge_width:] = True
    
    results['edge_yield'] = np.mean(wafer_map[edge_mask]) * 100
    
    # Quadrant analysis
    h, w = wafer_map.shape
    q1 = wafer_map[:h//2, :w//2]  # Top-left
    q2 = wafer_map[:h//2, w//2:]  # Top-right
    q3 = wafer_map[h//2:, :w//2]  # Bottom-left
    q4 = wafer_map[h//2:, w//2:]  # Bottom-right
    
    results['quadrant_yields'] = {
        'Q1': np.mean(q1) * 100,
        'Q2': np.mean(q2) * 100,
        'Q3': np.mean(q3) * 100,
        'Q4': np.mean(q4) * 100
    }
    
    return results

# Analyze our wafer map
analysis = analyze_wafer_patterns(wafer_map)

print("🔍 Wafer Analysis Results:")
print(f"Overall Yield: {analysis['yield_pct']:.2f}%")
print(f"Center Yield: {analysis['center_yield']:.2f}%")
print(f"Edge Yield: {analysis['edge_yield']:.2f}%")
print("\nQuadrant Yields:")
for quad, yield_val in analysis['quadrant_yields'].items():
    print(f"  {quad}: {yield_val:.2f}%")

## 💡 Key Takeaways

### Python Fundamentals for Semiconductor Engineering:
1. **Variables and Data Types**: Essential for storing process parameters
2. **Functions**: Create reusable analysis tools
3. **Data Structures**: Lists and dictionaries for organizing data

### NumPy for Numerical Computing:
1. **Arrays**: Efficient storage and manipulation of numerical data
2. **Mathematical Operations**: Statistical analysis and calculations
3. **Broadcasting**: Vectorized operations for performance

### Pandas for Data Analysis:
1. **DataFrames**: Structured data with labels and metadata
2. **Filtering and Grouping**: Extract insights from process data
3. **Aggregation**: Summary statistics by categories

### Visualization Best Practices:
1. **Wafer Maps**: Use heatmaps for spatial data
2. **Control Charts**: Monitor process parameters over time
3. **Scatter Plots**: Explore relationships between variables
4. **Distributions**: Understand data spread and patterns

## 🎯 Practice Exercises

### Exercise 1: Process Parameter Analysis
Create a function that calculates process capability indices (Cp, Cpk) for a given parameter.

### Exercise 2: Wafer Map Pattern Detection
Extend the `analyze_wafer_patterns` function to detect specific failure patterns:
- Ring patterns
- Cluster patterns
- Systematic patterns

### Exercise 3: Yield Prediction
Create a simple model to predict yield based on process parameters using the data we've created.

### Exercise 4: Automated Reporting
Build a function that generates an automated wafer analysis report with:
- Summary statistics
- Visualizations
- Pass/fail criteria
- Recommendations

## 🔗 Next Steps

1. **Practice**: Complete the exercises above
2. **Explore**: Try the production script `1.1-wafer-analysis.py`
3. **Deep Dive**: Read the technical document `1.1-python-fundamentals.md`
4. **Reference**: Keep the quick reference `1.1-python-quick-ref.md` handy
5. **Continue**: Move to notebook `1.2-statistical-foundations.ipynb`

---

**Congratulations!** You've completed the first step in your machine learning journey for semiconductor engineering. You now have the Python fundamentals needed to analyze semiconductor data effectively.