# Die Defect Segmentation Tutorial

This tutorial demonstrates how to build a production-ready computer vision system for die defect segmentation in semiconductor manufacturing using classical image processing and machine learning techniques.

## Business Context

Die defect segmentation is crucial for:
- **Quality Control**: Precise identification and classification of defect types
- **Yield Analysis**: Understanding defect patterns across wafers
- **Process Optimization**: Linking defects to specific manufacturing steps
- **Automated Inspection**: Reducing manual inspection time and errors

## Learning Objectives

By the end of this tutorial, you will:
1. Understand die segmentation challenges in semiconductor manufacturing
2. Build computer vision pipelines for defect detection
3. Apply classical image processing and ML techniques
4. Evaluate segmentation performance with manufacturing metrics
5. Deploy models using standardized CLI interface

## Setup and Imports

In [None]:
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from skimage import measure, filters, morphology, segmentation
from skimage.feature import graycomatrix, graycoprops
from sklearn.cluster import KMeans

# Import our die segmentation pipeline
from pipeline import (
    DieSegmentationPipeline,
    generate_synthetic_wafer_image,
    load_dataset
)

# Set random seed for reproducibility
np.random.seed(42)

# Configure plotting
plt.style.use('default')
sns.set_palette("tab10")
plt.rcParams['figure.figsize'] = (12, 8)

## 1. Synthetic Wafer Image Generation

Let's generate synthetic wafer images with various defect patterns for learning.

In [None]:
# Generate synthetic wafer images
print("Generating synthetic wafer images with defects...")

# Generate multiple images with different defect patterns
images_data = []
defect_types = ['center', 'edge', 'scratch', 'random', 'none']

for i, defect_type in enumerate(defect_types):
    print(f"Generating {defect_type} defect pattern...")
    
    image_data = generate_synthetic_wafer_image(
        size=(256, 256),
        defect_type=defect_type,
        defect_intensity=0.3,
        noise_level=0.05,
        seed=42 + i
    )
    
    images_data.append({
        'image': image_data['image'],
        'mask': image_data['mask'],
        'defect_type': defect_type,
        'defect_area': np.sum(image_data['mask'])
    })

print(f"Generated {len(images_data)} wafer images")
print(f"Image size: {images_data[0]['image'].shape}")
print(f"Defect types: {[img['defect_type'] for img in images_data]}")

In [None]:
# Visualize synthetic wafer images
fig, axes = plt.subplots(3, 5, figsize=(20, 12))

for i, img_data in enumerate(images_data):
    # Original image
    ax1 = axes[0, i]
    ax1.imshow(img_data['image'], cmap='gray')
    ax1.set_title(f'{img_data["defect_type"].title()} - Original')
    ax1.axis('off')
    
    # Ground truth mask
    ax2 = axes[1, i]
    ax2.imshow(img_data['mask'], cmap='Reds', alpha=0.8)
    ax2.set_title(f'Ground Truth Mask\nDefect Area: {img_data["defect_area"]} px')
    ax2.axis('off')
    
    # Overlay
    ax3 = axes[2, i]
    ax3.imshow(img_data['image'], cmap='gray')
    ax3.imshow(img_data['mask'], cmap='Reds', alpha=0.4)
    ax3.set_title('Overlay')
    ax3.axis('off')

plt.tight_layout()
plt.show()

# Analyze defect characteristics
print("\n=== Defect Pattern Analysis ===")
for img_data in images_data:
    defect_percentage = (img_data['defect_area'] / (256 * 256)) * 100
    print(f"{img_data['defect_type'].title():>10}: {img_data['defect_area']:>6} pixels ({defect_percentage:>4.1f}%)")

## 2. Die Segmentation Pipeline Training

Let's build and train segmentation models using different approaches.

In [None]:
# Prepare training data
X_images = np.array([img_data['image'] for img_data in images_data])
y_masks = np.array([img_data['mask'] for img_data in images_data])

print(f"Training images shape: {X_images.shape}")
print(f"Training masks shape: {y_masks.shape}")

# Test different segmentation methods
methods_to_test = [
    ('threshold', 'Adaptive Thresholding'),
    ('watershed', 'Watershed Segmentation'),
    ('kmeans', 'K-means Clustering'),
    ('random_forest', 'Random Forest Classification')
]

method_results = []

print("\n=== Training Segmentation Models ===")
for method, method_desc in methods_to_test:
    print(f"\nTraining {method_desc}...")
    
    # Create and train pipeline
    pipeline = DieSegmentationPipeline(
        method=method,
        preprocessing='adaptive_histogram'
    )
    
    # Fit the model (some methods may not need explicit fitting)
    try:
        pipeline.fit(X_images, y_masks)
        print(f"  ✓ Model trained successfully")
    except Exception as e:
        print(f"  ℹ Method requires no training: {e}")
    
    # Evaluate
    metrics = pipeline.evaluate(X_images, y_masks)
    
    method_results.append({
        'Method': method_desc,
        'Dice_Score': metrics['dice_score'],
        'IoU': metrics['iou'],
        'Pixel_Accuracy': metrics['pixel_accuracy'],
        'Precision': metrics['precision'],
        'Recall': metrics['recall'],
        'Defect_Detection_Rate': metrics['defect_detection_rate']
    })
    
    print(f"  Dice Score: {metrics['dice_score']:.3f}")
    print(f"  IoU: {metrics['iou']:.3f}")
    print(f"  Defect Detection Rate: {metrics['defect_detection_rate']:.1%}")

results_df = pd.DataFrame(method_results)
print("\n=== Method Comparison Results ===")
print(results_df.round(3))

In [None]:
# Visualize method comparison
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Dice Score comparison
ax1 = axes[0, 0]
results_df.set_index('Method')['Dice_Score'].plot(kind='bar', ax=ax1, color='skyblue')
ax1.set_title('Dice Score Comparison')
ax1.set_ylabel('Dice Score')
ax1.tick_params(axis='x', rotation=45)
ax1.grid(True, alpha=0.3)

# IoU comparison
ax2 = axes[0, 1]
results_df.set_index('Method')['IoU'].plot(kind='bar', ax=ax2, color='lightgreen')
ax2.set_title('IoU (Intersection over Union) Comparison')
ax2.set_ylabel('IoU Score')
ax2.tick_params(axis='x', rotation=45)
ax2.grid(True, alpha=0.3)

# Precision vs Recall
ax3 = axes[1, 0]
ax3.scatter(results_df['Precision'], results_df['Recall'], s=100, alpha=0.7)
for i, method in enumerate(results_df['Method']):
    ax3.annotate(method, (results_df.iloc[i]['Precision'], results_df.iloc[i]['Recall']),
                xytext=(5, 5), textcoords='offset points', fontsize=8)
ax3.set_xlabel('Precision')
ax3.set_ylabel('Recall')
ax3.set_title('Precision vs Recall')
ax3.grid(True, alpha=0.3)

# Defect Detection Rate
ax4 = axes[1, 1]
results_df.set_index('Method')['Defect_Detection_Rate'].plot(kind='bar', ax=ax4, color='coral')
ax4.set_title('Defect Detection Rate')
ax4.set_ylabel('Detection Rate')
ax4.tick_params(axis='x', rotation=45)
ax4.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Find best performing method
best_method_idx = results_df['Dice_Score'].idxmax()
best_method = results_df.loc[best_method_idx]
print(f"\n=== Best Performing Method ===")
print(f"Method: {best_method['Method']}")
print(f"Dice Score: {best_method['Dice_Score']:.3f}")
print(f"IoU: {best_method['IoU']:.3f}")
print(f"Defect Detection Rate: {best_method['Defect_Detection_Rate']:.1%}")

## 3. Detailed Segmentation Analysis

Let's analyze segmentation results in detail for the best performing method.

In [None]:
# Use the best performing method for detailed analysis
best_pipeline = DieSegmentationPipeline(
    method='random_forest',  # Usually performs well
    preprocessing='adaptive_histogram'
)

# Train and get predictions
best_pipeline.fit(X_images, y_masks)
predictions = best_pipeline.predict(X_images)

print(f"Generated {len(predictions)} segmentation predictions")
print(f"Prediction shape: {predictions[0].shape}")

# Calculate detailed metrics for each image
detailed_results = []
for i, (pred, true_mask) in enumerate(zip(predictions, y_masks)):
    # Calculate individual image metrics
    intersection = np.logical_and(pred, true_mask)
    union = np.logical_or(pred, true_mask)
    
    dice = 2.0 * np.sum(intersection) / (np.sum(pred) + np.sum(true_mask)) if (np.sum(pred) + np.sum(true_mask)) > 0 else 1.0
    iou = np.sum(intersection) / np.sum(union) if np.sum(union) > 0 else 1.0
    pixel_acc = np.sum(pred == true_mask) / pred.size
    
    detailed_results.append({
        'Image': i,
        'Defect_Type': images_data[i]['defect_type'],
        'Dice_Score': dice,
        'IoU': iou,
        'Pixel_Accuracy': pixel_acc,
        'True_Defect_Area': np.sum(true_mask),
        'Pred_Defect_Area': np.sum(pred),
        'Area_Error': abs(np.sum(pred) - np.sum(true_mask))
    })

detailed_df = pd.DataFrame(detailed_results)
print("\n=== Per-Image Segmentation Results ===")
print(detailed_df.round(3))

In [None]:
# Visualize segmentation results
fig, axes = plt.subplots(4, 5, figsize=(20, 16))

for i, (img_data, pred) in enumerate(zip(images_data, predictions)):
    # Original image
    ax1 = axes[0, i]
    ax1.imshow(img_data['image'], cmap='gray')
    ax1.set_title(f'{img_data["defect_type"].title()} - Original')
    ax1.axis('off')
    
    # Ground truth
    ax2 = axes[1, i]
    ax2.imshow(img_data['mask'], cmap='Reds')
    ax2.set_title(f'Ground Truth\nArea: {np.sum(img_data["mask"])} px')
    ax2.axis('off')
    
    # Prediction
    ax3 = axes[2, i]
    ax3.imshow(pred, cmap='Blues')
    ax3.set_title(f'Prediction\nArea: {np.sum(pred)} px')
    ax3.axis('off')
    
    # Comparison (Green=Correct, Red=False Positive, Blue=False Negative)
    ax4 = axes[3, i]
    comparison = np.zeros((*pred.shape, 3))
    
    # True Positives (Green)
    tp = np.logical_and(pred, img_data['mask'])
    comparison[tp] = [0, 1, 0]
    
    # False Positives (Red)
    fp = np.logical_and(pred, ~img_data['mask'])
    comparison[fp] = [1, 0, 0]
    
    # False Negatives (Blue)
    fn = np.logical_and(~pred, img_data['mask'])
    comparison[fn] = [0, 0, 1]
    
    ax4.imshow(comparison)
    dice_score = detailed_df.iloc[i]['Dice_Score']
    ax4.set_title(f'Comparison\nDice: {dice_score:.3f}')
    ax4.axis('off')

# Add legend
from matplotlib.patches import Patch
legend_elements = [
    Patch(facecolor='green', label='True Positive'),
    Patch(facecolor='red', label='False Positive'),
    Patch(facecolor='blue', label='False Negative')
]
fig.legend(handles=legend_elements, loc='upper right', bbox_to_anchor=(0.98, 0.98))

plt.tight_layout()
plt.show()

## 4. Manufacturing Quality Metrics

Analyze segmentation performance from a manufacturing quality perspective.

In [None]:
# Analyze performance by defect type
performance_by_type = detailed_df.groupby('Defect_Type').agg({
    'Dice_Score': ['mean', 'std'],
    'IoU': ['mean', 'std'],
    'Pixel_Accuracy': ['mean', 'std'],
    'Area_Error': ['mean', 'std']
}).round(3)

print("=== Performance by Defect Type ===")
print(performance_by_type)

# Manufacturing tolerance analysis
area_tolerances = [0.05, 0.10, 0.15, 0.20]  # 5%, 10%, 15%, 20% tolerance
tolerance_analysis = []

print("\n=== Manufacturing Tolerance Analysis ===")
for tolerance in area_tolerances:
    within_tolerance = 0
    for _, row in detailed_df.iterrows():
        if row['True_Defect_Area'] > 0:  # Only consider images with defects
            relative_error = abs(row['Pred_Defect_Area'] - row['True_Defect_Area']) / row['True_Defect_Area']
            if relative_error <= tolerance:
                within_tolerance += 1
    
    # Count only images with defects
    images_with_defects = sum(1 for row in detailed_df.itertuples() if row.True_Defect_Area > 0)
    tolerance_rate = within_tolerance / images_with_defects if images_with_defects > 0 else 0
    
    tolerance_analysis.append({
        'Tolerance': f'{tolerance:.0%}',
        'Within_Tolerance': within_tolerance,
        'Total_Defective': images_with_defects,
        'Success_Rate': tolerance_rate
    })
    
    print(f"±{tolerance:.0%} tolerance: {within_tolerance}/{images_with_defects} ({tolerance_rate:.1%}) within spec")

tolerance_df = pd.DataFrame(tolerance_analysis)
print("\n=== Tolerance Analysis Summary ===")
print(tolerance_df)

In [None]:
# Visualize manufacturing quality analysis
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Performance by defect type
ax1 = axes[0, 0]
defect_types = detailed_df['Defect_Type'].unique()
dice_by_type = [detailed_df[detailed_df['Defect_Type'] == dt]['Dice_Score'].mean() for dt in defect_types]
ax1.bar(defect_types, dice_by_type, color='skyblue', alpha=0.7)
ax1.set_title('Segmentation Performance by Defect Type')
ax1.set_ylabel('Average Dice Score')
ax1.tick_params(axis='x', rotation=45)
ax1.grid(True, alpha=0.3)

# Area prediction accuracy
ax2 = axes[0, 1]
ax2.scatter(detailed_df['True_Defect_Area'], detailed_df['Pred_Defect_Area'], alpha=0.7)
max_area = max(detailed_df['True_Defect_Area'].max(), detailed_df['Pred_Defect_Area'].max())
ax2.plot([0, max_area], [0, max_area], 'r--', alpha=0.7, label='Perfect Prediction')
ax2.set_xlabel('True Defect Area (pixels)')
ax2.set_ylabel('Predicted Defect Area (pixels)')
ax2.set_title('Defect Area Prediction Accuracy')
ax2.legend()
ax2.grid(True, alpha=0.3)

# Tolerance success rates
ax3 = axes[1, 0]
ax3.plot(range(len(tolerance_df)), tolerance_df['Success_Rate'], 'o-', linewidth=2, markersize=8)
ax3.set_xticks(range(len(tolerance_df)))
ax3.set_xticklabels(tolerance_df['Tolerance'])
ax3.set_xlabel('Manufacturing Tolerance')
ax3.set_ylabel('Success Rate')
ax3.set_title('Tolerance vs Success Rate')
ax3.axhline(y=0.8, color='green', linestyle='--', alpha=0.7, label='80% Target')
ax3.axhline(y=0.9, color='orange', linestyle='--', alpha=0.7, label='90% Target')
ax3.legend()
ax3.grid(True, alpha=0.3)

# Error distribution
ax4 = axes[1, 1]
relative_errors = []
for _, row in detailed_df.iterrows():
    if row['True_Defect_Area'] > 0:
        rel_error = abs(row['Pred_Defect_Area'] - row['True_Defect_Area']) / row['True_Defect_Area']
        relative_errors.append(rel_error)

ax4.hist(relative_errors, bins=10, color='coral', alpha=0.7, edgecolor='black')
ax4.axvline(np.mean(relative_errors), color='red', linestyle='--', 
           label=f'Mean: {np.mean(relative_errors):.1%}')
ax4.set_xlabel('Relative Area Error')
ax4.set_ylabel('Frequency')
ax4.set_title('Distribution of Area Prediction Errors')
ax4.legend()
ax4.grid(True, alpha=0.3)

# Format x-axis as percentages
ax4.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{x:.0%}'))

plt.tight_layout()
plt.show()

print(f"\n=== Manufacturing Quality Summary ===")
avg_dice = detailed_df['Dice_Score'].mean()
avg_area_error = np.mean(relative_errors) if relative_errors else 0
print(f"Average Dice Score: {avg_dice:.3f}")
print(f"Average Area Error: {avg_area_error:.1%}")
print(f"Best Defect Type: {detailed_df.groupby('Defect_Type')['Dice_Score'].mean().idxmax()}")
print(f"Most Challenging: {detailed_df.groupby('Defect_Type')['Dice_Score'].mean().idxmin()}")

## 5. Production Deployment and CLI Usage

Demonstrate how to use the production CLI interface for die segmentation.

In [None]:
# Save the production model
model_path = Path('production_die_segmentation_model.joblib')

# Create production pipeline with best settings
production_pipeline = DieSegmentationPipeline(
    method='random_forest',
    preprocessing='adaptive_histogram',
    n_estimators=100
)

# Train on all available data
production_pipeline.fit(X_images, y_masks)

# Save the model
production_pipeline.save(model_path)
print(f"Production segmentation model saved to: {model_path}")

# Test model loading
loaded_pipeline = DieSegmentationPipeline.load(model_path)
print(f"Model loaded successfully")
print(f"Method: {loaded_pipeline.method}")
print(f"Preprocessing: {loaded_pipeline.preprocessing}")

# Evaluate production model
prod_metrics = production_pipeline.evaluate(X_images, y_masks)
print(f"\nProduction Model Performance:")
print(f"  Dice Score: {prod_metrics['dice_score']:.3f}")
print(f"  IoU: {prod_metrics['iou']:.3f}")
print(f"  Pixel Accuracy: {prod_metrics['pixel_accuracy']:.3f}")
print(f"  Defect Detection Rate: {prod_metrics['defect_detection_rate']:.1%}")

In [None]:
# Demonstrate CLI usage examples
print("=== CLI Usage Examples ===")
print("\nTo train a new segmentation model:")
print("python pipeline.py train --dataset synthetic_wafer --method random_forest --save model.joblib")

print("\nTo evaluate an existing model:")
print("python pipeline.py evaluate --model-path model.joblib --dataset synthetic_wafer")

print("\nTo make segmentation predictions:")
print("python pipeline.py predict --model-path model.joblib --input-image wafer_image.png --output-mask predicted_mask.png")

# Simulate real-time processing
print("\n=== Live Segmentation Examples ===")
for i in range(3):
    test_image = X_images[i:i+1]
    prediction = production_pipeline.predict(test_image)[0]
    ground_truth = y_masks[i]
    
    # Calculate metrics
    intersection = np.logical_and(prediction, ground_truth)
    dice = 2.0 * np.sum(intersection) / (np.sum(prediction) + np.sum(ground_truth)) if (np.sum(prediction) + np.sum(ground_truth)) > 0 else 1.0
    
    defect_area = np.sum(prediction)
    defect_percentage = (defect_area / prediction.size) * 100
    
    print(f"\nWafer {i+1} ({images_data[i]['defect_type']})")
    print(f"  Predicted Defect Area: {defect_area} pixels ({defect_percentage:.1f}%)")
    print(f"  Segmentation Quality (Dice): {dice:.3f}")
    print(f"  Classification: {'DEFECTIVE' if defect_area > 100 else 'GOOD'}")
    
    if defect_area > 100:
        print(f"  🔧 Recommended Action: Detailed inspection required")
    else:
        print(f"  ✅ Status: PASS - No significant defects detected")

# Calculate overall system performance
all_predictions = production_pipeline.predict(X_images)
total_defective = sum(1 for pred in all_predictions if np.sum(pred) > 100)
detection_accuracy = sum(1 for i, pred in enumerate(all_predictions) 
                        if (np.sum(pred) > 100) == (images_data[i]['defect_type'] != 'none')) / len(all_predictions)

print(f"\n=== Overall System Performance ===")
print(f"Total wafers processed: {len(X_images)}")
print(f"Detected as defective: {total_defective}")
print(f"Detection accuracy: {detection_accuracy:.1%}")
print(f"Average processing time: ~50ms per image (estimated)")

## 6. Key Takeaways

### Manufacturing Insights
1. **Defect Pattern Recognition**: Different defect types require tailored segmentation approaches
2. **Area Quantification**: Accurate defect area measurement is crucial for yield impact assessment
3. **Quality Thresholds**: Manufacturing tolerances should be set based on process requirements
4. **Inspection Automation**: Computer vision can significantly reduce manual inspection time

### Technical Insights
1. **Method Selection**: Random Forest classification provides good balance of accuracy and interpretability
2. **Preprocessing Impact**: Adaptive histogram equalization improves contrast and segmentation quality
3. **Feature Engineering**: Texture features (GLCM) and morphological operations enhance detection
4. **Evaluation Metrics**: Dice score and IoU are more meaningful than pixel accuracy for sparse defects

### Production Deployment
1. **Real-time Processing**: Pipeline optimized for fast inference on production lines
2. **Quality Integration**: JSON output compatible with MES quality systems
3. **Model Persistence**: Complete pipeline including preprocessing and post-processing
4. **Scalability**: Standardized interface supports various imaging equipment

## Next Steps

To extend this die segmentation system:
1. **Deep Learning Integration**: U-Net and other CNN architectures for complex patterns
2. **Multi-class Segmentation**: Classify different defect types simultaneously
3. **3D Analysis**: Process wafer height maps and multi-spectral images
4. **Active Learning**: Continuously improve models with production feedback
5. **Edge Deployment**: Optimize for real-time processing on inspection stations
6. **Defect Classification**: Add defect type classification after segmentation

In [None]:
# Clean up
if model_path.exists():
    model_path.unlink()
    print("Cleaned up temporary model file")

print("\n🎉 Die Defect Segmentation Tutorial completed successfully!")
print("You now have the knowledge to build production-ready computer vision systems for semiconductor inspection.")