# Computer Vision Semiconductor Analysis - Interactive Notebook

This notebook provides an interactive exploration of the computer vision pipeline for semiconductor analysis using MLX GPU acceleration.

## Features Covered:
1. Synthetic image generation with geometric patterns
2. GPU-accelerated image processing
3. Advanced filtering techniques
4. Feature extraction and classification
5. Performance evaluation and visualization

In [None]:
# Import required libraries
import sys
import os
import numpy as np
import matplotlib.pyplot as plt
import cv2
from tqdm.notebook import tqdm
import pandas as pd

# Import our custom modules
from main import SemiconImageProcessor
from image_utils import MLXImageUtils, AdvancedFilters
from config import *

# Set up plotting
plt.style.use('seaborn-v0_8')
%matplotlib inline

print("Environment setup complete!")

## 1. Initialize the Image Processor

Let's start by initializing our main image processing class with a specific random seed for reproducibility.

In [None]:
# Initialize the processor
processor = SemiconImageProcessor(seed=42)
print(f"Initialized processor with image size: {processor.image_size}")
print(f"Will generate {processor.num_images} images")

## 2. Generate Synthetic Images

Generate a dataset of synthetic grayscale images containing random rectangles with varying properties.

In [None]:
# Generate synthetic images
images, labels = processor.generate_synthetic_images()

print(f"Generated {len(images)} images")
print(f"Image shape: {images.shape}")
print(f"Label distribution: {np.bincount(np.array(labels))}")

## 3. Visualize Sample Images

Let's look at some sample images to understand what we've generated.

In [None]:
# Visualize sample images
fig, axes = plt.subplots(2, 4, figsize=(16, 8))
axes = axes.flatten()

# Convert MLX arrays to numpy for visualization
images_np = np.array(images, dtype=np.uint8)
labels_np = np.array(labels)

# Sample 8 random images
sample_indices = np.random.choice(len(images_np), 8, replace=False)
label_names = ['No Special Features', 'Large or Thick', 'Large and Thick']

for i, idx in enumerate(sample_indices):
    ax = axes[i]
    ax.imshow(images_np[idx], cmap='gray')
    ax.set_title(f'Image {idx}\nLabel: {label_names[labels_np[idx]]}')
    ax.axis('off')

plt.tight_layout()
plt.show()

## 4. Apply Gaussian Blur

Apply Gaussian blur filtering to the images and compare with originals.

In [None]:
# Apply Gaussian blur
blurred_images = processor.apply_gaussian_blur(kernel_size=7, sigma=1.5)

print(f"Applied Gaussian blur to {len(blurred_images)} images")

In [None]:
# Compare original vs blurred images
fig, axes = plt.subplots(2, 6, figsize=(18, 6))

blurred_np = np.array(blurred_images, dtype=np.uint8)
sample_indices = np.random.choice(len(images_np), 6, replace=False)

for i, idx in enumerate(sample_indices):
    # Original image
    axes[0, i].imshow(images_np[idx], cmap='gray')
    axes[0, i].set_title(f'Original {idx}')
    axes[0, i].axis('off')
    
    # Blurred image
    axes[1, i].imshow(blurred_np[idx], cmap='gray')
    axes[1, i].set_title(f'Blurred {idx}')
    axes[1, i].axis('off')

plt.tight_layout()
plt.show()

## 5. Advanced Image Filtering

Demonstrate various advanced filtering techniques.

In [None]:
# Select a test image for advanced filtering
test_idx = 42
test_image = images_np[test_idx]

# Initialize advanced filters
filters = AdvancedFilters()

# Apply different filters
grad_x, grad_y = filters.sobel_edge_detection(test_image)
edge_magnitude = np.sqrt(grad_x**2 + grad_y**2)
sharpened = filters.laplacian_sharpening(test_image, strength=0.8)
threshold = filters.adaptive_threshold(test_image, block_size=11)

# Visualize results
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

axes[0, 0].imshow(test_image, cmap='gray')
axes[0, 0].set_title('Original Image')
axes[0, 0].axis('off')

axes[0, 1].imshow(edge_magnitude, cmap='gray')
axes[0, 1].set_title('Edge Detection (Sobel)')
axes[0, 1].axis('off')

axes[0, 2].imshow(sharpened, cmap='gray')
axes[0, 2].set_title('Laplacian Sharpening')
axes[0, 2].axis('off')

axes[1, 0].imshow(threshold, cmap='gray')
axes[1, 0].set_title('Adaptive Threshold')
axes[1, 0].axis('off')

axes[1, 1].imshow(grad_x, cmap='gray')
axes[1, 1].set_title('Gradient X')
axes[1, 1].axis('off')

axes[1, 2].imshow(grad_y, cmap='gray')
axes[1, 2].set_title('Gradient Y')
axes[1, 2].axis('off')

plt.tight_layout()
plt.show()

## 6. Feature Extraction

Extract features from the images using various kernels and statistical measures.

In [None]:
# Extract features
print("Extracting features...")
features = processor.extract_features(images)

print(f"Extracted features shape: {features.shape}")
print(f"Number of features per image: {features.shape[1]}")

# Convert to numpy for analysis
features_np = np.array(features)

# Show feature statistics
feature_names = [
    'mean', 'std', 'max', 'min',
    'edge_x_mean', 'edge_x_std',
    'edge_y_mean', 'edge_y_std',
    'blur_mean', 'blur_std',
    'sharp_mean', 'sharp_std',
    'strong_edges', 'bright_pixels',
    'q25', 'q75'
]

# Create feature statistics dataframe
feature_stats = pd.DataFrame({
    'Feature': feature_names,
    'Mean': np.mean(features_np, axis=0),
    'Std': np.std(features_np, axis=0),
    'Min': np.min(features_np, axis=0),
    'Max': np.max(features_np, axis=0)
})

print("\nFeature Statistics:")
print(feature_stats)

## 7. Train Classifier

Train a Random Forest classifier using the extracted features.

In [None]:
# Train classifier
print("Training classifier...")
results = processor.train_classifier(features, labels, test_size=0.2)

print(f"\nClassifier Performance:")
print(f"Accuracy: {results['accuracy']:.4f}")
print(f"Precision: {results['precision']:.4f}")
print(f"Recall: {results['recall']:.4f}")

print("\nDetailed Classification Report:")
print(results['classification_report'])

## 8. Feature Importance Analysis

Analyze which features are most important for classification.

In [None]:
# Get feature importance from the trained classifier
classifier = results['classifier']
feature_importance = classifier.feature_importances_

# Create feature importance dataframe
importance_df = pd.DataFrame({
    'Feature': feature_names,
    'Importance': feature_importance
}).sort_values('Importance', ascending=False)

# Plot feature importance
plt.figure(figsize=(12, 8))
plt.barh(importance_df['Feature'], importance_df['Importance'])
plt.xlabel('Feature Importance')
plt.title('Feature Importance for Rectangle Classification')
plt.tight_layout()
plt.show()

print("Top 5 Most Important Features:")
print(importance_df.head())

## 9. Confusion Matrix Analysis

Analyze the confusion matrix to understand classification performance.

In [None]:
from sklearn.metrics import confusion_matrix
import seaborn as sns

# Create confusion matrix
cm = confusion_matrix(results['y_test'], results['y_pred'])

# Plot confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=label_names,
            yticklabels=label_names)
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.title('Confusion Matrix')
plt.show()

# Calculate per-class accuracy
class_accuracy = cm.diagonal() / cm.sum(axis=1)
for i, acc in enumerate(class_accuracy):
    print(f"Class {i} ({label_names[i]}) Accuracy: {acc:.4f}")

## 10. Performance Analysis by Image Properties

Analyze how different image properties affect classification performance.

In [None]:
# Analyze feature distributions by class
test_indices = np.arange(len(results['y_test']))
test_features = results['X_test']
test_labels = results['y_test']

# Create feature analysis plots
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Plot distributions for key features
key_features = [0, 4, 12, 13]  # mean, edge_x_mean, strong_edges, bright_pixels
key_names = ['Image Mean', 'Edge X Mean', 'Strong Edges', 'Bright Pixels']

for i, (feat_idx, feat_name) in enumerate(zip(key_features, key_names)):
    ax = axes[i//2, i%2]
    
    for class_label in range(3):
        class_mask = test_labels == class_label
        class_features = test_features[class_mask, feat_idx]
        
        ax.hist(class_features, alpha=0.7, label=label_names[class_label], bins=20)
    
    ax.set_xlabel(feat_name)
    ax.set_ylabel('Frequency')
    ax.set_title(f'{feat_name} Distribution by Class')
    ax.legend()

plt.tight_layout()
plt.show()

## 11. Save Results and Data

Save all generated data and results for future use.

In [None]:
# Save evaluation results
csv_path = processor.evaluate_performance(results)
print(f"Evaluation results saved to: {csv_path}")

# Save all data
processor.save_data()
print("All data saved successfully!")

# Create a summary report
summary = {
    'Total Images': len(images),
    'Image Size': processor.image_size,
    'Classes': len(np.unique(labels_np)),
    'Features': features.shape[1],
    'Test Accuracy': results['accuracy'],
    'Test Precision': results['precision'],
    'Test Recall': results['recall']
}

print("\n" + "="*50)
print("EXPERIMENT SUMMARY")
print("="*50)
for key, value in summary.items():
    print(f"{key}: {value}")
print("="*50)

## 12. Interactive Parameter Exploration

Experiment with different parameters to see their effects.

In [None]:
# Try different blur parameters
blur_params = [(3, 0.5), (5, 1.0), (7, 1.5), (9, 2.0)]
test_img_idx = 50
test_img = images_np[test_img_idx]

fig, axes = plt.subplots(1, len(blur_params) + 1, figsize=(20, 4))

# Original image
axes[0].imshow(test_img, cmap='gray')
axes[0].set_title('Original')
axes[0].axis('off')

# Apply different blur parameters
for i, (kernel_size, sigma) in enumerate(blur_params):
    blurred = cv2.GaussianBlur(test_img, (kernel_size, kernel_size), sigma)
    axes[i+1].imshow(blurred, cmap='gray')
    axes[i+1].set_title(f'Blur: k={kernel_size}, σ={sigma}')
    axes[i+1].axis('off')

plt.tight_layout()
plt.show()

## Conclusion

This notebook demonstrated a complete computer vision pipeline for semiconductor analysis including:

1. ✅ **Synthetic Data Generation**: Created 512 grayscale images with geometric patterns
2. ✅ **GPU Acceleration**: Leveraged MLX for efficient processing on Apple Silicon
3. ✅ **Advanced Filtering**: Applied Gaussian blur, edge detection, and morphological operations
4. ✅ **Feature Extraction**: Used multiple kernels to extract meaningful features
5. ✅ **Classification**: Trained a Random Forest classifier with strong performance
6. ✅ **Evaluation**: Comprehensive analysis with multiple metrics
7. ✅ **Visualization**: Interactive plots for understanding results
8. ✅ **Data Management**: Automated saving and organization of results

The pipeline successfully classifies rectangles based on their size and border properties with high accuracy, demonstrating the effectiveness of combining traditional computer vision techniques with modern GPU acceleration.