# Lab 4, Module 4: Real-World Medical Classification - Breast Cancer Dataset

## From XOR to Cancer Diagnosis: Binary Classification with High-Dimensional Data

**Learning Objectives:**
- Apply neural networks to a real medical diagnosis problem
- Understand binary classification with many features (30 vs. 4 in Iris)
- Experiment with model architecture (layers and units)
- Learn that simpler models can work well (don't always need huge networks)
- Understand confusion matrices and medical prediction trade-offs

**Estimated time:** 15-20 minutes

---

## Section 1: Introduction - From Penguins to Medical Diagnosis

**Your ML journey so far:**
- Module 0-2: Hand-built gradient descent with XOR (2 inputs, binary output)
- Module 3: Penguin species (4 inputs, 3 classes)
- Module 4 (now): Breast cancer diagnosis (30 inputs, binary output)

**Today's challenge: High-dimensional medical data**

### The Wisconsin Breast Cancer Dataset

This dataset contains measurements from **569 breast tumor cell samples**. Each sample is classified as:
- **Benign (0)**: Not cancerous
- **Malignant (1)**: Cancerous

Each sample has **30 numeric features** computed from cell nucleus images:
- Radius, texture, perimeter, area, smoothness, compactness, etc.
- Mean, standard error, and "worst" (largest) values for each measurement

**The Task:** Given these 30 measurements, predict whether the tumor is benign or malignant.

**Connection to XOR:**
- XOR: 2 inputs ‚Üí binary output (0 or 1)
- Breast Cancer: 30 inputs ‚Üí binary output (benign or malignant)
- Same type of problem, but with **15 times more features**!

**Why This Matters:**
- Real medical diagnosis with high stakes
- Predictions help doctors make treatment decisions
- False positives vs. false negatives have different consequences
- ML accuracy is rarely 100% - we need to understand errors

---

## Section 2: Setup and Package Check

First, let's make sure we have all the packages we need.

In [None]:
# Package check and installation
import sys

print("Checking packages...")
print(f"Python version: {sys.version}")

# Check/install required packages
required_packages = [
    ('numpy', 'numpy'),
    ('matplotlib', 'matplotlib'),
    ('sklearn', 'scikit-learn'),
    ('tensorflow', 'tensorflow')
]

missing_packages = []
for module_name, package_name in required_packages:
    try:
        __import__(module_name)
        print(f"‚úì {module_name} is installed")
    except ImportError:
        print(f"‚úó {module_name} not found")
        missing_packages.append(package_name)

if missing_packages:
    print(f"\nInstalling missing packages: {', '.join(missing_packages)}")
    import subprocess
    for package in missing_packages:
        subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", package])
    print("‚úì All packages installed!")
else:
    print("\n‚úì All required packages are already installed!")

Now import all the libraries we'll use:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")
print("‚úì All imports successful!")

---

## Section 3: Load and Explore the Dataset

Let's load the breast cancer dataset and see what we're working with.

In [None]:
# Load Breast Cancer dataset from scikit-learn
cancer = load_breast_cancer()
X = cancer.data  # Features: 30 measurements per tumor
y = cancer.target  # Labels: 0=Malignant, 1=Benign

print("="*70)
print("BREAST CANCER DATASET OVERVIEW")
print("="*70)
print(f"\nDataset shape:")
print(f"  Features (X): {X.shape} - {X.shape[0]} tumors, {X.shape[1]} measurements each")
print(f"  Labels (y):   {y.shape} - {y.shape[0]} diagnoses")

print(f"\nFeature names (first 10 of {len(cancer.feature_names)}):")
for i, name in enumerate(cancer.feature_names[:10]):
    print(f"  {i+1:2d}. {name}")
print("  ... and 20 more features")

print(f"\nTarget classes:")
benign_count = np.sum(y == 1)
malignant_count = np.sum(y == 0)
print(f"  Class 0 (Malignant): {malignant_count} samples ({malignant_count/len(y):.1%})")
print(f"  Class 1 (Benign):    {benign_count} samples ({benign_count/len(y):.1%})")

print(f"\nFeature value ranges (first 3 features):")
for i in range(3):
    print(f"  {cancer.feature_names[i]:30s}: min={X[:, i].min():7.2f}, max={X[:, i].max():7.2f}")
print("  ‚Üí Wide range! Scaling will be important.")
print("="*70)

### Key Observations

**High-dimensional data:**
- 30 features (vs. 4 for Iris, 2 for XOR!)
- Hard to visualize all dimensions at once
- Neural networks can find patterns in high-dimensional spaces

**Class balance:**
- Roughly 63% benign, 37% malignant
- Reasonably balanced (not heavily skewed)

**Feature scales:**
- Different features have wildly different ranges
- StandardScaler will normalize everything to mean=0, std=1

---

## Section 4: Train/Test Split and Feature Scaling

**Why split the data?**
- **Training set (80%):** Used to teach the network
- **Test set (20%):** Used to evaluate how well it learned (never seen during training)

**Why scale features?**
- Features have vastly different ranges (e.g., radius: 6-28, area: 143-2500)
- Neural networks learn better when all features are on similar scales
- StandardScaler transforms each feature to mean=0, std=1

In [None]:
# Split into train (80%) and test (20%) sets
# stratify=y ensures each class is proportionally represented in both sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print("Data split:")
print(f"  Training set:   {X_train.shape[0]} samples")
print(f"  Test set:       {X_test.shape[0]} samples")

# Scale features to mean=0, std=1
# IMPORTANT: Fit scaler on training data only, then apply to both train and test
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("\nFeature scaling (first feature):")
print(f"  Before scaling - mean: {X_train[:, 0].mean():.2f}, std: {X_train[:, 0].std():.2f}")
print(f"  After scaling  - mean: {X_train_scaled[:, 0].mean():.2f}, std: {X_train_scaled[:, 0].std():.2f}")
print("\n‚úì Data prepared for training!")

---

## Section 5: Baseline Linear Model (No Hidden Layer)

**Connection to Module 2:**
- Remember XOR couldn't be solved with a linear model (no hidden layer)
- But cancer diagnosis might be different!
- In **30-dimensional space**, even a linear (straight) boundary might separate the classes well

**Model architecture:**
- Input: 30 features
- Output: 1 unit with sigmoid activation (outputs probability 0-1)
- NO hidden layers - just a direct linear transformation

This is essentially **logistic regression** - a classic ML baseline!

In [None]:
# Build baseline linear model (no hidden layer)
baseline_model = Sequential([
    Dense(1, activation='sigmoid', input_dim=30, name='output')
], name='Baseline_Linear_Model')

# Compile model
# - Adam optimizer: smart gradient descent with momentum
# - binary_crossentropy: loss function for binary classification
# - accuracy: % of correct predictions
baseline_model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Show model architecture
print("="*70)
print("BASELINE LINEAR MODEL ARCHITECTURE")
print("="*70)
baseline_model.summary()
print("\nüí° This model has NO hidden layers - just 30 inputs ‚Üí 1 output")
print("   Total trainable parameters: (30 features √ó 1 output) + 1 bias = 31")

### Train the Baseline Model

In [None]:
# Train the model
print("Training baseline linear model...\n")

history_baseline = baseline_model.fit(
    X_train_scaled, y_train,
    epochs=100,
    batch_size=32,
    validation_split=0.2,  # Use 20% of training data for validation
    verbose=0  # Suppress detailed output
)

print("‚úì Training complete!")

### Evaluate and Analyze Results

In [None]:
# Evaluate on test set
test_loss_baseline, test_accuracy_baseline = baseline_model.evaluate(X_test_scaled, y_test, verbose=0)

# Get predictions for confusion matrix
y_pred_baseline = (baseline_model.predict(X_test_scaled, verbose=0) > 0.5).astype(int).flatten()

# Compute confusion matrix
cm_baseline = confusion_matrix(y_test, y_pred_baseline)

print("="*70)
print("BASELINE LINEAR MODEL RESULTS")
print("="*70)
print(f"Test Accuracy:  {test_accuracy_baseline:.1%}")
print(f"Test Loss:      {test_loss_baseline:.4f}")
print("\nConfusion Matrix:")
print("")
print("                    Predicted")
print("                Malignant  Benign")
print(f"Actual Malignant    {cm_baseline[0,0]:4d}      {cm_baseline[0,1]:4d}")
print(f"       Benign       {cm_baseline[1,0]:4d}      {cm_baseline[1,1]:4d}")
print("")
print("Interpretation:")
print(f"  - True Negatives (correctly identified malignant):  {cm_baseline[0,0]}")
print(f"  - False Positives (benign predicted as malignant):  {cm_baseline[1,0]}")
print(f"  - False Negatives (malignant predicted as benign):  {cm_baseline[0,1]}")
print(f"  - True Positives (correctly identified benign):     {cm_baseline[1,1]}")
print("="*70)

if test_accuracy_baseline >= 0.95:
    print("\n‚úÖ Excellent! Linear model achieves high accuracy on cancer data.")
    print("   Why? Even a straight boundary in 30D space can separate classes well!")
elif test_accuracy_baseline >= 0.90:
    print("\n‚úì Good baseline performance!")
    print("  Let's see if hidden layers can improve this.")
else:
    print("\n‚ö†Ô∏è Linear model struggles.")
    print("  We'll likely need hidden layers for better performance.")

### Visualize Training Progress

In [None]:
# Plot training curves
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5), dpi=100)

# Accuracy plot
ax1.plot(history_baseline.history['accuracy'], label='Training', linewidth=2)
ax1.plot(history_baseline.history['val_accuracy'], label='Validation', linewidth=2)
ax1.set_xlabel('Epoch', fontsize=12, fontweight='bold')
ax1.set_ylabel('Accuracy', fontsize=12, fontweight='bold')
ax1.set_title('Baseline Linear Model: Accuracy', fontsize=13, fontweight='bold')
ax1.legend(fontsize=11)
ax1.grid(True, alpha=0.3)

# Loss plot
ax2.plot(history_baseline.history['loss'], label='Training', linewidth=2)
ax2.plot(history_baseline.history['val_loss'], label='Validation', linewidth=2)
ax2.set_xlabel('Epoch', fontsize=12, fontweight='bold')
ax2.set_ylabel('Loss', fontsize=12, fontweight='bold')
ax2.set_title('Baseline Linear Model: Loss', fontsize=13, fontweight='bold')
ax2.legend(fontsize=11)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüí° What to look for:")
print("   - Accuracy increases and stabilizes")
print("   - Training and validation curves are close (no major overfitting)")
print("   - Loss decreases smoothly")

### Confusion Matrix Heatmap Visualization

In [None]:
# Create confusion matrix heatmap
fig, ax = plt.subplots(figsize=(8, 6), dpi=100)

# Plot heatmap
im = ax.imshow(cm_baseline, cmap='Blues', aspect='auto')

# Add colorbar
cbar = ax.figure.colorbar(im, ax=ax)
cbar.ax.set_ylabel('Count', rotation=-90, va="bottom", fontsize=11)

# Set ticks and labels
ax.set_xticks([0, 1])
ax.set_yticks([0, 1])
ax.set_xticklabels(['Malignant', 'Benign'], fontsize=11)
ax.set_yticklabels(['Malignant', 'Benign'], fontsize=11)

# Labels
ax.set_xlabel('Predicted Label', fontsize=12, fontweight='bold')
ax.set_ylabel('True Label', fontsize=12, fontweight='bold')
ax.set_title('Baseline Model: Confusion Matrix', fontsize=13, fontweight='bold')

# Add text annotations
for i in range(2):
    for j in range(2):
        text = ax.text(j, i, cm_baseline[i, j],
                      ha="center", va="center", color="black", fontsize=20, fontweight='bold')

plt.tight_layout()
plt.show()

print("\nüí° Understanding the confusion matrix:")
print("   - Diagonal (top-left to bottom-right): Correct predictions")
print("   - Off-diagonal: Errors")
print("   - Top-right (False Negative): Dangerous! Missed cancer diagnosis")
print("   - Bottom-left (False Positive): Concerning! Unnecessary worry/treatment")

---

## Section 6: Experiment with Model Architecture

**Now it's your turn to experiment!**

Try different architectures by changing the variables below:
- `num_hidden_layers`: How many hidden layers? (Try 0, 1, 2)
- `units_per_layer`: How many neurons in each hidden layer? (Try 8, 16, 32, 64)

**Questions to explore:**
- Does adding hidden layers improve accuracy?
- Does more neurons always help?
- At what point do you see diminishing returns?
- Can you beat the baseline linear model?

In [None]:
# ========== ADJUSTABLE PARAMETERS - CHANGE THESE! ==========
num_hidden_layers = 1   # Try: 0, 1, 2, 3
units_per_layer = 16    # Try: 8, 16, 32, 64
# ===========================================================

# Build model programmatically based on parameters
model = Sequential(name=f'Model_{num_hidden_layers}L_{units_per_layer}U')

# Add hidden layers
for i in range(num_hidden_layers):
    if i == 0:
        # First hidden layer needs input_dim specified
        model.add(Dense(units_per_layer, activation='relu', input_dim=30, name=f'hidden_{i+1}'))
    else:
        model.add(Dense(units_per_layer, activation='relu', name=f'hidden_{i+1}'))

# Add output layer
if num_hidden_layers == 0:
    # If no hidden layers, need to specify input_dim on output layer
    model.add(Dense(1, activation='sigmoid', input_dim=30, name='output'))
else:
    model.add(Dense(1, activation='sigmoid', name='output'))

# Compile model
model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Show architecture
print("="*70)
print(f"MODEL ARCHITECTURE: {num_hidden_layers} Hidden Layers, {units_per_layer} Units Each")
print("="*70)
model.summary()

if num_hidden_layers == 0:
    print("\nüí° No hidden layers - this is the same as the baseline!")
else:
    total_params = sum([np.prod(w.shape) for w in model.trainable_weights])
    print(f"\nüí° This model has {num_hidden_layers} hidden layer(s) with {units_per_layer} neurons each")
    print(f"   Total parameters: {total_params}")

### Train Your Custom Model

In [None]:
# Train the model
print(f"Training model with {num_hidden_layers} hidden layers, {units_per_layer} units each...\n")

history_custom = model.fit(
    X_train_scaled, y_train,
    epochs=100,
    batch_size=32,
    validation_split=0.2,
    verbose=0
)

print("‚úì Training complete!")

### Evaluate and Compare

In [None]:
# Evaluate on test set
test_loss_custom, test_accuracy_custom = model.evaluate(X_test_scaled, y_test, verbose=0)

# Get predictions and confusion matrix
y_pred_custom = (model.predict(X_test_scaled, verbose=0) > 0.5).astype(int).flatten()
cm_custom = confusion_matrix(y_test, y_pred_custom)

print("="*70)
print("COMPARISON: BASELINE vs YOUR CUSTOM MODEL")
print("="*70)
print(f"Baseline (no hidden layer):              {test_accuracy_baseline:.1%}")
print(f"Your model ({num_hidden_layers} layers, {units_per_layer:2d} units/layer): {test_accuracy_custom:.1%}")
print("="*70)

improvement = test_accuracy_custom - test_accuracy_baseline
if improvement > 0.02:
    print(f"\n‚úÖ Your model is better! Improvement: +{improvement:.1%}")
elif improvement > 0:
    print(f"\n‚úì Slight improvement: +{improvement:.1%}")
elif improvement > -0.02:
    print(f"\n‚âà Similar performance: {improvement:+.1%}")
    print("   The baseline was already very good!")
else:
    print(f"\n‚ö†Ô∏è Your model is worse: {improvement:+.1%}")
    print("   Sometimes simpler is better!")

print("\nYour Model Confusion Matrix:")
print("")
print("                    Predicted")
print("                Malignant  Benign")
print(f"Actual Malignant    {cm_custom[0,0]:4d}      {cm_custom[0,1]:4d}")
print(f"       Benign       {cm_custom[1,0]:4d}      {cm_custom[1,1]:4d}")
print("")

# Compare false negatives (most critical in medical diagnosis)
fn_baseline = cm_baseline[0, 1]
fn_custom = cm_custom[0, 1]
print(f"False Negatives (missed cancers):")
print(f"  Baseline: {fn_baseline}")
print(f"  Your model: {fn_custom}")
if fn_custom < fn_baseline:
    print(f"  ‚úÖ Your model missed fewer cancers! ({fn_baseline - fn_custom} fewer)")
elif fn_custom == fn_baseline:
    print(f"  = Same number of missed cancers")
else:
    print(f"  ‚ö†Ô∏è Your model missed more cancers ({fn_custom - fn_baseline} more)")

### Visualize Custom Model Performance

In [None]:
# Plot training curves
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5), dpi=100)

# Accuracy plot
ax1.plot(history_custom.history['accuracy'], label='Training', linewidth=2)
ax1.plot(history_custom.history['val_accuracy'], label='Validation', linewidth=2)
ax1.set_xlabel('Epoch', fontsize=12, fontweight='bold')
ax1.set_ylabel('Accuracy', fontsize=12, fontweight='bold')
ax1.set_title(f'Custom Model ({num_hidden_layers}L, {units_per_layer}U): Accuracy', fontsize=13, fontweight='bold')
ax1.legend(fontsize=11)
ax1.grid(True, alpha=0.3)

# Loss plot
ax2.plot(history_custom.history['loss'], label='Training', linewidth=2)
ax2.plot(history_custom.history['val_loss'], label='Validation', linewidth=2)
ax2.set_xlabel('Epoch', fontsize=12, fontweight='bold')
ax2.set_ylabel('Loss', fontsize=12, fontweight='bold')
ax2.set_title(f'Custom Model ({num_hidden_layers}L, {units_per_layer}U): Loss', fontsize=13, fontweight='bold')
ax2.legend(fontsize=11)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüí° Signs of overfitting:")
print("   - Training accuracy much higher than validation")
print("   - Validation loss increases while training loss decreases")
print("   - Large gap between training and validation curves")

### Custom Model Confusion Matrix Heatmap

In [None]:
# Create confusion matrix heatmap for custom model
fig, ax = plt.subplots(figsize=(8, 6), dpi=100)

# Plot heatmap
im = ax.imshow(cm_custom, cmap='Blues', aspect='auto')

# Add colorbar
cbar = ax.figure.colorbar(im, ax=ax)
cbar.ax.set_ylabel('Count', rotation=-90, va="bottom", fontsize=11)

# Set ticks and labels
ax.set_xticks([0, 1])
ax.set_yticks([0, 1])
ax.set_xticklabels(['Malignant', 'Benign'], fontsize=11)
ax.set_yticklabels(['Malignant', 'Benign'], fontsize=11)

# Labels
ax.set_xlabel('Predicted Label', fontsize=12, fontweight='bold')
ax.set_ylabel('True Label', fontsize=12, fontweight='bold')
ax.set_title(f'Custom Model ({num_hidden_layers}L, {units_per_layer}U): Confusion Matrix', 
            fontsize=13, fontweight='bold')

# Add text annotations
for i in range(2):
    for j in range(2):
        text = ax.text(j, i, cm_custom[i, j],
                      ha="center", va="center", color="black", fontsize=20, fontweight='bold')

plt.tight_layout()
plt.show()

---

## Section 7: Understanding Variability - Running Multiple Experiments

### Why Run Multiple Experiments?

**Remember from Module 3:** Neural networks are stochastic!
- Random weight initialization leads to different results each run
- Professional ML practice: run multiple times, report mean ¬± std
- This is especially important when comparing models

**Let's apply this to medical diagnosis:**
- Is the baseline model consistently good?
- Do hidden layers reliably improve performance?
- How much do false negatives (missed cancers) vary?

---

### Experiment: Baseline Model with Multiple Runs

Let's run the baseline linear model **5 times** and analyze the variability.

In [None]:
# Run baseline model multiple times
num_runs = 5
num_epochs = 100

print("="*70)
print(f"RUNNING BASELINE LINEAR MODEL {num_runs} TIMES")
print("="*70)
print("\nWhy? To see how stable the model is across different initializations!\n")

# Store results
baseline_accuracies = []
baseline_losses = []
baseline_false_negatives = []  # Missed cancers - most critical!
baseline_histories = []

for run in range(num_runs):
    print(f"Run {run+1}/{num_runs}...", end=" ")
    
    # Create NEW model for fresh initialization
    model = Sequential([
        Dense(1, activation='sigmoid', input_dim=30, name='output')
    ], name=f'Baseline_Run_{run+1}')
    
    model.compile(
        optimizer='adam',
        loss='binary_crossentropy',
        metrics=['accuracy']
    )
    
    # Train
    history = model.fit(
        X_train_scaled, y_train,
        epochs=num_epochs,
        batch_size=32,
        validation_split=0.2,
        verbose=0
    )
    
    # Evaluate
    test_loss, test_acc = model.evaluate(X_test_scaled, y_test, verbose=0)
    
    # Get predictions and confusion matrix
    y_pred = (model.predict(X_test_scaled, verbose=0) > 0.5).astype(int).flatten()
    cm = confusion_matrix(y_test, y_pred)
    false_negatives = cm[0, 1]  # Malignant predicted as benign
    
    # Store
    baseline_accuracies.append(test_acc)
    baseline_losses.append(test_loss)
    baseline_false_negatives.append(false_negatives)
    baseline_histories.append(history)
    
    print(f"Acc: {test_acc:.1%}, Loss: {test_loss:.4f}, FN: {false_negatives}")

print("\n" + "="*70)
print("BASELINE MODEL - STATISTICAL SUMMARY")
print("="*70)
print(f"Test Accuracy:        {np.mean(baseline_accuracies):.1%} ¬± {np.std(baseline_accuracies):.1%}")
print(f"Test Loss:            {np.mean(baseline_losses):.4f} ¬± {np.std(baseline_losses):.4f}")
print(f"False Negatives:      {np.mean(baseline_false_negatives):.1f} ¬± {np.std(baseline_false_negatives):.2f}")
print(f"  (Missed cancers - MOST CRITICAL METRIC!)")
print(f"\nAccuracy Range:       {np.min(baseline_accuracies):.1%} to {np.max(baseline_accuracies):.1%}")
print(f"FN Range:             {int(np.min(baseline_false_negatives))} to {int(np.max(baseline_false_negatives))}")
print("="*70)

if np.std(baseline_accuracies) < 0.02:
    print("\n‚úÖ Very consistent results - stable baseline!")
elif np.std(baseline_accuracies) < 0.05:
    print("\n‚úì Reasonably consistent results")
else:
    print("\n‚ö†Ô∏è High variability - results depend on initialization")

### Experiment: Custom Model with Multiple Runs

Now let's run your custom model (with hidden layers) multiple times.

In [None]:
# Adjustable parameters - CHANGE THESE!
num_hidden_layers_exp = 1   # Try: 0, 1, 2
units_per_layer_exp = 16    # Try: 8, 16, 32

print("="*70)
print(f"RUNNING CUSTOM MODEL ({num_hidden_layers_exp} layers, {units_per_layer_exp} units) {num_runs} TIMES")
print("="*70)
print()

# Store results
custom_accuracies = []
custom_losses = []
custom_false_negatives = []
custom_histories = []

for run in range(num_runs):
    print(f"Run {run+1}/{num_runs}...", end=" ")
    
    # Build model
    model = Sequential(name=f'Custom_Run_{run+1}')
    
    # Add hidden layers
    for i in range(num_hidden_layers_exp):
        if i == 0:
            model.add(Dense(units_per_layer_exp, activation='relu', input_dim=30, name=f'hidden_{i+1}'))
        else:
            model.add(Dense(units_per_layer_exp, activation='relu', name=f'hidden_{i+1}'))
    
    # Output layer
    if num_hidden_layers_exp == 0:
        model.add(Dense(1, activation='sigmoid', input_dim=30, name='output'))
    else:
        model.add(Dense(1, activation='sigmoid', name='output'))
    
    model.compile(
        optimizer='adam',
        loss='binary_crossentropy',
        metrics=['accuracy']
    )
    
    # Train
    history = model.fit(
        X_train_scaled, y_train,
        epochs=num_epochs,
        batch_size=32,
        validation_split=0.2,
        verbose=0
    )
    
    # Evaluate
    test_loss, test_acc = model.evaluate(X_test_scaled, y_test, verbose=0)
    
    # Confusion matrix
    y_pred = (model.predict(X_test_scaled, verbose=0) > 0.5).astype(int).flatten()
    cm = confusion_matrix(y_test, y_pred)
    false_negatives = cm[0, 1]
    
    # Store
    custom_accuracies.append(test_acc)
    custom_losses.append(test_loss)
    custom_false_negatives.append(false_negatives)
    custom_histories.append(history)
    
    print(f"Acc: {test_acc:.1%}, Loss: {test_loss:.4f}, FN: {false_negatives}")

print("\n" + "="*70)
print(f"CUSTOM MODEL ({num_hidden_layers_exp} layers, {units_per_layer_exp} units) - STATISTICAL SUMMARY")
print("="*70)
print(f"Test Accuracy:        {np.mean(custom_accuracies):.1%} ¬± {np.std(custom_accuracies):.1%}")
print(f"Test Loss:            {np.mean(custom_losses):.4f} ¬± {np.std(custom_losses):.4f}")
print(f"False Negatives:      {np.mean(custom_false_negatives):.1f} ¬± {np.std(custom_false_negatives):.2f}")
print(f"\nAccuracy Range:       {np.min(custom_accuracies):.1%} to {np.max(custom_accuracies):.1%}")
print(f"FN Range:             {int(np.min(custom_false_negatives))} to {int(np.max(custom_false_negatives))}")
print("="*70)

### Statistical Comparison: Baseline vs Custom Model

In [None]:
# Statistical comparison
print("="*70)
print("STATISTICAL COMPARISON: BASELINE vs CUSTOM MODEL")
print("="*70)

print(f"\nBaseline (no hidden layers):")
print(f"  Accuracy:         {np.mean(baseline_accuracies):.1%} ¬± {np.std(baseline_accuracies):.1%}")
print(f"  False Negatives:  {np.mean(baseline_false_negatives):.1f} ¬± {np.std(baseline_false_negatives):.2f}")

print(f"\nCustom Model ({num_hidden_layers_exp} layers, {units_per_layer_exp} units):")
print(f"  Accuracy:         {np.mean(custom_accuracies):.1%} ¬± {np.std(custom_accuracies):.1%}")
print(f"  False Negatives:  {np.mean(custom_false_negatives):.1f} ¬± {np.std(custom_false_negatives):.2f}")

acc_improvement = np.mean(custom_accuracies) - np.mean(baseline_accuracies)
fn_improvement = np.mean(baseline_false_negatives) - np.mean(custom_false_negatives)

print(f"\nMean Accuracy Improvement:  {acc_improvement:+.1%}")
print(f"Mean FN Reduction:          {fn_improvement:+.1f} (positive = fewer missed cancers)")
print("="*70)

# Box plots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6), dpi=100)

# Accuracy box plot
box_data_acc = [baseline_accuracies, custom_accuracies]
labels = ['Baseline\n(0 layers)', f'Custom\n({num_hidden_layers_exp}L, {units_per_layer_exp}U)']

bp1 = ax1.boxplot(box_data_acc, labels=labels, patch_artist=True, showmeans=True, meanline=True)
for patch, color in zip(bp1['boxes'], ['lightblue', 'lightgreen']):
    patch.set_facecolor(color)

ax1.set_ylabel('Test Accuracy', fontsize=12, fontweight='bold')
ax1.set_title(f'Accuracy Distribution ({num_runs} runs)', fontsize=13, fontweight='bold')
ax1.grid(True, alpha=0.3, axis='y')

# Add points
for i, data in enumerate(box_data_acc, 1):
    x = np.random.normal(i, 0.04, size=len(data))
    ax1.scatter(x, data, alpha=0.6, s=60, c='red', edgecolors='black', linewidths=1)

# False Negatives box plot
box_data_fn = [baseline_false_negatives, custom_false_negatives]

bp2 = ax2.boxplot(box_data_fn, labels=labels, patch_artist=True, showmeans=True, meanline=True)
for patch, color in zip(bp2['boxes'], ['lightcoral', 'lightgreen']):
    patch.set_facecolor(color)

ax2.set_ylabel('False Negatives (Missed Cancers)', fontsize=12, fontweight='bold')
ax2.set_title(f'False Negatives Distribution ({num_runs} runs)', fontsize=13, fontweight='bold')
ax2.grid(True, alpha=0.3, axis='y')

# Add points
for i, data in enumerate(box_data_fn, 1):
    x = np.random.normal(i, 0.04, size=len(data))
    ax2.scatter(x, data, alpha=0.6, s=60, c='red', edgecolors='black', linewidths=1)

plt.tight_layout()
plt.show()

print("\nüí° Key Questions:")
print("   - Is the custom model CONSISTENTLY better?")
print("   - Do the boxes overlap? (If yes, improvement may not be reliable)")
print("   - Which metric matters more: accuracy or false negatives?")
print("   - Would you trust this model for medical diagnosis?")

### Visualize All Training Runs

In [None]:
# Plot all training curves together
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(14, 10), dpi=100)

# Baseline accuracy
for i, history in enumerate(baseline_histories):
    ax1.plot(history.history['val_accuracy'], alpha=0.6, linewidth=2, label=f'Run {i+1}')
ax1.set_xlabel('Epoch', fontsize=11, fontweight='bold')
ax1.set_ylabel('Validation Accuracy', fontsize=11, fontweight='bold')
ax1.set_title('Baseline: Validation Accuracy', fontsize=12, fontweight='bold')
ax1.legend(fontsize=9)
ax1.grid(True, alpha=0.3)

# Baseline loss
for i, history in enumerate(baseline_histories):
    ax2.plot(history.history['val_loss'], alpha=0.6, linewidth=2, label=f'Run {i+1}')
ax2.set_xlabel('Epoch', fontsize=11, fontweight='bold')
ax2.set_ylabel('Validation Loss', fontsize=11, fontweight='bold')
ax2.set_title('Baseline: Validation Loss', fontsize=12, fontweight='bold')
ax2.legend(fontsize=9)
ax2.grid(True, alpha=0.3)

# Custom accuracy
for i, history in enumerate(custom_histories):
    ax3.plot(history.history['val_accuracy'], alpha=0.6, linewidth=2, label=f'Run {i+1}')
ax3.set_xlabel('Epoch', fontsize=11, fontweight='bold')
ax3.set_ylabel('Validation Accuracy', fontsize=11, fontweight='bold')
ax3.set_title(f'Custom ({num_hidden_layers_exp}L, {units_per_layer_exp}U): Validation Accuracy', 
             fontsize=12, fontweight='bold')
ax3.legend(fontsize=9)
ax3.grid(True, alpha=0.3)

# Custom loss
for i, history in enumerate(custom_histories):
    ax4.plot(history.history['val_loss'], alpha=0.6, linewidth=2, label=f'Run {i+1}')
ax4.set_xlabel('Epoch', fontsize=11, fontweight='bold')
ax4.set_ylabel('Validation Loss', fontsize=11, fontweight='bold')
ax4.set_title(f'Custom ({num_hidden_layers_exp}L, {units_per_layer_exp}U): Validation Loss', 
             fontsize=12, fontweight='bold')
ax4.legend(fontsize=9)
ax4.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüí° What to observe:")
print("   - Do all runs converge to similar final values?")
print("   - Are there any outlier runs?")
print("   - Does one model show more variability than the other?")

### Key Insights: Stochasticity in Medical ML

**What you should have learned:**

1. **Results vary even on the same data!**
   - Random initialization affects final performance
   - Some runs may miss more cancers than others
   - This is why clinical ML systems need extensive validation

2. **Statistical reporting is essential in medicine**
   - Reporting "97% accuracy" from one run is misleading
   - "97.2% ¬± 0.8%" gives a true picture of reliability
   - Variability in false negatives is critical - lives are at stake!

3. **Model comparison requires statistics**
   - If the boxes overlap significantly, the "improvement" may not be real
   - Need statistical tests (t-test, etc.) to confirm differences
   - In medicine, reproducibility is paramount

4. **Simpler models can be more stable**
   - Baseline model may have lower variance than complex models
   - Trade-off: slightly lower mean accuracy but more consistent
   - Consistency matters in clinical deployment

**Medical Ethics Connection:**
- If your model misses 2-4 cancers depending on initialization, that's a problem!
- Real medical ML systems:
  - Train on much larger datasets (thousands to millions of samples)
  - Use ensemble methods (combine multiple models)
  - Undergo extensive clinical trials
  - Are validated on diverse patient populations

**Professional Practice:**
- Always run at least 5-10 times (ideally more)
- Report mean ¬± std for all metrics
- Show distributions (box plots, violin plots)
- Consider worst-case runs, not just average

---

---

## Section 8: Record Your Experiments

**Instructions:** Run the experiment cells above multiple times with different values for `num_hidden_layers` and `units_per_layer`. Record your results in the table below (on your answer sheet)!

### My Experiment Results

| Hidden Layers | Units per Layer | Test Accuracy | False Negatives |
|--------------|-----------------|---------------|----------------|
| 0            | N/A             | ____%        | ____           |
| 1            | 8               | ____%        | ____           |
| 1            | 16              | ____%        | ____           |
| 1            | 32              | ____%        | ____           |
| 1            | 64              | ____%        | ____           |
| 2            | 16              | ____%        | ____           |
| 2            | 32              | ____%        | ____           |

**Tip:** Copy this table to your answer sheet and fill it in as you experiment!

---

## Section 9: Connection to Earlier Labs

Let's connect what you just did to the gradient descent work from Modules 0-2!

### The Journey: From Manual to Automatic

**Module 0 (Dimension Lifting):**
- You learned that XOR can't be solved in 2D, but CAN in 3D
- Adding x‚ÇÉ = x‚ÇÅ √ó x‚ÇÇ "lifted" the problem to a solvable space
- **Connection now:** Hidden layers do this automatically! Each hidden neuron creates a new "dimension"

**Module 1 (Manual Weight Adjustment):**
- You manually adjusted 9 weights to solve XOR
- Saw how changing weights affects decision boundaries
- **Connection now:** Keras `.fit()` adjusts hundreds of weights automatically!

**Module 2 (Gradient Descent):**
- You watched gradient descent automatically find good weights
- Saw momentum speed up convergence
- Experimented with learning rates
- **Connection now:**
  - `.fit()` uses **backpropagation** to compute gradients (same idea!)
  - **Adam optimizer** is like momentum, but smarter (adaptive learning rates)
  - All the math you learned still applies!

**Module 3 (Penguin Species):**
- Applied these ideas to real data (4 features, 3 classes)
- Saw that simple models can work well
- **Connection now:** Breast cancer has 30 features instead of 4, but same principles!

### What's The Same?
1. **Goal:** Find weights that minimize prediction error
2. **Method:** Gradient descent (computing how to change weights)
3. **Architecture:** Input ‚Üí Hidden layers ‚Üí Output
4. **Learning:** Iterative improvement through many epochs

### What's Different?
1. **Scale:** 30 inputs instead of 2, potentially hundreds of weights instead of 9
2. **Automation:** No manual weight adjustment, no manual gradient calculation
3. **Data:** Real medical measurements instead of abstract patterns
4. **Stakes:** Real diagnosis implications instead of academic exercise

### The Big Picture

```
Module 0: Learned WHY hidden layers help (dimension lifting)
         ‚Üì
Module 1: Saw WHAT weights do (manual adjustment)
         ‚Üì
Module 2: Learned HOW to find weights automatically (gradient descent)
         ‚Üì
Module 3: Applied to REAL data (penguin species)
         ‚Üì
Module 4: Applied to HIGH-STAKES real data (cancer diagnosis)
         ‚Üì
SAME FUNDAMENTAL CONCEPTS, INCREASING COMPLEXITY AND REAL-WORLD RELEVANCE!
```

**The power of abstraction:** The gradient descent you learned on tiny XOR works the same way on medical datasets with millions of parameters!

---

## Section 10: Key Takeaways

### 1. High-Dimensional Data Can Be Surprisingly Easy
- 30 features sounds complex, but linear models can work well!
- In high-dimensional space, classes are often more separable
- "Curse of dimensionality" is real, but so is "blessing of dimensionality"

### 2. Simpler Models Often Suffice
- Baseline linear model likely achieved 94-97% accuracy
- Adding hidden layers may give only small improvements
- Don't assume you need a huge network for every problem!
- **Occam's Razor:** Prefer simpler models when they work

### 3. Medical ML Has Ethical Implications
- **False Positive:** Predicting cancer when there isn't any
  - Consequence: Unnecessary stress, additional tests, maybe biopsy
- **False Negative:** Missing actual cancer
  - Consequence: Delayed treatment, disease progression, worse outcomes
- Which is worse? **Context matters!** Doctors use ML as one tool among many.

### 4. Accuracy Isn't Everything
- 97% sounds great, but that's still ~3 errors per 100 patients
- Confusion matrix shows WHERE errors happen
- In medicine, the TYPE of error matters as much as the rate

### 5. Diminishing Returns with Complexity
- Going from 0 to 1 hidden layer: may help a lot (or a little)
- Going from 1 to 2 layers: often minimal improvement
- More parameters = longer training, more overfitting risk
- **Principle:** Add complexity only when justified by performance gain

### 6. Real ML Is About Trade-offs
- Accuracy vs. interpretability (can we explain predictions?)
- Complexity vs. training time
- False positives vs. false negatives
- Model performance vs. computational cost

---

## Reflection Questions

Answer these on your answer sheet:

**Q1.** How did the baseline linear model perform on breast cancer data? Were you surprised? Why or why not?

**Q2.** Did adding hidden layers significantly improve accuracy? At what architecture did you see diminishing returns?

**Q3.** Looking at your confusion matrices, did you reduce false negatives (missed cancers) with more complex models? Is there a trade-off with false positives?

**Q4.** In medical diagnosis, which error is more concerning: false positive (predicting cancer when there isn't any) or false negative (missing actual cancer)? Explain your reasoning.

**Q5.** Compare Module 3 (Penguins) and Module 4 (Breast Cancer):
   - Which dataset benefited more from hidden layers?
   - Why might this be? (Think about feature count and class separability)

**Q6.** Reflect on your journey from Module 0 to now:
   - What's the connection between manually lifting XOR to 3D (Module 0) and hidden layers in Keras?
   - How does `.fit()` relate to the gradient descent you saw in Module 2?
   - What's the SAME between 2-feature XOR and 30-feature cancer diagnosis?

**Q7.** Given that a simple linear model achieves ~95% accuracy, why might doctors still want a more complex model? Why might they prefer the simpler one?

---

## Next Steps

1. **Experiment extensively** - try many combinations of layers and units
2. **Record your results** - fill in the experiment table on your answer sheet
3. **Analyze patterns** - when does complexity help? When doesn't it?
4. **Answer reflection questions** - think deeply about what you learned
5. **Return to the LMS** - submit your answer sheet

---

## Congratulations!

You've completed the journey from hand-built gradient descent with abstract XOR patterns to real-world medical ML with TensorFlow/Keras!

**You now understand:**
- ‚úÖ Why hidden layers help (dimension lifting)
- ‚úÖ How gradient descent finds weights automatically
- ‚úÖ How to build, train, and evaluate neural networks
- ‚úÖ How to apply ML to real-world problems
- ‚úÖ How to interpret results and understand trade-offs
- ‚úÖ The ethical implications of ML in high-stakes domains

**The principles you learned apply to:**
- Image recognition (computer vision)
- Natural language processing (text understanding)
- Time series forecasting (stock prices, weather)
- Recommender systems (Netflix, Spotify)
- And countless other applications!

**Great work! üéâ**

---