<a href="https://colab.research.google.com/github/sprince0031/ICT-Python-ML/blob/main/Week%205/Notebooks/week5_solutions.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python and ML Foundations: Session 5
## Perceptrons, MLPs & Neural Networks - Solutions

Welcome to the session 5 solutions notebook! This notebook contains complete solutions to all the challenges from week 5.

## Utility code
The below code cell(s) contain(s) any common imports or sample data that can be useful for your exercises. Make sure to run these cells first before starting your exercises!

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Perceptron
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import confusion_matrix, classification_report

sns.set_style('whitegrid')
np.random.seed(42)

Run below cell to download the Breast Cancer Wisconsin dataset directly from scikit-learn and load it into a Pandas dataframe.

In [None]:
from sklearn.datasets import load_breast_cancer

# Load the breast cancer dataset
data = load_breast_cancer()

# Create a DataFrame
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

# Map target values to meaningful labels
df['diagnosis'] = df['target'].map({0: 'malignant', 1: 'benign'})

print("Dataset shape:", df.shape)
print("\nFirst few rows:")
df.head()

---
# Video Challenges
## About the Breast Cancer Wisconsin Dataset

This dataset contains features computed from digitized images of fine needle aspirate (FNA) of breast masses. The features describe characteristics of the cell nuclei present in the images. This is a binary classification problem where the goal is to predict whether a tumor is **malignant** (cancerous) or **benign** (non-cancerous).

**Dataset characteristics:**
- **569 samples** (212 malignant, 357 benign)
- **30 numerical features** including:
  - radius, texture, perimeter, area, smoothness
  - compactness, concavity, concave points, symmetry, fractal dimension
  - Each feature has mean, standard error, and "worst" (largest) values
- **Target**: 0 = malignant, 1 = benign

**Real-world context:**
This is a classic medical diagnosis problem where accurate classification can help doctors make informed decisions about treatment. In this context:
- **False Negatives** (predicting benign when it's malignant) are very dangerous
- **False Positives** (predicting malignant when it's benign) cause unnecessary stress and procedures
- We need to carefully balance precision and recall

Let's explore this dataset and build increasingly sophisticated models to solve this important classification task!

---
## Video 1: Perceptron & MLPs

### Challenge: Build Your First Neural Network Classifier

In this challenge, you'll build and compare two classifiers for breast cancer detection:
1. A simple **Perceptron** model
2. A **Multi-Layer Perceptron (MLP)** with hidden layers

In [None]:
# Step 1: Select features and target
# We'll use all 30 features for comprehensive analysis
X = df[data.feature_names]
y = df['target']

print(f"Features shape: {X.shape}")
print(f"Target shape: {y.shape}")
print(f"\nTarget distribution:")
print(y.value_counts())

In [None]:
# Step 2: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

print(f"Training set size: {X_train.shape[0]}")
print(f"Test set size: {X_test.shape[0]}")

In [None]:
# Step 3: Scale the features using StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("Features scaled successfully!")
print(f"Mean of scaled training data: {X_train_scaled.mean():.4f}")
print(f"Std of scaled training data: {X_train_scaled.std():.4f}")

In [None]:
# Step 4: Train a Perceptron model
perceptron = Perceptron(max_iter=1000, random_state=42)
perceptron.fit(X_train_scaled, y_train)

# Make predictions
y_pred_perceptron = perceptron.predict(X_test_scaled)

# Calculate accuracy
acc_perceptron = accuracy_score(y_test, y_pred_perceptron)
print(f"Perceptron Accuracy: {acc_perceptron:.4f}")
print(f"Number of iterations: {perceptron.n_iter_}")

In [None]:
# Step 5: Train an MLP model
mlp = MLPClassifier(hidden_layer_sizes=(20, 10), activation='relu', 
                    max_iter=1000, random_state=42)
mlp.fit(X_train_scaled, y_train)

# Make predictions
y_pred_mlp = mlp.predict(X_test_scaled)

# Calculate accuracy
acc_mlp = accuracy_score(y_test, y_pred_mlp)
print(f"MLP Accuracy: {acc_mlp:.4f}")
print(f"Number of layers: {mlp.n_layers_}")
print(f"Number of iterations: {mlp.n_iter_}")

In [None]:
# Step 6: Compare the two models
print("Model Comparison:")
print(f"  Perceptron Accuracy: {acc_perceptron:.4f}")
print(f"  MLP Accuracy:        {acc_mlp:.4f}")
print(f"\nImprovement with MLP: {(acc_mlp - acc_perceptron) * 100:.2f}%")

# Visualize comparison
plt.figure(figsize=(10, 6))
models = ['Perceptron', 'MLP']
accuracies = [acc_perceptron, acc_mlp]
bars = plt.bar(models, accuracies, color=['#3498db', '#2ecc71'])
plt.ylabel('Accuracy')
plt.title('Model Comparison: Perceptron vs MLP')
plt.ylim([0.9, 1.0])

# Add value labels on bars
for bar, acc in zip(bars, accuracies):
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
            f'{acc:.4f}',
            ha='center', va='bottom')

plt.show()

print("\nAnalysis:")
print("The MLP performs better than the simple Perceptron because:")
print("1. It can learn non-linear patterns through hidden layers")
print("2. Multiple layers allow for more complex feature combinations")
print("3. The ReLU activation introduces non-linearity")

---
## Video 2: MLPs 2 & Advanced Metrics

### Challenge: Evaluate with Advanced Metrics

Accuracy alone doesn't tell the whole story, especially in medical diagnosis where the cost of different types of errors varies significantly.

In [None]:
# Step 1: Prepare the data (reusing from Video 1)
# Data is already prepared: X_train_scaled, X_test_scaled, y_train, y_test
print("Data already prepared from Video 1")
print(f"Training samples: {X_train_scaled.shape[0]}")
print(f"Test samples: {X_test_scaled.shape[0]}")

In [None]:
# Step 2: Train an MLP classifier with a deeper architecture
mlp_deep = MLPClassifier(hidden_layer_sizes=(50, 25), activation='relu',
                         max_iter=1000, random_state=42)
mlp_deep.fit(X_train_scaled, y_train)

# Make predictions
y_pred = mlp_deep.predict(X_test_scaled)

print(f"Model trained successfully!")
print(f"Architecture: {mlp_deep.hidden_layer_sizes}")
print(f"Training iterations: {mlp_deep.n_iter_}")

In [None]:
# Step 3: Calculate and print multiple metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print("\n" + "="*50)
print("MODEL PERFORMANCE METRICS")
print("="*50)
print(f"Accuracy:  {accuracy:.4f}")
print(f"Precision: {precision:.4f}  (Of all predicted benign, how many are correct?)")
print(f"Recall:    {recall:.4f}  (Of all actual benign cases, how many did we find?)")
print(f"F1-Score:  {f1:.4f}  (Harmonic mean of precision and recall)")
print("="*50)

# Visualize metrics
plt.figure(figsize=(10, 6))
metrics = ['Accuracy', 'Precision', 'Recall', 'F1-Score']
values = [accuracy, precision, recall, f1]
colors = ['#3498db', '#2ecc71', '#e74c3c', '#f39c12']
bars = plt.bar(metrics, values, color=colors)
plt.ylabel('Score')
plt.title('Comprehensive Model Evaluation')
plt.ylim([0.9, 1.0])

for bar, val in zip(bars, values):
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
            f'{val:.4f}',
            ha='center', va='bottom')

plt.show()

In [None]:
# Step 4: Create and visualize the confusion matrix
cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['Malignant (0)', 'Benign (1)'],
            yticklabels=['Malignant (0)', 'Benign (1)'],
            cbar_kws={'label': 'Count'})
plt.title('Confusion Matrix - Breast Cancer Classification', fontsize=14, fontweight='bold')
plt.ylabel('True Label', fontsize=12)
plt.xlabel('Predicted Label', fontsize=12)
plt.show()

print("\n" + "="*50)
print("CONFUSION MATRIX BREAKDOWN")
print("="*50)
print(f"True Negatives (TN):  {cm[0, 0]:3d}  - Correctly identified malignant")
print(f"False Positives (FP): {cm[0, 1]:3d}  - Benign predicted as malignant (unnecessary concern)")
print(f"False Negatives (FN): {cm[1, 0]:3d}  - Malignant predicted as benign (DANGEROUS!)")
print(f"True Positives (TP):  {cm[1, 1]:3d}  - Correctly identified benign")
print("="*50)

print("\nMedical Context Analysis:")
print(f"- False Negatives ({cm[1, 0]}): Missing malignant tumors is life-threatening")
print(f"- False Positives ({cm[0, 1]}): Cause unnecessary stress and procedures")
print(f"- In medical diagnosis, minimizing False Negatives is critical")
print(f"- High Recall ({recall:.4f}) means we're catching most malignant cases")

In [None]:
# Step 5: Generate a detailed classification report
print("\n" + "="*70)
print("DETAILED CLASSIFICATION REPORT")
print("="*70)
print(classification_report(y_test, y_pred, 
                          target_names=['Malignant (0)', 'Benign (1)']))
print("="*70)

print("\nKey Insights:")
print("1. Support: Number of samples in each class")
print("2. For malignant cases (class 0):")
print("   - High precision means low false alarms")
print("   - High recall means we're catching most malignant cases")
print("3. For benign cases (class 1):")
print("   - High precision means confident benign predictions")
print("   - High recall means we're correctly identifying benign cases")
print("4. In this medical context, we want:")
print("   - Very high recall for malignant (class 0) - don't miss cancers")
print("   - Good precision overall - minimize false alarms")

---
## Video 3: Neural Networks Deep Dive

### Challenge: Optimize Your Neural Network

Now that you understand how neural networks learn, it's time to build an optimized classifier by experimenting with different hyperparameters.

In [None]:
# Step 1: Prepare the data (already done)
print("Using the same prepared dataset from previous sections")
print(f"Features: {X_train_scaled.shape[1]}")
print(f"Training samples: {X_train_scaled.shape[0]}")
print(f"Test samples: {X_test_scaled.shape[0]}")

In [None]:
# Step 2: Experiment with different architectures
architectures = [
    (30,),
    (50, 25),
    (100, 50, 25),
    (50, 30, 20, 10)
]

print("\n" + "="*70)
print("ARCHITECTURE COMPARISON")
print("="*70)

arch_results = []
for arch in architectures:
    mlp = MLPClassifier(hidden_layer_sizes=arch, activation='relu',
                       max_iter=1000, random_state=42)
    mlp.fit(X_train_scaled, y_train)
    y_pred = mlp.predict(X_test_scaled)
    acc = accuracy_score(y_test, y_pred)
    arch_results.append({'architecture': arch, 'accuracy': acc})
    print(f"Architecture {str(arch):20s}: Accuracy = {acc:.4f}")

best_arch = max(arch_results, key=lambda x: x['accuracy'])
print(f"\nBest Architecture: {best_arch['architecture']} with accuracy {best_arch['accuracy']:.4f}")
print("="*70)

In [None]:
# Step 3: Compare different activation functions
activations = ['relu', 'tanh', 'logistic']

print("\n" + "="*70)
print("ACTIVATION FUNCTION COMPARISON")
print("="*70)

activation_results = []
for activation in activations:
    mlp = MLPClassifier(hidden_layer_sizes=(50, 25), activation=activation,
                       max_iter=1000, random_state=42)
    mlp.fit(X_train_scaled, y_train)
    y_pred = mlp.predict(X_test_scaled)
    acc = accuracy_score(y_test, y_pred)
    activation_results.append(acc)
    print(f"Activation: {activation:10s} - Accuracy: {acc:.4f}")

best_activation_idx = np.argmax(activation_results)
print(f"\nBest Activation: {activations[best_activation_idx]} with accuracy {activation_results[best_activation_idx]:.4f}")
print("="*70)

In [None]:
# Step 4: Experiment with different solvers
solvers = ['adam', 'sgd', 'lbfgs']

print("\n" + "="*70)
print("SOLVER COMPARISON")
print("="*70)

solver_results = []
for solver in solvers:
    mlp = MLPClassifier(hidden_layer_sizes=(50, 25), activation='relu',
                       solver=solver, max_iter=1000, random_state=42)
    mlp.fit(X_train_scaled, y_train)
    y_pred = mlp.predict(X_test_scaled)
    acc = accuracy_score(y_test, y_pred)
    solver_results.append(acc)
    print(f"Solver: {solver:8s} - Accuracy: {acc:.4f}, Iterations: {mlp.n_iter_}")

best_solver_idx = np.argmax(solver_results)
print(f"\nBest Solver: {solvers[best_solver_idx]} with accuracy {solver_results[best_solver_idx]:.4f}")
print("="*70)

In [None]:
# Step 5: Test different regularization strengths
alphas = [0.0001, 0.001, 0.01, 0.1, 1.0]

print("\n" + "="*70)
print("REGULARIZATION (ALPHA) COMPARISON")
print("="*70)

for alpha in alphas:
    mlp = MLPClassifier(hidden_layer_sizes=(100, 50, 25), activation='relu',
                       alpha=alpha, max_iter=1000, random_state=42)
    mlp.fit(X_train_scaled, y_train)
    
    train_acc = mlp.score(X_train_scaled, y_train)
    test_acc = mlp.score(X_test_scaled, y_test)
    gap = train_acc - test_acc
    
    print(f"Alpha {alpha:7.4f}: Train={train_acc:.4f}, Test={test_acc:.4f}, Gap={gap:.4f}")

print("\nAnalysis:")
print("- Small gap indicates good generalization (less overfitting)")
print("- Too much regularization can cause underfitting (lower test accuracy)")
print("- Alpha around 0.001-0.01 typically provides good balance")
print("="*70)

In [None]:
# Step 6: Build the final optimized model
print("\n" + "="*70)
print("BUILDING OPTIMIZED MODEL")
print("="*70)

mlp_optimized = MLPClassifier(
    hidden_layer_sizes=(100, 50, 25),  # Deep architecture for complex patterns
    activation='relu',                  # Best performing activation
    solver='adam',                      # Adaptive learning rate
    alpha=0.001,                       # Moderate regularization
    batch_size=32,                     # Mini-batch for stable gradients
    learning_rate_init=0.001,          # Moderate learning rate
    max_iter=1000,
    early_stopping=True,               # Prevent overfitting
    validation_fraction=0.1,           # 10% for validation
    random_state=42,
    verbose=False
)

mlp_optimized.fit(X_train_scaled, y_train)

print(f"Training completed in {mlp_optimized.n_iter_} iterations")
print(f"Final training loss: {mlp_optimized.loss_:.6f}")
print("\nModel Configuration:")
print(f"  Architecture: {mlp_optimized.hidden_layer_sizes}")
print(f"  Activation: {mlp_optimized.activation}")
print(f"  Solver: {mlp_optimized.solver}")
print(f"  Alpha (L2): {mlp_optimized.alpha}")
print(f"  Batch size: {mlp_optimized.batch_size}")
print(f"  Learning rate: {mlp_optimized.learning_rate_init}")
print("="*70)

In [None]:
# Step 7: Comprehensive evaluation of the optimized model
y_pred_optimized = mlp_optimized.predict(X_test_scaled)

# Calculate all metrics
accuracy_opt = accuracy_score(y_test, y_pred_optimized)
precision_opt = precision_score(y_test, y_pred_optimized)
recall_opt = recall_score(y_test, y_pred_optimized)
f1_opt = f1_score(y_test, y_pred_optimized)

print("\n" + "="*70)
print("OPTIMIZED MODEL - FINAL EVALUATION")
print("="*70)
print(f"Accuracy:  {accuracy_opt:.4f}")
print(f"Precision: {precision_opt:.4f}")
print(f"Recall:    {recall_opt:.4f}")
print(f"F1-Score:  {f1_opt:.4f}")
print("="*70)

# Confusion matrix
cm_opt = confusion_matrix(y_test, y_pred_optimized)

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Plot 1: Confusion Matrix
sns.heatmap(cm_opt, annot=True, fmt='d', cmap='Blues', ax=axes[0],
            xticklabels=['Malignant', 'Benign'],
            yticklabels=['Malignant', 'Benign'])
axes[0].set_title('Confusion Matrix - Optimized Model', fontsize=14, fontweight='bold')
axes[0].set_ylabel('True Label', fontsize=12)
axes[0].set_xlabel('Predicted Label', fontsize=12)

# Plot 2: Metrics Comparison
metrics = ['Accuracy', 'Precision', 'Recall', 'F1-Score']
values = [accuracy_opt, precision_opt, recall_opt, f1_opt]
colors = ['#3498db', '#2ecc71', '#e74c3c', '#f39c12']
bars = axes[1].bar(metrics, values, color=colors)
axes[1].set_ylabel('Score', fontsize=12)
axes[1].set_title('All Metrics - Optimized Model', fontsize=14, fontweight='bold')
axes[1].set_ylim([0.90, 1.0])

for bar, val in zip(bars, values):
    height = bar.get_height()
    axes[1].text(bar.get_x() + bar.get_width()/2., height,
                f'{val:.4f}',
                ha='center', va='bottom', fontsize=10)

plt.tight_layout()
plt.show()

# Classification report
print("\nDetailed Classification Report:")
print(classification_report(y_test, y_pred_optimized, 
                          target_names=['Malignant', 'Benign']))

print("\n" + "="*70)
print("CONFUSION MATRIX ANALYSIS")
print("="*70)
print(f"True Negatives:  {cm_opt[0, 0]:3d} - Correctly identified malignant")
print(f"False Positives: {cm_opt[0, 1]:3d} - Benign predicted as malignant")
print(f"False Negatives: {cm_opt[1, 0]:3d} - Malignant predicted as benign (CRITICAL!)")
print(f"True Positives:  {cm_opt[1, 1]:3d} - Correctly identified benign")
print("="*70)

---
## Summary and Insights

### Model Evolution:
1. **Perceptron**: Simple linear classifier, good baseline but limited by linear decision boundaries
2. **Basic MLP**: Significantly better with hidden layers enabling non-linear pattern learning
3. **Optimized MLP**: Best performance through careful hyperparameter tuning

### Medical Context Considerations:
- **False Negatives** (missing malignant tumors) are life-threatening
- **False Positives** cause unnecessary stress and procedures
- High **Recall** is critical to catch malignant cases
- Good **Precision** minimizes false alarms
- **F1-Score** provides balanced view of both

### Key Hyperparameter Findings:
1. **Architecture**: Deeper networks (3-4 layers) perform better
2. **Activation**: ReLU consistently outperforms tanh and sigmoid
3. **Solver**: Adam optimizer provides best convergence
4. **Regularization**: Alpha ~0.001 balances overfitting prevention and performance
5. **Early Stopping**: Prevents overfitting by monitoring validation performance

### Real-World Deployment Considerations:
1. Need larger, more diverse dataset for production
2. Should implement cross-validation for robust evaluation
3. Require clinical validation and regulatory approval
4. Must establish monitoring system for model performance
5. Should use ensemble methods for critical decisions
6. Need explainability features for doctor interpretation

This optimized neural network demonstrates the power of deep learning in medical diagnosis, achieving high accuracy while carefully balancing the critical trade-offs between precision and recall in a life-or-death classification task.