<a href="https://colab.research.google.com/github/sprince0031/ICT-Python-ML/blob/main/Week%205/Notebooks/week5_reference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python & ML Foundations: Session 5 - Reference
## Perceptrons, MLPs & Advanced Metrics

Welcome to the session 5 reference notebook! This week, we explore:
- Perceptrons and their limitations (AND, OR, XOR problems)
- Multi-layer perceptrons for regression and classification
- Advanced evaluation metrics for imbalanced datasets

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Perceptron
from sklearn.neural_network import MLPClassifier, MLPRegressor
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import confusion_matrix, classification_report, mean_squared_error, r2_score
from sklearn.datasets import load_diabetes, make_classification

sns.set_style('whitegrid')
np.random.seed(42)

---
## Video 1: Perceptrons and MLPs

### 1.1 - Perceptron: AND Operation

A perceptron can learn linearly separable functions like AND.

In [None]:
# AND operation dataset
X_and = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_and = np.array([0, 0, 0, 1])

print("AND Truth Table:")
print("Input | Output")
print("------|-------")
for i in range(len(X_and)):
    print(f"{X_and[i]} | {y_and[i]}")

In [None]:
# Train perceptron on AND
perceptron_and = Perceptron(max_iter=1000, random_state=42)
perceptron_and.fit(X_and, y_and)

# Test
y_pred_and = perceptron_and.predict(X_and)
print("\nAND Perceptron Predictions:")
print("Input | Actual | Predicted")
print("------|--------|----------")
for i in range(len(X_and)):
    print(f"{X_and[i]} | {y_and[i]:6d} | {y_pred_and[i]:9d}")

print(f"\nAccuracy: {accuracy_score(y_and, y_pred_and):.2f}")

### 1.2 - Perceptron: OR Operation

Similarly, a perceptron can learn the OR operation.

In [None]:
# OR operation dataset
X_or = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_or = np.array([0, 1, 1, 1])

print("OR Truth Table:")
print("Input | Output")
print("------|-------")
for i in range(len(X_or)):
    print(f"{X_or[i]} | {y_or[i]}")

In [None]:
# Train perceptron on OR
perceptron_or = Perceptron(max_iter=1000, random_state=42)
perceptron_or.fit(X_or, y_or)

# Test
y_pred_or = perceptron_or.predict(X_or)
print("\nOR Perceptron Predictions:")
print("Input | Actual | Predicted")
print("------|--------|----------")
for i in range(len(X_or)):
    print(f"{X_or[i]} | {y_or[i]:6d} | {y_pred_or[i]:9d}")

print(f"\nAccuracy: {accuracy_score(y_or, y_pred_or):.2f}")

### 1.3 - Perceptron Limitation: XOR Problem

The XOR problem is NOT linearly separable, so a single perceptron cannot solve it.

In [None]:
# XOR operation dataset
X_xor = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_xor = np.array([0, 1, 1, 0])

print("XOR Truth Table:")
print("Input | Output")
print("------|-------")
for i in range(len(X_xor)):
    print(f"{X_xor[i]} | {y_xor[i]}")

In [None]:
# Try to train perceptron on XOR (it will fail!)
perceptron_xor = Perceptron(max_iter=1000, random_state=42)
perceptron_xor.fit(X_xor, y_xor)

# Test
y_pred_xor = perceptron_xor.predict(X_xor)
print("\nXOR Perceptron Predictions (FAILS):")
print("Input | Actual | Predicted")
print("------|--------|----------")
for i in range(len(X_xor)):
    print(f"{X_xor[i]} | {y_xor[i]:6d} | {y_pred_xor[i]:9d}")

print(f"\nAccuracy: {accuracy_score(y_xor, y_pred_xor):.2f}")
print("\n⚠️ The perceptron CANNOT solve XOR because it's not linearly separable!")

### 1.4 - MLP Solves XOR Problem

A multi-layer perceptron with hidden layers CAN solve the XOR problem!

In [None]:
# MLP with one hidden layer to solve XOR
mlp_xor = MLPClassifier(hidden_layer_sizes=(4,), activation='relu', 
                        max_iter=5000, random_state=42)
mlp_xor.fit(X_xor, y_xor)

# Test
y_pred_mlp_xor = mlp_xor.predict(X_xor)
print("\nXOR MLP Predictions (SUCCESS):")
print("Input | Actual | Predicted")
print("------|--------|----------")
for i in range(len(X_xor)):
    print(f"{X_xor[i]} | {y_xor[i]:6d} | {y_pred_mlp_xor[i]:9d}")

print(f"\nAccuracy: {accuracy_score(y_xor, y_pred_mlp_xor):.2f}")
print("\n✅ The MLP successfully solves XOR with hidden layers!")

---
## Video 2: MLPs for Regression and Classification

### 2.1 - Load Real-World Dataset

We'll use the Diabetes dataset for regression and classification tasks.

In [None]:
# Load diabetes dataset
diabetes = load_diabetes()
X_diabetes = diabetes.data
y_diabetes = diabetes.target

print("Diabetes Dataset:")
print(f"  Samples: {X_diabetes.shape[0]}")
print(f"  Features: {X_diabetes.shape[1]}")
print(f"  Target range: {y_diabetes.min():.2f} to {y_diabetes.max():.2f}")

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X_diabetes, y_diabetes, test_size=0.2, random_state=42
)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"\nTrain set: {X_train.shape[0]} samples")
print(f"Test set: {X_test.shape[0]} samples")

### 2.2 - MLP Regressor WITHOUT ReLU (Identity Activation)

Without non-linear activation, the MLP behaves like linear regression, limiting its ability to model non-linear relationships.

In [None]:
# MLP with identity (linear) activation
mlp_identity = MLPRegressor(hidden_layer_sizes=(10, 10), activation='identity',
                            max_iter=1000, random_state=42)
mlp_identity.fit(X_train_scaled, y_train)

# Predictions
y_pred_identity = mlp_identity.predict(X_test_scaled)

# Evaluate
mse_identity = mean_squared_error(y_test, y_pred_identity)
r2_identity = r2_score(y_test, y_pred_identity)

print("MLP Regressor with Identity (Linear) Activation:")
print(f"  MSE: {mse_identity:.2f}")
print(f"  R² Score: {r2_identity:.4f}")
print("\n⚠️ Without non-linearity, the model struggles with complex patterns!")

### 2.3 - MLP Regressor WITH ReLU Activation

Adding ReLU activation enables the MLP to model non-linear relationships.

In [None]:
# MLP with ReLU activation
mlp_relu = MLPRegressor(hidden_layer_sizes=(10, 10), activation='relu',
                        max_iter=1000, random_state=42)
mlp_relu.fit(X_train_scaled, y_train)

# Predictions
y_pred_relu = mlp_relu.predict(X_test_scaled)

# Evaluate
mse_relu = mean_squared_error(y_test, y_pred_relu)
r2_relu = r2_score(y_test, y_pred_relu)

print("MLP Regressor with ReLU Activation:")
print(f"  MSE: {mse_relu:.2f}")
print(f"  R² Score: {r2_relu:.4f}")
print("\n✅ ReLU activation improves performance on non-linear data!")

In [None]:
# Visualize comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Identity activation
axes[0].scatter(y_test, y_pred_identity, alpha=0.6)
axes[0].plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
axes[0].set_xlabel('Actual Values')
axes[0].set_ylabel('Predicted Values')
axes[0].set_title(f'Identity Activation\nR² = {r2_identity:.4f}')

# Plot 2: ReLU activation
axes[1].scatter(y_test, y_pred_relu, alpha=0.6, color='green')
axes[1].plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
axes[1].set_xlabel('Actual Values')
axes[1].set_ylabel('Predicted Values')
axes[1].set_title(f'ReLU Activation\nR² = {r2_relu:.4f}')

plt.tight_layout()
plt.show()

print(f"\nImprovement with ReLU: {(r2_relu - r2_identity):.4f}")

### 2.4 - MLP Classifier with Sigmoid Activation

Now let's use the same dataset for classification by creating binary classes.

In [None]:
# Create binary classification problem from diabetes data
# Classes: 0 = low progression, 1 = high progression
median_value = np.median(y_diabetes)
y_diabetes_binary = (y_diabetes > median_value).astype(int)

print(f"Binary classification threshold: {median_value:.2f}")
print(f"Class distribution:")
print(f"  Class 0 (low): {np.sum(y_diabetes_binary == 0)} samples")
print(f"  Class 1 (high): {np.sum(y_diabetes_binary == 1)} samples")

In [None]:
# Split for classification
X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(
    X_diabetes, y_diabetes_binary, test_size=0.2, random_state=42
)

# Scale
scaler_c = StandardScaler()
X_train_c_scaled = scaler_c.fit_transform(X_train_c)
X_test_c_scaled = scaler_c.transform(X_test_c)

In [None]:
# MLP Classifier with sigmoid (logistic) activation
mlp_classifier = MLPClassifier(hidden_layer_sizes=(10, 10), activation='logistic',
                               max_iter=1000, random_state=42)
mlp_classifier.fit(X_train_c_scaled, y_train_c)

# Predictions
y_pred_classifier = mlp_classifier.predict(X_test_c_scaled)

# Evaluate
accuracy = accuracy_score(y_test_c, y_pred_classifier)

print("MLP Classifier with Sigmoid Activation:")
print(f"  Accuracy: {accuracy:.4f}")
print("\n✅ Sigmoid activation is ideal for binary classification!")

---
## Video 3: Advanced Metrics for Imbalanced Datasets

### 3.1 - Create Imbalanced Dataset

Let's create an imbalanced dataset to demonstrate why accuracy alone is insufficient.

In [None]:
# Create imbalanced dataset (90% class 0, 10% class 1)
X_imb, y_imb = make_classification(
    n_samples=1000, n_features=20, n_informative=15,
    n_redundant=5, n_classes=2, weights=[0.9, 0.1],
    random_state=42
)

print("Imbalanced Dataset:")
unique, counts = np.unique(y_imb, return_counts=True)
for cls, count in zip(unique, counts):
    print(f"  Class {cls}: {count:4d} samples ({count/len(y_imb)*100:5.1f}%)")

In [None]:
# Visualize class imbalance
plt.figure(figsize=(8, 5))
plt.bar(['Class 0\n(Majority)', 'Class 1\n(Minority)'], counts, color=['skyblue', 'salmon'])
plt.ylabel('Number of Samples')
plt.title('Class Distribution - Imbalanced Dataset')
plt.ylim(0, max(counts) * 1.1)

for i, count in enumerate(counts):
    plt.text(i, count + 20, f'{count} ({count/len(y_imb)*100:.1f}%)', 
             ha='center', fontweight='bold')

plt.show()

print("\n⚠️ With such imbalance, a naive classifier could achieve 90% accuracy")
print("   by simply predicting all samples as Class 0!")

### 3.2 - Train Model and Show Confusion Matrix

Let's train a classifier and examine the confusion matrix to understand True Positives, True Negatives, False Positives, and False Negatives.

In [None]:
# Split and scale
X_train_imb, X_test_imb, y_train_imb, y_test_imb = train_test_split(
    X_imb, y_imb, test_size=0.3, random_state=42, stratify=y_imb
)

scaler_imb = StandardScaler()
X_train_imb_scaled = scaler_imb.fit_transform(X_train_imb)
X_test_imb_scaled = scaler_imb.transform(X_test_imb)

# Train classifier
mlp_imb = MLPClassifier(hidden_layer_sizes=(20, 10), activation='relu',
                        max_iter=1000, random_state=42)
mlp_imb.fit(X_train_imb_scaled, y_train_imb)

# Predictions
y_pred_imb = mlp_imb.predict(X_test_imb_scaled)

print("Model trained on imbalanced data")

In [None]:
# Confusion Matrix
cm = confusion_matrix(y_test_imb, y_pred_imb)

plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False,
            xticklabels=['Predicted 0', 'Predicted 1'],
            yticklabels=['Actual 0', 'Actual 1'])
plt.title('Confusion Matrix', fontsize=16, fontweight='bold')
plt.ylabel('Actual Class', fontsize=12)
plt.xlabel('Predicted Class', fontsize=12)

# Add annotations
plt.text(0.5, -0.1, f'TN = {cm[0, 0]}\n(True Negatives)', 
         ha='center', transform=plt.gca().transAxes, fontsize=10)
plt.text(1.5, -0.1, f'FP = {cm[0, 1]}\n(False Positives)', 
         ha='center', transform=plt.gca().transAxes, fontsize=10)
plt.text(0.5, 1.1, f'FN = {cm[1, 0]}\n(False Negatives)', 
         ha='center', transform=plt.gca().transAxes, fontsize=10)
plt.text(1.5, 1.1, f'TP = {cm[1, 1]}\n(True Positives)', 
         ha='center', transform=plt.gca().transAxes, fontsize=10)

plt.show()

print("\nConfusion Matrix Components:")
print(f"  True Negatives (TN):  {cm[0, 0]} - Correctly predicted as Class 0")
print(f"  False Positives (FP): {cm[0, 1]} - Incorrectly predicted as Class 1")
print(f"  False Negatives (FN): {cm[1, 0]} - Incorrectly predicted as Class 0")
print(f"  True Positives (TP):  {cm[1, 1]} - Correctly predicted as Class 1")

### 3.3 - Calculate Metrics Manually

Let's manually calculate precision, recall, and F1-score to understand what they mean.

In [None]:
# Extract values from confusion matrix
TN = cm[0, 0]
FP = cm[0, 1]
FN = cm[1, 0]
TP = cm[1, 1]

# Manual calculations
accuracy_manual = (TP + TN) / (TP + TN + FP + FN)
precision_manual = TP / (TP + FP) if (TP + FP) > 0 else 0
recall_manual = TP / (TP + FN) if (TP + FN) > 0 else 0
f1_manual = 2 * (precision_manual * recall_manual) / (precision_manual + recall_manual) if (precision_manual + recall_manual) > 0 else 0

print("="*60)
print("MANUAL METRIC CALCULATIONS")
print("="*60)

print("\n1. ACCURACY:")
print(f"   Formula: (TP + TN) / (TP + TN + FP + FN)")
print(f"   Calculation: ({TP} + {TN}) / ({TP} + {TN} + {FP} + {FN})")
print(f"   Result: {accuracy_manual:.4f}")

print("\n2. PRECISION:")
print(f"   Formula: TP / (TP + FP)")
print(f"   Meaning: Of all predicted positives, how many were correct?")
print(f"   Calculation: {TP} / ({TP} + {FP})")
print(f"   Result: {precision_manual:.4f}")

print("\n3. RECALL (Sensitivity):")
print(f"   Formula: TP / (TP + FN)")
print(f"   Meaning: Of all actual positives, how many did we find?")
print(f"   Calculation: {TP} / ({TP} + {FN})")
print(f"   Result: {recall_manual:.4f}")

print("\n4. F1-SCORE:")
print(f"   Formula: 2 * (Precision * Recall) / (Precision + Recall)")
print(f"   Meaning: Harmonic mean of precision and recall")
print(f"   Calculation: 2 * ({precision_manual:.4f} * {recall_manual:.4f}) / ({precision_manual:.4f} + {recall_manual:.4f})")
print(f"   Result: {f1_manual:.4f}")

print("\n" + "="*60)

In [None]:
# Verify with sklearn functions
accuracy_sklearn = accuracy_score(y_test_imb, y_pred_imb)
precision_sklearn = precision_score(y_test_imb, y_pred_imb)
recall_sklearn = recall_score(y_test_imb, y_pred_imb)
f1_sklearn = f1_score(y_test_imb, y_pred_imb)

print("\nVerification (sklearn vs manual):")
print(f"  Accuracy:  {accuracy_sklearn:.4f} vs {accuracy_manual:.4f}")
print(f"  Precision: {precision_sklearn:.4f} vs {precision_manual:.4f}")
print(f"  Recall:    {recall_sklearn:.4f} vs {recall_manual:.4f}")
print(f"  F1-Score:  {f1_sklearn:.4f} vs {f1_manual:.4f}")
print("\n✅ Manual calculations match sklearn!")

### 3.4 - Classification Report

The classification report provides a comprehensive view of all metrics.

In [None]:
print("\n" + "="*60)
print("CLASSIFICATION REPORT")
print("="*60)
print(classification_report(y_test_imb, y_pred_imb, 
                          target_names=['Class 0 (Majority)', 'Class 1 (Minority)']))
print("="*60)

print("\nKey Takeaways:")
print("  - Accuracy alone can be misleading with imbalanced data")
print("  - Precision tells us about false positive rate")
print("  - Recall tells us about false negative rate")
print("  - F1-score balances both precision and recall")
print("  - For imbalanced data, focus on minority class metrics!")

---
## Summary

### Video 1: Perceptrons and MLPs
- Perceptrons can solve AND and OR (linearly separable)
- Perceptrons CANNOT solve XOR (not linearly separable)
- MLPs with hidden layers solve XOR successfully

### Video 2: MLPs for Regression and Classification
- MLP without non-linear activation = limited to linear relationships
- ReLU activation enables modeling of non-linear patterns
- Sigmoid activation is ideal for binary classification

### Video 3: Advanced Metrics
- Accuracy can be misleading with imbalanced data
- Confusion matrix shows TP, TN, FP, FN
- Precision: Of predicted positives, how many correct?
- Recall: Of actual positives, how many found?
- F1-score: Harmonic mean of precision and recall
- Classification report provides comprehensive metrics