<a href="https://colab.research.google.com/github/sprince0031/ICT-Python-ML/blob/main/Week%205/Notebooks/week5_reference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python & ML Foundations: Session 5
## Perceptrons, MLPs & Neural Networks

Welcome to the session 5 reference notebook! This week, we dive into the foundations of neural networks. We'll start with the simplest neural network unit—the perceptron, build up to multi-layer perceptrons (MLPs), explore advanced evaluation metrics, and understand how deep neural networks learn.

**Libraries for this week:**
- `numpy`: For numerical operations.
- `pandas`: For data manipulation.
- `matplotlib` & `seaborn`: For visualization.
- `scikit-learn`: For building and evaluating models.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Perceptron
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.datasets import make_classification, make_moons

sns.set_style('whitegrid')
np.random.seed(42)

---

## Video 1: Perceptron & MLPs

### 1.1 - What is a Perceptron?

The perceptron is the simplest form of a neural network—a single neuron that makes binary classifications. It takes multiple inputs, multiplies them by weights, adds a bias, and passes the result through an activation function.

**Key components:**
- Inputs: Features from our data
- Weights: Learned parameters that determine feature importance
- Bias: A constant that shifts the decision boundary
- Activation function: Determines the output (step function for perceptron)

In [None]:
# Create a simple linearly separable dataset
X, y = make_classification(n_samples=200, n_features=2, n_redundant=0, 
                          n_informative=2, n_clusters_per_class=1, 
                          flip_y=0, random_state=42)

# Visualize the data
plt.figure(figsize=(10, 6))
plt.scatter(X[y == 0, 0], X[y == 0, 1], c='blue', label='Class 0', alpha=0.6)
plt.scatter(X[y == 1, 0], X[y == 1, 1], c='red', label='Class 1', alpha=0.6)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Binary Classification Dataset')
plt.legend()
plt.show()

In [None]:
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Scale the features (important for neural networks)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [None]:
# Train a perceptron
perceptron = Perceptron(max_iter=1000, random_state=42)
perceptron.fit(X_train_scaled, y_train)

# Make predictions
y_pred = perceptron.predict(X_test_scaled)

# Evaluate
accuracy = accuracy_score(y_test, y_pred)
print(f"Perceptron Accuracy: {accuracy:.4f}")
print(f"Weights: {perceptron.coef_}")
print(f"Bias: {perceptron.intercept_}")

### 1.2 - Multi-Layer Perceptron (MLP)

While a single perceptron can only learn linear decision boundaries, an MLP can learn complex, non-linear patterns by stacking multiple layers of neurons.

**Architecture:**
- Input layer: Receives the features
- Hidden layer(s): Process the information
- Output layer: Makes predictions

Each layer applies weights, biases, and activation functions to transform the data.

In [None]:
# Create a non-linear dataset (moons)
X_moons, y_moons = make_moons(n_samples=300, noise=0.15, random_state=42)

# Visualize
plt.figure(figsize=(10, 6))
plt.scatter(X_moons[y_moons == 0, 0], X_moons[y_moons == 0, 1], c='blue', label='Class 0', alpha=0.6)
plt.scatter(X_moons[y_moons == 1, 0], X_moons[y_moons == 1, 1], c='red', label='Class 1', alpha=0.6)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Non-Linear Classification Dataset (Moons)')
plt.legend()
plt.show()

In [None]:
# Prepare the data
X_train_m, X_test_m, y_train_m, y_test_m = train_test_split(X_moons, y_moons, test_size=0.3, random_state=42)

# Scale features
scaler_m = StandardScaler()
X_train_m_scaled = scaler_m.fit_transform(X_train_m)
X_test_m_scaled = scaler_m.transform(X_test_m)

In [None]:
# Create a simple MLP with one hidden layer
mlp_simple = MLPClassifier(hidden_layer_sizes=(10,), activation='relu', 
                          max_iter=1000, random_state=42)
mlp_simple.fit(X_train_m_scaled, y_train_m)

# Predictions
y_pred_mlp = mlp_simple.predict(X_test_m_scaled)

# Evaluate
accuracy_mlp = accuracy_score(y_test_m, y_pred_mlp)
print(f"Simple MLP Accuracy: {accuracy_mlp:.4f}")
print(f"Number of layers: {mlp_simple.n_layers_}")
print(f"Number of iterations: {mlp_simple.n_iter_}")

In [None]:
# Compare perceptron vs MLP on non-linear data
perceptron_nl = Perceptron(max_iter=1000, random_state=42)
perceptron_nl.fit(X_train_m_scaled, y_train_m)
y_pred_perceptron = perceptron_nl.predict(X_test_m_scaled)

acc_perceptron = accuracy_score(y_test_m, y_pred_perceptron)
acc_mlp = accuracy_score(y_test_m, y_pred_mlp)

print(f"Perceptron Accuracy on Non-linear Data: {acc_perceptron:.4f}")
print(f"MLP Accuracy on Non-linear Data: {acc_mlp:.4f}")
print(f"\nImprovement with MLP: {(acc_mlp - acc_perceptron) * 100:.2f}%")

---

## Video 2: MLPs 2 & Advanced Metrics

### 2.1 - Deeper MLPs

Adding more hidden layers and neurons allows the network to learn more complex patterns. However, we need to be careful about overfitting.

In [None]:
# Let's try different architectures
architectures = [
    (10,),           # 1 hidden layer, 10 neurons
    (20, 10),        # 2 hidden layers
    (30, 20, 10),    # 3 hidden layers
    (50, 25),        # 2 hidden layers, more neurons
]

results = []

for arch in architectures:
    mlp = MLPClassifier(hidden_layer_sizes=arch, activation='relu', 
                       max_iter=1000, random_state=42)
    mlp.fit(X_train_m_scaled, y_train_m)
    y_pred = mlp.predict(X_test_m_scaled)
    acc = accuracy_score(y_test_m, y_pred)
    results.append({'architecture': arch, 'accuracy': acc})
    print(f"Architecture {arch}: Accuracy = {acc:.4f}")

# Find the best architecture
best = max(results, key=lambda x: x['accuracy'])
print(f"\nBest Architecture: {best['architecture']} with accuracy {best['accuracy']:.4f}")

### 2.2 - Activation Functions

Activation functions introduce non-linearity into the network, allowing it to learn complex patterns.

Common activation functions:
- **ReLU** (Rectified Linear Unit): Most popular, fast to compute
- **Sigmoid**: Outputs between 0 and 1, good for binary classification output
- **Tanh**: Outputs between -1 and 1, zero-centered
- **Identity**: Linear activation (rarely used in hidden layers)

In [None]:
# Compare different activation functions
activations = ['relu', 'tanh', 'logistic']

for activation in activations:
    mlp = MLPClassifier(hidden_layer_sizes=(20, 10), activation=activation,
                       max_iter=1000, random_state=42)
    mlp.fit(X_train_m_scaled, y_train_m)
    y_pred = mlp.predict(X_test_m_scaled)
    acc = accuracy_score(y_test_m, y_pred)
    print(f"Activation: {activation:10s} - Accuracy: {acc:.4f}")

### 2.3 - Beyond Accuracy: Advanced Evaluation Metrics

Accuracy alone doesn't tell the full story, especially with imbalanced datasets. We need to understand:

- **Precision**: Of all positive predictions, how many were correct?
- **Recall**: Of all actual positives, how many did we find?
- **F1-Score**: Harmonic mean of precision and recall
- **Confusion Matrix**: Shows all prediction outcomes

In [None]:
# Create an imbalanced dataset
X_imb, y_imb = make_classification(n_samples=1000, n_features=20, n_informative=15,
                                   n_redundant=5, n_classes=2, weights=[0.9, 0.1],
                                   random_state=42)

print(f"Class distribution:")
unique, counts = np.unique(y_imb, return_counts=True)
for cls, count in zip(unique, counts):
    print(f"  Class {cls}: {count} samples ({count/len(y_imb)*100:.1f}%)")

In [None]:
# Split and scale
X_train_imb, X_test_imb, y_train_imb, y_test_imb = train_test_split(
    X_imb, y_imb, test_size=0.3, random_state=42, stratify=y_imb)

scaler_imb = StandardScaler()
X_train_imb_scaled = scaler_imb.fit_transform(X_train_imb)
X_test_imb_scaled = scaler_imb.transform(X_test_imb)

# Train MLP
mlp_imb = MLPClassifier(hidden_layer_sizes=(50, 25), activation='relu',
                        max_iter=1000, random_state=42)
mlp_imb.fit(X_train_imb_scaled, y_train_imb)

# Predictions
y_pred_imb = mlp_imb.predict(X_test_imb_scaled)

In [None]:
# Calculate various metrics
accuracy = accuracy_score(y_test_imb, y_pred_imb)
precision = precision_score(y_test_imb, y_pred_imb)
recall = recall_score(y_test_imb, y_pred_imb)
f1 = f1_score(y_test_imb, y_pred_imb)

print("Model Performance on Imbalanced Data:")
print(f"  Accuracy:  {accuracy:.4f}")
print(f"  Precision: {precision:.4f}")
print(f"  Recall:    {recall:.4f}")
print(f"  F1-Score:  {f1:.4f}")

In [None]:
# Confusion Matrix
cm = confusion_matrix(y_test_imb, y_pred_imb)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['Class 0', 'Class 1'],
            yticklabels=['Class 0', 'Class 1'])
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()

print("\nConfusion Matrix Breakdown:")
print(f"  True Negatives:  {cm[0, 0]}")
print(f"  False Positives: {cm[0, 1]}")
print(f"  False Negatives: {cm[1, 0]}")
print(f"  True Positives:  {cm[1, 1]}")

In [None]:
# Detailed classification report
print("\nClassification Report:")
print(classification_report(y_test_imb, y_pred_imb, target_names=['Class 0', 'Class 1']))

---

## Video 3: Neural Networks Deep Dive

### 3.1 - How Neural Networks Learn

Neural networks learn through a process called **backpropagation**:

1. **Forward Pass**: Data flows through the network, producing predictions
2. **Loss Calculation**: Compare predictions to actual values
3. **Backward Pass**: Calculate gradients (how much each weight contributed to the error)
4. **Weight Update**: Adjust weights to reduce error
5. **Repeat**: Continue until the model converges

This process is controlled by hyperparameters like learning rate and optimizer.

In [None]:
# Effect of learning rate
learning_rates = [0.001, 0.01, 0.1, 0.5]

print("Effect of Learning Rate on Model Performance:\n")
for lr in learning_rates:
    mlp = MLPClassifier(hidden_layer_sizes=(20, 10), activation='relu',
                       learning_rate_init=lr, max_iter=1000, random_state=42)
    mlp.fit(X_train_m_scaled, y_train_m)
    y_pred = mlp.predict(X_test_m_scaled)
    acc = accuracy_score(y_test_m, y_pred)
    print(f"  Learning Rate {lr:6.3f}: Accuracy = {acc:.4f}, Iterations = {mlp.n_iter_}")

### 3.2 - Optimization Algorithms (Solvers)

Different optimization algorithms update weights in different ways:

- **SGD** (Stochastic Gradient Descent): Updates weights after each sample
- **Adam**: Adaptive learning rate, combines momentum and RMSprop
- **L-BFGS**: Quasi-Newton method, good for small datasets

In [None]:
# Compare different solvers
solvers = ['sgd', 'adam', 'lbfgs']

print("Comparison of Different Optimization Algorithms:\n")
for solver in solvers:
    mlp = MLPClassifier(hidden_layer_sizes=(30, 20), activation='relu',
                       solver=solver, max_iter=1000, random_state=42)
    mlp.fit(X_train_m_scaled, y_train_m)
    y_pred = mlp.predict(X_test_m_scaled)
    acc = accuracy_score(y_test_m, y_pred)
    print(f"  Solver: {solver:8s} - Accuracy: {acc:.4f}, Iterations: {mlp.n_iter_}")

### 3.3 - Regularization to Prevent Overfitting

As networks get deeper, they can overfit the training data. Regularization helps by:

- **L2 Regularization (alpha parameter)**: Penalizes large weights
- **Early Stopping**: Stops training when validation performance stops improving

In [None]:
# Effect of regularization
alphas = [0.0001, 0.001, 0.01, 0.1, 1.0]

print("Effect of L2 Regularization (alpha parameter):\n")
for alpha in alphas:
    mlp = MLPClassifier(hidden_layer_sizes=(50, 30, 20), activation='relu',
                       alpha=alpha, max_iter=1000, random_state=42)
    mlp.fit(X_train_m_scaled, y_train_m)
    
    train_acc = mlp.score(X_train_m_scaled, y_train_m)
    test_acc = mlp.score(X_test_m_scaled, y_test_m)
    
    print(f"  Alpha {alpha:7.4f}: Train Acc = {train_acc:.4f}, Test Acc = {test_acc:.4f}, Gap = {train_acc - test_acc:.4f}")

### 3.4 - Batch Size and Training Dynamics

The batch size determines how many samples are used to compute each gradient update:

- **Small batches**: Noisy gradients, but can escape local minima
- **Large batches**: Stable gradients, but may get stuck
- **Mini-batch**: Sweet spot between the two

In [None]:
# Compare different batch sizes
batch_sizes = [32, 64, 128, 200]

print("Effect of Batch Size on Training:\n")
for batch_size in batch_sizes:
    mlp = MLPClassifier(hidden_layer_sizes=(20, 10), activation='relu',
                       batch_size=batch_size, max_iter=1000, random_state=42)
    mlp.fit(X_train_m_scaled, y_train_m)
    y_pred = mlp.predict(X_test_m_scaled)
    acc = accuracy_score(y_test_m, y_pred)
    print(f"  Batch Size {batch_size:3d}: Accuracy = {acc:.4f}, Iterations = {mlp.n_iter_}")

### 3.5 - Putting It All Together: A Comprehensive Example

Let's build an optimized neural network using all the concepts we've learned.

In [None]:
# Create a more complex dataset
X_complex, y_complex = make_classification(
    n_samples=2000, n_features=30, n_informative=20,
    n_redundant=5, n_classes=3, n_clusters_per_class=2,
    random_state=42
)

print(f"Dataset shape: {X_complex.shape}")
print(f"Number of classes: {len(np.unique(y_complex))}")
print(f"Class distribution: {np.bincount(y_complex)}")

In [None]:
# Split and scale
X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(
    X_complex, y_complex, test_size=0.2, random_state=42, stratify=y_complex
)

scaler_c = StandardScaler()
X_train_c_scaled = scaler_c.fit_transform(X_train_c)
X_test_c_scaled = scaler_c.transform(X_test_c)

In [None]:
# Build an optimized neural network
mlp_optimized = MLPClassifier(
    hidden_layer_sizes=(100, 50, 25),  # 3 hidden layers with decreasing size
    activation='relu',                  # ReLU activation
    solver='adam',                      # Adam optimizer
    alpha=0.001,                       # L2 regularization
    batch_size=64,                     # Mini-batch size
    learning_rate_init=0.001,          # Learning rate
    max_iter=1000,
    early_stopping=True,               # Stop when validation performance plateaus
    validation_fraction=0.1,           # Use 10% of training data for validation
    random_state=42,
    verbose=False
)

mlp_optimized.fit(X_train_c_scaled, y_train_c)

print(f"Training completed in {mlp_optimized.n_iter_} iterations")
print(f"Final training loss: {mlp_optimized.loss_:.4f}")

In [None]:
# Evaluate the optimized model
y_pred_c = mlp_optimized.predict(X_test_c_scaled)

# Multi-class metrics
accuracy = accuracy_score(y_test_c, y_pred_c)
precision = precision_score(y_test_c, y_pred_c, average='weighted')
recall = recall_score(y_test_c, y_pred_c, average='weighted')
f1 = f1_score(y_test_c, y_pred_c, average='weighted')

print("\nOptimized Model Performance:")
print(f"  Accuracy:  {accuracy:.4f}")
print(f"  Precision: {precision:.4f}")
print(f"  Recall:    {recall:.4f}")
print(f"  F1-Score:  {f1:.4f}")

In [None]:
# Confusion matrix for multi-class
cm = confusion_matrix(y_test_c, y_pred_c)

plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=[f'Class {i}' for i in range(3)],
            yticklabels=[f'Class {i}' for i in range(3)])
plt.title('Confusion Matrix - Optimized MLP on Multi-class Dataset')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()

In [None]:
# Detailed report
print("\nDetailed Classification Report:")
print(classification_report(y_test_c, y_pred_c, 
                          target_names=[f'Class {i}' for i in range(3)]))

---

## Summary

In this notebook, we covered:

**Video 1: Perceptron & MLPs**
- Single perceptrons for linearly separable data
- Multi-layer perceptrons for non-linear patterns
- The power of hidden layers

**Video 2: MLPs 2 & Advanced Metrics**
- Different network architectures and activation functions
- Beyond accuracy: precision, recall, F1-score
- Confusion matrices and classification reports
- Handling imbalanced datasets

**Video 3: Neural Networks Deep Dive**
- How neural networks learn through backpropagation
- Effect of learning rate and optimization algorithms
- Regularization techniques to prevent overfitting
- Batch size and training dynamics
- Building an optimized neural network for multi-class classification