<a href="https://colab.research.google.com/github/sreent/machine-learning/blob/main/Neural%20Networks/SLP%20Hands-On%20Lab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Hands-On Lab: Single-Layer Perceptrons (SLPs)**

## **Learning Objectives**

By the end of this lab, you will be able to:
- Understand what a Single-Layer Perceptron is and its limitations
- Build SLPs for **binary classification** (Breast Cancer) and **multi-class classification** (Fashion MNIST)
- Apply proper data preprocessing (scaling, flattening)
- Evaluate models using accuracy, confusion matrices, and classification reports
- Visualize learned weights to interpret what the model has learned
- Save and load trained models for reuse

## **Table of Contents**

1. [Introduction & Prerequisites](#1-introduction--prerequisites)
2. [What is a Single-Layer Perceptron?](#2-what-is-a-single-layer-perceptron)
3. [Sigmoid vs. Softmax Activations](#3-sigmoid-vs-softmax-activations)
4. [Data Exploration Best Practices](#4-data-exploration-best-practices)
5. [Binary Classification: Breast Cancer Dataset](#5-binary-classification-breast-cancer-dataset)
6. [Hyperparameter Tuning](#6-hyperparameter-tuning)
7. [Multi-Class Classification: Fashion MNIST](#7-multi-class-classification-fashion-mnist)
8. [Saving & Loading Models](#8-saving--loading-models)
9. [Wrap-Up & Next Steps](#9-wrap-up--next-steps)

## 1. Introduction & Prerequisites

**Goal**:  
Build a **Single-Layer Perceptron (SLP)** for **binary classification** (Breast Cancer) and **multi-class classification** (Fashion MNIST). Along the way, we will:
- Explore data for better intuition.
- Implement **hyperparameter tuning** (learning rates, epochs).
- Visualize learned **weights** (for image data).
- Perform a more **detailed error analysis**.
- Discuss **handling class imbalance** conceptually.
- Demonstrate **saving & loading** models for an end-to-end ML workflow.

**Prerequisites**:
- Basic Python, NumPy, and Matplotlib.
- Familiarity with classification tasks (binary and multi-class).
- Libraries: `tensorflow`, `scikit-learn`, `matplotlib`, `numpy`.

```bash
pip install --upgrade pip
pip install tensorflow scikit-learn matplotlib numpy
```

## 2. What is a Single-Layer Perceptron?

A **Single-Layer Perceptron (SLP)** is one of the most fundamental neural network models. It consists of:
- **Input Layer**: Receives features $\{x_1, x_2, \ldots, x_n\}$.  
- **Output Layer**: Produces predictions (probabilities or logits).  
- **No Hidden Layers**: Exactly one set of weights from inputs to outputs.

**Typical SLP Setup**:
1. **Binary classification**:
   - **1 output neuron** (sigmoid activation).  
   - **Binary crossentropy** loss.
2. **Multi-class classification** (e.g., 10 classes):
   - **$k$ output neurons** (softmax activation).  
   - **(Sparse) categorical crossentropy** loss.

**Number of parameters** = $(\text{input_dim} \times \text{output_dim}) + \text{output_dim}$.  
- If input_dim = 784 (flattened 28×28) and output_dim = 10 => $(784 \times 10) + 10 = 7850$ parameters.

## 3. Sigmoid vs. Softmax Activations

Understanding when to use each activation function is crucial:

| Activation | Formula | Output Range | Use Case |
|------------|---------|--------------|----------|
| **Sigmoid** | $\sigma(z) = \frac{1}{1 + e^{-z}}$ | $(0, 1)$ | Binary classification (1 output neuron) |
| **Softmax** | $\text{softmax}(z_j) = \frac{e^{z_j}}{\sum_{k} e^{z_k}}$ | $(0, 1)$, sums to 1 | Multi-class classification (k output neurons) |

### Key Differences

1. **Sigmoid**: Outputs a single probability for "class 1". Use with **binary crossentropy** loss.

2. **Softmax**: Outputs probabilities for all classes that sum to 1. Use with **(sparse) categorical crossentropy** loss.

## 4. Data Exploration & Insights

Before diving into modeling, a brief **data exploration** helps us understand:
- Feature distributions.
- Potential class imbalances.
- Correlations or outliers.

In the **Breast Cancer** dataset, we might:
- Examine basic statistics of each feature (mean, std).
- Count how many “malignant” vs. “benign” samples to check if it’s balanced.

In **Fashion MNIST**, we might:
- Display a few sample images to see what the data looks like.

**Why do this?**  
1. **Better Intuition**: We know whether the dataset is balanced or skewed.  
2. **Feature Engineering**: EDA can suggest if scaling or transformations are needed.  
3. **Detecting Anomalies**: Sometimes there are outliers or missing values.

Below, we’ll incorporate minimal EDA to illustrate these points.

## 5. Binary Classification: Breast Cancer (`sklearn.datasets`)

### 5.1 Load & Quick Exploration

In [None]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

from sklearn.datasets import load_breast_cancer
data_bc = load_breast_cancer()

X_bc = data_bc.data   # (569, 30) typically
y_bc = data_bc.target # (569,) => 0 or 1

print("Features shape:", X_bc.shape)
print("Labels shape:", y_bc.shape)
print("Feature names:", data_bc.feature_names)
print("Class distribution:\n",
      {name: count for name, count in zip(data_bc.target_names, np.bincount(y_bc))})

**What the above code does**:
- Prints shapes, checks how many features (30).
- Prints how many malignant vs. benign cases.

#### Handling Class Imbalance (Conceptual)
If we find significant imbalance (say 90% benign, 10% malignant), we might:
- Use **metrics** like `precision`, `recall`, `F1-score` instead of just accuracy.  
- Adjust **class_weights** in `model.fit()`.  
- Perform **oversampling** or **undersampling**.  

Breast Cancer is not extremely imbalanced, but it’s still important to check.

### 5.2 Split, Scale, and Build SLP

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X_train_bc, X_test_bc, y_train_bc, y_test_bc = train_test_split(
    X_bc, y_bc,
    test_size=0.2,
    random_state=42
)

scaler_bc = StandardScaler()
X_train_bc_scaled = scaler_bc.fit_transform(X_train_bc)
X_test_bc_scaled  = scaler_bc.transform(X_test_bc)

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

model_bc = keras.Sequential([
    layers.Dense(1, activation='sigmoid', input_shape=(X_train_bc.shape[1],))
])
model_bc.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

model_bc.summary()

### 5.3 Train & Evaluate

In [None]:
history_bc = model_bc.fit(
    X_train_bc_scaled, y_train_bc,
    validation_split=0.2,
    epochs=10,
    batch_size=32,
    verbose=1
)

test_loss_bc, test_acc_bc = model_bc.evaluate(X_test_bc_scaled, y_test_bc)
print(f"BC Test Loss: {test_loss_bc:.4f}")
print(f"BC Test Accuracy: {test_acc_bc:.4f}")

#### Plot Learning Curves

In [None]:
plt.figure(figsize=(12,4))
plt.subplot(1,2,1)
plt.plot(history_bc.history['loss'], label='Train Loss')
plt.plot(history_bc.history['val_loss'], label='Val Loss')
plt.title("BC - Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend()

plt.subplot(1,2,2)
plt.plot(history_bc.history['accuracy'], label='Train Acc')
plt.plot(history_bc.history['val_accuracy'], label='Val Acc')
plt.title("BC - Accuracy")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.legend()

plt.show()

#### Confusion Matrix & Classification Report

In [None]:
from sklearn.metrics import confusion_matrix, classification_report

y_pred_bc_probs = model_bc.predict(X_test_bc_scaled)
y_pred_bc = (y_pred_bc_probs > 0.5).astype(int).ravel()

cm_bc = confusion_matrix(y_test_bc, y_pred_bc)
print("Breast Cancer Confusion Matrix:\n", cm_bc)

print("Breast Cancer Classification Report:\n",
      classification_report(y_test_bc, y_pred_bc))

### 5.4 Handling Class Imbalance (Conceptual)

In [None]:
# If, for instance, we discovered a heavy imbalance in classes,
# we could try:

# model_bc.fit(
#    X_train_bc_scaled, y_train_bc,
#    class_weight={0: 2.0, 1: 1.0},  # Example weighting
#    ...
# )

> **Why?**  
> Giving **higher weight** to the minority class can help the network pay more attention to it, improving metrics like recall for that class.

Since the Breast Cancer dataset is not severely imbalanced, we might not need this, but it’s **important** students understand how to handle imbalance if it arises.

## 6. Hyperparameter Tuning (Learning Rates, Epochs)

**Hyperparameters** such as:
- **Learning rate** (`optimizer` parameters, e.g., `Adam(lr=0.001)`).
- **Number of epochs**.
- **Batch size**.

significantly influence model performance.

### 6.1 Example: Adjusting Learning Rate & Epochs

In [None]:
from tensorflow.keras.optimizers import Adam

# Let's try a smaller learning rate and more epochs
model_bc_tuned = keras.Sequential([
    layers.Dense(1, activation='sigmoid', input_shape=(X_train_bc.shape[1],))
])
# e.g. learning_rate=0.0005 instead of the default 0.001
model_bc_tuned.compile(
    optimizer=Adam(learning_rate=0.0005),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

history_bc_tuned = model_bc_tuned.fit(
    X_train_bc_scaled, y_train_bc,
    validation_split=0.2,
    epochs=50,  # increased
    batch_size=32,
    verbose=1
)

test_loss_tuned, test_acc_tuned = model_bc_tuned.evaluate(X_test_bc_scaled, y_test_bc)
print(f"Tuned BC Loss: {test_loss_tuned:.4f}")
print(f"Tuned BC Accuracy: {test_acc_tuned:.4f}")

**Why try smaller LR & more epochs?**  
- A smaller learning rate helps the model converge more *gradually*, potentially avoiding overshoot.  
- More epochs allow the model to refine its weights further.

If the dataset is small and you see overfitting, you may not need too many epochs. Tuning these hyperparameters is **iterative** and often dataset-specific.

## 7. Multi-Class Classification: Fashion MNIST

**Objective**: Classify 28×28 grayscale images into 10 clothing categories.

### The 10 Classes

| Label | Class Name |
|-------|------------|
| 0 | T-shirt/top |
| 1 | Trouser |
| 2 | Pullover |
| 3 | Dress |
| 4 | Coat |
| 5 | Sandal |
| 6 | Shirt |
| 7 | Sneaker |
| 8 | Bag |
| 9 | Ankle boot |

A Single-Layer Perceptron will serve as our baseline model for this task.

### 7.1 Load, Inspect & Explore

In [None]:
from tensorflow.keras.datasets import fashion_mnist

(X_train_fm, y_train_fm), (X_test_fm, y_test_fm) = fashion_mnist.load_data()

print("FM Train shape:", X_train_fm.shape)   # (60000, 28, 28)
print("FM Train labels:", y_train_fm.shape)  # (60000,)
print("Unique classes:", np.unique(y_train_fm))

**Data Exploration**:
- Optional: Display some sample images.

In [None]:
import matplotlib.pyplot as plt

# Class names for Fashion MNIST
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# Display the first 5 images with class names
plt.figure(figsize=(12, 3))
for i in range(5):
    plt.subplot(1, 5, i + 1)
    plt.imshow(X_train_fm[i], cmap='gray')
    plt.title(class_names[y_train_fm[i]])
    plt.axis('off')
plt.suptitle('Sample Images from Fashion MNIST', fontsize=12)
plt.tight_layout()
plt.show()

### 7.2 Preprocess & Build SLP

1. **Rescale** from \([0..255]\) to \([0..1]\).  
2. **Flatten** images to 784-dim vectors.  
3. **Output layer** = 10 neurons (softmax).


In [None]:
X_train_fm = X_train_fm / 255.0
X_test_fm  = X_test_fm / 255.0

X_train_fm_flat = X_train_fm.reshape(-1, 28*28)
X_test_fm_flat  = X_test_fm.reshape(-1, 28*28)

model_fm = keras.Sequential([
    layers.Dense(10, activation='softmax', input_shape=(784,))
])
model_fm.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

model_fm.summary()

### 7.3 Train & Evaluate


In [None]:
history_fm = model_fm.fit(
    X_train_fm_flat, y_train_fm,
    validation_split=0.2,
    epochs=5,
    batch_size=32,
    verbose=1
)

test_loss_fm, test_acc_fm = model_fm.evaluate(X_test_fm_flat, y_test_fm)
print(f"Fashion MNIST - Test Loss: {test_loss_fm:.4f}")
print(f"Fashion MNIST - Test Accuracy: {test_acc_fm:.4f}")

#### Plot Curves

In [None]:
plt.figure(figsize=(12,4))
plt.subplot(1,2,1)
plt.plot(history_fm.history['loss'], label='Train Loss')
plt.plot(history_fm.history['val_loss'], label='Val Loss')
plt.title("FM - Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend()

plt.subplot(1,2,2)
plt.plot(history_fm.history['accuracy'], label='Train Acc')
plt.plot(history_fm.history['val_accuracy'], label='Val Acc')
plt.title("FM - Accuracy")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.legend()

plt.show()

#### Confusion Matrix & Classification Report

In [None]:
from sklearn.metrics import confusion_matrix, classification_report

y_pred_fm_probs = model_fm.predict(X_test_fm_flat)
y_pred_fm = np.argmax(y_pred_fm_probs, axis=1)

cm_fm = confusion_matrix(y_test_fm, y_pred_fm)
print("Fashion MNIST Confusion Matrix:\n", cm_fm)
print("Fashion MNIST Classification Report:\n",
      classification_report(y_test_fm, y_pred_fm))

### 7.4 Visualizing Weights in the SLP

A unique advantage of SLPs is that we can **visualize what they've learned**!

Each of the 10 output neurons has a weight vector of length 784. By reshaping these weights into 28×28 grids, we can see what "pattern" the model associates with each class.

**Why is this useful?**
- Provides interpretability - we can see what features the model focuses on
- Helps debug if the model is learning sensible patterns

In [None]:
weights_fm = model_fm.get_weights()[0]  # shape: (784, 10)
biases_fm = model_fm.get_weights()[1]   # shape: (10,)

plt.figure(figsize=(14, 6))
for i in range(10):
    # Extract the weight vector for class i
    w_i = weights_fm[:, i]
    # Reshape to 28x28
    w_i_2d = w_i.reshape(28, 28)
    plt.subplot(2, 5, i + 1)
    plt.imshow(w_i_2d, cmap='coolwarm')
    plt.title(f"{i}: {class_names[i]}")
    plt.colorbar(shrink=0.6)
    plt.axis('off')
plt.suptitle('Learned Weights for Each Class', fontsize=14)
plt.tight_layout()
plt.show()

**Why do this?**  
- It’s fascinating to see if the model highlights certain regions of the image for a given class, even though it’s just a simple linear map.

### 7.5 Error Analysis: Understanding Misclassifications

Let's examine which samples the model gets wrong. This helps us understand:
- Which classes are commonly confused
- Whether the model's mistakes make sense (e.g., confusing "Shirt" with "T-shirt")

In [None]:
incorrect_idx = np.where(y_pred_fm != y_test_fm)[0]
print(f"Number of misclassified samples: {len(incorrect_idx)} out of {len(y_test_fm)}")
print(f"Error rate: {len(incorrect_idx) / len(y_test_fm) * 100:.1f}%")

# Show some misclassified images with class names
plt.figure(figsize=(12, 4))
for i, idx in enumerate(incorrect_idx[:5]):
    plt.subplot(1, 5, i + 1)
    plt.imshow(X_test_fm[idx], cmap='gray')
    true_label = class_names[y_test_fm[idx]]
    pred_label = class_names[y_pred_fm[idx]]
    plt.title(f"True: {true_label}\nPred: {pred_label}", fontsize=9)
    plt.axis('off')
plt.suptitle('Sample Misclassified Images', fontsize=12)
plt.tight_layout()
plt.show()

**Why do this?**  
- Helps us understand which classes are confusing. For instance, the model might confuse “shirt” vs. “t-shirt” or “pullover” if they visually appear similar.

## 8. Saving & Loading Models (End-to-End Workflow)

After training, it’s crucial to **save** the model so you don’t have to retrain every time, or so you can **deploy** or **share** it.

In [None]:
# Save the Fashion MNIST model
model_fm.save("slp_fashion_mnist.h5")

# Later or in another script, load it:
loaded_model_fm = tf.keras.models.load_model("slp_fashion_mnist.h5")

You can then evaluate `loaded_model_fm` on test data again to confirm it’s the same:

In [None]:
loaded_loss, loaded_acc = loaded_model_fm.evaluate(X_test_fm_flat, y_test_fm)
print(f"Loaded Model - Test Loss: {loaded_loss:.4f}, Test Acc: {loaded_acc:.4f}")

**Why do this?**  
- **Practical**: In real projects, you rarely keep your model in memory. You train it once and save it for inference later or for future fine-tuning.  
- **Collaboration**: Team members can use your saved model for inference or production deployment.

## 9. Wrap-Up & Next Steps

### What We Learned

| Topic | Key Takeaway |
|-------|--------------|
| **SLP Architecture** | Simplest neural network - just input → output with no hidden layers |
| **Binary Classification** | Use sigmoid activation + binary crossentropy loss |
| **Multi-Class Classification** | Use softmax activation + categorical crossentropy loss |
| **Data Preprocessing** | Always scale features; flatten images for dense layers |
| **Weight Visualization** | SLPs allow direct interpretation of learned patterns |
| **Model Persistence** | Save models with `.save()`, load with `load_model()` |

### SLP Limitations

Single-Layer Perceptrons can only learn **linear decision boundaries**. This means:
- They struggle with complex patterns (e.g., CIFAR-10 images)
- Fashion MNIST accuracy tops out around 84-85%
- They cannot solve non-linearly separable problems (e.g., XOR)

### Next Steps

1. **Add hidden layers** → Create a Multi-Layer Perceptron (MLP) for better accuracy
2. **Try CNNs** → Convolutional Neural Networks are designed for image data
3. **Use callbacks** → `EarlyStopping`, `ReduceLROnPlateau` for smarter training
4. **Explore regularization** → Dropout, L2 to prevent overfitting

### Practice Exercise

Now try the **SLP Try-It-Yourself Lab** to apply these concepts to:
- **IMDB** (text classification with TF-IDF)
- **CIFAR-10** (color image classification)