<a href="https://colab.research.google.com/github/sreent/machine-learning/blob/main/Lectures/12%20Single-Layer%20Perceptron/Solution%3A%20Try-It-Yourself%20SLP%20Lab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **SOLUTION: Try-It-Yourself SLP Lab**

## **Contents**

1. **IMDB (Binary Classification)**  
   **1.1** Data Loading & Multi-hot (Bag-of-Words) Encoding  
   **1.2** Build & Train SLP  
   **1.3** Evaluate (Confusion Matrix, Classification Report)  
   **1.4** (Optional) Hyperparameter Tuning  
   **1.5** (Optional) Saving & Loading  

2. **CIFAR-10 (Multi-Class Classification)**  
   **2.1** Data Loading & Flattening  
   **2.2** Build & Train SLP  
   **2.3** Evaluate (Confusion Matrix, Classification Report)  
   **2.4** Weight Visualization (reshape 32×32×3)  
   **2.5** (Optional) Hyperparameter Tuning  
   **2.6** (Optional) Saving & Loading  

> **Note**: A Single-Layer Perceptron (SLP) is **very limited** for these datasets; we do **not** expect high accuracy. The goal is to **learn fundamentals** of data loading, preprocessing, model building, training, evaluation, and basic weight interpretation. Let’s get started!


In [None]:
###########################################################
# SETUP & IMPORTS
###########################################################
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

from sklearn.metrics import confusion_matrix, classification_report


## **1. IMDB (Binary Classification)**

The **IMDB dataset** contains 50,000 movie reviews labeled as **positive (1)** or **negative (0)**. In this lab:

1. We **load** the IMDB data from `keras.datasets.imdb`.  
2. We use a **multi-hot encoding** (a simple bag-of-words approach) to convert each review into a vector of size `num_words` (often 10,000).  
3. We **build** a Single-Layer Perceptron with **1 output neuron** (sigmoid) for binary classification.  
4. We **train** the SLP and **evaluate** the results (accuracy, confusion matrix, classification report).  

> **No weight visualization** is done for IMDB, as text does not naturally reshape into a 2D/3D format the way images do.

Let’s walk through these steps now.


### 1.1 Imports & Loading Data

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

from sklearn.metrics import confusion_matrix, classification_report
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

# Load the IMDB dataset from Keras (integer-encoded reviews)
(x_train_imdb, y_train_imdb), (x_test_imdb, y_test_imdb) = keras.datasets.imdb.load_data(num_words=10000)

print("Number of training samples (IMDB):", len(x_train_imdb))
print("Number of test samples (IMDB):", len(x_test_imdb))

# Quick check of class balance (0 = negative, 1 = positive)
unique, counts = np.unique(y_train_imdb, return_counts=True)
print("Train labels distribution:", dict(zip(unique, counts)))


### 1.2 Decode Integer Sequences to Text

Keras provides a mapping of words to integer indices via `get_word_index()`. However, note that the first few indices are reserved for special tokens.


In [None]:
# Retrieve the word index dictionary
word_index = keras.datasets.imdb.get_word_index()

# Create a reverse mapping from integer indices to words
reverse_word_index = {value: key for (key, value) in word_index.items()}

def decode_review(sequence):
    """
    Decodes a list of integer indices into a string (review).
    Indices 1,2,3 are special tokens in the raw IMDB dataset,
    so we map them to '?' if not found.
    """
    # The offset of 3 is because 0,1,2 are special indices in IMDB
    return ' '.join([reverse_word_index.get(i - 3, '?') for i in sequence])

# Convert each integer sequence into a raw text string
train_texts = [decode_review(seq) for seq in x_train_imdb]
test_texts  = [decode_review(seq) for seq in x_test_imdb]

print("Example decoded review:\n", train_texts[0][:300], "...")

**Why decode to text first?**  
- We can then use standard text vectorizers (bag-of-words, TF-IDF) from scikit-learn.  
- This step can be somewhat slow, but it’s more flexible and standard than a custom “multi-hot” approach.

### 3. Choose Bag-of-Words *or* TF-IDF

We can pick either **`CountVectorizer`** for a pure bag-of-words representation (counts of words) or **`TfidfVectorizer`** for TF-IDF weighting.

In [None]:
# Let's pick TF-IDF for demonstration
vectorizer = TfidfVectorizer(
    max_features=10000,  # restrict to top 10k features
    stop_words='english' # optionally remove English stopwords
)

# Fit on training texts and transform
X_train_imdb_bow = vectorizer.fit_transform(train_texts)  # scipy sparse matrix
# Transform test texts
X_test_imdb_bow  = vectorizer.transform(test_texts)

print("TF-IDF train shape:", X_train_imdb_bow.shape)
print("TF-IDF test shape:", X_test_imdb_bow.shape)

# We typically need a dense representation for Keras input
X_train_imdb_bow = X_train_imdb_bow.toarray()
X_test_imdb_bow  = X_test_imdb_bow.toarray()

print("Converted to dense arrays, shape:", X_train_imdb_bow.shape)

**Note**:  
- Using `.toarray()` can be **memory-intensive** for large feature sets and data sizes. Some solutions keep a sparse format, but Keras doesn’t natively handle sparse input.  
- If your system is memory-limited, consider using a smaller `max_features` or a partial training approach.

### 1.4 Build the SLP (Sigmoid for Binary Classification)

In [None]:
model_imdb = keras.Sequential([
    layers.Dense(1, activation='sigmoid', input_shape=(X_train_imdb_bow.shape[1],))
])

model_imdb.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

model_imdb.summary()

### 1.5 Train & Evaluate

In [None]:
history_imdb = model_imdb.fit(
    X_train_imdb_bow, y_train_imdb,
    validation_split=0.2,
    epochs=5,
    batch_size=32,
    verbose=1
)

test_loss_imdb, test_acc_imdb = model_imdb.evaluate(X_test_imdb_bow, y_test_imdb, verbose=0)
print(f"\nIMDB Test Loss: {test_loss_imdb:.4f}")
print(f"IMDB Test Accuracy: {test_acc_imdb:.4f}")

#### Plot Training Curves

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(12,4))

plt.subplot(1,2,1)
plt.plot(history_imdb.history['loss'], label='Train Loss')
plt.plot(history_imdb.history['val_loss'], label='Val Loss')
plt.title("IMDB - Loss (TF-IDF)")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend()

plt.subplot(1,2,2)
plt.plot(history_imdb.history['accuracy'], label='Train Accuracy')
plt.plot(history_imdb.history['val_accuracy'], label='Val Accuracy')
plt.title("IMDB - Accuracy (TF-IDF)")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.legend()

plt.show()

#### Confusion Matrix & Classification Report

In [None]:
y_pred_imdb_probs = model_imdb.predict(X_test_imdb_bow)
y_pred_imdb = (y_pred_imdb_probs > 0.5).astype(int).ravel()

cm_imdb = confusion_matrix(y_test_imdb, y_pred_imdb)
print("\nIMDB Confusion Matrix:\n", cm_imdb)

print("\nIMDB Classification Report:\n", classification_report(y_test_imdb, y_pred_imdb))

## 2. CIFAR-10 (Multi-Class Classification) + Weight Visualization

Now, we move to **CIFAR-10**, which has 10 classes of 32×32 color images.  
1. Load & flatten each image to 3,072 features.  
2. Build an SLP with 10 outputs (softmax).  
3. Train & Evaluate.  
4. **Weight Visualization**: each of the 10 class neurons has 3,072 weights, which we reshape to (32,32,3) to see the “pattern.”

**Note**: Performance is generally low (~25–30% accuracy) for an SLP on CIFAR-10, because it’s a purely linear model with no hidden layers.


# 2.1 LOAD & FLATTEN CIFAR-10

In [None]:
###########################################################
# 2.1 LOAD & FLATTEN CIFAR-10
###########################################################
from tensorflow.keras.datasets import cifar10

(X_train_cifar, y_train_cifar), (X_test_cifar, y_test_cifar) = cifar10.load_data()

print("CIFAR Train shape:", X_train_cifar.shape)    # (50000, 32, 32, 3)
print("CIFAR Train labels shape:", y_train_cifar.shape)  # (50000, 1)
print("CIFAR Test shape:", X_test_cifar.shape)      # (10000, 32, 32, 3)
print("CIFAR Test labels shape:", y_test_cifar.shape)    # (10000, 1)

# Flatten label arrays
y_train_cifar = y_train_cifar.ravel()
y_test_cifar  = y_test_cifar.ravel()
print("Unique classes:", np.unique(y_train_cifar))

# Let's show some images
class_names = ["airplane","automobile","bird","cat","deer","dog","frog","horse","ship","truck"]
plt.figure(figsize=(10,2))
for i in range(5):
    plt.subplot(1,5,i+1)
    plt.imshow(X_train_cifar[i])
    plt.title(class_names[y_train_cifar[i]])
    plt.axis('off')
plt.show()

# SCALE & FLATTEN
X_train_cifar = X_train_cifar / 255.0
X_test_cifar  = X_test_cifar  / 255.0

num_pixels = 32*32*3  # 3072
X_train_cifar_flat = X_train_cifar.reshape(-1, num_pixels)
X_test_cifar_flat  = X_test_cifar.reshape(-1, num_pixels)

print("Train flattened shape:", X_train_cifar_flat.shape)
print("Test flattened shape:", X_test_cifar_flat.shape)


# 2.2 BUILD & TRAIN SLP (SOFTMAX)

In [None]:
###########################################################
# 2.2 BUILD & TRAIN SLP (SOFTMAX)
###########################################################
model_cifar = keras.Sequential([
    layers.Dense(10, activation='softmax', input_shape=(num_pixels,))
])

model_cifar.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

model_cifar.summary()

history_cifar = model_cifar.fit(
    X_train_cifar_flat,
    y_train_cifar,
    validation_split=0.2,
    epochs=5,
    batch_size=64,
    verbose=1
)

# 2.3 EVALUATE CIFAR-10 SLP

In [None]:
###########################################################
# 2.3 EVALUATE CIFAR-10 SLP
###########################################################
test_loss_cifar, test_acc_cifar = model_cifar.evaluate(X_test_cifar_flat, y_test_cifar)
print(f"\nCIFAR-10 - Test Loss: {test_loss_cifar:.4f}")
print(f"CIFAR-10 - Test Accuracy: {test_acc_cifar:.4f}")

# Plot training curves
plt.figure(figsize=(12,4))
plt.subplot(1,2,1)
plt.plot(history_cifar.history['loss'], label='Train Loss')
plt.plot(history_cifar.history['val_loss'], label='Val Loss')
plt.title("CIFAR-10 SLP - Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend()

plt.subplot(1,2,2)
plt.plot(history_cifar.history['accuracy'], label='Train Acc')
plt.plot(history_cifar.history['val_accuracy'], label='Val Acc')
plt.title("CIFAR-10 SLP - Accuracy")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.legend()

plt.show()

# Confusion Matrix & Classification Report
y_pred_cifar_probs = model_cifar.predict(X_test_cifar_flat)
y_pred_cifar = np.argmax(y_pred_cifar_probs, axis=1)

cm_cifar = confusion_matrix(y_test_cifar, y_pred_cifar)
print("CIFAR-10 Confusion Matrix:\n", cm_cifar)

print("\nCIFAR-10 Classification Report:\n",
      classification_report(y_test_cifar, y_pred_cifar, target_names=class_names))


## 2.4 Weight Visualization

Now, we visualize the weights for each of the 10 output neurons.

**Process**:
1. Extract the weight matrix: shape = (3072, 10).  
2. For each class neuron, reshape the 3072 weights into (32, 32, 3).  
3. Plot each reshaped array as an “image.”  

**Caveat**:  
- The pixel-scale might be negative or large. We might apply some normalization or offset.  
- The pattern can be noisy because a single linear layer tries to separate complex images.

We’ll do a direct plot for demonstration.
python
Copy code


In [None]:
weights_cifar = model_cifar.get_weights()[0]  # shape (3072, 10)
biases_cifar  = model_cifar.get_weights()[1]  # shape (10,)

plt.figure(figsize=(12,6))
for i in range(10):
    # Extract weights for class i
    w_i = weights_cifar[:, i]  # shape: (3072,)
    # Reshape to (32,32,3)
    w_i_3d = w_i.reshape(32, 32, 3)

    # We'll normalize to 0..1 for better visualization
    # (This is optional, purely for display)
    min_val = w_i_3d.min()
    max_val = w_i_3d.max()
    w_i_3d_norm = (w_i_3d - min_val) / (max_val - min_val + 1e-9)

    plt.subplot(2,5,i+1)
    plt.imshow(w_i_3d_norm)
    plt.title(f"Class {i}: {class_names[i]}")
    plt.axis('off')

plt.tight_layout()
plt.show()


### Interpretation
Each “image” is a visualization of how the single-layer perceptron weights each pixel channel for a specific class.
- Bright (or darker) regions might indicate the SLP has assigned higher (or lower) weight.
- Since it’s a purely linear model, these weight maps can be quite noisy and not very “intuitive,” but it’s still a neat look under the hood of a single-layer approach!


## 2.5 (Optional) Hyperparameter Tuning (CIFAR-10)

We can similarly adjust learning rate, epochs, or batch size. The single-layer limitation means we might not see a huge jump in accuracy, but it’s still instructive to try.


In [None]:
from tensorflow.keras.optimizers import Adam

model_cifar_tuned = keras.Sequential([
    layers.Dense(10, activation='softmax', input_shape=(num_pixels,))
])
model_cifar_tuned.compile(
    optimizer=Adam(learning_rate=0.0005),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

history_cifar_tuned = model_cifar_tuned.fit(
    X_train_cifar_flat, y_train_cifar,
    validation_split=0.2,
    epochs=10,
    batch_size=128,
    verbose=1
)

test_loss_ct, test_acc_ct = model_cifar_tuned.evaluate(X_test_cifar_flat, y_test_cifar)
print(f"Tuned CIFAR-10 Loss: {test_loss_ct:.4f}")
print(f"Tuned CIFAR-10 Accuracy: {test_acc_ct:.4f}")


## 2.6 (Optional) Saving & Loading (CIFAR SLP)

Finally, we can save our model for later use or analysis.


In [None]:
# Save
model_cifar.save("slp_cifar.h5")
print("SLP CIFAR model saved as slp_cifar.h5")

# Load
loaded_model_cifar = keras.models.load_model("slp_cifar.h5")
loaded_loss_cifar, loaded_acc_cifar = loaded_model_cifar.evaluate(X_test_cifar_flat, y_test_cifar, verbose=0)
print(f"Loaded CIFAR model - Test Accuracy: {loaded_acc_cifar:.4f}")


# **3. Wrap-Up**

1. **IMDB**:  
   - Binary classification with a single-layer perceptron using **multi-hot** vectors of top 10k words.  
   - Performance is moderate compared to advanced text models (embeddings, RNNs, or Transformers).

2. **CIFAR-10**:  
   - Flattened images to (32×32×3 = 3072) features.  
   - Single-layer perceptron with 10 outputs is typically insufficient for complex image data => ~25-30% accuracy.  
   - Weight visualization shows how each class neuron is assigning positive/negative importance to each pixel channel in a purely linear way.

3. **Hyperparameter Tuning**:  
   - Adjusting learning rate, epochs, or batch size can yield small improvements.  
   - Real breakthroughs often need deeper networks or specialized architectures (e.g., CNNs for images, embeddings/RNNs for text).

4. **Saving & Loading**:  
   - Demonstrated how to persist models to `.h5` files and reload them for inference or sharing.

**Next Steps**:
- Add **hidden layers** to create a **multi-layer perceptron** and see how accuracy changes.  
- Use advanced architectures (CNNs, RNNs, or Transformers) for these tasks to get better performance.  
- Experiment with **callbacks** (EarlyStopping, etc.), or different initializations, or advanced regularization techniques.
