<a href="https://colab.research.google.com/github/sreent/machine-learning/blob/main/Neural%20Networks/Deep%20Neural%20Networks%20Hands-On%20Lab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Hands-On Lab: Deep Neural Networks**

## **1. Introduction**

Welcome to this hands-on lab on **Deep Neural Networks (DNNs)**! In this notebook, you'll learn how to build, train, and optimize neural networks by following a systematic approach.

### **Learning Objectives**

By the end of this lab, you will be able to:
- Apply the **universal machine learning workflow** to real datasets
- Build DNNs for **binary classification** (IMDB sentiment analysis) and **multi-class classification** (Fashion MNIST)
- Understand and apply **regularization techniques** (Dropout, L2) to combat overfitting
- Use **Keras Tuner's Hyperband** for automated hyperparameter optimization
- Evaluate models using appropriate metrics (**accuracy**, **F1 score**, **confusion matrix**)

### **The Universal ML Workflow**

We'll follow these 8 steps throughout this lab:

| Step | Description | Covered In |
|------|-------------|------------|
| 1 | Define problem & gather data | Sections 2.1, 3.1 |
| 2 | Choose metrics of success | Sections 2.2, 3.1 |
| 3 | Choose evaluation protocol | Sections 2.2, 3.1 |
| 4 | Prepare data | Sections 2.3, 3.1 |
| 5 | Build a baseline model | Sections 2.4, 3.2 |
| 6 | Scale up → intentionally overfit | Sections 2.5, 3.3 |
| 7 | Regularize | Sections 2.6, 3.4 |
| 8 | Hyperparameter tuning | Sections 2.7, 3.5 |

### **Notebook Outline**

1. **Introduction & Setup**
2. **Binary Classification: IMDB Sentiment Analysis** (with TF-IDF and F1 score)
3. **Multi-Class Classification: Fashion MNIST** (10 clothing categories)
4. **Key Takeaways & Further Improvements**
5. **Appendix: EarlyStopping Deep Dive**

Let's begin!

In [None]:
#############################################
# 1.1 SETUP & IMPORTS
#############################################

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, regularizers

# Keras Tuner for hyperparameter search
!pip install -q -U keras-tuner
import keras_tuner as kt

from sklearn.metrics import confusion_matrix, classification_report, f1_score
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

print("TensorFlow version:", tf.__version__)
print("Keras Tuner version:", kt.__version__)

# Utility function for plotting training history
def plot_history(history, title=''):
    """
    Plot training/validation loss & accuracy from a Keras history object.
    Helps visualize overfitting (gap between train and val curves).
    """
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

    ax1.plot(history.history['loss'], label='Train Loss')
    ax1.plot(history.history['val_loss'], label='Val Loss')
    ax1.set_title(f'{title} - Loss')
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Loss')
    ax1.legend()

    ax2.plot(history.history['accuracy'], label='Train Accuracy')
    ax2.plot(history.history['val_accuracy'], label='Val Accuracy')
    ax2.set_title(f'{title} - Accuracy')
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Accuracy')
    ax2.legend()

    plt.tight_layout()
    plt.show()

print("\nSetup complete! Ready to begin.")

# **2. Binary Classification: IMDB Sentiment Analysis**

In this section, we'll classify movie reviews as **positive** or **negative** using the IMDB dataset.

### **Why IMDB?**
- It's a classic benchmark for **sentiment analysis**
- Contains 50,000 movie reviews (25,000 train, 25,000 test)
- Binary labels: positive (1) or negative (0)

### **Our Approach: TF-IDF Text Representation**

Instead of using raw word indices, we'll convert text to **TF-IDF (Term Frequency-Inverse Document Frequency)** vectors:

- **Term Frequency (TF)**: How often a word appears in a document
- **Inverse Document Frequency (IDF)**: Downweights common words (like "the", "is")
- **TF-IDF = TF × IDF**: Highlights words that are important to a specific document but rare overall

This gives us a fixed-size numerical vector for each review that we can feed into a neural network.

---

## 2.1 Step 1: Define Problem & Load Data

In [None]:
###########################################################
# 2.1 IMDB DATA
###########################################################
num_words = 10000
(x_train_raw, y_train_raw), (x_test_raw, y_test_raw) = keras.datasets.imdb.load_data(num_words=num_words)

print("IMDB train samples:", len(x_train_raw))
print("IMDB test samples:", len(x_test_raw))

unique, counts = np.unique(y_train_raw, return_counts=True)
print("IMDB train label distribution:", dict(zip(unique, counts)))

classes = dict(zip(unique, counts))
print("Naive Baseline (Accuracy):", max([classes[0], classes[1]]) / (classes[0] + classes[1]))

## 2.2 Steps 2-3: Metrics & Evaluation Protocol

### **Metrics We'll Use**

| Metric | Why Use It? |
|--------|-------------|
| **Accuracy** | Overall correctness - easy to interpret |
| **F1 Score** | Balances precision and recall - better for imbalanced data |
| **Confusion Matrix** | Shows true/false positives and negatives |

### **Evaluation Protocol**

We'll use a **train/validation/test split**:
- **Training set (80%)**: Used to train the model
- **Validation set (20% of train)**: Used to tune hyperparameters and monitor overfitting
- **Test set**: Held out for final evaluation only

This prevents **data leakage** and gives us an honest estimate of model performance.

## 2.3 Step 4: Data Preparation (TF-IDF Encoding)

The Keras IMDB dataset comes as integer-encoded sequences. We need to:

1. **Decode** integers back to text strings
2. **Vectorize** using TF-IDF to create fixed-size feature vectors
3. **Split** into train/validation sets
4. **Compute class weights** to handle any class imbalance

In [None]:
###########################################################
# 2.4 BAG-OF-WORDS / TF-IDF FOR IMDB
###########################################################
# Retrieve word index from Keras
word_index = keras.datasets.imdb.get_word_index()

# Reverse mapping (integer -> word)
reverse_word_index = {v: k for (k, v) in word_index.items()}
# Indices 0..3 are special tokens in Keras IMDB
def decode_review(int_seq):
    return " ".join([reverse_word_index.get(i-3, "?") for i in int_seq])

# Convert integer sequences to raw text
train_texts = [" ".join([reverse_word_index.get(i-3, "?") for i in seq]) for seq in x_train_raw]
test_texts  = [" ".join([reverse_word_index.get(i-3, "?") for i in seq]) for seq in x_test_raw]

# Use TfidfVectorizer or CountVectorizer
# vectorizer = CountVectorizer(max_features=num_words, stop_words='english')
vectorizer = TfidfVectorizer(max_features=num_words, stop_words='english')

X_train_sparse = vectorizer.fit_transform(train_texts)
X_test_sparse  = vectorizer.transform(test_texts)

# Convert sparse -> dense (can be memory-heavy; consider partial approaches for large data)
X_train = X_train_sparse.toarray()
X_test  = X_test_sparse.toarray()

y_train = y_train_raw
y_test  = y_test_raw

print("TF-IDF train shape:", X_train.shape)
print("TF-IDF test shape: ", X_test.shape)

# Class weighting
unique, counts = np.unique(y_train, return_counts=True)
neg_count, pos_count = counts[0], counts[1]
total_count = neg_count + pos_count
weight_for_0 = (1.0 / neg_count) * (total_count / 2.0)
weight_for_1 = (1.0 / pos_count) * (total_count / 2.0)
class_weights = {0: weight_for_0, 1: weight_for_1}
print("IMDB class weights:", class_weights)

In [None]:
# Split out some of the training set for validation
X_train_imdb, X_val_imdb, y_train_imdb, y_val_imdb = train_test_split(
    X_train, y_train, test_size=0.2, random_state=42, stratify=y_train
)

print("Train samples after splitting:", X_train_imdb.shape[0])
print("Validation samples:", X_val_imdb.shape[0])

## 2.4 Step 5: Baseline Model (Single-Layer Perceptron)

Our baseline is a **Single-Layer Perceptron (SLP)** - the simplest possible neural network:
- Just one `Dense(1, sigmoid)` layer
- Learns a linear decision boundary
- Establishes a performance floor to beat

**Why start simple?** A baseline tells us if our problem is learnable and gives us a reference point for improvement.

In [None]:
###########################################################
# 2.5 BASELINE MODEL
###########################################################
baseline_model = keras.Sequential([
    layers.Dense(1, activation='sigmoid', input_shape=(X_train_imdb.shape[1],))
])

baseline_model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

history_base = baseline_model.fit(
    X_train_imdb, y_train_imdb,
    validation_data=(X_val_imdb, y_val_imdb),
    epochs=5,
    batch_size=32,
    class_weight=class_weights,
    verbose=1
)

plot_history(history_base, "IMDB Baseline (TF-IDF)")

# Evaluate on test
test_loss_base, test_acc_base = baseline_model.evaluate(X_test, y_test, verbose=0)
print(f"Baseline Test Accuracy: {test_acc_base:.4f}")

# Let's also compute F1 on test
y_test_preds_base = (baseline_model.predict(X_test) > 0.5).astype(int).ravel()
test_f1_base = f1_score(y_test, y_test_preds_base)
print(f"Baseline Test F1 score: {test_f1_base:.4f}")

## 2.5 Step 6: Scale Up → Intentionally Overfit

Now we **add capacity** to the model by including hidden layers. The goal is to see if a more complex model can learn better patterns.

**What to expect:**
- Training accuracy should increase significantly
- Validation accuracy may plateau or even decrease (overfitting)
- The gap between train and validation curves indicates overfitting

This step confirms we have enough model capacity - if we can't overfit, our model might be too simple!

In [None]:
overfit_model = keras.Sequential([
    layers.Dense(128, activation='relu', input_shape=(X_train_imdb.shape[1],)),
    layers.Dense(64, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

overfit_model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

history_overfit = overfit_model.fit(
    X_train_imdb, y_train_imdb,
    validation_data=(X_val_imdb, y_val_imdb),
    epochs=15,
    batch_size=512,
    class_weight=class_weights,
    verbose=1
)

plot_history(history_overfit, "IMDB Overfit (TF-IDF)")

# Evaluate on test
test_loss_over, test_acc_over = overfit_model.evaluate(X_test, y_test, verbose=0)
print(f"Overfit Model Test Accuracy: {test_acc_over:.4f}")

y_test_preds_over = (overfit_model.predict(X_test) > 0.5).astype(int).ravel()
test_f1_over = f1_score(y_test, y_test_preds_over)
print(f"Overfit Model Test F1: {test_f1_over:.4f}")

## 2.6 Step 7: Regularize (Dropout + L2)

Once we can overfit, we add **regularization** to improve generalization:

| Technique | How It Works |
|-----------|--------------|
| **Dropout** | Randomly "drops" neurons during training, preventing co-adaptation |
| **L2 Regularization** | Penalizes large weights, encouraging simpler models |

**What to expect:**
- The gap between training and validation curves should shrink
- Validation performance should improve
- Training accuracy may decrease slightly (this is okay!)

In [None]:
reg_model = keras.Sequential([
    layers.Dense(128, activation='relu', kernel_regularizer=regularizers.l2(0.001), input_shape=(X_train_imdb.shape[1],)),
    layers.Dropout(0.5),
    layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')
])

reg_model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

history_reg = reg_model.fit(
    X_train_imdb, y_train_imdb,
    validation_data=(X_val_imdb, y_val_imdb),
    epochs=15,
    batch_size=512,
    class_weight=class_weights,
    verbose=1
)

plot_history(history_reg, "IMDB Regularized (TF-IDF)")

# Evaluate test
test_loss_reg, test_acc_reg = reg_model.evaluate(X_test, y_test, verbose=0)
print(f"Regularized Model Test Accuracy: {test_acc_reg:.4f}")

y_test_preds_reg = (reg_model.predict(X_test) > 0.5).astype(int).ravel()
test_f1_reg = f1_score(y_test, y_test_preds_reg)
print(f"Regularized Model Test F1: {test_f1_reg:.4f}")

## 2.7 Step 8: Hyperparameter Tuning (Hyperband)

Finally, we use **automated hyperparameter search** to find the best combination of:
- Number of layers
- Units per layer
- Dropout rate
- L2 regularization strength
- Learning rate

### **Why Hyperband?**

**Hyperband** is more efficient than random search because it:
1. Starts training many configurations for a few epochs
2. Eliminates poor performers early
3. Allocates more resources to promising configurations

This "early stopping" approach saves compute time while still exploring widely.

In [None]:
def build_imdb_model_tuner(hp):
    model = keras.Sequential()

    n_layers = hp.Int('n_layers', 1, 3)
    for i in range(n_layers):
        units = hp.Choice(f'units_{i}', [64,128,256])
        l2_factor = hp.Float(f'l2_{i}', 1e-4, 1e-2, sampling='log')
        model.add(layers.Dense(units, activation='relu', kernel_regularizer=regularizers.l2(l2_factor),
                               input_shape=(X_train_imdb.shape[1],) if i==0 else ()))
        drop = hp.Float(f'drop_{i}', 0.0, 0.5, step=0.1)
        model.add(layers.Dropout(drop))
    model.add(layers.Dense(1, activation='sigmoid'))

    lr = hp.Float('lr', 1e-4, 1e-2, sampling='log')
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=lr),
        loss='binary_crossentropy',
        metrics=['accuracy']
    )
    return model

tuner_imdb = kt.Hyperband(
    build_imdb_model_tuner,
    objective='val_accuracy',
    max_epochs=5,
    factor=3,
    directory='imdb_tfidf_hyperband',
    project_name='imdb_f1_demo'
)

tuner_imdb.search(
    X_train_imdb, y_train_imdb,
    validation_data=(X_val_imdb, y_val_imdb),
    epochs=5,
    batch_size=512,
    class_weight=class_weights
)

best_hp_imdb = tuner_imdb.get_best_hyperparameters(num_trials=1)[0]
print("Best hyperparams (IMDB):", best_hp_imdb.values)

best_model_imdb = tuner_imdb.hypermodel.build(best_hp_imdb)
history_imdb_tuned = best_model_imdb.fit(
    X_train_imdb, y_train_imdb,
    validation_data=(X_val_imdb, y_val_imdb),
    epochs=10,
    batch_size=512,
    class_weight=class_weights
)

plot_history(history_imdb_tuned, "IMDB Tuner (TF-IDF)")

# Evaluate final
test_loss_tuned, test_acc_tuned = best_model_imdb.evaluate(X_test, y_test, verbose=0)
print(f"Tuned Model Test Accuracy: {test_acc_tuned:.4f}")

y_test_preds_tuned = (best_model_imdb.predict(X_test) > 0.5).astype(int).ravel()
test_f1_tuned = f1_score(y_test, y_test_preds_tuned)
print(f"Tuned Model Test F1 Score: {test_f1_tuned:.4f}")

# **3. Multi-Class Classification: Fashion MNIST**

Now we tackle a **10-class image classification** problem using Fashion MNIST.

### **Why Fashion MNIST?**
- Drop-in replacement for classic MNIST (same size: 28×28 grayscale)
- More challenging than handwritten digits
- 10 clothing categories to classify

### **The 10 Classes**

| Label | Description |
|-------|-------------|
| 0 | T-shirt/top |
| 1 | Trouser |
| 2 | Pullover |
| 3 | Dress |
| 4 | Coat |
| 5 | Sandal |
| 6 | Shirt |
| 7 | Sneaker |
| 8 | Bag |
| 9 | Ankle boot |

We'll follow the same workflow: **Baseline → Overfit → Regularize → Tune**

---

## 3.1 Steps 1-4: Load, Explore & Prepare Data

In [None]:
#############################################
# 3.1 LOAD & PREPARE FASHION MNIST
#############################################

from tensorflow.keras.datasets import fashion_mnist

# Load dataset
(x_train_fm, y_train_fm), (x_test_fm, y_test_fm) = fashion_mnist.load_data()

# Class names for visualization
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

print("Training set shape:", x_train_fm.shape)
print("Test set shape:", x_test_fm.shape)
print("Number of classes:", len(class_names))

# Visualize some sample images
plt.figure(figsize=(12, 4))
for i in range(10):
    plt.subplot(2, 5, i + 1)
    plt.imshow(x_train_fm[i], cmap='gray')
    plt.title(class_names[y_train_fm[i]])
    plt.axis('off')
plt.suptitle('Sample Images from Fashion MNIST', fontsize=14)
plt.tight_layout()
plt.show()

# Normalize pixel values to [0, 1]
x_train_fm = x_train_fm / 255.0
x_test_fm = x_test_fm / 255.0

# Flatten images: 28x28 -> 784
X_train_fm_flat = x_train_fm.reshape(-1, 28 * 28)
X_test_fm_flat = x_test_fm.reshape(-1, 28 * 28)

print("\nAfter preprocessing:")
print("Training features shape:", X_train_fm_flat.shape)
print("Test features shape:", X_test_fm_flat.shape)

## 3.2 Step 5: Baseline Model (Single-Layer Softmax)

For multi-class classification, our baseline uses:
- **10 output neurons** (one per class)
- **Softmax activation** (outputs probabilities that sum to 1)
- **Sparse categorical crossentropy** loss (for integer labels)

In [None]:
baseline_fm = keras.Sequential([
    layers.Dense(10, activation='softmax', input_shape=(784,))
])
baseline_fm.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

history_fm_base = baseline_fm.fit(
    X_train_fm_flat, y_train_fm,
    validation_split=0.2,
    epochs=5,
    batch_size=32,
    verbose=1
)

plot_history(history_fm_base, 'FashionMNIST Baseline')
test_loss_fm_base, test_acc_fm_base = baseline_fm.evaluate(X_test_fm_flat, y_test_fm, verbose=0)
print(f"FashionMNIST Baseline Test Accuracy: {test_acc_fm_base:.4f}")

## 3.3 Step 6: Scale Up → Intentionally Overfit

Adding hidden layers with ReLU activation to increase model capacity.

In [None]:
overfit_fm = keras.Sequential([
    layers.Dense(256, activation='relu', input_shape=(784,)),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
])
overfit_fm.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

history_fm_over = overfit_fm.fit(
    X_train_fm_flat, y_train_fm,
    validation_split=0.2,
    epochs=15,
    batch_size=512,
    verbose=1
)

plot_history(history_fm_over, 'FashionMNIST Overfit')
test_loss_fm_over, test_acc_fm_over = overfit_fm.evaluate(X_test_fm_flat, y_test_fm, verbose=0)
print(f"Overfit Model Test Accuracy: {test_acc_fm_over:.4f}")

## 3.4 Step 7: Regularize (Dropout + L2)

Applying the same regularization techniques to combat overfitting.

In [None]:
reg_fm = keras.Sequential([
    layers.Dense(256, activation='relu', kernel_regularizer=regularizers.l2(0.001), input_shape=(784,)),
    layers.Dropout(0.5),
    layers.Dense(128, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.Dropout(0.5),
    layers.Dense(10, activation='softmax')
])
reg_fm.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

history_fm_reg = reg_fm.fit(
    X_train_fm_flat, y_train_fm,
    validation_split=0.2,
    epochs=15,
    batch_size=512,
    verbose=1
)

plot_history(history_fm_reg, 'FashionMNIST Regularized')
test_loss_fm_reg, test_acc_fm_reg = reg_fm.evaluate(X_test_fm_flat, y_test_fm, verbose=0)
print(f"FashionMNIST Regularized Test Accuracy: {test_acc_fm_reg:.4f}")

## 3.5 Step 8: Hyperparameter Tuning (Hyperband)

Using Keras Tuner to find optimal hyperparameters for Fashion MNIST.

In [None]:
def build_fm_model_tuner(hp):
    model = keras.Sequential()
    model.add(layers.Input(shape=(784,)))

    n_layers = hp.Int('n_layers', 1, 3)
    for i in range(n_layers):
        units = hp.Choice(f'units_{i}', [128, 256])
        l2_factor = hp.Float(f'l2_{i}', 1e-4, 1e-2, sampling='log')
        model.add(layers.Dense(units, activation='relu', kernel_regularizer=regularizers.l2(l2_factor)))
        drop_rate = hp.Float(f'drop_{i}', 0.0, 0.5, step=0.1)
        model.add(layers.Dropout(drop_rate))

    model.add(layers.Dense(10, activation='softmax'))

    lr = hp.Float('lr', 1e-4, 1e-2, sampling='log')
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=lr),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    return model

tuner_fm = kt.Hyperband(
    build_fm_model_tuner,
    objective='val_accuracy',
    max_epochs=5,
    factor=3,
    directory='fashion_hyperband',
    project_name='fm_tfidf_demo'
)

tuner_fm.search(
    X_train_fm_flat, y_train_fm,
    validation_split=0.2,
    epochs=5,
    batch_size=512
)

best_hp_fm = tuner_fm.get_best_hyperparameters(num_trials=1)[0]
print("Best Hyperparams (Fashion MNIST):", best_hp_fm.values)

best_model_fm = tuner_fm.hypermodel.build(best_hp_fm)
history_fm_tuned = best_model_fm.fit(
    X_train_fm_flat, y_train_fm,
    validation_split=0.2,
    epochs=10,
    batch_size=512
)

plot_history(history_fm_tuned, "FashionMNIST Tuner")
test_loss_fm_tuned, test_acc_fm_tuned = best_model_fm.evaluate(X_test_fm_flat, y_test_fm, verbose=0)
print(f"Tuned Model Test Accuracy: {test_acc_fm_tuned:.4f}")

# **4. Key Takeaways & Summary**

## What We Learned

### **The Universal ML Workflow Works!**

By following the systematic 8-step workflow, we achieved strong results on both datasets:

| Dataset | Task | Baseline | After Tuning |
|---------|------|----------|--------------|
| IMDB | Binary Classification | ~85% | ~88%+ |
| Fashion MNIST | 10-Class Classification | ~84% | ~88%+ |

### **Key Insights**

1. **Start Simple**: A baseline model establishes a performance floor and confirms the problem is learnable.

2. **Overfitting is Informative**: If you can't overfit, your model may lack capacity. The ability to overfit means you have room to regularize.

3. **Regularization is Essential**: Dropout and L2 regularization consistently improved generalization on both datasets.

4. **Automated Tuning Saves Time**: Hyperband efficiently explored the hyperparameter space, finding configurations we might not have tried manually.

5. **Metrics Matter**: F1 score provided a more nuanced view of binary classification performance than accuracy alone.

### **Comparing Approaches**

| Aspect | IMDB (Text) | Fashion MNIST (Images) |
|--------|-------------|------------------------|
| Input Representation | TF-IDF vectors (10,000 features) | Flattened pixels (784 features) |
| Output Activation | Sigmoid (binary) | Softmax (multi-class) |
| Loss Function | Binary Crossentropy | Sparse Categorical Crossentropy |
| Key Challenge | High-dimensional sparse input | Visual pattern recognition |

## Next Steps

To further improve these models, consider:

1. **For IMDB**: Use word embeddings (Word2Vec, GloVe) or transformer-based models (BERT)
2. **For Fashion MNIST**: Use Convolutional Neural Networks (CNNs) which are designed for image data
3. **Advanced Regularization**: Try Batch Normalization, data augmentation
4. **Learning Rate Scheduling**: Use callbacks like `ReduceLROnPlateau`

# **5. Appendix: EarlyStopping Deep Dive**

This appendix explores **EarlyStopping** - a callback that automatically stops training when validation performance stops improving.

## 5.1 EarlyStopping in Action

Let's see how EarlyStopping works with Fashion MNIST:

## 5.2 EarlyStopping vs. Regularization: Do We Need Both?

A common question: *"If EarlyStopping prevents overfitting, why bother with Dropout and L2?"*

### **The Short Answer: They Work Differently**

| Technique | When It Acts | How It Helps |
|-----------|--------------|--------------|
| **EarlyStopping** | End of training | Stops training when val loss stops improving |
| **Dropout/L2** | During training | Shapes the learning trajectory from the start |

### **Why Use Both?**

1. **EarlyStopping is reactive**: It only stops training *after* overfitting has begun
2. **Regularization is proactive**: It prevents overfitting from happening in the first place
3. **Combined effect**: Regularization improves the model's learning path, while EarlyStopping ensures we don't train too long

### **Best Practice**

Use **regularization + EarlyStopping** together for optimal results:
- Regularization finds a better solution
- EarlyStopping saves training time and prevents late-stage degradation

---

## **Congratulations!**

You've completed this hands-on lab on Deep Neural Networks. You now know how to:

- ✅ Apply the universal ML workflow systematically
- ✅ Build DNNs for binary and multi-class classification
- ✅ Use TF-IDF for text representation
- ✅ Combat overfitting with Dropout and L2 regularization
- ✅ Automate hyperparameter tuning with Keras Tuner's Hyperband
- ✅ Use EarlyStopping effectively

**Keep experimenting!** Try modifying the architectures, hyperparameters, or even applying these techniques to your own datasets.