# MNIST Deep Learning Notebook

**Goal:** Use the MNIST handwritten digits dataset to demonstrate:

- Artificial Neural Network (ANN)
- Activation Functions (ReLU, Softmax, etc.)
- Loss Functions (Categorical Crossentropy)
- Optimization Techniques (SGD, Adam, etc.)
- Regularization Techniques (L2, Dropout, Early Stopping)
- Callbacks (EarlyStopping, ModelCheckpoint, ReduceLROnPlateau, CSVLogger, TensorBoard)
- Hyperparameter Tuning (learning rate, units, layers, dropout, batch size)

This notebook is written for **technical and non-technical learners** with **detailed comments** in each step.

In [None]:
#!/usr/bin/env python
"""MNIST Deep Learning Demo with ANN, Activations, Losses, Optimizers,
Regularization, Callbacks, and Hyperparameter Tuning.

This notebook can be run cell-by-cell for teaching and learning.
"""

import os
import datetime

import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.utils import to_categorical
from tensorflow.keras import regularizers
from tensorflow.keras.callbacks import (
    EarlyStopping,
    ModelCheckpoint,
    ReduceLROnPlateau,
    CSVLogger,
    TensorBoard,
)

print("TensorFlow version:", tf.__version__)

## 1. Load and Explore the MNIST Dataset

MNIST is a classic dataset of **28Ã—28 grayscale images** of handwritten digits (0â€“9).

In [None]:
# Load MNIST data from Keras datasets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

print("Training data shape:", X_train.shape, y_train.shape)
print("Test data shape:", X_test.shape, y_test.shape)

# Number of classes (digits 0â€“9)
num_classes = len(np.unique(y_train))
print("Number of classes:", num_classes)

### Visualize Sample Digits

We plot a few sample images to understand how the raw data looks.

In [None]:
# Plot some sample images with labels
plt.figure(figsize=(8, 4))
for i in range(10):
    plt.subplot(2, 5, i + 1)
    plt.imshow(X_train[i], cmap='gray')
    plt.title(f"Label: {y_train[i]}")
    plt.axis('off')
plt.tight_layout()
plt.show()

## 2. Data Preprocessing

- Flatten 28Ã—28 images into 784-dimensional vectors
- Scale pixel values to `[0, 1]`
- Convert labels to **one-hot encoded vectors** for multi-class classification.

In [None]:
# Flatten images: (n_samples, 28, 28) -> (n_samples, 784)
X_train_flat = X_train.reshape(-1, 28 * 28).astype("float32")
X_test_flat = X_test.reshape(-1, 28 * 28).astype("float32")

# Normalize pixel values to [0, 1]
X_train_flat /= 255.0
X_test_flat /= 255.0

# One-hot encode labels
y_train_cat = to_categorical(y_train, num_classes)
y_test_cat = to_categorical(y_test, num_classes)

print("X_train_flat shape:", X_train_flat.shape)
print("y_train_cat shape:", y_train_cat.shape)

## 3. Utility: Plot Training History

We define a helper function to easily compare **training vs. validation loss and accuracy**.

In [None]:
def plot_history(history, title_prefix="Model"):
    """Plot training & validation loss and accuracy from a Keras History object."""
    if history is None:
        print("No history to plot.")
        return

    hist = history.history
    epochs = range(1, len(hist.get('loss', [])) + 1)

    plt.figure(figsize=(12, 4))

    # Plot loss
    plt.subplot(1, 2, 1)
    plt.plot(epochs, hist.get('loss', []), label='Train Loss')
    if 'val_loss' in hist:
        plt.plot(epochs, hist['val_loss'], label='Val Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.title(f"{title_prefix} - Loss")
    plt.legend()

    # Plot accuracy (if available)
    plt.subplot(1, 2, 2)
    if 'accuracy' in hist:
        plt.plot(epochs, hist['accuracy'], label='Train Acc')
    if 'val_accuracy' in hist:
        plt.plot(epochs, hist['val_accuracy'], label='Val Acc')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.title(f"{title_prefix} - Accuracy")
    plt.legend()

    plt.tight_layout()
    plt.show()

## 4. Baseline ANN (No Regularization, Simple Optimizer)

Here we build a simple fully-connected network:

- Input: 784 features
- Hidden layer: 128 neurons, **ReLU activation**
- Output layer: 10 neurons, **Softmax activation**
- Loss: **Categorical Crossentropy** (multi-class classification)
- Optimizer: **SGD** (Stochastic Gradient Descent)

This model helps us understand a basic ANN before adding optimizations and regularization.

In [None]:
# Build a simple baseline ANN model
baseline_model = Sequential([
    Dense(128, activation='relu', input_shape=(28 * 28,)),  # ReLU activation in hidden layer
    Dense(num_classes, activation='softmax')                # Softmax for multi-class output
])

# Compile the model
baseline_model.compile(
    optimizer='sgd',                      # Basic optimizer
    loss='categorical_crossentropy',      # Standard loss for multi-class classification
    metrics=['accuracy']
)

baseline_model.summary()

In [None]:
# Train the baseline model (few epochs for demo)
history_baseline = baseline_model.fit(
    X_train_flat,
    y_train_cat,
    epochs=5,
    batch_size=128,
    validation_split=0.1,
    verbose=1
)

plot_history(history_baseline, title_prefix="Baseline ANN (SGD)")

## 5. Improved ANN with Better Optimizer (Adam)

Now we switch to a more powerful optimizer:

- **Adam** combines the benefits of Momentum + RMSProp.
- Often converges faster and to a better solution.

We also increase the network depth slightly.

In [None]:
adam_model = Sequential([
    Dense(256, activation='relu', input_shape=(28 * 28,)),
    Dense(128, activation='relu'),
    Dense(num_classes, activation='softmax')
])

adam_model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

adam_model.summary()

In [None]:
history_adam = adam_model.fit(
    X_train_flat,
    y_train_cat,
    epochs=10,
    batch_size=128,
    validation_split=0.1,
    verbose=1
)

plot_history(history_adam, title_prefix="Improved ANN (Adam)")

## 6. ANN with Regularization + Callbacks

Now we combine multiple **advanced deep learning concepts**:

### Regularization Techniques
- **L2 weight decay** (penalize large weights)
- **Dropout** (randomly drop neurons during training)

### Callbacks
- `EarlyStopping` â†’ stop when validation loss stops improving
- `ModelCheckpoint` â†’ save the best model weights
- `ReduceLROnPlateau` â†’ reduce learning rate when training stalls
- `CSVLogger` â†’ log training metrics to a CSV file
- `TensorBoard` â†’ visualize training (optional)

This section demonstrates how real-world production models are trained.

In [None]:
# Directory setup for saving models and logs
output_dir = "mnist_training_outputs"
os.makedirs(output_dir, exist_ok=True)

checkpoint_path = os.path.join(output_dir, "best_model.h5")
csv_log_path = os.path.join(output_dir, "training_log.csv")
log_dir = os.path.join(output_dir, "logs_" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))

print("Checkpoint path:", checkpoint_path)
print("CSV log path:", csv_log_path)
print("TensorBoard log dir:", log_dir)

In [None]:
# Build a regularized ANN model

regularized_model = Sequential([
    # First hidden layer with L2 regularization
    Dense(
        512,
        activation='relu',
        input_shape=(28 * 28,),
        kernel_regularizer=regularizers.l2(1e-4)  # L2 penalty on weights
    ),
    Dropout(0.3),  # Dropout: randomly drop 30% of neurons

    # Second hidden layer with L2 regularization
    Dense(
        256,
        activation='relu',
        kernel_regularizer=regularizers.l2(1e-4)
    ),
    Dropout(0.3),

    # Output layer (Softmax for multi-class classification)
    Dense(num_classes, activation='softmax')
])

# Compile with Adam optimizer and categorical crossentropy loss
regularized_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

regularized_model.summary()

In [None]:
# Set up callbacks

early_stopping_cb = EarlyStopping(
    monitor='val_loss',           # watch validation loss
    patience=5,                   # epochs to wait for improvement
    restore_best_weights=True,    # roll back to best weights
    verbose=1
)

model_checkpoint_cb = ModelCheckpoint(
    filepath=checkpoint_path,
    monitor='val_loss',
    save_best_only=True,
    verbose=1
)

reduce_lr_cb = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,           # reduce LR by half
    patience=3,           # after 3 epochs of no improvement
    min_lr=1e-6,
    verbose=1
)

csv_logger_cb = CSVLogger(csv_log_path)

tensorboard_cb = TensorBoard(log_dir=log_dir)

callbacks_list = [
    early_stopping_cb,
    model_checkpoint_cb,
    reduce_lr_cb,
    csv_logger_cb,
    tensorboard_cb,
]

In [None]:
# Train the regularized model with callbacks
history_reg = regularized_model.fit(
    X_train_flat,
    y_train_cat,
    epochs=50,               # we allow up to 50 epochs, but EarlyStopping will likely cut it
    batch_size=128,
    validation_split=0.1,
    callbacks=callbacks_list,
    verbose=1
)

plot_history(history_reg, title_prefix="Regularized ANN (Adam + L2 + Dropout + Callbacks)")

## 7. Evaluate Best Model on Test Data

We now load the **best saved model** (according to validation loss) and evaluate
it on the **test set** to see how well it generalizes.

In [None]:
# Load the best model from checkpoint (optional, but recommended)
if os.path.exists(checkpoint_path):
    best_model = load_model(checkpoint_path)
    print("Loaded best model from checkpoint.")
else:
    best_model = regularized_model
    print("Checkpoint not found; using the last trained regularized model.")

# Evaluate on test data
test_loss, test_acc = best_model.evaluate(X_test_flat, y_test_cat, verbose=0)
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_acc:.4f}")

## 8. Make Predictions and Visualize

Let's predict some digits from the test set and compare predictions with true labels.

In [None]:
# Predict probabilities for the first 10 test images
y_pred_probs = best_model.predict(X_test_flat[:10])
y_pred = np.argmax(y_pred_probs, axis=1)

print("Predicted labels:", y_pred)
print("True labels:     ", y_test[:10])

# Visualize the corresponding images
plt.figure(figsize=(8, 4))
for i in range(10):
    plt.subplot(2, 5, i + 1)
    plt.imshow(X_test[i], cmap='gray')
    plt.title(f"Pred: {y_pred[i]}\nTrue: {y_test[i]}")
    plt.axis('off')
plt.tight_layout()
plt.show()

## 9. Hyperparameter Tuning

Now we add **hyperparameter tuning** to search for a better architecture automatically.

We will demonstrate two approaches:

1. **Manual grid search** over a small set of hyperparameters.
2. **Automated search** using `KerasTuner` (if available), tuning:
   - Number of units in hidden layer(s)
   - Number of hidden layers
   - Dropout rate
   - Learning rate
   - Batch size

> Note: For large searches, this can take time. You can reduce the number of trials/epochs for quick experiments.

### 9.1 Simple Manual Hyperparameter Search

We try a few combinations of:

- `learning_rate`: [0.01, 0.001]
- `batch_size`: [64, 128]

And track the validation accuracy.

In [None]:
def build_simple_model(learning_rate=1e-3):
    """Build a simple ANN model with a configurable learning rate."""
    model = Sequential([
        Dense(256, activation='relu', input_shape=(28 * 28,)),
        Dense(128, activation='relu'),
        Dense(num_classes, activation='softmax')
    ])
    opt = tf.keras.optimizers.Adam(learning_rate=learning_rate)
    model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

learning_rates = [1e-2, 1e-3]
batch_sizes = [64, 128]

results = []

for lr in learning_rates:
    for bs in batch_sizes:
        print("\n=== Training with lr=", lr, "batch_size=", bs, "===")
        model = build_simple_model(learning_rate=lr)
        history = model.fit(
            X_train_flat,
            y_train_cat,
            epochs=5,
            batch_size=bs,
            validation_split=0.1,
            verbose=0
        )
        val_acc = history.history['val_accuracy'][-1]
        print(f"Validation accuracy: {val_acc:.4f}")
        results.append({
            'learning_rate': lr,
            'batch_size': bs,
            'val_accuracy': float(val_acc)
        })

print("\nManual search results:")
for r in results:
    print(r)

### 9.2 Hyperparameter Tuning with KerasTuner (Optional)

We now use **KerasTuner** for a more systematic search.

If KerasTuner is not installed, run in a notebook cell:

```bash
!pip install keras-tuner -q
```

Then re-run the import cell below.

In [None]:
try:
    import keras_tuner as kt
    print("KerasTuner version:", kt.__version__)
    kt_available = True
except ImportError:
    print("KerasTuner is not installed. Install it with: !pip install keras-tuner")
    kt_available = False

In [None]:
if kt_available:
    def build_hyper_model(hp):
        """Build a model for KerasTuner with multiple tunable hyperparameters."""
        model = Sequential()

        # Tune number of hidden layers: 1 to 3
        n_hidden = hp.Int('n_hidden', min_value=1, max_value=3, step=1)

        for i in range(n_hidden):
            # Tune units per layer: 64â€“512
            units = hp.Int(f'units_{i}', min_value=64, max_value=512, step=64)
            if i == 0:
                model.add(Dense(units, activation='relu', input_shape=(28 * 28,)))
            else:
                model.add(Dense(units, activation='relu'))

            # Tune dropout rate for each hidden layer
            dropout_rate = hp.Float(f'dropout_{i}', min_value=0.0, max_value=0.5, step=0.1)
            if dropout_rate > 0:
                model.add(Dropout(dropout_rate))

        # Output layer
        model.add(Dense(num_classes, activation='softmax'))

        # Tune learning rate for Adam optimizer
        lr = hp.Float('learning_rate', min_value=1e-4, max_value=1e-2, sampling='log')
        opt = tf.keras.optimizers.Adam(learning_rate=lr)

        model.compile(
            optimizer=opt,
            loss='categorical_crossentropy',
            metrics=['accuracy']
        )

        return model

    tuner_dir = 'kt_mnist_tuner'
    os.makedirs(tuner_dir, exist_ok=True)

    tuner = kt.RandomSearch(
        build_hyper_model,
        objective='val_accuracy',
        max_trials=5,          # increase for a more thorough search
        executions_per_trial=1,
        directory=tuner_dir,
        project_name='mnist_ann_tuning'
    )

    tuner.search(
        X_train_flat,
        y_train_cat,
        epochs=10,
        validation_split=0.1,
        verbose=1
    )

    best_hp = tuner.get_best_hyperparameters(num_trials=1)[0]
    print("\nBest hyperparameters found:")
    for k in best_hp.values.keys():
        print(k, ":", best_hp.get(k))

    best_hyper_model = tuner.get_best_models(num_models=1)[0]

    test_loss_hp, test_acc_hp = best_hyper_model.evaluate(X_test_flat, y_test_cat, verbose=0)
    print(f"\n[HyperTuned Model] Test Loss: {test_loss_hp:.4f}, Test Accuracy: {test_acc_hp:.4f}")
else:
    print("KerasTuner not available; skipping automated hyperparameter tuning.")

## 10. Summary of Concepts Applied

In this single notebook, we applied the following deep learning concepts on MNIST:

1. **Artificial Neural Network (ANN)**
   - Fully connected layers using `Dense`.

2. **Activation Functions**
   - Hidden layers: `ReLU`
   - Output layer: `Softmax` for multi-class classification.

3. **Loss Function**
   - `categorical_crossentropy` for multi-class classification.

4. **Optimization Techniques**
   - Baseline: `SGD`
   - Improved: `Adam` optimizer with learning rate tuning.

5. **Regularization Techniques**
   - L2 weight regularization (`kernel_regularizer=regularizers.l2(...)`)
   - Dropout layers to reduce overfitting
   - Early stopping (via callback)

6. **Callbacks**
   - `EarlyStopping` â†’ stop training when validation loss stops improving
   - `ModelCheckpoint` â†’ save the best model
   - `ReduceLROnPlateau` â†’ lower learning rate when learning saturates
   - `CSVLogger` â†’ log training history to CSV
   - `TensorBoard` â†’ visualize training with TensorBoard

7. **Hyperparameter Tuning**
   - Manual search over learning rate & batch size
   - Automated search using `KerasTuner` (if installed) over:
     - Number of layers
     - Units per layer
     - Dropout rate
     - Learning rate

You can further extend this notebook by:
- Trying different activation functions (e.g., `tanh`, `LeakyReLU`)
- Changing optimizers (e.g., `RMSprop`, `AdamW`)
- Adding Batch Normalization
- Converting this fully connected ANN to a Convolutional Neural Network (CNN).
- Expanding the hyperparameter search space and trials for better results.

---
# ðŸ”¥ Advanced Section: Optuna Hyperparameter Optimization

The following section adds **Optuna-based hyperparameter tuning** on top of the
previous Keras-based ANN experiments. You can run it after training the earlier
models, or independently.


# MNIST ANN with Optuna Hyperparameter Optimization

This notebook shows how to use **Optuna** to tune hyperparameters of a Keras
Artificial Neural Network (ANN) on the **MNIST** dataset.

Hyperparameters tuned:
- Number of units in hidden layer
- Number of hidden layers
- Dropout rate
- Learning rate
- Batch size

The code is heavily commented for teaching and self-study.

In [None]:
#!/usr/bin/env python
"""MNIST + Optuna Hyperparameter Tuning Demo."""

import os

import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.utils import to_categorical

print("TensorFlow version:", tf.__version__)

## 1. Load and Preprocess MNIST

We load MNIST, flatten the images, normalize pixel values, and oneâ€‘hot encode labels.

In [None]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(-1, 28 * 28).astype("float32") / 255.0
X_test = X_test.reshape(-1, 28 * 28).astype("float32") / 255.0

num_classes = len(np.unique(y_train))
y_train_cat = to_categorical(y_train, num_classes)
y_test_cat = to_categorical(y_test, num_classes)

print("Train:", X_train.shape, y_train_cat.shape)
print("Test:", X_test.shape, y_test_cat.shape)

### Quick Visualization

In [None]:
plt.figure(figsize=(6,3))
for i in range(6):
    plt.subplot(2,3,i+1)
    plt.imshow(X_train[i].reshape(28,28), cmap='gray')
    plt.title(int(np.argmax(y_train_cat[i])))
    plt.axis('off')
plt.tight_layout()
plt.show()

## 2. Install and Import Optuna

If Optuna is not installed in your environment, run the following in a notebook cell:

```bash
!pip install optuna -q
```

Then import it as below.

In [None]:
try:
    import optuna
    print("Optuna version:", optuna.__version__)
    optuna_available = True
except ImportError:
    print("Optuna is not installed. Install it with: !pip install optuna")
    optuna_available = False

## 3. Define Objective Function for Optuna

Optuna works by repeatedly calling an **objective function**.

In each trial, Optuna will:
- Suggest values for hyperparameters (units, layers, dropout, learning rate, batch size)
- Build and train a model using those hyperparameters
- Return a metric to **maximize** or **minimize** (here: validation accuracy to maximize).

In [None]:
def create_model(trial):
    """Create a Keras ANN model whose hyperparameters are sampled by Optuna trial."""
    model = Sequential()

    # Number of hidden layers: 1â€“3
    n_hidden = trial.suggest_int("n_hidden", 1, 3)

    for i in range(n_hidden):
        # Units per layer: 64â€“512
        units = trial.suggest_int(f"units_{i}", 64, 512, step=64)

        if i == 0:
            # First layer needs input_shape
            model.add(Dense(units, activation="relu", input_shape=(28*28,)))
        else:
            model.add(Dense(units, activation="relu"))

        # Dropout rate 0.0â€“0.5
        dropout_rate = trial.suggest_float(f"dropout_{i}", 0.0, 0.5, step=0.1)
        if dropout_rate > 0:
            model.add(Dropout(dropout_rate))

    # Output layer: 10 classes, softmax
    model.add(Dense(num_classes, activation="softmax"))

    # Learning rate for Adam
    lr = trial.suggest_float("learning_rate", 1e-4, 1e-2, log=True)
    optimizer = tf.keras.optimizers.Adam(learning_rate=lr)

    model.compile(
        optimizer=optimizer,
        loss="categorical_crossentropy",
        metrics=["accuracy"],
    )

    return model


def objective(trial):
    """Optuna objective: build, train, and evaluate a model; return validation accuracy."""
    model = create_model(trial)

    # Tune batch size as well
    batch_size = trial.suggest_categorical("batch_size", [64, 128, 256])

    # We use a small number of epochs for speed; increase for better performance.
    history = model.fit(
        X_train,
        y_train_cat,
        validation_split=0.1,
        epochs=5,
        batch_size=batch_size,
        verbose=0,
    )

    # Use the last validation accuracy as the objective value
    val_acc = history.history["val_accuracy"][-1]
    return val_acc

## 4. Run the Optuna Study

We now create a study to **maximize validation accuracy** and run a limited
number of trials (e.g., 10). You can increase `n_trials` for a deeper search
if you have more compute time.

In [None]:
if optuna_available:
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=10, show_progress_bar=True)

    print("\nBest trial:")
    best_trial = study.best_trial
    print("  Value (val_accuracy):", best_trial.value)
    print("  Params:")
    for k, v in best_trial.params.items():
        print("    {}: {}".format(k, v))
else:
    print("Optuna not available; please install it to run the study.")

## 5. Train Best-Found Model on Full Training Data & Evaluate on Test Set

After Optuna finishes, we rebuild the model with the best hyperparameters
and train it a bit longer, then evaluate on the test set.

In [None]:
if optuna_available:
    best_params = study.best_trial.params
    print("\nRebuilding model using best hyperparameters...")

    # Recreate a fake trial-like object to reuse our create_model() function
    class SimpleTrial:
        def __init__(self, params):
            self.params = params

        def suggest_int(self, name, low, high, step=1):
            return self.params[name]

        def suggest_float(self, name, low, high, step=None, log=False):
            return self.params[name]

        def suggest_categorical(self, name, choices):
            return self.params[name]

    dummy_trial = SimpleTrial(best_params)
    best_model = create_model(dummy_trial)

    best_batch_size = best_params.get("batch_size", 128)

    history_best = best_model.fit(
        X_train,
        y_train_cat,
        validation_split=0.1,
        epochs=10,
        batch_size=best_batch_size,
        verbose=1,
    )

    test_loss, test_acc = best_model.evaluate(X_test, y_test_cat, verbose=0)
    print(f"\nBest Optuna-tuned model - Test loss: {test_loss:.4f}, Test accuracy: {test_acc:.4f}")
else:
    print("Skipping best model training because Optuna is not available.")

## 6. Visualize a Few Predictions from the Tuned Model

In [None]:
if optuna_available:
    preds = best_model.predict(X_test[:10])
    y_pred = np.argmax(preds, axis=1)

    plt.figure(figsize=(8,3))
    for i in range(10):
        plt.subplot(2,5,i+1)
        plt.imshow(X_test[i].reshape(28,28), cmap='gray')
        plt.title(f"Pred: {y_pred[i]}\nTrue: {y_test[i]}")
        plt.axis('off')
    plt.tight_layout()
    plt.show()
else:
    print("Install Optuna and re-run the tuning section to see predictions.")

## 7. Summary

In this notebook we:

- Loaded and preprocessed the **MNIST** dataset.
- Defined a flexible ANN architecture with:
  - Tunable number of layers
  - Tunable number of units
  - Tunable dropout rate
  - Tunable learning rate & batch size
- Used **Optuna** to perform hyperparameter optimization by maximizing
  validation accuracy.
- Retrained the best-found model and evaluated it on the test set.

You can extend this by:
- Increasing `n_trials` for better search.
- Adding L2 regularization as another hyperparameter.
- Tuning different optimizers (e.g., Adam vs. RMSprop).
- Integrating this into a bigger teaching notebook with ANN, activations,
  loss functions, regularization, callbacks, and now Optuna-based tuning.