# MNIST ANN with Optuna Hyperparameter Optimization

This notebook shows how to use **Optuna** to tune hyperparameters of a Keras
Artificial Neural Network (ANN) on the **MNIST** dataset.

Hyperparameters tuned:
- Number of units in hidden layer
- Number of hidden layers
- Dropout rate
- Learning rate
- Batch size

The code is heavily commented for teaching and self-study.

In [None]:
#!/usr/bin/env python
"""MNIST + Optuna Hyperparameter Tuning Demo."""

import os

import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.utils import to_categorical

print("TensorFlow version:", tf.__version__)

## 1. Load and Preprocess MNIST

We load MNIST, flatten the images, normalize pixel values, and one‑hot encode labels.

In [None]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(-1, 28 * 28).astype("float32") / 255.0
X_test = X_test.reshape(-1, 28 * 28).astype("float32") / 255.0

num_classes = len(np.unique(y_train))
y_train_cat = to_categorical(y_train, num_classes)
y_test_cat = to_categorical(y_test, num_classes)

print("Train:", X_train.shape, y_train_cat.shape)
print("Test:", X_test.shape, y_test_cat.shape)

### Quick Visualization

In [None]:
plt.figure(figsize=(6,3))
for i in range(6):
    plt.subplot(2,3,i+1)
    plt.imshow(X_train[i].reshape(28,28), cmap='gray')
    plt.title(int(np.argmax(y_train_cat[i])))
    plt.axis('off')
plt.tight_layout()
plt.show()

## 2. Install and Import Optuna

If Optuna is not installed in your environment, run the following in a notebook cell:

```bash
!pip install optuna -q
```

Then import it as below.

In [None]:
try:
    import optuna
    print("Optuna version:", optuna.__version__)
    optuna_available = True
except ImportError:
    print("Optuna is not installed. Install it with: !pip install optuna")
    optuna_available = False

## 3. Define Objective Function for Optuna

Optuna works by repeatedly calling an **objective function**.

In each trial, Optuna will:
- Suggest values for hyperparameters (units, layers, dropout, learning rate, batch size)
- Build and train a model using those hyperparameters
- Return a metric to **maximize** or **minimize** (here: validation accuracy to maximize).

In [None]:
def create_model(trial):
    """Create a Keras ANN model whose hyperparameters are sampled by Optuna trial."""
    model = Sequential()

    # Number of hidden layers: 1–3
    n_hidden = trial.suggest_int("n_hidden", 1, 3)

    for i in range(n_hidden):
        # Units per layer: 64–512
        units = trial.suggest_int(f"units_{i}", 64, 512, step=64)

        if i == 0:
            # First layer needs input_shape
            model.add(Dense(units, activation="relu", input_shape=(28*28,)))
        else:
            model.add(Dense(units, activation="relu"))

        # Dropout rate 0.0–0.5
        dropout_rate = trial.suggest_float(f"dropout_{i}", 0.0, 0.5, step=0.1)
        if dropout_rate > 0:
            model.add(Dropout(dropout_rate))

    # Output layer: 10 classes, softmax
    model.add(Dense(num_classes, activation="softmax"))

    # Learning rate for Adam
    lr = trial.suggest_float("learning_rate", 1e-4, 1e-2, log=True)
    optimizer = tf.keras.optimizers.Adam(learning_rate=lr)

    model.compile(
        optimizer=optimizer,
        loss="categorical_crossentropy",
        metrics=["accuracy"],
    )

    return model


def objective(trial):
    """Optuna objective: build, train, and evaluate a model; return validation accuracy."""
    model = create_model(trial)

    # Tune batch size as well
    batch_size = trial.suggest_categorical("batch_size", [64, 128, 256])

    # We use a small number of epochs for speed; increase for better performance.
    history = model.fit(
        X_train,
        y_train_cat,
        validation_split=0.1,
        epochs=5,
        batch_size=batch_size,
        verbose=0,
    )

    # Use the last validation accuracy as the objective value
    val_acc = history.history["val_accuracy"][-1]
    return val_acc

## 4. Run the Optuna Study

We now create a study to **maximize validation accuracy** and run a limited
number of trials (e.g., 10). You can increase `n_trials` for a deeper search
if you have more compute time.

In [None]:
if optuna_available:
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=10, show_progress_bar=True)

    print("\nBest trial:")
    best_trial = study.best_trial
    print("  Value (val_accuracy):", best_trial.value)
    print("  Params:")
    for k, v in best_trial.params.items():
        print("    {}: {}".format(k, v))
else:
    print("Optuna not available; please install it to run the study.")

## 5. Train Best-Found Model on Full Training Data & Evaluate on Test Set

After Optuna finishes, we rebuild the model with the best hyperparameters
and train it a bit longer, then evaluate on the test set.

In [None]:
if optuna_available:
    best_params = study.best_trial.params
    print("\nRebuilding model using best hyperparameters...")

    # Recreate a fake trial-like object to reuse our create_model() function
    class SimpleTrial:
        def __init__(self, params):
            self.params = params

        def suggest_int(self, name, low, high, step=1):
            return self.params[name]

        def suggest_float(self, name, low, high, step=None, log=False):
            return self.params[name]

        def suggest_categorical(self, name, choices):
            return self.params[name]

    dummy_trial = SimpleTrial(best_params)
    best_model = create_model(dummy_trial)

    best_batch_size = best_params.get("batch_size", 128)

    history_best = best_model.fit(
        X_train,
        y_train_cat,
        validation_split=0.1,
        epochs=10,
        batch_size=best_batch_size,
        verbose=1,
    )

    test_loss, test_acc = best_model.evaluate(X_test, y_test_cat, verbose=0)
    print(f"\nBest Optuna-tuned model - Test loss: {test_loss:.4f}, Test accuracy: {test_acc:.4f}")
else:
    print("Skipping best model training because Optuna is not available.")

## 6. Visualize a Few Predictions from the Tuned Model

In [None]:
if optuna_available:
    preds = best_model.predict(X_test[:10])
    y_pred = np.argmax(preds, axis=1)

    plt.figure(figsize=(8,3))
    for i in range(10):
        plt.subplot(2,5,i+1)
        plt.imshow(X_test[i].reshape(28,28), cmap='gray')
        plt.title(f"Pred: {y_pred[i]}\nTrue: {y_test[i]}")
        plt.axis('off')
    plt.tight_layout()
    plt.show()
else:
    print("Install Optuna and re-run the tuning section to see predictions.")

## 7. Summary

In this notebook we:

- Loaded and preprocessed the **MNIST** dataset.
- Defined a flexible ANN architecture with:
  - Tunable number of layers
  - Tunable number of units
  - Tunable dropout rate
  - Tunable learning rate & batch size
- Used **Optuna** to perform hyperparameter optimization by maximizing
  validation accuracy.
- Retrained the best-found model and evaluated it on the test set.

You can extend this by:
- Increasing `n_trials` for better search.
- Adding L2 regularization as another hyperparameter.
- Tuning different optimizers (e.g., Adam vs. RMSprop).
- Integrating this into a bigger teaching notebook with ANN, activations,
  loss functions, regularization, callbacks, and now Optuna-based tuning.