# MNIST Fashion Dataset CNN with TensorFlow

Jackson Eshbaugh &bull; CS 424 &bull; Spring 2025

**Abstract**: This notebook trains and tunes a convolutional neural network on Fashion-MNIST using `KerasTuner` with TensorBoard logging, early stopping, and checkpoint saving.

In [None]:
!pip install keras_tuner

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import keras_tuner as kt
from sklearn.model_selection import train_test_split
import datetime
import os

print("GPU available:", tf.config.list_physical_devices('GPU'))

## Selection: Run Pretrained Model or Tune Model Again?

**If you run the pretrained model, ensure**:

1. `model.weights.h5` is uploaded into the same folder as this file,
2. `hyperparams.json` is uploaded into the same folder as this file, and
3. `logs/trial_06/06/execution0` is uploaded into the same folder as this file.

All of these can be found within the GitHub repository.

In [None]:
run_tuning = "Load Pretrained Model"  #@param ["Run Hyperparameter Tuning", "Load Pretrained Model"]
run_tuning = run_tuning == "Run Hyperparameter Tuning"

## 1. Import the Dataset and Preprocess the Data

- Normalize pixel values to range [0, 1]
- Add the channel value (required by TF for CNNs)

In [None]:
# Load the Fashion MNIST dataset

fashion_mnist = tf.keras.datasets.fashion_mnist
(X_train_pre, y_train_pre), (X_test, y_test) = fashion_mnist.load_data()

# Define class names

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
              'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# Print the shape of the training images and labels

print("Training images shape:", X_train_pre.shape)
print("Training labels shape:", y_train_pre.shape)
print("Test images shape:", X_test.shape)
print("Test labels shape:", y_test.shape)

# Get some size data about the image

print("First training image shape:", X_train_pre[0].shape)

# Normalize pixel values to [0, 1] and reshape for CNN

X_train_pre = X_train_pre.astype('float32') / 255.0
X_test  = X_test.astype('float32') / 255.0
X_train_pre = X_train_pre[..., tf.newaxis]  # shape is updated: (num_samples, 28, 28, 1)
X_test  = X_test[..., tf.newaxis]

# Get validation set

X_train, X_val, y_train, y_val = train_test_split(
    X_train_pre, y_train_pre, test_size=0.1, random_state=42
)

# Get the same size data about the image again; now we have the channel value

print("First training image shape:", X_train[0].shape)

# Sizes

print(f"Train: {len(X_train)}, Val: {len(X_val)}, Test: {len(X_test)}")

## Just Run the Model

Below, you can import the precompiled model and evaluate it (by selecting "Load Pretrained Model" from the dropdown above)

In [None]:
if not run_tuning:

  fashion_mnist = tf.keras.datasets.fashion_mnist
  (_, _), (X_test, y_test) = fashion_mnist.load_data()

  X_test = X_test.astype("float32") / 255.0
  X_test = X_test[..., tf.newaxis]  # Add channel dimension


  # Load saved hyperparameters from file
  import json
  filename = 'model.weights.h5'
  hyperparams_file = 'hyperparams.json'

  from tensorflow import keras
  from tensorflow.keras import layers

  with open(hyperparams_file, 'r') as f:
      best_hp = json.load(f)

  model = keras.Sequential([
      keras.Input(shape=(28, 28, 1)),

      layers.Conv2D(best_hp['conv1_filters'], kernel_size=3, activation='relu', padding='same'),
      layers.MaxPooling2D(),

      layers.Conv2D(best_hp['conv2_filters'], kernel_size=3, activation='relu', padding='same'),
      layers.MaxPooling2D(),

      layers.Flatten(),
      layers.Dense(best_hp['dense_units'], activation='relu'),
      layers.Dropout(best_hp['dropout_rate']),
      layers.Dense(10, activation='softmax')
  ])

  model.compile(
      optimizer=keras.optimizers.Adam(learning_rate=best_hp['learning_rate']),
      loss='sparse_categorical_crossentropy',
      metrics=['accuracy']
  )

  model.load_weights(filename)  # Load pretrained weights

  test_loss, test_acc = model.evaluate(X_test, y_test)
  print(f"Test accuracy: {test_acc:.4f}")

In [None]:
if not run_tuning:
  %load_ext tensorboard
  %tensorboard --logdir logs/

## 2. Define the Model

We propose a model that uses 2D convolutional layers, which capture more spatial information—and are super useful for image processing. We control for overfitting by using a dropout layer before making the final prediction. Pooling allows us to reduce the dimensionality of the images. Meanwhile, we increase the number of filters. These two actions go hand in hand and help us to recognize richer information in the images.

We use the categorical crossentropy loss function to optimize the model, with the `adam` optimizer.

In [None]:
def build_model(hp):

  # Get the hyperparameters we're tuning.

  filters_1 = hp.Choice('conv1_filters', [32, 64])
  filters_2 = hp.Choice('conv2_filters', [64, 128])
  dense_units = hp.Int('dense_units', 64, 256, step=64)
  dropout_rate = hp.Float('dropout_rate', 0.3, 0.7, step=0.1)
  lr = hp.Float('learning_rate', 1e-4, 1e-2, sampling='log')

  model = keras.Sequential([
    keras.Input(shape=(28, 28, 1)),

    layers.Conv2D(filters_1, kernel_size=3, activation='relu', padding='same'),
    layers.MaxPooling2D(),

    layers.Conv2D(filters_2, kernel_size=3, activation='relu', padding='same'),
    layers.MaxPooling2D(),

    layers.Flatten(),
    layers.Dense(dense_units, activation='relu'),
    layers.Dropout(dropout_rate),
    layers.Dense(10, activation='softmax')
  ])

  model.compile(
      optimizer=keras.optimizers.Adam(
          hp.Float('learning_rate', 1e-4, 1e-2, sampling='log')
      ),
      loss='sparse_categorical_crossentropy',
      metrics=['accuracy']
  )

  return model

## 3. Hyperparameter Tuning

We now tune the model using **Bayesian Optimization**. To support trial-specific logging and checkpointing, we define a custom subclass of `BayesianOptimization` that allows us to assign **unique callbacks** (such as TensorBoard and ModelCheckpoint) to each individual trial. This ensures that we save the best model weights and learning curves separately for each hyperparameter configuration. We also setup callbacks for early stopping, saving a model checkpoint, and for TensorBoard logging within the `run_trial` method.

Within the `build_model` function, we tune the following five hyperparameters:

1. **`conv1_filters`** – The number of filters in the first convolutional layer. This controls how much low-level detail (e.g., edges, corners) the model can extract from input images.

2. **`conv2_filters`** – The number of filters in the second convolutional layer. These deeper filters typically learn more complex patterns such as textures and shapes.

3. **`dense_units`** – The number of neurons in the dense (fully connected) layer before the output. This layer helps the model combine features to make final predictions, so tuning its size affects model capacity.

4. **`dropout_rate`** – The probability of "dropping" a neuron during training. Dropout is a form of regularization that helps prevent overfitting, especially on relatively small datasets like Fashion-MNIST.

5. **`learning_rate`** – The size of the step taken by the optimizer during each parameter update. This is a critical parameter for controlling convergence speed and training stability.

In [None]:
class CallbackTuner(kt.BayesianOptimization):
    def run_trial(self, trial, *args, **kwargs):
        model = self.hypermodel.build(trial.hyperparameters)

        trial_id = trial.trial_id

        callbacks = [
            keras.callbacks.ModelCheckpoint(
                filepath=f"checkpoints/trial_{trial_id}_best.weights.h5",
                monitor='val_accuracy',
                save_best_only=True,
                save_weights_only=True,
                verbose=1
            ),
            keras.callbacks.EarlyStopping(
                monitor='val_accuracy',
                patience=3,
                restore_best_weights=True
            ),
            keras.callbacks.TensorBoard(
                log_dir=f"logs/trial_{trial_id}",
                histogram_freq=1
            )
        ]

        return super().run_trial(trial, *args, callbacks=callbacks, **kwargs)

In [None]:
if run_tuning:
  tuner = CallbackTuner(
      hypermodel=build_model,
      objective='val_accuracy',
      max_trials=10,
      directory='tuner_dir',
      project_name='fashion_mnist'
  )

  tuner.search(
      X_train, y_train,
      epochs=20,
      validation_data=(X_val, y_val),
  )

In [None]:
if run_tuning:
  import json

  best_hp = tuner.get_best_hyperparameters(1)[0]

  with open("best_hp.json", "w") as f:
      json.dump(best_hp.values, f)

We can view the TensorBoard for these trials here.

In [None]:
%load_ext tensorboard
%tensorboard --logdir logs/

## 4. Evaluation

Now that we've optimized the hyperparameters, we pick the best model out and run it.

In [None]:
def build_best_model(hp):
    model = keras.Sequential([
        keras.Input(shape=(28, 28, 1)),
        layers.Conv2D(hp.get('conv1_filters'), 3, activation='relu', padding='same'),
        layers.MaxPooling2D(),
        layers.Conv2D(hp.get('conv2_filters'), 3, activation='relu', padding='same'),
        layers.MaxPooling2D(),
        layers.Flatten(),
        layers.Dense(hp.get('dense_units'), activation='relu'),
        layers.Dropout(hp.get('dropout_rate')),
        layers.Dense(10, activation='softmax')
    ])
    model.compile(
        optimizer=keras.optimizers.Adam(hp.get('learning_rate')),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    return model

In [None]:
if run_tuning:
  best_hp = tuner.get_best_hyperparameters(1)[0]
  best_trial_id = tuner.oracle.get_best_trials(1)[0].trial_id

  model = build_best_model(best_hp)
  weights_path = f"checkpoints/trial_{best_trial_id}_best.weights.h5"
  model.load_weights(weights_path)
  best_trial_id

Trial 6 performed the best—let's view its learning curves using TensorBoard.

In [None]:
%load_ext tensorboard
%tensorboard --logdir logs/trial_06

In [None]:
if run_tuning:
  test_loss, test_acc = model.evaluate(X_test, y_test)
  print(f"Test accuracy: {test_acc:.4f}")

## 5. Conclusion

The best model achieved a test accuracy of $ 92.52\% $, and the validiation set plateaued at around $ 92.91\% $. Although it didn't reach the $ 98\% $ we were (informally) aiming for, this model demonstrates excellence in clasifying the MNIST Fashion dataset.

In a second iteration, I would consider:

- adding a third convolutional block,
- using batch normalization,
- increasing the number of epochs (maintaining early stopping), and
- expanding the search space
  - tune batch size, kernal size, or even run more trials

However, the current model generalizes quite well and meets the core objectives of the assignment: hyperparameter tuning, checkpointing, early stopping, and reproducible evaluation.