<a href="https://colab.research.google.com/github/sreent/machine-learning/blob/main/Final%20DNN%20Code%20Examples/Fashion%20MNIST/Fashion%20MNIST%20-%20TFDS%20Gray-Scaled%20Image%20Example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fashion MNIST - TFDS Gray-Scaled Image Example

This notebook demonstrates the **Universal ML Workflow** for multi-class classification on grayscale images using TensorFlow Datasets.

## Learning Objectives

By the end of this notebook, you will be able to:
- Work with grayscale images (no color conversion needed)
- Handle 10-class image classification
- Understand the Fashion MNIST benchmark dataset
- Apply the Universal ML Workflow to a standard benchmark

---

## Dataset Overview

| Attribute | Description |
|-----------|-------------|
| **Source** | [TensorFlow Datasets - fashion_mnist](https://www.tensorflow.org/datasets/catalog/fashion_mnist) |
| **Problem Type** | Multi-Class Classification (10 classes) |
| **Classes** | T-shirt, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Ankle boot |
| **Data Balance** | Perfectly Balanced (7000 per class) |
| **Image Size** | 28x28 grayscale |
| **Preprocessing** | Resize to 16x16 -> Flatten (256 features) |

---

## 1. Defining the Problem and Assembling a Dataset

**Problem:** Classify grayscale images of fashion items into 10 categories.

**Why Fashion MNIST?** A drop-in replacement for classic MNIST, but more challenging and realistic. While digit recognition (MNIST) is essentially "solved," fashion item classification presents more subtle differences between classes.

## 2. Choosing a Measure of Success

Standard accuracy (balanced classes), plus Top-N accuracy for 10-class problem.

## 3. Deciding on an Evaluation Protocol

20% test set, validation set, K-fold cross-validation for tuning.

## 4. Preparing Your Data

### 4.1 Import Libraries and Load Dataset

Note: Fashion MNIST images are already grayscale (no color conversion needed).

In [None]:
import os
import pandas as pd
import numpy as np

from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedKFold
from sklearn.utils.class_weight import compute_class_weight
from sklearn.metrics import accuracy_score, precision_score, recall_score, roc_auc_score
from sklearn.metrics import balanced_accuracy_score, confusion_matrix, ConfusionMatrixDisplay

from skimage.transform import resize

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, regularizers
import tensorflow_datasets as tfds
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import RMSprop

# Keras Tuner for hyperparameter search
!pip install -q -U keras-tuner
import keras_tuner as kt

import itertools
import matplotlib.pyplot as plt

SEED = 204

tf.random.set_seed(SEED)
np.random.seed(SEED)

import warnings
warnings.filterwarnings('ignore')

In [3]:
DATASET = 'fashion_mnist'
RESIZE = (16, 16)
GRAY_SCALE = True

In [5]:
ds = tfds.load(DATASET, split='all', shuffle_files=True)

images, labels = [], []
for entry in ds.take(len(ds)) :
    image, label = entry['image'], entry['label']
    
    image, label = image.numpy()[:,:,0], label.numpy()
    
    image = resize(image, RESIZE, anti_aliasing=True)
        
    images.append( image )
    labels.append( label )

In [55]:
# list of images to numpy array
X = np.array(images)

# flatten 2D image array to 1D array
X = X.reshape( (X.shape[0], -1) )

In [56]:
label_encoder = LabelEncoder()
label_encoder.fit(labels)

# label -> one-hot encoded vector
y = np_utils.to_categorical(label_encoder.transform(labels))

In [57]:
TEST_SIZE = 0.20

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=TEST_SIZE, stratify=labels, 
                                                    random_state=SEED, shuffle=True)

In [58]:
# normalise pixels from [0, 255] to [0, 1]
X_train, X_test = X_train/255., X_test/255.

In [59]:
VALIDATION_SIZE = X_test.shape[0]

X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, 
                                                 test_size=VALIDATION_SIZE, stratify=y_train.argmax(axis=1),
                                                 shuffle=True, random_state=SEED)

## 5. Developing a Model That Does Better Than a Baseline

10 balanced classes â†’ 10% baseline accuracy.

In [60]:
# number of samples per each class
counts = np.sum(y, axis=0)

# the dataset is balanced. one class is slightly less than the others, but minimal.
counts

array([7000., 7000., 7000., 7000., 7000., 7000., 7000., 7000., 7000.,
       7000.], dtype=float32)

In [61]:
baseline = max(counts) / sum(counts)

baseline

0.1

In [None]:
INPUT_DIMENSION = X_train.shape[1]
OUTPUT_CLASSES = y_train.shape[1]

OPTIMIZER = 'RMSprop'
LOSS_FUNC = 'categorical_crossentropy'
METRICS = ['categorical_accuracy', 
           tf.keras.metrics.Precision(name='precision'), 
           tf.keras.metrics.Recall(name='recall'),
           tf.keras.metrics.AUC(name='auc', multi_label=True)]

batch_size = 128
EPOCHS = 400

In [None]:
learning_rate = 0.25

# SLP: Single Layer Perceptron (no hidden layers)
slp_model = Sequential([
    Dense(OUTPUT_CLASSES, activation='softmax', input_shape=(INPUT_DIMENSION,))
], name='Single_Layer_Perceptron')

slp_model.compile(
    optimizer=RMSprop(learning_rate=learning_rate),
    loss=LOSS_FUNC,
    metrics=METRICS
)

slp_model.summary()

In [None]:
history_slp = slp_model.fit(X_train, y_train, 
                            batch_size=batch_size, epochs=EPOCHS, 
                            validation_data=(X_val, y_val), 
                            verbose=0)
slp_val_score = slp_model.evaluate(X_val, y_val, verbose=0)[1:]  # Skip loss, get metrics

In [None]:
print('Accuracy (Validation): {:.2f} (baseline={:.2f})'.format(slp_val_score[0], baseline))
print('Precision (Validation): {:.2f}'.format(slp_val_score[1]))
print('Recall (Validation): {:.2f}'.format(slp_val_score[2]))
print('AUC (Validation): {:.2f}'.format(slp_val_score[3]))

In [None]:
def plot_training_history(history, monitors=['loss', 'AUC']) :

  # using the variable axs for multiple Axes
  fig, axs = plt.subplots(1, 2, sharex='all', figsize=(15,5))
 
  for ax, monitor in zip(axs.flat, monitors) :
    loss, val_loss = history.history[monitor], history.history['val_' + monitor]

    if monitor == 'loss' :
        monitor = monitor.capitalize()

    epochs = range(1, len(loss)+1)

    ax.plot(epochs, loss, 'b.', label=monitor)
    ax.plot(epochs, val_loss, 'r.', label='Validation ' + monitor)
    ax.set_xlim([0, len(loss)])
    ax.title.set_text('Training and Validation ' + monitor + 's')
    ax.set_xlabel('Epochs')
    ax.set_ylabel(monitor)
    ax.legend()
    ax.grid()

  _ = plt.show()

In [None]:
plot_training_history(history_slp, monitors=['loss', 'auc'])

## 6. Scaling Up: Developing a Model That Overfits

Adding hidden layers to capture fashion item patterns.

In [None]:
learning_rate = 0.01

# MLP: Multi-Layer Perceptron (1 hidden layer with 8 neurons)
mlp_model = Sequential([
    Dense(8, activation='relu', input_shape=(INPUT_DIMENSION,)),
    Dense(OUTPUT_CLASSES, activation='softmax')
], name='Multi_Layer_Perceptron')

mlp_model.compile(
    optimizer=RMSprop(learning_rate=learning_rate),
    loss=LOSS_FUNC,
    metrics=METRICS
)

mlp_model.summary()

In [None]:
history_mlp = mlp_model.fit(X_train, y_train, 
                            batch_size=batch_size, epochs=EPOCHS, 
                            validation_data=(X_val, y_val), 
                            verbose=0)
mlp_val_score = mlp_model.evaluate(X_val, y_val, verbose=0)[1:]  # Skip loss, get metrics

In [None]:
plot_training_history(history_mlp, monitors=['loss', 'auc'])

In [None]:
print('Accuracy (Validation): {:.2f} (baseline={:.2f})'.format(mlp_val_score[0], baseline))
print('Precision (Validation): {:.2f}'.format(mlp_val_score[1]))
print('Recall (Validation): {:.2f}'.format(mlp_val_score[2]))
print('AUC (Validation): {:.2f}'.format(mlp_val_score[3]))

## 7. Regularizing Your Model and Tuning Hyperparameters

Using **Hyperband** for efficient hyperparameter tuning with a frozen architecture.

### Why Hyperband?

**Hyperband** is more efficient than grid search because it:
1. Starts training many configurations for a few epochs
2. Eliminates poor performers early
3. Allocates more resources to promising configurations

In [None]:
# Hyperband Model Builder for Multi-Class Classification
def build_model_hyperband(hp):
    """
    Build Fashion MNIST model with FROZEN architecture (2 layers: 32 -> 16 neurons).
    Only tunes regularization (Dropout) and learning rate.
    """
    model = keras.Sequential()
    model.add(layers.Input(shape=(INPUT_DIMENSION,)))

    # Fixed architecture: 2 hidden layers with 32 and 16 neurons
    # Layer 1: 32 neurons
    model.add(layers.Dense(32, activation='relu'))
    drop_0 = hp.Float('drop_0', 0.0, 0.5, step=0.1)
    model.add(layers.Dropout(drop_0))

    # Layer 2: 16 neurons
    model.add(layers.Dense(16, activation='relu'))
    drop_1 = hp.Float('drop_1', 0.0, 0.5, step=0.1)
    model.add(layers.Dropout(drop_1))

    # Output layer for multi-class classification
    model.add(layers.Dense(OUTPUT_CLASSES, activation='softmax'))

    lr = hp.Float('lr', 1e-4, 1e-2, sampling='log')
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=lr),
        loss=LOSS_FUNC,
        metrics=METRICS
    )
    return model

In [None]:
# Configure Hyperband tuner
tuner = kt.Hyperband(
    build_model_hyperband,
    objective='val_categorical_accuracy',
    max_epochs=20,
    factor=3,
    directory='fashion_mnist_hyperband',
    project_name='fashion_mnist_tuning'
)

# Run Hyperband search
tuner.search(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=20,
    batch_size=batch_size
)

In [None]:
# Get best hyperparameters and build best model
best_hp = tuner.get_best_hyperparameters(num_trials=1)[0]
print("Best hyperparameters:")
print(f"  Dropout Layer 1: {best_hp.get('drop_0')}")
print(f"  Dropout Layer 2: {best_hp.get('drop_1')}")
print(f"  Learning Rate: {best_hp.get('lr')}")

opt_model = tuner.hypermodel.build(best_hp)
opt_model.summary()

In [None]:
# Train the best model
history_opt = opt_model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=50,
    batch_size=batch_size,
    verbose=1
)

opt_val_score = opt_model.evaluate(X_val, y_val, verbose=0)[1:]  # Skip loss, get metrics

In [None]:
preds = opt_model.predict(X_test, verbose=0)

print('Accuracy (Test): {:.2f} (baseline={:.2f})'.format(accuracy_score(y_test.argmax(axis=1), 
                                                                      preds.argmax(axis=1)), baseline))
print('Precision (Test): {:.2f}'.format(precision_score(y_test.argmax(axis=1), 
                                                        preds.argmax(axis=1),
                                                        average='macro')))
print('Recall (Test): {:.2f}'.format(recall_score(y_test.argmax(axis=1), 
                                                  preds.argmax(axis=1),
                                                  average='macro')))
print('AUC (Test): {:.2f}'.format(roc_auc_score(y_test, 
                                                preds,
                                                multi_class='ovo',
                                                average='macro')))

In [None]:
disp = ConfusionMatrixDisplay( confusion_matrix(y_test.argmax(axis=1), preds.argmax(axis=1)),
                               display_labels=np.sort(np.unique(labels)) )
_ = disp.plot()

In [None]:
def top_n_accuracy_score(y, preds, n) :
    scores = []
    for j in range(preds.shape[0]) :
        score = 1 if y[j,:].argmax() in preds[j,:].argsort()[-n:] else 0
        
        scores.append( score )
            
    return sum(scores) / len(scores)

In [None]:
TOP_N = 3
within = top_n_accuracy_score(y_test, preds, TOP_N)

---

## 8. Key Takeaways

1. **Grayscale simplifies preprocessing** - no color conversion needed
2. **Balanced classes** allow standard accuracy as primary metric
3. **Top-N accuracy** useful for multi-class evaluation (was the correct class in top 3?)
4. **Fashion MNIST** is harder than digit MNIST but same workflow applies

In [136]:
print('Within Top N Accuracy (Test Data): {:.2f}'.format(within))

Within Top N Accuracy (Test Data): 0.98


---

## Appendix: Making the Code More Modular

The following helper functions encapsulate model building and training for reuse across projects.

In [None]:
def build_classification_model(input_dimension, output_classes, hidden_layers=0, hidden_neurons=32, 
                               activation='relu', dropout=None, optimizer='rmsprop', 
                               loss='categorical_crossentropy', metrics=['accuracy'], name=None):
    """
    Build a classification model with configurable architecture.
    
    Parameters:
    -----------
    input_dimension : int
        Number of input features
    output_classes : int
        Number of output classes
    hidden_layers : int
        Number of hidden layers (0 for SLP, >0 for MLP)
    hidden_neurons : int or list
        Neurons per hidden layer (int applies to all, list for per-layer)
    activation : str
        Activation function for hidden layers
    dropout : float or None
        Dropout rate (None = no dropout)
    optimizer : str or optimizer instance
        Optimizer for training
    loss : str
        Loss function
    metrics : list
        Metrics to track
    name : str
        Model name
    
    Returns:
    --------
    Compiled Keras Sequential model
    """
    model = Sequential(name=name)
    
    # Handle neurons as int or list
    if isinstance(hidden_neurons, int):
        neurons_list = [hidden_neurons] * hidden_layers
    else:
        neurons_list = hidden_neurons
    
    # Add hidden layers
    for i, neurons in enumerate(neurons_list):
        if i == 0:
            model.add(Dense(neurons, activation=activation, input_shape=(input_dimension,)))
        else:
            model.add(Dense(neurons, activation=activation))
        
        if dropout is not None:
            model.add(Dropout(dropout))
    
    # Output layer
    if hidden_layers == 0:
        model.add(Dense(output_classes, activation='softmax', input_shape=(input_dimension,)))
    else:
        model.add(Dense(output_classes, activation='softmax'))
    
    model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
    
    return model


def train_model(model, X_train, y_train, X_val, y_val, batch_size=32, epochs=100, 
                callbacks=None, verbose=0):
    """
    Train a model and return training history and validation scores.
    
    Parameters:
    -----------
    model : Keras model
        Compiled model to train
    X_train, y_train : arrays
        Training data
    X_val, y_val : arrays
        Validation data
    batch_size : int
        Batch size for training
    epochs : int
        Number of training epochs
    callbacks : list or None
        Keras callbacks
    verbose : int
        Verbosity level
    
    Returns:
    --------
    dict with 'history' and 'val_score'
    """
    history = model.fit(
        X_train, y_train,
        batch_size=batch_size,
        epochs=epochs,
        validation_data=(X_val, y_val),
        callbacks=callbacks,
        verbose=verbose
    )
    
    val_score = model.evaluate(X_val, y_val, verbose=0)[1:]  # Skip loss, get metrics
    
    return {
        'history': history,
        'val_score': val_score
    }


# Example usage (commented out):
# ----------------------------------
# # Build an SLP (no hidden layers)
# slp = build_classification_model(
#     input_dimension=256, 
#     output_classes=10,
#     hidden_layers=0,
#     optimizer=RMSprop(learning_rate=0.25),
#     loss='categorical_crossentropy',
#     metrics=METRICS,
#     name='SLP_Model'
# )
#
# # Build an MLP (with hidden layers)
# mlp = build_classification_model(
#     input_dimension=256,
#     output_classes=10,
#     hidden_layers=2,
#     hidden_neurons=[32, 16],
#     activation='relu',
#     dropout=0.2,
#     optimizer='adam',
#     loss='categorical_crossentropy',
#     metrics=METRICS,
#     name='MLP_Model'
# )
#
# # Train the model
# result = train_model(mlp, X_train, y_train, X_val, y_val, 
#                      batch_size=128, epochs=100)
# print(f"Validation Accuracy: {result['val_score'][0]:.2f}")