## Exercise 01
In this question, you will use the Fashion MNIST dataset and build a custom model using custom training loops (using a different optimizer with a different learning rate for the upper layers and the lower layers) to tackle image classification in the Fashion MNIST dataset.
For example, you can use Sequential model [1] from keras to build your custom model. Please note that Sequential model is suitable for a simple stack of layers with the restriction that each layer can only support exactly one input tensor and one output tensor [1]. Alternatively, you can also use Keras Functional API [2], which allows you to create models that are more flexible than the models created using Sequential model [1]. Some hints are as follows:
1. Only use five epochs and 32 as batch size.
2. Only use softmax and ReLU activation functions.
3. Use SGD as the lower optimizer with the learning rate of 1e-4 and Nadam as upper optimizer with a learning rate as 1e-3.
4. Use Nadam optimizer [3] from Keras and also use sparse categorical cross entropy as a loss function.
5. Display the mean training loss and the mean accuracy over each epoch (updated at each iteration). Also display, validation loss, and accuracy at the end of each epoch.

In [10]:
import sys
import sklearn
import tensorflow as tf
from tensorflow import keras 
import tensorflow_addons as tfa #for tfa to work, a compatible version of tensorflow has to be installed: check https://github.com/tensorflow/addons
import numpy as np
import os

#to make this notebook’s output stable across runs
np.random.seed(42) 
tf.random.set_seed(42)

(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data() 
X_train_full = X_train_full.astype(np.float32) / 255.
X_valid , X_train = X_train_full [:5000] , X_train_full [5000:]
y_valid , y_train = y_train_full [:5000] , y_train_full [5000:]
X_test = X_test.astype(np.float32) / 255.

In [13]:
#define sequential model
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(64, activation='softmax'),
  tf.keras.layers.Dense(10)
])

#define optimizers used in different layers of the model
#legacy used to run more efficient on M1/M2 Macs as suggested by warning
#WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.
optimizers = [
    tf.keras.optimizers.legacy.SGD(learning_rate=1e-4), #lower optimizer -> close to input
    tf.keras.optimizers.legacy.Adam(learning_rate=1e-2) #lower optimizer -> close to output
]

#assign optimizers to the layers
optimizers_and_layers = [(optimizers[0], model.layers[0]), (optimizers[1], model.layers[1:])]
optimizer = tfa.optimizers.MultiOptimizer(optimizers_and_layers)

#compile with .SparseCategoricalCrossentropy as loss funciton and accuracy as metric (will be later be outputted for every epoch)
model.compile(optimizer=optimizer, loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])


In [16]:
#fit the model, default batch size is 32, see https://keras.io/api/models/model_training_apis/
model.fit(X_train, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x2a1a4f3a0>

In [15]:
#check accuracy of model on the test data
test_loss, test_acc = model.evaluate(X_test,  y_test, verbose=2)

print('\nTest accuracy:', test_acc)

313/313 - 1s - loss: 1.0123 - accuracy: 0.5223 - 767ms/epoch - 2ms/step

Test accuracy: 0.5223000049591064
