# Lab 2 - introducing Practical Modelling in Keras
  <a target="_blank" href="https://colab.research.google.com/github/andrew-nash/CS6421-labs-2025/blob/main/CS6421_Lab_02.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [None]:
import tensorflow as tf
import matplotlib.pyplot as plt

## Tensors (https://www.tensorflow.org/guide/basics)

The basic data structure in TensorFlow is the tf.Tensor, which is very similar to the np.array

In [None]:
# An immutable Tensor
x = tf.constant([[1., 2., 3.],
                 [4., 5., 6.]])
# A mutable Tensor
vx = tf.Variable([[1., 2., 3.],
                 [4., 5., 6.]])


In [None]:
x

In [None]:
vx

## Mathematical operations

These can be performed in much the same way as NumPy

In [None]:
x = tf.constant(1.75, dtype=tf.float32)
x
x*2

In [None]:
tf.exp(x)

In [None]:
A = tf.constant([[1,2,3],[4,5,6]])
B = tf.constant([[1,2,3,4],[5,6,7,8],[9,10,11,12]])

C=tf.matmul(A,B)
C

In [None]:
C.shape

## Auto-differentiation

One of the most imporant differces over NumPy is TensorFlow's ability to autmatically differentiate user-defined functions

In [None]:
def f(x):
  y = x**2 + 2*x - 5
  return y

In [None]:
x = tf.Variable(2.0)

with tf.GradientTape() as tape:
  y = f(x)

g_x = tape.gradient(y, x)
g_x

This also works over multi-variate functions

In [None]:
def f2(x):
  # y = 5*x + 2*exp(x)
  A = tf.constant(5.0)
  B = tf.constant(2.0)
  y = tf.add(tf.multiply(x,A), tf.multiply(B, tf.exp(x)))
  return y

In [None]:
x = tf.Variable([1.0,2.0,3.0,4.0,5.0])

with tf.GradientTape() as tape:
  y = f2(x)

g_x = tape.gradient(y, x)
g_x

These are the tools that will allow us to implement our own layers, activation functions and loss functions to add to Keras models.

# Data Loading And Cleaning

For this lab, we will use a pre-loaded dataset from TensorFlow - the MNIST (Modified NIST) dataset. This is a set of 70,000 28x28 greyscale images, with associated labels, of handwritten digits (0-9).

In this case TensorFlow has already split up the dataset to give us 60k images for training, and a separate 10k for evaluation.

In [None]:
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# data normalizing
x_train, x_test = x_train / 255.0, x_test / 255.0

**QUESTION** - why bother dividing the data by 255?

We will revisit this later in the lab.

In [None]:
print("Train shape", x_train.shape)
print("Test shape",  x_test.shape)

In [None]:
plt.imshow(x_train[0])


In [None]:
plt.imshow(x_test[0])


**EXERCISE**: Currently, `x_train` and `x_test` are arrays of square 28x28 images. For today, we are not going to be working with image-specific architectures and instead will pass our inputs as 784-length vectors instead of 28x28 images.

Use NumPy to transform `x_train` and `x_test` accordingly


In [None]:
x_train_clean = None
x_test_clean = None

Now lets consider our output labels.

In [None]:
y_train.shape, y_test.shape

In [None]:
y_train

`y_train` and `y_test` are simply arrays of numeric labels from 0-9.

Is this categorical, or quantitative data? Is there an inherent ordering to the values?

Therefore, is this the best possible data encoding for us to directly predict?

### One-hot Encoding

This is a more computer-friendly way of encoding categorical data.

The idea is, if there are $K$ possible values of our data, we represent each value as a vector of length $K$. One value of this vector is 1, while the rest remain 0.

E.g., if we want to encode a persons blood group (which is one of AB+, AB-, A+, A-, B+, B-, O+, O-) we could do the following:

Type | Encoded Vector |
---- | ----  |
AB+ | [1,0,0,0,0,0,0,0]
AB- | [0,1,0,0,0,0,0,0]
A+ | [0,0,1,0,0,0,0,0]
A- | [0,0,0,1,0,0,0,0]
B+ | [0,0,0,0,1,0,0,0]
B- | [0,0,0,0,0,1,0,0]
O+ | [0,0,0,0,0,0,1,0]
O- | [0,0,0,0,0,0,0,1]

The code to perform this transformation in tensorflow is simple.

In our case, because our data consists of the numbers 0-9, we can let our raw labels be the indices of the '1' value in the one-hot encoding.

I.e.

Type | Encoded Vector |
---- | ----  |
0 | [1,0,0,0,0,0,0,0,0,0]
1 | [0,1,0,0,0,0,0,0,0,0]
2 | [0,0,1,0,0,0,0,0,0,0]
3 | [0,0,0,1,0,0,0,0,0,0]
4 | [0,0,0,0,1,0,0,0,0,0]
5 | [0,0,0,0,0,1,0,0,0,0]
6 | [0,0,0,0,0,0,1,0,0,0]
7 | [0,0,0,0,0,0,0,1,0,0]
8 | [0,0,0,0,0,0,0,0,1,0]
9 | [0,0,0,0,0,0,0,0,0,1]

In [None]:
y_train_clean = tf.one_hot(indices=y_train, depth=10)
y_test_clean = tf.one_hot(indices=y_test, depth=10)

It is important to note that if your labels do not consist of the integers $0,1,2,3,4,\dots$ additional processing will be required to produce the one-hot vectors

## Model Design Process

We will now step through the process described to you in the lectures so far

1. Define Model
2. Compile Model
3. Train Model
4. Save Model (optional)
5. Save Best Weights (optional)
6. Evaluate Model
7. Predict using saved Model

### Define Model

In Keras, there are two main methods of defining a model - the Sequential API or the Functional API.

The Functional API has greater capability, at the price of extra code complexity. For now, we will focus on the Sequential API and introduce the functional API only once we introduce more complex models.

Sequential models are defined in a striaightforward layer-by-layer basis.

It is possible to implement your own type of layer, or use a pre-defined one from Keras' comprehensive list at https://www.tensorflow.org/api_docs/python/tf/keras/layers

It is critical to define the input and output shapes correctly - in this case our inputs are 784 length vectors, and our ouputs are vectors of length 10.

In [None]:
model = tf.keras.models.Sequential()

model.add(tf.keras.layers.Input(shape=(784,)))
model.add(tf.keras.layers.Dense(10))

**Pay attention** to the trailing comma after 784 - this is needed when our shape is a single value

In [None]:
model.summary()

This is the simplest possible model - with no activation function, and just one weight matrix and bias vector.

### Compile Model

This is where we specify our training hyper-parameters.

If you wish, you can define your own functions using tensorflow operations for any of these. Keras includes a library of loss functons, metrics and optimizers at:

https://www.tensorflow.org/api_docs/python/tf/keras/losses

https://www.tensorflow.org/api_docs/python/tf/keras/metrics

https://www.tensorflow.org/api_docs/python/tf/keras/optimizers

In [None]:
model.compile(
    optimizer= "SGD",
    loss = "mean_squared_error",
    metrics = ["accuracy"]
)

### Train Model

In [None]:
x_test_clean.shape

In [None]:
model.fit(
    x_train_clean,
    y_train_clean,
    epochs=5,
    batch_size = 128,
    validation_data = (x_test_clean,y_test_clean)
)

### Save Model

It is critically important to pick meaninful, and unique names for each model you train - and keep track of what saved model corresponsed to what training run.

In [None]:
model.save("simple_model_def_op_no_act.keras")

### Evaluate Model

In [None]:
model = tf.keras.models.load_model("simple_model_def_op_no_act.keras")
model.evaluate(x_test_clean,y_test_clean)

### Predict Using Saved Model

**EXERCISE** Get the first image out of x_test_clean. Make sure its shape is (1,784) and not just (784).

In [None]:
datapoint = x_test_clean[...]
model(datapoint)

## Model Architecture Improvements

Clearly, this simple model could use some improvement.

We can add:

1. Additional layers
2. Activation Functions
3. Regularization (Dropout and BatchNorm)



In [None]:
model = tf.keras.models.Sequential()

model.add(tf.keras.layers.Input(shape=(784,)))
model.add(tf.keras.layers.Dense(128, activation="sigmoid"))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(10, activation="softmax"))

In [None]:
model.summary()

We can also choose a more appropiate optimizer and loss function for this particular case.

In [None]:
model.compile(
    optimizer= "Adam",
    loss = "categorical_crossentropy",
    metrics = ["accuracy"]
)

In [None]:
model.fit(
    x_train_clean,
    y_train_clean,
    epochs=5,
    batch_size = 128,
    validation_data = (x_test_clean,y_test_clean)
)

# Hyper-parameter Optimzation

For simplicity, we will use the Keras tuner to partially automate this process (https://www.tensorflow.org/tutorials/keras/keras_tuner)

## Model Builder Function

The first step is to define a function over the hyper-parameters of interest, that returns the validation metrics.

We will then search over these arguments to find their optimal values.

In [None]:
!pip install -U keras-tuner

In [None]:
%load_ext tensorboard

In [None]:
import keras_tuner as kt

In [None]:
def build_two_layer_model(hp):
  model = tf.keras.models.Sequential()

  model.add(tf.keras.layers.Input(shape=(784,)))

  # chosen act function
  act = hp.Choice("Activation", ["relu", "sigmoid"])

  # This will test values of the layer size in the range 64-512 on a logarithmic scale
  l1_size = hp.Int("Layer_1_size", min_value=64, max_value=512, step=2, sampling='log')
  model.add(tf.keras.layers.Dense(l1_size, activation=act))
  model.add(tf.keras.layers.Dropout(0.2))
  model.add(tf.keras.layers.BatchNormalization())

  # This will test values of the layer size in the range 64-512 on a logarithmic scale
  l2_size = hp.Int("Layer_2_size", min_value=64, max_value=512, step=2, sampling='log')
  model.add(tf.keras.layers.Dense(l2_size, activation=act))
  model.add(tf.keras.layers.Dropout(0.2))
  model.add(tf.keras.layers.BatchNormalization())


  model.add(tf.keras.layers.Dense(10, activation="softmax"))

  model.compile(
    optimizer= tf.keras.optimizers.Adam(learning_rate=0.001),
    loss = "categorical_crossentropy",
    metrics = ["accuracy"]
  )

  return model


In [None]:
tuner = kt.RandomSearch(build_two_layer_model,
                        objective='val_accuracy',
                        max_trials=20,
                        seed=42,
                        overwrite=True,
                        directory="./hyp_searches/",
                        project_name="two_layer_size_search")

In [None]:
tuner.search_space_summary()

It is good practice when tuning these hyper-parameters to not use the test dataset for tuning - we will perform a separate split on our training data, and evaluate on thetest dataset post-optimization

In [None]:
%tensorboard --logdir "./hyp_searches/two_layer_size_search"

In [None]:
tuner.search(
    x_train_clean,
    y_train_clean,
    validation_split = 0.8,
    epochs=5,
    callbacks=[tf.keras.callbacks.TensorBoard("./hyp_searches/two_layer_size_search/tb_logs")]
)

**IMPORTANT** If you are using Google colab, the results will not be saved, once your notebook times out everything will be discarded.

Download any files or details you don't want to lose immediately.

Below are the highest scoring hyper-parameters. Be sure to closely examine the TensorBoard metrics for a better understanding of performance.

In [None]:
best_model = tuner.get_best_models()[0]

In [None]:
best_model.summary()

In [None]:
tuner.get_best_hyperparameters()[0].values

# Optional (But Recommended) Practice - Not Graded

This practice is not graded, but praciticing this task will prove useful for completing assignment 1.

Using keras_tuner (the various methods like hp.Int() and hp.Choice() are documented at https://keras.io/keras_tuner/api/hyperparameters/), try optimize the different hyper-parameters above model architecture.

You do not have to perform all of the optimization in one single search.

Refer to Lecture L06 as a guide for what to target.