In [1]:
# from tensorflow import keras
import os
import pandas as pd
import numpy as np
import sklearn
import tensorflow as tf
from sklearn.model_selection import train_test_split
from tensorflow import keras
import keras

# Task 1: Learn the basics of Keras API for TensorFlow.

Start with reading the section “Implementing MLPs with Keras” from Chapter 10 of Geron’s textbook
(pages 295-320). Then install TensorFlow 2.0+ and experiment with the code included in this section
(Brightspace General information → Instructions). Additionally, study the official documentation https:
//keras.io/ and get an idea of the numerous options offered by Keras (layers, loss functions, metrics,
optimizers, activations, initializers, regularizers). Don’t get overwhelmed with the number of options – you
will frequently return to this site in the coming months.

Check out this official repository with many examples of Keras implementations of various sorts of
deep neural networks here. We recommend cloning this repository and trying to get some of these
examples running on your system (or Colab/DeepNote). In particular, experiment with mnist mlp.py
and mnist cnn.py scripts which show you how to build simple neural networks for the MNIST dataset
(useful for the next task).

Next, take the two well-known datasets: Fashion MNIST (introduced in Ch 10, p. 298) and CIFAR-10.
The first dataset contains 2D (grayscale) images of size 28x28, split into 10 categories; 60,000 images
for training and 10,000 for testing, while the latter contains 32x32x3 RGB images (50,000/10,000
train/test). Apply two reference networks on the fashion MNIST dataset. 

Experiment with both networks, trying various options: initializations, activations, optimizers (and
their hyperparameters), regularizations (L1, L2, Dropout, no Dropout). You may also experiment
with changing the architecture of both networks: adding/removing layers, number of convolutional
filters, their sizes, etc. For optimizing your hyperparameters you should use a validation set (10% of
the training set). After you have found the best-performing hyperparameter sets, take the 3 best ones
and train new models on the CIFAR-10 dataset to see whether your performance gains translate to a
different dataset. Provide your thoughts on these results in the report (e.g. what are the main reasons
for the difference in performance?).

The purpose of this task is NOT to get the highest accuracy. Instead, you are supposed to gain some
practical experience with tuning networks, running multiple experiments (with the help of some scripting),
and, very importantly, documenting your findings and understanding reasonable ranges for your hyperparameters. In your report, you have to provide a concise description of your experiments, results, and conclusions.
The quality of your work (code and report) is more important than the quantity or the accuracy you’ve
achieved.



## (a) A multi-layer perceptron described in detail in Ch. 10, pp. 299-307

In [None]:
# We import the fashion MNIST dataset.
fashion_mnist = keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()

X_test = X_test/255.0
X_valid, X_train = X_train_full[:5000]/255.0, X_train_full[5000:]/255.0
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

In [22]:
model = keras.models.Sequential()
model.add(keras.layers.Input((28,28)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(300, activation='elu'))
model.add(keras.layers.Dense(100, activation='relu'))
model.add(keras.layers.Dense(10, activation="softmax"))


In [23]:
model.summary()

## (b) After lecture 6/7: a convolutional neural network described in Ch. 14, p. 447.

# Task 2: Develop a “Tell-the-time” network.

## 1

The problem of correctly telling the time can be formulated either as a multi-class classification
problem (for example, with 12x60=720 classes representing each minute label) or a regression problem
(for example, predicting the number of minutes after 12 o’clock). Therefore, your goal is to come up
with different representations for the labels of your data adapt the output layer of your neural network
and see how it impacts the training time and performance. No matter which architecture and loss
function you will use when reporting results also provide “common sense” accuracy: the absolute value
of the time difference between the predicted and the actual time (e.g., the “common sense” difference
between “predicted” 11:55 and the “target” 0:05 is just 10 minutes and not 11 hours and 50 minutes!).
Minimizing this “common sense” error measure is the main objective of this assignment! Notice that
it is a common situation in Machine Learning: we often train models using one error measure (e.g.,
cross-entropy loss) while the actual performance measure that we are interested in is different, e.g., the
accuracy (the percentage of correctly classified cases).

The dataset can be downloaded from here and it consists of 18000 grayscale images (18000x150x150
or 18000x75x75) contained in ‘images.npy’. The labels for each sample are represented by two integers
(18000x2, ‘labels.npy’ file), that correspond to the hour and minute displayed by the clock. You can see
that each image is rendered from a different angle and rotation and they might contain light reflections from
within the scene making this a non-trivial problem. For your experiments, we suggest splitting your data
into 80/10/10% splits for training/validation and test sets respectively. Remember to shuffle your dataset
as the sample files are ordered. We suggest using the smaller dataset for your initial tests and runs (75x75
images) and then reporting your results on the larger (150x150) datase

### (a) Classification

Treat this as a n-class classification problem. We suggest starting out with a
smaller number of categories e.g. grouping all the samples that are between [3 : 00 − 3 : 30] into
a single category (results in 24 categories in total), and trying to train a CNN model. Once you
have found a working architecture, increase the number of categories by using smaller intervals
for grouping samples to increase the ’common sense accuracy’. Can you train a network using
all 720 different labels? What problems does such a label representation have?

2.20.0
TensorFlow detected 1 GPU(s):
  - /physical_device:GPU:0


#### load data

In [3]:
data_folder = "A1_data_75"
images_path = os.path.join(data_folder, "images.npy")
images = np.load(images_path)
labels_path = os.path.join(data_folder, "labels.npy")
labels = np.load(labels_path)


(a) Classification - treat this as a n-class classification problem. We suggest starting out with a
smaller number of categories e.g. grouping all the samples that are between [3 : 00 −3 : 30] into
a single category (results in 24 categories in total), and trying to train a CNN model. Once you
have found a working architecture, increase the number of categories by using smaller intervals
for grouping samples to increase the ’common sense accuracy’. Can you train a network using
all 720 different labels? What problems does such a label representation have?

## Task a: classification
We will start with deviding labels into 24 categories, one for each 30 minute

In [None]:

print(labels)
def get_cat_labels(labels):
    new_labels = []
    for label in labels:
        label = label[0]* 2 + int(label[1] >= 30)
        new_labels.append(label)
    return np.array(new_labels)
labels = get_cat_labels(labels)
print(labels)


[[ 0  0]
 [ 0  0]
 [ 0  0]
 ...
 [11 59]
 [11 59]
 [11 59]]
[ 0  0  0 ... 23 23 23]


We then split the data into training, validation, and test sets. The sklearn train_test_split method shuffles the data by default

In [None]:
X_train_full, X_test,y_train_full, y_test = train_test_split(
    images, labels, test_size=0.1, random_state=35
)
X_train, X_valid,y_train, y_valid = train_test_split(
    X_train_full, y_train_full, test_size=1/9, random_state=35
) # 1/9 x 0.9 = 0.1.


X_train shape: (14400, 75, 75)
y_train shape: (14400,)
X_valid shape: (1800, 75, 75)
y_valid shape: (1800,)
X_test shape: (1800, 75, 75)
y_test shape: (1800,)


we define a common sense loss. This will calculate how far of the prediction was

In [9]:
def common_sense_loss(y_true, y_pred):
    """
    """
    y_pred_class = tf.argmax(y_pred, axis=1)
    y_true_float = tf.cast(tf.squeeze(y_true), dtype=tf.float32)
    y_pred_float = tf.cast(y_pred_class, dtype=tf.float32)
    diff = tf.abs(y_true_float - y_pred_float)
    cyclical_diff = tf.minimum(diff, 12.0 - diff)
    print(cyclical_diff)
    return tf.reduce_mean(cyclical_diff)


Our model for 24 class classification. We use a scheduler to lower the learning rate when we plateau



In [None]:

lr_scheduler = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,          # halve the learning rate if there is no improvement
    patience=3,          # Wait 2 epochs with no improvement before reducing
    min_lr=1e-6          # Set a minimum learning rate at 1e-6
)
early_stopper = keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=6,          # Wait 6 epochs for improvement before stopping
    restore_best_weights=True  # Automatically restore the weights from the best epoch
)
model = keras.models.Sequential([
    keras.Input(shape=(75, 75, 1)),
    # Block 1
    keras.layers.Conv2D(32, (3,3), activation="relu", padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D((2,2)),

    # Block 2
    keras.layers.Conv2D(64, (3,3), activation="relu", padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(64, (3,3), activation="relu", padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(2),

    # Block 3
    keras.layers.Conv2D(128, (3,3), activation="relu", padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(128, (3,3), activation="relu", padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(2), # Output shape: (9, 9, 128)

    # Block 4
    keras.layers.Conv2D(256, (3,3), activation="relu", padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(2), # Output shape: (4, 4, 256)

    keras.layers.Flatten(),
    keras.layers.Dense(128, activation="leaky_relu"),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(64, activation="leaky_relu"),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(24, activation="softmax")
])
model.compile(loss='sparse_categorical_crossentropy',
optimizer=keras.optimizers.Adam(learning_rate=0.001),
metrics=[common_sense_loss,"Accuracy"
        #   tf.keras.metrics.Precision(), tf.keras.metrics.Recall()
          ],
)


In [None]:
model.fit(
    X_train, y_train,
    epochs=10,
    validation_data=(X_valid, y_valid),
    callbacks=[lr_scheduler, early_stopper]
    )
#evaluate the model on the test set
test_loss,test_csl, test_acc = model.evaluate(X_test, y_test)
print('Test accuracy:', test_acc)
#base:0.8420000076293945
#leaky: 0.8525000214576721
#leaky + L2regularization: 0.8472999930381775
#leaky + batch normalization: 0.8978000283241272

(print(tf.__version__))
#0.9711111187934875
# metrics=["accuracy"])

Epoch 1/10
[1m450/450[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 13ms/step - Accuracy: 0.9581 - common_sense_loss: 0.0559 - loss: 0.1244 - val_Accuracy: 0.9578 - val_common_sense_loss: 0.0351 - val_loss: 0.1369 - learning_rate: 3.1250e-05
Epoch 2/10
[1m450/450[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 13ms/step - Accuracy: 0.9596 - common_sense_loss: 0.0474 - loss: 0.1199 - val_Accuracy: 0.9594 - val_common_sense_loss: 0.0389 - val_loss: 0.1259 - learning_rate: 3.1250e-05
Epoch 3/10
[1m450/450[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 13ms/step - Accuracy: 0.9624 - common_sense_loss: 0.0453 - loss: 0.1182 - val_Accuracy: 0.9517 - val_common_sense_loss: 0.0515 - val_loss: 0.1607 - learning_rate: 3.1250e-05
Epoch 4/10
[1m450/450[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 12ms/step - Accuracy: 0.9633 - common_sense_loss: 0.0463 - loss: 0.1117 - val_Accuracy: 0.9589 - val_common_sense_loss: 0.0417 - val_loss: 0.1299 - learning_rate: 3.1250e

We now make a class for every 10 minutes

In [None]:
labels = np.load(labels_path)

def get_cat_labels_10(labels):
    new_labels = []
    dct = {}
    for label in labels:
        old = label
        label = label[0]* 6 + int((label[1])/10)
        new_labels.append(label)
        dct[str(old)] = label
    print(dct)
    return np.array(new_labels)
labels = get_cat_labels_10(labels)


{'[0 0]': np.int64(0), '[0 1]': np.int64(0), '[0 2]': np.int64(0), '[0 3]': np.int64(0), '[0 4]': np.int64(0), '[0 5]': np.int64(0), '[0 6]': np.int64(0), '[0 7]': np.int64(0), '[0 8]': np.int64(0), '[0 9]': np.int64(0), '[ 0 10]': np.int64(1), '[ 0 11]': np.int64(1), '[ 0 12]': np.int64(1), '[ 0 13]': np.int64(1), '[ 0 14]': np.int64(1), '[ 0 15]': np.int64(1), '[ 0 16]': np.int64(1), '[ 0 17]': np.int64(1), '[ 0 18]': np.int64(1), '[ 0 19]': np.int64(1), '[ 0 20]': np.int64(2), '[ 0 21]': np.int64(2), '[ 0 22]': np.int64(2), '[ 0 23]': np.int64(2), '[ 0 24]': np.int64(2), '[ 0 25]': np.int64(2), '[ 0 26]': np.int64(2), '[ 0 27]': np.int64(2), '[ 0 28]': np.int64(2), '[ 0 29]': np.int64(2), '[ 0 30]': np.int64(3), '[ 0 31]': np.int64(3), '[ 0 32]': np.int64(3), '[ 0 33]': np.int64(3), '[ 0 34]': np.int64(3), '[ 0 35]': np.int64(3), '[ 0 36]': np.int64(3), '[ 0 37]': np.int64(3), '[ 0 38]': np.int64(3), '[ 0 39]': np.int64(3), '[ 0 40]': np.int64(4), '[ 0 41]': np.int64(4), '[ 0 42]': 

In [42]:
X_train_full, X_test,y_train_full, y_test = train_test_split(
    images, labels, test_size=0.1, random_state=35
)
X_train, X_valid,y_train, y_valid = train_test_split(
    X_train_full, y_train_full, test_size=1/9, random_state=35
) # 1/9 x 0.9 = 0.1. train test split shuffles by default


In [None]:
def common_sense_loss(y_true, y_pred):
    """
    """
    y_pred_class = tf.argmax(y_pred, axis=1)
    y_true_float = tf.cast(tf.squeeze(y_true), dtype=tf.float32)
    y_pred_float = tf.cast(y_pred_class, dtype=tf.float32)
    diff = tf.abs(y_true_float - y_pred_float)
    cyclical_diff = tf.minimum(diff, 12.0 - diff)
    print(cyclical_diff)
    return tf.reduce_mean(cyclical_diff)

In [47]:
max_pool = keras.layers.MaxPool2D(pool_size=2)
lr_scheduler = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,          # halce the learning rate if no improvement
    patience=2,          # Wait 2 epochs with no improvement before reducing
    min_lr=1e-6          # Set a minimum learning rate at 1e-6
)
early_stopper = keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=5,          # Wait 5 epochs for improvement before stopping
    restore_best_weights=True  # Automatically restore the model weights from the best epoch
)
# avg_pool = keras.layers.AveragePooling2D(pool_size=2)
model = keras.models.Sequential([
    keras.Input(shape=(75, 75, 1)),
    # Block 1
    keras.layers.Conv2D(32, (3,3), activation="relu", padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D((2,2)), # Output shape: (37, 37, 32)

    # Block 2
    keras.layers.Conv2D(64, (3,3), activation="relu", padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(64, (3,3), activation="relu", padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(2), # Output shape: (18, 18, 64)

    # Block 3
    keras.layers.Conv2D(128, (3,3), activation="relu", padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(128, (3,3), activation="relu", padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(2), # Output shape: (9, 9, 128)

    # Block 4
    keras.layers.Conv2D(256, (3,3), activation="relu", padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(2), # Output shape: (4, 4, 256)

    keras.layers.Flatten(),
    keras.layers.Dense(128, activation="leaky_relu"),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(64, activation="leaky_relu"),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(72, activation="softmax")
])
model.compile(loss='sparse_categorical_crossentropy',
optimizer=keras.optimizers.Adam(learning_rate=0.001),
metrics=[common_sense_loss,"Accuracy"
        #   tf.keras.metrics.Precision(), tf.keras.metrics.Recall()
          ],
)


In [None]:
model.fit(
    X_train, y_train,
    epochs=10,
    validation_data=(X_valid, y_valid),
    callbacks=[lr_scheduler, early_stopper]
    )
#evaluate the model on the test set
test_loss,test_csl, test_acc = model.evaluate(X_test, y_test)
print('Test accuracy:', test_acc)
#base:0.8420000076293945
#leaky: 0.8525000214576721
#leaky + L2regularization: 0.8472999930381775
#leaky + batch normalization: 0.8978000283241272

(print(tf.__version__))
#0.9194444417953491
# metrics=["accuracy"])

Epoch 1/10
[1m450/450[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 8ms/step - Accuracy: 0.9309 - common_sense_loss: 0.0900 - loss: 0.2031 - val_Accuracy: 0.9156 - val_common_sense_loss: 0.1091 - val_loss: 0.2537 - learning_rate: 3.1250e-05
Epoch 2/10
[1m450/450[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 8ms/step - Accuracy: 0.9324 - common_sense_loss: 0.0492 - loss: 0.1923 - val_Accuracy: 0.9122 - val_common_sense_loss: 0.0735 - val_loss: 0.2550 - learning_rate: 3.1250e-05
Epoch 3/10
[1m450/450[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 9ms/step - Accuracy: 0.9317 - common_sense_loss: 0.0530 - loss: 0.1989 - val_Accuracy: 0.9100 - val_common_sense_loss: 0.0707 - val_loss: 0.2669 - learning_rate: 3.1250e-05
Epoch 4/10
[1m450/450[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 9ms/step - Accuracy: 0.9408 - common_sense_loss: 0.0096 - loss: 0.1797 - val_Accuracy: 0.9133 - val_common_sense_loss: 0.0768 - val_loss: 0.2498 - learning_rate: 1.5625e-05


In [84]:
labels = np.load(labels_path)
def get_cat_labels_10(labels):
    new_labels = []
    dct = {}
    for label in labels:
        old = label
        label = label[0]* 60 + int((label[1]))
        new_labels.append(label)
        dct[str(old)] = label
    print(dct)
    return np.array(new_labels)
labels = get_cat_labels_10(labels)

{'[0 0]': np.int64(0), '[0 1]': np.int64(1), '[0 2]': np.int64(2), '[0 3]': np.int64(3), '[0 4]': np.int64(4), '[0 5]': np.int64(5), '[0 6]': np.int64(6), '[0 7]': np.int64(7), '[0 8]': np.int64(8), '[0 9]': np.int64(9), '[ 0 10]': np.int64(10), '[ 0 11]': np.int64(11), '[ 0 12]': np.int64(12), '[ 0 13]': np.int64(13), '[ 0 14]': np.int64(14), '[ 0 15]': np.int64(15), '[ 0 16]': np.int64(16), '[ 0 17]': np.int64(17), '[ 0 18]': np.int64(18), '[ 0 19]': np.int64(19), '[ 0 20]': np.int64(20), '[ 0 21]': np.int64(21), '[ 0 22]': np.int64(22), '[ 0 23]': np.int64(23), '[ 0 24]': np.int64(24), '[ 0 25]': np.int64(25), '[ 0 26]': np.int64(26), '[ 0 27]': np.int64(27), '[ 0 28]': np.int64(28), '[ 0 29]': np.int64(29), '[ 0 30]': np.int64(30), '[ 0 31]': np.int64(31), '[ 0 32]': np.int64(32), '[ 0 33]': np.int64(33), '[ 0 34]': np.int64(34), '[ 0 35]': np.int64(35), '[ 0 36]': np.int64(36), '[ 0 37]': np.int64(37), '[ 0 38]': np.int64(38), '[ 0 39]': np.int64(39), '[ 0 40]': np.int64(40), '[ 0

In [87]:
X_train_full, X_test,y_train_full, y_test = train_test_split(
    images, labels, test_size=0.1, random_state=35
)
X_train, X_valid,y_train, y_valid = train_test_split(
    X_train_full, y_train_full, test_size=1/9, random_state=35
) # 1/9 x 0.9 = 0.1. train test split shuffles by default
print(y_train)


[ 57 587 472 ... 223 268 344]


In [None]:
def common_sense_loss(y_true, y_pred):
    """
    """
    y_pred_class = tf.argmax(y_pred, axis=1)
    y_true_float = tf.cast(tf.squeeze(y_true), dtype=tf.float32)
    y_pred_float = tf.cast(y_pred_class, dtype=tf.float32)
    diff = tf.abs(y_true_float - y_pred_float)
    cyclical_diff = tf.minimum(diff, 720.0 - diff)
    print(cyclical_diff)
    return tf.reduce_mean(cyclical_diff)


### (b) Regression

try to build a network that predicts the time using a single output node in the
following format: [”03 : 00” → y = 3.0]; [”05 : 30” → y = 5.5], where categorical labels of hours
and minutes get transformed to a single continuous value. What kind of loss function would you
need to use for such a task? What kind of problems does such a representation have?

### (c) Multi-head models

### (d) Label transformation

## 2
Use the knowledge gained by working with other datasets in the previous parts of this assignment to
optimize your final models and decrease the error of telling the time as much as possible (common
sense error of below 10 minutes is achievable using relatively simple CNN architectures). You should
also compare the different ways of representing your labels and different neural network output layer
combinations. You should use an 80:20% ratio for the train/test sets respectively. Document your
experiments and findings in the report.