## Chapter 12 - Custom Models and Training with TensorFlow

In [1]:
import time
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split

### question 12 - Creating Custom Layer That Performs Layer Normalization

i subclass the layer class of Keras, and defining the build and call functions which are called in the ____ call ____ function

In [2]:
class MyNormLayer(keras.layers.Layer):
    def __init__(self , **kwargs):
        super().__init__(**kwargs)
        
    def build(self,input_shape):
        self.alpha = self.add_weight(name = "alpha" , shape=input_shape[-1:] , dtype=tf.float32,
                                    initializer=keras.initializers.Ones)
        self.beta = self.add_weight(name = "beta" , shape=input_shape[-1:] , dtype=tf.float32,
                                   initializer=keras.initializers.Zeros)
        super().build(input_shape)
        
    def call(self,X):
        mu , var = tf.nn.moments(X , axes=-1 , keepdims=True)
        sigma = tf.sqrt(var)
        return (self.alpha*(X-mu)/(sigma + 0.001)) + self.beta
    
    def compute_output_shape(self,input_shape):
        return input_shape
    
    def get_config(self):
        base_config = super().get_config()
        return base_config

### Question 13 - Training a Model Using Custom Training Loop on Fashion MNIST Dataset

For this question, i'm going to create 3 custom training loops:

- The first is written entirly inside the loop
- The second is written with functions and @tf.function 
- The third is subclassing keras.Model class, and overwriting the train_step function of the class

when in each training method i evaluate also on validation data

The purpose of making those 3 loops is to compare the training time between the loop versions.

getting the fasion MNIST data

In [3]:
(X_train , y_train) , (X_test , y_test) = keras.datasets.fashion_mnist.load_data()

standardise the data, splitting in into train and validation, and transforming it to keras.Dataset object with 32 batch size

In [4]:
X_train = X_train.astype("float32") / 255 
X_test =  X_test.astype("float32") / 255
X_train , X_val , y_train , y_val = train_test_split(X_train , y_train , train_size=0.8)
training_data = tf.data.Dataset.from_tensor_slices((X_train , y_train)).batch(32)
validation_data = tf.data.Dataset.from_tensor_slices((X_val , y_val)).batch(32)

function to compose a basic model, with flattening layer at the start, followed by dense and dropout layers

In [5]:
def get_model():
    inputs = keras.Input(shape=(28,28))
    flatten = keras.layers.Flatten()(inputs)
    x = keras.layers.Dense(512 , activation = "relu")(flatten)
    x = keras.layers.Dropout(0.5)(x)
    x = keras.layers.Dense(100 , activation = "relu")(x)
    x = keras.layers.Dropout(0.2)(x)
    output = keras.layers.Dense(10 , activation = "softmax")(x)
    model = keras.Model(inputs , output)
    return model

defining the model configuration - loss function to calculate gradients from, optimizer to update the model weights, 
metric to follow and a metric to update the loss value

In [6]:
loss_fn = keras.losses.SparseCategoricalCrossentropy()
optimizer = keras.optimizers.Adam()
metrics = [keras.metrics.SparseCategoricalAccuracy()]
loss_tracking_metric = keras.metrics.Mean()
epochs = 10

First custom training loop - for each epoch i'm reseting the state of the metric and the loss, and then for each batch in the training data i'm feedfoward the inputs into the model and calculating the loss. then i'm calculate the gradients with regard to each trainable weight in the model and with the optimizer i'm updating the new weights. 
After finishing going through all the training data i'm updating the logs dict for each metric, again reseting the metric states and calculating those matric with regard to the validation data and again updating the logs dict. after that i print the results at the end of each epoch.

In [7]:
model = get_model()

start_time = time.time()
for epoch in range(epochs):
    for metric in metrics:
        metric.reset_states()
    loss_tracking_metric.reset_states()
    
    for inputs_batch , labels_batch in training_data:
        with tf.GradientTape() as tape:
            predictions = model(inputs_batch , training = True)
            loss = loss_fn(labels_batch , predictions)
        gradients = tape.gradient(loss , model.trainable_weights)
        optimizer.apply_gradients(zip(gradients , model.trainable_weights))
    
        logs = {}
        for metric in metrics:
            metric.update_state(labels_batch , predictions)
            logs[metric.name] = metric.result()
        loss_tracking_metric.update_state(loss)
        logs["loss"] = loss_tracking_metric.result()
        
    for metric in metrics:
        metric.reset_states()
    loss_tracking_metric.reset_states()
        
    for val_inputs_batch , val_label_batch in validation_data:
        predictions = model(val_inputs_batch , training = False)
        loss = loss_fn(val_label_batch , predictions)
        for metric in metrics:
            metric.update_state(val_label_batch , predictions)
            logs["val_" + metric.name] = metric.result()
        loss_tracking_metric.update_state(loss)
        logs["val_loss"] = loss_tracking_metric.result()
            
        
    print(f"Results of epoch {epoch + 1}:")
    for key , value in logs.items():
        print(f" => {key} : {value:.4f}")
        
end_time = time.time()
print("Time taken for training :" , end_time - start_time)

Results of epoch 1:
 => sparse_categorical_accuracy : 0.7696
 => loss : 0.6330
 => val_sparse_categorical_accuracy : 0.8130
 => val_loss : 0.4775
Results of epoch 2:
 => sparse_categorical_accuracy : 0.8256
 => loss : 0.4772
 => val_sparse_categorical_accuracy : 0.8349
 => val_loss : 0.4315
Results of epoch 3:
 => sparse_categorical_accuracy : 0.8376
 => loss : 0.4415
 => val_sparse_categorical_accuracy : 0.8528
 => val_loss : 0.3897
Results of epoch 4:
 => sparse_categorical_accuracy : 0.8463
 => loss : 0.4177
 => val_sparse_categorical_accuracy : 0.8492
 => val_loss : 0.3976
Results of epoch 5:
 => sparse_categorical_accuracy : 0.8517
 => loss : 0.4031
 => val_sparse_categorical_accuracy : 0.8678
 => val_loss : 0.3632
Results of epoch 6:
 => sparse_categorical_accuracy : 0.8587
 => loss : 0.3914
 => val_sparse_categorical_accuracy : 0.8698
 => val_loss : 0.3608
Results of epoch 7:
 => sparse_categorical_accuracy : 0.8617
 => loss : 0.3763
 => val_sparse_categorical_accuracy : 0.8643


The first training loop took us a whooping __2 minutes!__ 

Defining functions for the second custom training loop - i creat a function to reset the states of the metrics, function that updating the logs dict differently if we in the training stage or validation stage

In [8]:
def metrics_resetting():
    for metric in  metrics:
        metric.reset_states()
    loss_tracking_metric.reset_states()

In [9]:
def updating_logs(labels , preds, loss , is_training):
    if is_training:
        for metric in metrics:
            metric.update_state(labels , preds)
            logs[metric.name] = metric.result()
        loss_tracking_metric.update_state(loss)
        logs["loss"] = loss_tracking_metric.result()
        
    else:
        for metric in metrics:
            metric.update_state(labels , preds)
            logs["val_" + metric.name] = metric.result()
        loss_tracking_metric.update_state(loss)
        logs["val_loss"] = loss_tracking_metric.result()

Defining the training stage and passing it into @tf.function in order to make it computanional graph

In [10]:
@tf.function
def training_iteration(inputs,labels):
    with tf.GradientTape() as tape:
        predictions = model(inputs , training = True)
        loss = loss_fn(labels , predictions)
    gradients = tape.gradient(loss , model.trainable_weights)
    optimizer.apply_gradients(zip(gradients , model.trainable_weights))
    return predictions , loss

Doing the same with the validation stage

In [11]:
@tf.function
def validation_iteration(inputs,labels):
    predictions = model(inputs , training = False)
    loss = loss_fn(labels , predictions)
    return predictions , loss

Second Custom training loop - the same as the first one but with predefined @tf.functions

In [12]:
model = get_model()

start_time = time.time()

for epoch in range(epochs):
    metrics_resetting()
    logs = {}
    
    for inputs_batch , labels_batch in training_data:
        predictions , loss = training_iteration(inputs_batch , labels_batch)
        updating_logs(labels_batch , predictions , loss , True)
        
    metrics_resetting()
        
    for val_inputs_batch , val_labels_batch in validation_data:
        predictions , loss = validation_iteration(val_inputs_batch , val_labels_batch)    
        updating_logs(val_labels_batch , predictions , loss , False)
        
    print(f"Results of epoch {epoch + 1}:")
    for key , value in logs.items():
        print(f" => {key} : {value:.4f}")
        
end_time = time.time()
print("Time taken for training :" , end_time - start_time)

Results of epoch 1:
 => sparse_categorical_accuracy : 0.7654
 => loss : 0.6395
 => val_sparse_categorical_accuracy : 0.8256
 => val_loss : 0.4515
Results of epoch 2:
 => sparse_categorical_accuracy : 0.8263
 => loss : 0.4788
 => val_sparse_categorical_accuracy : 0.8477
 => val_loss : 0.4133
Results of epoch 3:
 => sparse_categorical_accuracy : 0.8369
 => loss : 0.4457
 => val_sparse_categorical_accuracy : 0.8574
 => val_loss : 0.3907
Results of epoch 4:
 => sparse_categorical_accuracy : 0.8449
 => loss : 0.4198
 => val_sparse_categorical_accuracy : 0.8670
 => val_loss : 0.3718
Results of epoch 5:
 => sparse_categorical_accuracy : 0.8509
 => loss : 0.4075
 => val_sparse_categorical_accuracy : 0.8647
 => val_loss : 0.3683
Results of epoch 6:
 => sparse_categorical_accuracy : 0.8573
 => loss : 0.3911
 => val_sparse_categorical_accuracy : 0.8683
 => val_loss : 0.3527
Results of epoch 7:
 => sparse_categorical_accuracy : 0.8615
 => loss : 0.3793
 => val_sparse_categorical_accuracy : 0.8663


We can see the power of @tf.function here in the second training loop - just by wrapping the training into a @tf.function, it shortens the time of training by more than half to __56 seconds!__

Third custom training loop - subclassing the keras.Model class, and inside overwriting the train_step function of the class

In [13]:
class MyCustomModel(keras.Model):
    def train_step(self, data):
        inputs, targets = data
        with tf.GradientTape() as tape:
            predictions = self(inputs, training=True)
            loss = self.compiled_loss(targets, predictions) 
        gradients = tape.gradient(loss, model.trainable_weights)
        optimizer.apply_gradients(zip(gradients, model.trainable_weights))
        self.compiled_metrics.update_state(targets, predictions) 
        return {m.name: m.result() for m in self.metrics} 

In [14]:
inputs = keras.Input(shape=(28,28))
flatten = keras.layers.Flatten()(inputs)
x = keras.layers.Dense(512 , activation = "relu")(flatten)
x = keras.layers.Dropout(0.5)(x)
x = keras.layers.Dense(100 , activation = "relu")(x)
x = keras.layers.Dropout(0.2)(x)
output = keras.layers.Dense(10 , activation = "softmax")(x)
model = MyCustomModel(inputs , output)

In [15]:
model.compile(optimizer=optimizer ,
             loss = loss_fn,
             metrics= metrics)
model.fit(X_train , y_train , batch_size=32 , epochs=10 ,
         validation_data=(X_val , y_val))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x2832a42a100>

For the third training loop is not even a cometition, only about __30 seconds__ to train!