<a href="https://colab.research.google.com/github/joepeskett/tree-pixels/blob/master/notebooks/tensorflow_exercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exercises to experiment with Tensorflow2/Keras

This is a notebook to go through some of the exercises from Hands on Machine Learning v2. 

1. Implement a custom layer for layer normalisation

2. Train a model using a custom training loop

3. Use the `tf.data.Dataset` functionality for preprocessing and then create a basic binary classifier. 

That should do for now. First, let's check that we're using Tensorflow 2

In [0]:
import tensorflow as tf
tf.__version__
import numpy as np

In [0]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

housing = fetch_california_housing()
X_train_full, X_test, y_train_full, y_test = train_test_split(
    housing.data, housing.target.reshape(-1, 1), random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(
    X_train_full, y_train_full, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_valid_scaled = scaler.transform(X_valid)
X_test_scaled = scaler.transform(X_test)

# Implementing a custom layer for Layer Normalisation

Firstly, what is layer normalisation(LN). Rather than normalising across the batch, as seen in Batch Normalisation(BN), LN normalises across the features dimension. 

In [0]:
from tensorflow import keras as keras

In [0]:
class LayerNormalisation(keras.layers.Layer):
  # Test that this layer gives the same output as the keras.layers.LayerNorm
  # layer. 
  def __init__(self, eps = 0.001, **kwargs):
    super().__init__(**kwargs)
    self.eps = eps
  def build(self, batch_input_shape):
    # This method should define two trainable weights 
    # This method needs to call the add_weight function
    # Essentially, this method is 
    self.alpha = self.add_weight(name = "alpha", 
                                 shape = batch_input_shape[-1:], 
                                 initializer = "ones")
    self.beta = self.add_weight(name = "beta",
                                shape = batch_input_shape[-1:],
                                initializer = "zeros")
    # We need to call the super's build method AT THE END(!)
    super().build(batch_input_shape)
  def call(self, X):
    # This method should compute the mean and std dev of the each instances 
    # features. You can use tf.nn.moments()
    # This function can then be used to calculate the 
    mean, variance = tf.nn.moments(X, axes = 1, keepdims=True)
    return self.alpha * (X - mean)/(tf.sqrt(variance+self.eps)) + self.beta

  # We need to more methods to complete a layer: `compute_output_shape` and `get_config`
  
  # Because we're not adding to the shape in this layer, we just return the
  # same shape as the batch_input_shape
  def get_output_shape(self, batch_input_shape):
    return batch_input_shape

  # Any additional configs that we have added to the base class
  def get_config(self):
    base_config = super().get_config
    return {**base_config, "eps1:" : self.eps}

In [0]:
# Testing our new layer 
X = X_train.astype(np.float32)

custom_layer_norm = LayerNormalisation()
keras_layer_norm = keras.layers.LayerNormalization()

tf.reduce_mean(keras.losses.mean_absolute_error(
    keras_layer_norm(X), custom_layer_norm(X)))


# Train a model using a custom training loop

Generally, we should not *need* to do this very often, given the extensive functionality now provided by Tensorflow. 

A custom training loop can be useful when we want to use different optmisers for different parts of the network. The book refers to the wide and deep architecture. 

First, let's look at an example from the book, and then move on to our own example. 

In [0]:
l2_reg = keras.regularizers.l2(0.05)
#Define our model
model = keras.models.Sequential([
                                 keras.layers.Dense(30, activation = "elu",
                                                    kernel_initializer = "he_normal",
                                                    kernel_regularizer = l2_reg),
                                 keras.layers.Dense(1, kernel_regularizer=l2_reg)
])

def random_batch(X, y, batch_size = 32):
  #Create a random batch for each step
  idx = np.random.randint(len(X), size = batch_size)
  return X[idx], y[idx]

def print_status_bar(iteration, total, loss, metrics=None):
  # Define a function for the status bar that is printed after each step.
  metrics = " - ".join(["{}: {:.4f}".format(m.name, m.result()) 
                      for m in [loss] + (metrics or [])])
  end = " " if iteration < total else "\n"
  print("\r{}/{} - ".format(iteration, total)+metrics, end = end)


# Set up some basic variables
n_epochs = 5
batch_size = 32
n_steps = len(X_train) // batch_size
optimizer = keras.optimizers.Nadam(lr = 0.01)
loss_fn = keras.losses.mean_squared_error
mean_loss = keras.metrics.Mean()
metrics = [keras.metrics.MeanAbsoluteError()]

# And now we setup the basic training loop...

#Create a loop for each epoch
for epoch in range(1, n_epochs+1):
  print("Epoch {}/{}".format(epoch, n_epochs))
  #Create a loop for each step within the epoch
  for step in range(1, n_steps +1):
    #Sample a random batch
    X_batch, y_batch = random_batch(X_train_scaled, y_train)
    #Use Gradient Tape
    with tf.GradientTape() as tape:
      #Make a prediction using the model function
      y_pred = model(X_batch, training = True)
      #Compute the loss
      main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
      # Include the regularisation losses 
      loss = tf.add_n([main_loss], + model.loss)
    # Then we ask the tape to calculate the gradient of the loss in regards to 
    # to each trainable variable.
    gradients = tape.gradient(loss, model.trainable_variables)
    #Apply these gradients to the optimizer to perform gradient descent.
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    #-->Weight contraints should be included here <--
    
    #Update the metrics
    mean_loss(loss)
    for metric in metrics:
      metric(y_batch, y_pred)
    #Display the metrics in the status bar.
    print_status_bar(step * batch_size, len(y_train), mean_loss, metrics)
    for metric in [mean_loss] + metrics:
      metric.reset_states()



In [0]:
# Let's create our own training loop
# We want to display the epoch, iteration, mean training loss and  mean accuracy over each epoch updated at each iteration
# We want the validation loss and accuracy at the end of each epoch
# Finally, try and use different optimisers with different learning rates for the upper layers and lower layers.

# Data setup - we create a validation data set for the validation loss and accuracy metrics

(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()
X_train_full = X_train_full.astype(np.float32) / 255.
X_valid, X_train = X_train_full[:5000], X_train_full[5000:]
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
X_test = X_test.astype(np.float32) / 255.
print(X_train.shape)
l2_reg = keras.regularizers.l2(0.05)
#Define our model
model = keras.models.Sequential([
                                 keras.layers.Flatten(),
                                 keras.layers.Dense(100, activation = "relu",
                                                    kernel_initializer = "he_normal",
                                                    kernel_regularizer = l2_reg),
                                 keras.layers.Dense(10, 
                                                    activation = "softmax", 
                                                    kernel_regularizer=l2_reg)
])

def random_batch(X, y, batch_size = 32):
  #Create a random batch for each step
  idx = np.random.randint(len(X), size = batch_size)
  return X[idx], y[idx]

def print_status_bar(iteration, total, loss, metrics=None):
  # Define a function for the status bar that is printed after each step.
  metrics = " - ".join(["{}: {:.4f}".format(m.name, m.result()) 
                      for m in [loss] + (metrics or [])])
  end = " " if iteration < total else "\n"
  print("\r{}/{} - ".format(iteration, total)+metrics, end = end)


# Set up some basic variables
n_epochs = 5
batch_size = 32
n_steps = len(X_train) // batch_size
optimizer = keras.optimizers.Nadam(lr = 0.01)
optimizer_low = keras.optimizers.Adagrad(learning_rate = 0.001)
loss_fn = keras.losses.sparse_categorical_crossentropy
mean_loss = keras.metrics.Mean()
metrics = [keras.metrics.SparseCategoricalAccuracy()]

# We'll create a function to output the required validation metrics

def print_validation_metrics(X_val, y_val):
  #Run the model over the validation dataset
  y_pred = model(X_val, training = False)
  val_loss = np.mean(loss_fn(y_val, y_pred))
  val_acc = np.mean(keras.metrics.sparse_categorical_accuracy(
      tf.constant(y_val, dtype = np.float32),y_pred))
  print("Validation metrics: loss = {}; accuracy = {}".format(val_loss, val_acc))

# And now we setup the basic training loop...

#Create a loop for each epoch
for epoch in range(1, n_epochs+1):
  print("Epoch {}/{}".format(epoch, n_epochs))
  #Create a loop for each step within the epoch
  for step in range(1, n_steps +1):
    #Sample a random batch
    X_batch, y_batch = random_batch(X_train, y_train)
    #Use Gradient Tape
    with tf.GradientTape() as tape:
      #Make a prediction using the model function
      y_pred = model(X_batch, training = True)
      #Compute the loss
      main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
      # Include the regularisation losses 
      loss = tf.add_n([main_loss] + model.losses)
    # Then we ask the tape to calculate the gradient of the loss in regards to 
    # to each trainable variable.
    gradients = tape.gradient(loss, model.trainable_variables)
    #Apply these gradients to the optimizer to perform gradient descent.
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    #-->Weight contraints should be included here <--
    
    #Update the metrics
    mean_loss(loss)
    for metric in metrics:
      metric(y_batch, y_pred)
    #Display the metrics in the status bar.
    print_status_bar(step * batch_size, len(y_train), mean_loss, metrics)
    for metric in [mean_loss] + metrics:
      metric.reset_states()
  
  # Here we're going to add in print step for the valdation loss and accuracy
  print_validation_metrics(X_valid, y_valid)


# Optimisers for different levels of the network

Now we want to look at how we change the type of optimiser at different points in the network similar to what we saw in the wide and deep paper. 

This got a little more complicated for me so I had a look through the solution. Using the sequential API we can create our network in two pieces. 