<a href="https://colab.research.google.com/github/jackjameswillis/BSc-Final-Year-Project/blob/main/Cascade_Correlation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

We begin by importing libraries.

In [1]:
import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np

Unlike AdaNet, there is no common library publicly available that implements the Cascade Correlation algorithm. Therefore, the bulk of this notebook is the construction of the algorithm.

We define a seed for shuffling the dataset.

In [2]:
seed = 201338 

We download the MNIST dataset through keras.

In [3]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data(path='mnist.npz')

We now shuffle the dataset. Although the ideal solution is one that shuffles the dataset more regularly, for example every time we produce an input function as in our AdaNet implementation, for Cascade Correlation it is important that we maintain consistency in the order of items of data, which is easier with less shuffling.

In [4]:
rng = np.random.default_rng(seed=seed)

rng.shuffle(x_train)

rng = np.random.default_rng(seed=seed)

rng.shuffle(y_train)

The Cascade Correlation algorithm requires the manipulation of input data. Therefore, our data generation function is quite important. We need our input function generator to change the features and the labels depending on the use case of the input function.

In [5]:
# The key for the input features. We have one set of inputs,
# these being the flattened images, so we provide one key.
feat_key = "x"

# We use this generator function to produce the dataset later.
def generator(X, Y):

  def _gen():

    for x, y in zip(X, Y):

      yield tf.reshape(x, [len(x)]), y

  return _gen

# We map this function onto elements of the dataset 
# to make it compatible with the expected format of 
# input data.
def featurise(x, y):
  
  features = {feat_key: x}

  return features, y

# This input function returns another function that returns a tensorflow dataset
# which is used to feed data into models. We have parameters on this function
# generator that allow us to indicate and manipulate features and labels.
def input_fn(partition, batch_size, repeat=False, alter_x=False, x_alterations=False, alter_y=False, y_alterations=False):

  if partition == 'train':

    x, y = (x_train, y_train)

  elif partition == 'debug':

    x, y = (x_train[:1], y_train[:1])
    
  else:

    x, y = (x_test, y_test)

  # In this section we check to see if data needs to be altered. In the case 
  # where we want to alter the labels, we simply assign the given alterations
  # to be the labels. This facilitates the alternating training between
  # regression in the hidden layers and classification in the output layer.
  if alter_y: 

    y = y_alterations

  # In the case where we want to alter the features, we append given alterations
  # onto each pre-existing feature set. This facilitates the increasing input
  # size for each neuron added to the model. Aside from alterations, this code
  # also squishes and flattens the features.
  _x = []

  # If we want to alter features and have alterations:  
  if alter_x and len(x_alterations) != 0:

    for i in range(len(x)):

      _x.append(x[i]/255)

      _x[i] = np.concatenate((_x[i].flatten(),x_alterations[i]))
    
    x = np.asarray(_x)
      
  else:

    for i in range(len(x)):

      _x.append(x[i]/255)

      _x[i] = _x[i].flatten()
      
    x = np.asarray(_x)

  # This is the function returned, which generates the 
  # dataset from the preprocessed data.
  def _input_fn():
    
    # Make sure that our generator is producing a dateset of the correct data 
    # types depending on alterations made to the labels.

    y_type = tf.int32 if not alter_y else tf.float32
      
    dataset = tf.data.Dataset.from_generator(generator(x, y), (tf.float32, y_type), ((len(x[0])), ()))

    # If we are training, we want training to continue
    # after we reach the final batch of the dataset. We
    # achieve this by 'repeating' the dataset.
    if repeat: dataset = dataset.repeat()

    # Apply featre function to data and batch according
    # to batch_size parameter.
    dataset = dataset.map(featurise).batch(batch_size)

    # Return the dataset ready to train on.
    return dataset

  return _input_fn

The loss function implemented requires the generation of errors for each input, for each output, which we achieve through changing the output label into a one-hot binary encoding.

In [6]:
# This functions is useful for calculating error in classification by converting
# model output to binary.
def binary_representation(Y):

  return np.asarray([[1 if i==y else 0 for i in range(10)] for y in Y])

The Cascade Correlation algorithm comprises of a number of key steps:
* Train a model on top that is connected to each input and each layer.
* Get the errors of the top model.
* Train a new hidden unit to maximise the correlation between its output and
the error of the top model with inputs of the original input features and all previous hidden units.
* Repeat.

Our code needs to facilitate this. Our input function generator allows us to provide the different types of data required for each step in training. We now need to construct the architecture and feed the data into it. 

We note here that the implemented loss function (self.loss_fn) is not used in training. This is due to a simplifying idea that can be found in the presentations of the author of the original paper introducing this algorithm. We will go into detail about this in the report.

In [17]:
class CassCor:

  # Each hidden unit effectively corresponds to a new hidden layer.
  layers = []

  # The top layer is treated independently of the rest since it has a unique
  # function.
  top_layer = None

  # These variables are used to construct estimators.
  top_head = None

  top_feature_columns = None

  input_dimension = None

  # When we instantiate this class, we want to initially produce only the top layer.
  def __init__(self, top_head, input_dimension):

    self.top_head = top_head

    self.top_feature_columns = [tf.feature_column.numeric_column('x', shape=(input_dimension))]

    self.top_layer = tf.estimator.DNNEstimator(
        
        head=self.top_head,

        hidden_units=[],

        feature_columns=self.top_feature_columns,

        activation_fn=tf.nn.sigmoid,

        optimizer='SGD')

    self.input_dimension = input_dimension

  # The loss function used in training the hidden layer units, as described above.
  def loss_fn(self, labels, logits):

    residual_value = logits - tf.math.reduce_mean(logits)

    return -tf.abs(residual_value*labels)

  # The feedforward of this architecture requires us to produce a new dataset
  # for each layer, making the process of producing output from this architecture
  # significantly more involved.
  def feedforward_layers(self, partition, batch_size):

    # The list of features to be added through the x_alterations param of the input
    # function generator.
    additions = []

    # A store of the output of the current layer.
    y = None

    # For each hidden unit.
    for l in self.layers:

      # Generate the input function.
      current_input_fn = input_fn(
          
          partition=partition,

          batch_size=batch_size,

          alter_x=True,

          x_alterations=additions)

      # Get its predictions
      logits = list(l.predict(input_fn=current_input_fn))

      # If there are already hidden unit predictions, add the new logits to the list.
      if additions:
        
        additions = [additions[i] + list(logits[i]['predictions']) for i in range(len(logits))]

      # If there are no hidden unit logits yet, create the appropriate structure
      # for the new logits.
      else:

        additions = [list(logits[i]['predictions']) for i in range(len(logits))]

    return y, additions
  
  # Hidden units are trained in accordance with the previously described loss function,
  # which requires the generation of error data.
  def errors(self, partition, batch_size):

    # Generate the required feature additions for the top layer.
    _, additions = self.feedforward_layers(
        
        partition=partition,
        
        batch_size=batch_size)
    
    # Generate an input function used to get top layer output.
    predict_input_fn = input_fn(
        
        partition=partition,

        batch_size=batch_size,

        alter_x=True,

        x_alterations=additions)
    
    # Generate predictions of the top layer on the generated input function.
    predictions = self.top_layer.predict(
        
        input_fn=predict_input_fn)
    
    # Get the classification for each input.
    predictions = [prediction['class_ids'] for prediction in predictions]
    
    # Get the appropriate labels for calculating error.
    if partition == 'train':

      Y = y_train

    else:

      Y = y_test

    # Produce the binary representation of these labels.
    binary_Y = binary_representation(Y)
    
    # A store for errors calculated through this process.
    error = []

    # For each prediction, each label and each binary label.
    for prediction, y, b_y in zip(predictions, Y, binary_Y):
      
      # If the prediction is correct.
      if b_y[prediction[0]]:
        
        # There is no error on any output.
        error.append([0 for i in range(len(b_y))])

      # Otherwise the error of the predicted label is 1 and the correct label is
      # -1.
      else:

        temp = [0 for i in range(len(b_y))]

        temp[prediction[0]], temp[y] = 1, -1 

        error.append(temp)
    
    error = np.asarray(error)

    # Use the error to calculate the mean error.
    mean_error = np.sum(error, axis=0)/len(error)

    # For each set of errors, return the sum of the residual errors of each output.
    return tf.math.reduce_sum(error - mean_error, axis=1)
  
  # This function simply re-trains the top layer, which involves producing a 
  # new classifier. The top layer is then evaluated on a testing input function.
  def train_test_top(self, train_input_fn, test_input_fn, steps):

    self.top_feature_columns = [tf.feature_column.numeric_column(key=feat_key, shape=[self.input_dimension + len(self.layers)])]

    self.top_layer = tf.estimator.DNNEstimator(
        
        head=self.top_head,

        hidden_units=[],

        feature_columns=self.top_feature_columns,

        activation_fn=tf.nn.relu,

        optimizer='SGD')

    self.top_layer.train(
        
        input_fn=train_input_fn,

        max_steps=steps)

    results = self.top_layer.evaluate(
        
        input_fn=test_input_fn,
        
        steps=None)
    
    print("Accuracy:", results["accuracy"])

    print("Loss:", results["average_loss"])

  # This function trains a new layer. It begins by defining a regressor to the
  # required specification, trains on the provided training input function,
  # and appends the regressor onto the layers of the model.
  def train_layer(self, train_fn, loss_fn, steps):

    new_layer_feature_columns = self.top_feature_columns

    new_layer_head = tf.estimator.RegressionHead(label_dimension=1, loss_fn=loss_fn)

    new_layer = tf.estimator.DNNEstimator(
        
        head=new_layer_head,

        hidden_units=[],

        feature_columns=new_layer_feature_columns,

        activation_fn=tf.nn.relu,
        
        optimizer=tf.keras.optimizers.SGD)

    self.layers.append(new_layer)

    early_stopping = tf.estimator.experimental.stop_if_no_decrease_hook(
        
        self.layers[-1],

        metric_name='loss',

        max_steps_without_decrease=500,

        min_steps=100)
    
    self.layers[-1].train(
        
        input_fn=train_fn,

        max_steps=steps,
        
        hooks=[early_stopping])
  
  # This function collects the other training functions together into one
  # coherent procedure. Until a total number of training steps has been met,
  # the top layer of the model is trained, followed by a new layer being trained.
  def train_test(self, batch_size, total_steps, steps_per_unit, hidden_loss_fn):

    # Since we train the top layer and a new layer we train two units each
    # iteration. Therefore we iterate two training periods at a time.
    for i in range(0,total_steps,2*steps_per_unit):

      print('----------------------------------------------------------------')

      print('Generate Feature Additions')

      print('----------------------------------------------------------------')

      # Generate additional features on training data.
      _, train_additions = self.feedforward_layers(
          
          partition='train',
          
          batch_size=batch_size)
      
      # Generate a new training input function based on those features.
      train_input_fn = input_fn(
          
          partition='train', 
          
          batch_size=batch_size, 
          
          repeat=True,

          alter_x=True, 
          
          x_alterations=train_additions)
      
      # Generate additional features on testing data.
      _, test_additions = self.feedforward_layers(
          
          partition='test',
          
          batch_size=batch_size)

      # Generate a new testing input function based on those features.
      test_input_fn = input_fn(
          
          partition='test', 
          
          batch_size=batch_size, 

          alter_x=True,
          
          x_alterations=test_additions)
      
      print('----------------------------------------------------------------')

      print('Top Layer Training')

      print('----------------------------------------------------------------')
      
      # Train and test the top layer.
      self.train_test_top(
          
          train_input_fn=train_input_fn, 
          
          test_input_fn=test_input_fn, 
          
          steps=steps_per_unit)
      
      print('----------------------------------------------------------------')

      print('Hidden Layer Training')

      print('----------------------------------------------------------------')
      
      # Calculate the residual errors of training the new layer.
      errors = self.errors(
          
          partition='train',
          
          batch_size=batch_size)
      
      # Produce a new training input function for the new layer based on the
      # previously generated training additions and the generated errors.
      hidden_input_fn = input_fn(
          
          partition='train',

          batch_size=batch_size,

          repeat=True,

          alter_x=True,

          x_alterations=train_additions,

          alter_y=True,
          
          y_alterations=errors)
      
      # Generate and train the new layer.
      self.train_layer(
          
          train_fn=hidden_input_fn,

          loss_fn=hidden_loss_fn,

          steps=steps_per_unit)

    # We finish training by retraining the top layer in the same way as previously.
    # Generate additional features on training data.
    _, train_additions = self.feedforward_layers(
          
        partition='train',
          
        batch_size=batch_size)
      
    # Generate a new training input function based on those features.
    train_input_fn = input_fn(
          
        partition='train', 
          
        batch_size=batch_size, 
          
        repeat=True,

        alter_x=True, 
          
        x_alterations=train_additions)
      
    # Generate additional features on testing data.
    _, test_additions = self.feedforward_layers(
          
        partition='test',
          
        batch_size=batch_size)

    # Generate a new testing input function based on those features.
    test_input_fn = input_fn(
          
        partition='test', 
          
        batch_size=batch_size, 

        alter_x=True,
          
        x_alterations=test_additions)
      
    print('----------------------------------------------------------------')

    print('Final Top Training')

    print('----------------------------------------------------------------')
      
    # Train and test the top layer.
    self.train_test_top(
          
        train_input_fn=train_input_fn, 
          
        test_input_fn=test_input_fn, 
          
        steps=steps_per_unit)

Now we instantiate the class and train our model.

In [18]:
top_head = tf.estimator.MultiClassHead(n_classes=10)

In [19]:
model = CassCor(
    
    top_head=top_head,
    
    input_dimension=28*28)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpsgdakc7q', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [20]:
model.train_test(
    
    batch_size=64,

    total_steps=5000*20,

    steps_per_unit=5000,
    
    hidden_loss_fn=None)

----------------------------------------------------------------
Generate Feature Additions
----------------------------------------------------------------
----------------------------------------------------------------
Top Layer Training
----------------------------------------------------------------
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpqflk2rdr', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_se