<a href="https://colab.research.google.com/github/lukmanr/codenext/blob/master/Improving_Neural_Network_Training_in_TensorFlow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Improving Neural Network Training in TensorFlow

This notebook shows you how to improve training for the basic Neural Network we saw in the previous notebook. 

![two layer MNIST network](https://i.imgur.com/5LOj7mtl.png)

Let's start with the code from the basic Neural Network in TensorFlow we did last time.  We are going to "refactor", or reorganize, the code a little.  We'll put all the code that loads the MNIST training data into one method, ```load_training_data```.

In [0]:
%tensorflow_version 2.x
import tensorflow as tf

# Load the MNIST data set. The training set and test set are split
# automatically. We pre-process the data to "normalize" the pixels (make all 
# the pixel values between 0 and 1) and reshape the data into 2-D tensors of
# shape 1 x 784.
def load_training_data():
  (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

  # convert the integer pixel values to floats
  x_train = x_train / 255.0
  x_test = x_test / 255.0

  # reshape the images to be 2-D tensors of 1 x 784 pixels
  x_train = x_train.reshape([-1, 1, 784])
  x_test = x_test.reshape([-1, 1, 784])

  return (x_train, y_train), (x_test, y_test)

The code that defines the neural network will be the same, for now.  Remember from last time:  the network is a simple function that takes the inputs X as argument.  It computes the input to each neuron by multiplying the inputs times the weights and adding the biases.  Then it applies the "softmax" activation function to the inputs, to compute the output for each of the 10 neurons.

In [0]:
# weights W[784, 10]   784=28*28
W = tf.Variable(tf.zeros([784, 10]))

# biases b[10]
b = tf.Variable(tf.zeros([10]))

# The neural network
def neural_network(X):
  Inputs = tf.matmul(X, W) + b
  Y = tf.nn.softmax(Inputs)
  return Y

The loss function is the same sum of squares function we have seen before.

In [0]:
# The loss function
def loss(Y, Y_l):
  return tf.reduce_sum(tf.square(Y - Y_l))

Here is the training loop.  We'll put that into a method as well, to make it easier to call repeatedly.  The method takes ```num_epochs```, ```learning_rate```, and ```eval_steps``` as hyperparameters, specifying the default values of each of those parameters.  We've also modified it to plot the average loss values every ```eval_steps``` if the plot flag is set to True. 

In [0]:
import matplotlib.pyplot as plt

def nn_training_loop(num_epochs = 1, 
                     learning_rate = 0.001, 
                     eval_steps = 500,
                     plot = False):

  (x_train, y_train), (x_test, y_test) = load_training_data()

  counter = 0
  total_loss = 0.0
  avg_losses = []

  # the outer training loop:  repeat for num_epochs
  for e in range(num_epochs):

      # the inner training loop: train on one image and label from the data set
      for image, label in zip(x_train, y_train):

          # use this to count how many training loops we've executed
          counter += 1

          # convert the image and label to tensors
          X = tf.Variable(image, dtype=tf.float32)
          Y_l = tf.Variable(label, dtype=tf.float32)        

          # we wrap this 'with' statement around the next two lines, to tell 
          # TensorFlow to auto-compute the gradients
          with tf.GradientTape() as tape:
              # get the output of the neural net
              Y = neural_network(X)

              # compute the loss function 
              current_loss = loss(Y, Y_l)

          # compute the gradients of the weights and biases with respect to the
          # loss function
          dW, db = tape.gradient(current_loss, [W, b])

          # update the weights and biases. Remember we need to reverse the sign; 
          # ie. we want the add the negative of the gradient (times the 
          # learning rate) to the weights and biases.
          W.assign(W - learning_rate * dW)
          b.assign(b - learning_rate * db)

          # every eval_steps, print the average loss since the last eval
          total_loss += current_loss.numpy()
          if (counter % eval_steps == 0):
              avg_loss = total_loss / eval_steps
              avg_losses.append(avg_loss)
              if plot:
                plt.plot(avg_losses)
                plt.show()
              else:
                print("avg_loss = ", avg_loss)
              total_loss = 0.0

Now call the function to execute the loop.

In [0]:
nn_training_loop()

  Note again that the loss does not seem to go down.  We need to make some improvements to this Neural Network in order for it to learn how to solve the MNIST data problem!

## Improvements to training
We are going to try these things to improve the training of this neural network:

1.   Use **Stochastic Gradient Descent**:  train in batches.
2.   Use **a better loss** function (the "cross entropy loss").
3.   Add some **"hidden layers"**.



First let's create a new method to create batches of data.


In [0]:
def batch_training_data(batch_size=100):
  (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

  # reshape the images to be 2-D tensors of 1 x 784 pixels
  x_train = x_train.reshape([-1, 784])
  x_train = tf.convert_to_tensor(x_train, dtype=tf.float32) / 255.
  x_test = x_test.reshape([-1, 784])
  x_test = tf.convert_to_tensor(x_test, dtype=tf.float32) / 255.

  # convert the labels to floats
  y_train = tf.convert_to_tensor(y_train, dtype=tf.float32)
  y_test = tf.convert_to_tensor(y_test, dtype=tf.float32)

  # create a TensorFlow dataset for train and test sets
  train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
  test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))

  # shuffle (randomly select) the dataset, and set the batch size
  train_dataset = train_dataset.shuffle(10000).batch(batch_size)

  return train_dataset, test_dataset

In [24]:
batch_training_data()

(<BatchDataset shapes: ((None, 784), (None,)), types: (tf.float32, tf.float32)>,
 <TensorSliceDataset shapes: ((784,), ()), types: (tf.float32, tf.float32)>)

To calculate our final results we use this code.

In [0]:
def test_results(test_data):

  test_metric = tf.keras.metrics.SparseCategoricalAccuracy()
  test_data = test_data.batch(1000)
  for test_images, test_labels in test_data:
    predictions = neural_network(test_images)
    test_metric.update_state(test_labels, predictions)

  print("Accuracy = ", test_metric.result().numpy() * 100)

In [0]:
import matplotlib.pyplot as plt

def nn_training_loop_with_batches(num_epochs = 1, 
                                  learning_rate = 0.001, 
                                  eval_steps = 500, 
                                  batch_size = 100,
                                  plot = False):

  train_data, test_data = batch_training_data(batch_size)

  counter = 0
  total_loss = 0.0
  avg_losses = []

  # the outer training loop:  repeat for num_epochs
  for e in range(num_epochs):

      # the inner training loop: train on one image and label from the data set
      for X, Y_l in train_data:

          # use this to count how many training loops we've executed
          counter += 1

          # we wrap this 'with' statement around the next two lines, to tell 
          # TensorFlow to auto-compute the gradients
          with tf.GradientTape() as tape:
              # get the output of the neural net
              Y = neural_network(X)

              print("X.shape", X.shape)

              print("Y.shape", Y.shape)

              print("Y_l.shape", Y_l.shape)

              # compute the loss function 
              current_loss = loss(Y, Y_l)

          break

      break

          # compute the gradients of the weights and biases with respect to the
          # loss function
          #dW, db = tape.gradient(current_loss, [W, b])

          # update the weights and biases. Remember we need to reverse the sign; 
          # ie. we want the add the negative of the gradient (times the 
          # learning rate) to the weights and biases.
          #W.assign(W - learning_rate * dW)
          #b.assign(b - learning_rate * db)

          # every eval_steps, print the average loss since the last eval
          #total_loss += current_loss.numpy()
          #if (counter % eval_steps == 0):
          #    avg_loss = total_loss / eval_steps
          #    avg_losses.append(avg_loss)
          #    if (plot):
          #      plt.plot(avg_losses)
          #      plt.show()
          #    else:
          #      print("avg_loss = ", avg_loss)
          #    print("counter = ", counter)
          #    total_loss = 0.0

          #test_results(test_data)

In [29]:
nn_training_loop_with_batches()

X.shape (100, 784)
Y.shape (100, 10)
Y_l.shape (100,)


InvalidArgumentError: ignored

In [0]:
_, test_data = batch_training_data()
test_results(test_data)

### A better loss function

For a problem in which we are trying to classify a set of objects, a better loss function than "squared error" is "cross entropy".  It computes the difference between two probability distributions.  In this case, the output of the 10 neurons is a probablity distribution, and the label represents another probability distribution.

In [0]:
# The loss function
def loss(Y, Y_l):
  return tf.keras.losses.sparse_categorical_crossentropy(Y_l, Y)

In [0]:
nn_training_loop_with_batches()

In [0]:
# weights/biases for first layer
W1 = tf.Variable(tf.zeros([784, 20]))
b1 = tf.Variable(tf.zeros([20]))

# weights/biases for second layer
W2 = tf.Variable(tf.zeros([20, 20]))
b2 = tf.Variable(tf.zeros([20]))

# weights/biases for final layer
W3 = tf.Variable(tf.zeros([20, 10]))
b3 = tf.Variable(tf.zeros([10]))

# The neural network
def neural_network(X):
  input1 = tf.matmul(X, W1) + b1
  output1 = tf.nn.relu(input1)

  input2 = tf.matmul(output1, W2) + b2
  output2 = tf.nn.relu(input2)

  input3 = tf.matmul(output2, W3) + b3
  Y = tf.nn.softmax(input3)
  return Y

In [0]:
import matplotlib.pyplot as plt

def multi_layer_nn_training_loop(num_epochs = 1, 
                                 learning_rate = 0.001, 
                                 eval_steps = 500, 
                                 batch_size = 100,
                                 plot = False):

  train_data, test_data = batch_training_data(batch_size)

  counter = 0
  total_loss = 0.0
  avg_losses = []

  # the outer training loop:  repeat for num_epochs
  for e in range(num_epochs):

      # the inner training loop: train on one image and label from the data set
      for X, Y_l in train_data:

          # use this to count how many training loops we've executed
          counter += 1

          # we wrap this 'with' statement around the next two lines, to tell 
          # TensorFlow to auto-compute the gradients
          with tf.GradientTape() as tape:
              # get the output of the neural net
              Y = neural_network(X)

              # compute the loss function 
              current_loss = loss(Y, Y_l)

          # compute the gradients of the weights and biases with respect to the
          # loss function
          dW1, db1, dW2, db2, dW3, db3 = tape.gradient(current_loss, [W1, b1, W2, b2, W3, b3])

          # update the weights and biases. Remember we need to reverse the sign; 
          # ie. we want the add the negative of the gradient (times the 
          # learning rate) to the weights and biases.
          W1.assign(W1 - learning_rate * dW1)
          b1.assign(b1 - learning_rate * db1)

          W2.assign(W2 - learning_rate * dW2)
          b2.assign(b2 - learning_rate * db2)

          W3.assign(W3 - learning_rate * dW3)
          b3.assign(b3 - learning_rate * db3)

          # every eval_steps, print the average loss since the last eval
          total_loss += current_loss.numpy()
          if (counter % eval_steps == 0):
              avg_loss = total_loss / eval_steps
              avg_losses.append(avg_loss)
              if (plot):
                plt.plot(avg_losses)
                plt.show()
              else:
                print("avg_loss = ", avg_loss)
              print("counter = ", counter)
              total_loss = 0.0

  test_results(test_data)

In [0]:
multi_layer_nn_training_loop()