<a href="https://colab.research.google.com/github/lukmanr/codenext/blob/master/Improving_Neural_Network_Training_in_TensorFlow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Improving Neural Network Training in TensorFlow

This notebook shows you how to improve training for the basic Neural Network we saw in the previous notebook. 

![two layer MNIST network](https://i.imgur.com/5LOj7mtl.png)

Let's start with the code from the basic Neural Network in TensorFlow we did last time.  We are going to "refactor", or reorganize, the code a little.  We'll put all the code that loads the MNIST training data into one method, ```load_training_data```.

In [0]:
%tensorflow_version 2.x
import tensorflow as tf

# weights W[784, 10]   784=28*28
W = tf.Variable(tf.zeros([784, 10]))

# biases b[10]
b = tf.Variable(tf.zeros([10]))

# Load the MNIST data set. The training set and test set are split
# automatically. We pre-process the data to "normalize" the pixels (make all 
# the pixel values between 0 and 1) and reshape the data into 2-D tensors of
# shape 1 x 784.
def load_training_data():
  (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

  # convert the integer pixel values to floats
  x_train = x_train / 255.0
  x_test = x_test / 255.0

  # reshape the images to be 2-D tensors of 1 x 784 pixels
  x_train = x_train.reshape([-1, 1, 784])
  x_test = x_test.reshape([-1, 1, 784])

  return (x_train, y_train), (x_test, y_test)

The code that defines the neural network will be the same, for now.  Remember from last time:  the network is a simple function that takes the inputs X as argument.  It computes the input to each neuron by multiplying the inputs times the weights and adding the biases.  Then it applies the "softmax" activation function to the inputs, to compute the output for each of the 10 neurons.

In [0]:
# The neural network
def neural_network(X):
  Inputs = tf.matmul(X, W) + b
  Y = tf.nn.softmax(Inputs)
  return Y

Now we define the neural network itself.  The network is a simple function that takes the inputs X as argument.  It computes the input to each neuron by multiplying the inputs times the weights and adding the biases.  Then it applies the "softmax" activation function to the inputs, to compute the output for each of the 10 neurons.  The softmax function enforces the sum of the outputs to be equal to 1, and it makes the high outputs higher and the low outputs lower, which helps the network "make a choice" between the 10 different digits. The method returns the outputs, a Tensor of 10 elements.

![two layer MNIST network](https://i.imgur.com/5LOj7mtl.png)


In [0]:
# The neural network
def neural_network(X):
  Inputs = tf.matmul(X, W) + b
  Y = tf.nn.softmax(Inputs)
  return Y

The loss function is the same sum of squares function we have seen before.

In [0]:
# The loss function
def loss(Y, Y_l):
  return tf.reduce_sum(tf.square(Y - Y_l))

Here is the training loop.

In [0]:
num_epochs = 1
learning_rate = 0.001

counter = 0
eval_steps = 500
total_loss = 0.0

# the outer training loop:  repeat for num_epochs
for e in range(num_epochs):

    # the inner training loop: train on one image and label from the data set
    for image, label in zip(x_train, y_train):

        # use this to count how many training loops we've executed
        counter += 1

        # convert the image and label to tensors
        X = tf.Variable(image, dtype=tf.float32)
        Y_l = tf.Variable(label, dtype=tf.float32)        

        # we wrap this 'with' statement around the next two lines, to tell 
        # TensorFlow to auto-compute the gradients
        with tf.GradientTape() as tape:
            # now get the output of the neural net
            Y = neural_network(X)

            # compute the loss function 
            current_loss = loss(Y, Y_l)

        # compute the gradients of the weights and biases with respect to the
        # loss function
        dW, db = tape.gradient(current_loss, [W, b])

        # update the weights and biases. Remember we need to reverse the sign; 
        # ie. we want the add the negative of the gradient (times the 
        # learning rate) to the weights and biases.
        W.assign(W - learning_rate * dW)
        b.assign(b - learning_rate * db)

        # every eval_steps, print the average loss since the last eval
        total_loss += current_loss.numpy()
        if (counter % eval_steps == 0):
            avg_loss = total_loss / eval_steps
            print(avg_loss)
            total_loss = 0.0