# Gradient Training - Basic ML with TensorFlow

Below notebook shows basic example for neural network training using TensorFlow. In the excercise below, you'll work on a model that predicts numbers from images. To do so, we will use MNIST database.

This notebook is based on [this repository example](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/2_BasicModels/logistic_regression.py).

---------------------

## Step 1. Prepare your environment

To run this notebook you need to have:
 - Tensorflow (either CPU or GPU version),
 - Numpy,
 - Matplotlib.

**Google Collaborator** did it for us, so we don't have to execute any command and install packages manually.

In [0]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

## Step 2. Prepare dataset

In this step we will prepare our MNIST dataset for training.

**Don't worry!** Image preprocessing part will be described lated during our course when we will talk more about Computer Vision.

Below preprocessing makes sure that:
  - images are zero-centered and normalized,
  - images are flattened to vector of 784 floats (28*28),
  - labels are in the one hot format.

In [0]:
# Load MNIST dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Compute mean and std image for normalization
mean_image = np.mean(x_train)
std_image = np.std(x_train)

# Normalize images by subtracting mean image and dividing std image
x_train = (x_train - mean_image) / std_image
x_test = (x_test - mean_image) / std_image

# Change shape of images from 2D image 28x28 to 1D vector 1x784
x_train = x_train.reshape(x_train.shape[0], 28*28)
x_test = x_test.reshape(x_test.shape[0], 28*28)

# Convert labels from numbers to one hot vector
y_train = tf.Session().run(tf.one_hot(y_train, 10))
y_test = tf.Session().run(tf.one_hot(y_test, 10))

In [0]:
assert x_train.shape == (60000, 784), 'Invalid training images shape! Please double check your code.'
assert y_train.shape == (60000, 10), 'Invalid training labels shape! Please double check your code.'
assert x_test.shape == (10000, 784), 'Invalid training images shape! Please double check your code.'
assert y_test.shape == (10000, 10), 'Invalid training labels shape! Please double check your code.'
assert abs(np.mean(x_train)) <= 1e-2, 'Mean pixel values in train dataset should be close to 0. Current value: {}'.format(np.mean(x_train))
assert abs(1.0 - np.std(x_train)) <= 1e-2, 'Standard deviation of pixel values in train dataset should be close to 1. Current value: {}'.format(np.std(x_train))
assert abs(np.mean(x_test)) <= 1e-2, 'Mean pixel values in test dataset should be close to 0. Current value: {}'.format(np.mean(x_test))
assert abs(1.0 - np.std(x_test)) <= 1e-2, 'Standard deviation of pixel values in test dataset should be close to 1. Current value: {}'.format(np.std(x_test))
print('Everything is fine!')

Now, let's visualize an example entry in training dataset!


In [0]:
plt.figure()
plt.imshow(x_train[0].reshape(28, 28))
plt.show()
print('Label as one hot:', y_train[0])
print('Label as number:', np.argmax(y_train[0]))

## Step 3. Define hyperparameters

In machine learning, a **hyperparameter** is a parameter whose value is set before the learning process begins!

In [0]:
learning_rate = 0.1
training_epochs = 25

## Step 4. Prepare your graph - a neural network model

For the purpose of this training, you'll prepare a simple one-layer neural network.

Below you can find a representation of our single neural network layer as a TensorFlow graph:

![Simple model graph](https://i.imgur.com/N8MproZ.png)

Below, you need to define:
  - `x` placeholder for network input,
  - `W` variable for weight tensor initialized with normal distribution,
  - `b` variable for bias tensor initialized with zeros,
  - `model` which defines computation graph.

In [0]:
# Inputs
x = ...

# Set model weights
W = ...
b = ...

# Construct model
model = ...

In [0]:
assert x.shape[1] == 784, 'Your neural network needs to get an input of 784 values (flatten image 28x28).'
assert W.shape[0] == 784, 'Your weight Tensor\'s first dimension should equal to 784.'
assert W.shape[1] == 10, 'Your weight Tensor\'s second dimension should equal to 10.'
assert b.shape[0] == 10, 'Your bias Tensor\'s dimension should equal to 10.'
print('Everything is fine!')

## Step 5. Prepare your optimizer

Our model is ready for training. But before we do so, we need to prepare a loss function and SGD optimizer. Also, you will need to prepare an initializer for all variables. After that, we need to have a Session object for computations during this excercise.

Below you can find a representation of our loss function as a TensorFlow graph:

![Graph for loss function](https://i.imgur.com/Echiwjp.png)

Below, you need to define:
  - `y` placeholder for output label (needed to compute loss),
  - `loss` which defines computation graph,
  - `optimizer` which should be SGD (tf.train.GradientDescentOptimizer).

In [0]:
# Minimize error using cross entropy
y = ...
loss = ...

# Gradient Descent
optimizer = ...

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

# Prepare global TF Session object
sess = tf.Session()

In [0]:
assert 'GradientDescent' in optimizer.name, 'Change your optimizer to SGD!'
assert y.shape[1] == 10, 'Your neural network needs to output 10 values (possible 10 classes).'
print('Everything is fine!')

## Step 6. Let's start the training!

It's high time to train our model!

What you need to do is:
  - initialize your variables,
  - pass whole training dataset multiple times,
  - pass whole test dataset to know how well our model works.

In [0]:
# Run the initializer
...

# Training cycle
for epoch in range(training_epochs):
  # Pass whole training dataset through our model
  _, total_loss = sess.run([...], feed_dict={
    x: ...,
    y: ...,
  })

  # Display logs per epoch step
  print("Epoch:", '%04d' % (epoch+1), "loss=", "{:.9f}".format(total_loss))

print('Training Finished!')

# Test model
correct_prediction = tf.equal(tf.argmax(model, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("Accuracy:", sess.run(accuracy, {x: x_test, y: y_test}))

Now, let's visualize our network prediction on example test image.

In [0]:
plt.figure()
plt.imshow(x_test[0].reshape(28, 28))
plt.show()

model_input = np.expand_dims(x_test[0], 0)  # Add batch dimension!
model_output = sess.run(model, {x: model_input})
print('Model responded:', model_output[0])
print('It is:', np.argmax(model_output[0]))

## Step 7. Batch training

Instead of passing whole dataset at once, it is better to train our model with batches (or even minibatches).

What you need to do is:
  - initialize your variables,
  - pass whole training dataset multiple times but this time in multiple smaller parts (named "batches"),
  - pass whole test dataset to know how well our model works (you can also pass them with minibatches).

In [0]:
# Define new hyperparameter and tweak learning rate
batch_size = 32
learning_rate = 0.01

# Run the initializer
sess.run(init)

# Training cycle
number_of_batches = int(x_train.shape[0] / batch_size)
for epoch in range(training_epochs):
  total_loss = 0.0

  # Loop over all batches
  for i in range(number_of_batches):
    batch_xs = x_train[...:...]
    batch_ys = y_train[...:...]
    _, batch_loss = sess.run([...], feed_dict={
        x: batch_xs,
        y: batch_ys,
    })
    total_loss += batch_loss

  # Display logs per epoch step
  mean_loss = total_loss / number_of_batches
  print("Epoch:", '%04d' % (epoch+1), "loss=", "{:.9f}".format(mean_loss))

print("Optimization Finished!")

# Test model
correct_prediction = tf.equal(tf.argmax(model, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("Accuracy:", sess.run(accuracy, {x: x_test, y: y_test}))

Now, let's visualize our network prediction on example test image.

In [0]:
plt.figure()
plt.imshow(x_test[0].reshape(28, 28))
plt.show()

model_input = np.expand_dims(x_test[0], 0)  # Add batch dimension!
model_output = sess.run(model, {x: model_input})
print('Model responded:', model_output[0])
print('It is:', np.argmax(model_output[0]))

## Summary

That's all! I hope that you've learn something useful today :)

**Your homework:**
  - experiment with different learning rate & batch size,
  - check what will happen if you change the way you initialize layers,
  - extend your model and add another fully-connected layer,
  - split training dataset into training dataset and validation dataset to prevent overfitting.

**Where to get more information?**
  - Great introduction & visualization of Neural Networks by [3Blue1Brown on YouTube](https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi),
  - TensorFlow in 5 minutes by [Siraj Raval on YouTube](https://www.youtube.com/watch?v=2FmcHiLCwTU),
  - Short videos about Neural Networks by [Welch Labs on YouTube](https://www.youtube.com/playlist?list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU),
  - Part I: Applied Math and Machine Learning Basics of [Deep Learning Book](http://www.deeplearningbook.org),
  - A Visual and Interactive Guide to the Basics of Neural Networks by Jay Alammar's ([Blog post](http://jalammar.github.io/visual-interactive-guide-basics-neural-networks/)),
  - [CS229 Course](http://cs229.stanford.edu) from Stanford (lectures can be found on YouTube),
  - [CS231n Course](http://cs231n.stanford.edu) from Stanford (lectures can be found on YouTube).