# Introduction to Neural Networks
Pre-requisite: TensorFlow basics

## Training a DNN with TensorFlow's High-Level API

TensorFlow has a nice lightweight high level API (mostly tf.keras) that we can use to construct and train our neural networks. Here we're going to use the MNIST dataset to predict handwritten digits.



In [0]:
import tensorflow as tf
import numpy as np

# To plot pretty figures
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12

# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "ann"

def save_fig(fig_id, tight_layout=True):
    path = os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID, fig_id + ".png")
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format='png', dpi=300)

# to make this notebook's output stable across runs
def reset_graph(seed=42):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

In [0]:
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
X_train = X_train.astype(np.float32).reshape(-1, 28*28) / 255.0
X_test = X_test.astype(np.float32).reshape(-1, 28*28) / 255.0
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)
X_valid, X_train = X_train[:5000], X_train[5000:]
y_valid, y_train = y_train[:5000], y_train[5000:]

In [15]:
feature_cols = [tf.feature_column.numeric_column("X", shape=[28 * 28])]
dnn_clf = tf.estimator.DNNClassifier(hidden_units=[300,100], n_classes=10, feature_columns=feature_cols)
input_fn = tf.estimator.inputs.numpy_input_fn(x={"X": X_train}, y=y_train, num_epochs=40, batch_size=50, shuffle=True)
dnn_clf.train(input_fn=input_fn)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmprzcuxfvn', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7ff74a5a1240>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
To construct input pipelines, use the `tf.

<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifier at 0x7ff74a5a1048>

The above code gets the number of feature columns = 784 (28*28), then we construct our DNN to have 2 hidden layers , one with a width of 300 and the second with a width of 100. The next step defines our training function such that we shuffle our input data with mini-batches of 50 and 40 epochs. We have 44000 steps here because we have 55000 training samples (we initially had 60000 but put 5000 in the validation set) and we divided them to mini-batches of 50 with 40 epochs.

In [16]:
test_input_fn = tf.estimator.inputs.numpy_input_fn(x={"X": X_test}, y=y_test, shuffle=False)
eval_results = dnn_clf.evaluate(input_fn=test_input_fn)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-03-15T19:31:23Z
INFO:tensorflow:Graph was finalized.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from /tmp/tmprzcuxfvn/model.ckpt-44000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-03-15-19:31:24
INFO:tensorflow:Saving dict for global step 44000: accuracy = 0.9802, average_loss = 0.09942919, global_step = 44000, loss = 12.585974
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 44000: /tmp/tmprzcuxfvn/model.ckpt-44000


In [17]:
eval_results

{'accuracy': 0.9802,
 'average_loss': 0.09942919,
 'global_step': 44000,
 'loss': 12.585974}

In the above code, we define our test input function we can see these to be like callbacks for how to pass our inputs and we evaluate on that test set.

We can see that we have a 98.02% accuracy on our test set. Not bad!

The below code shows us the prediction of the first sample in our test set. `dnn_clf.predict` returns a generator which we convert to a list and get the prediction of the first sample.

It predicts it to be 7 and we can see that the probability that the digit is 7 is 1 and the rest is very low so our model is quite confident with this prediction and if we take a look at `logits` those are the values that our output returns.

In [21]:
y_pred_iter = dnn_clf.predict(input_fn=test_input_fn)
y_pred = list(y_pred_iter)
y_pred[0]

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmprzcuxfvn/model.ckpt-44000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


{'class_ids': array([7]),
 'classes': array([b'7'], dtype=object),
 'logits': array([ -5.5540705,   2.6465282,   3.0625932,   7.0229154,  -6.6496286,
         -4.618923 , -13.450632 ,  24.919262 ,  -3.8851683,   3.2711911],
       dtype=float32),
 'probabilities': array([5.8290842e-14, 2.1236096e-10, 3.2193598e-10, 1.6893329e-08,
        1.9489715e-14, 1.4850095e-13, 2.1685327e-17, 1.0000000e+00,
        3.0931076e-13, 3.9660858e-10], dtype=float32)}

## Training a DNN Using Plain TensorFlow
As opposed to what we saw above using tf.keras and the high-level APIs.

## Construction Phase

In [0]:
n_inputs = 28*28  # MNIST input layer
n_hidden1 = 300   # 1st hidden layer
n_hidden2 = 100   # 2nd hidden layer
n_outputs = 10    # output layer (number of digits)

In [0]:
reset_graph()

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int32, shape=(None), name="y")

In [0]:
def neuron_layer(X, n_neurons, name, activation=None):
    with tf.name_scope(name): # set namescope for each layer just to have a nice graph in TensorBoard
        n_inputs = int(X.get_shape()[1])
        stddev = 2 / np.sqrt(n_inputs + n_neurons) # standard deviation of gaussian distribution
        init = tf.truncated_normal((n_inputs, n_neurons), stddev=stddev) #initialize our weights to a random gaussian distribution
        W = tf.Variable(init, name="kernel") # often named kernel
        b = tf.Variable(tf.zeros([n_neurons]), name="bias")
        Z = tf.matmul(X, W) + b
        if activation is not None: # apply activation function if it exists
            return activation(Z)
        else:
            return Z

In [0]:
with tf.name_scope("dnn"):
    hidden1 = neuron_layer(X, n_hidden1, name="hidden1", activation=tf.nn.relu)
    hidden2 = neuron_layer(hidden1, n_hidden2, name="hidden2", activation=tf.nn.relu)
    logits = neuron_layer(hidden2, n_outputs, name="outputs") # output layer before going into the softmax computation layer

These are our layers, now we need t define the cost function.

In [0]:
with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")

`tf.nn.sparse_softmax_cross_entropy_with_logits` which computes the cross entropy based on "logits" and it expects labels in the form of integers ranging from 0 to 9 this will give us a 1D tensor containing the cross entrpy for each instance then we call `reduce_mean` to get the mean cross entropy over al instances. 

`tf.nn.sparse_softmax_cross_entropy_with_logits` is equivalent to applying the softmax function and then computing the cross entropy but it is more efficient and it properly takes care of corner cases: when logits are large, floating point rounding errors may cause the softmax output to be equal to 0 or 1, and in the case of log(0) it actually computes a log(eps) where eps is a small number. to avoid the negative infinity.

That's why we didn't just apply the normal softmax activation function. Additionally, there is another function called `softmax_cross_entropy_with_logits()` which takes labels in the form of one-hot vectors instead of ints from 0 to the number of classes  - 1 that `tf.nn.sparse_softmax_cross_entropy_with_logits` takes.



In [0]:
learning_rate = 0.01

with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

In [0]:
with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

For evaluating our model, we just use the accuracy measure. We compute the prediction for each instance and return a boolean if it is the same as the target class. So `correct` would be a list of booleans and then we computer the mean of the entire list to determine our model's accuracy.

In [0]:
init = tf.global_variables_initializer()
saver = tf.train.Saver()

## Execution Phase

In [0]:
n_epochs = 40
batch_size = 50

In [0]:
def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X)) #shuffle our samples
    n_batches = len(X) // batch_size 
    for batch_idx in np.array_split(rnd_idx, batch_size): # create batches of batch_size from shuffled samples 
        X_batch, y_batch = X[batch_idx], y[batch_idx]
        yield X_batch, y_batch

In [68]:
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        acc_batch = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
        acc_val = accuracy.eval(feed_dict={X: X_valid, y: y_valid})
        print(epoch, "Batch accuracy:", acc_batch, "Val accuracy:", acc_val)

    save_path = saver.save(sess, "./my_model_final.ckpt")

0 Batch accuracy: 0.57454544 Val accuracy: 0.5662
1 Batch accuracy: 0.72727275 Val accuracy: 0.7342
2 Batch accuracy: 0.7845455 Val accuracy: 0.7984
3 Batch accuracy: 0.8381818 Val accuracy: 0.8302
4 Batch accuracy: 0.8527273 Val accuracy: 0.8496
5 Batch accuracy: 0.84636366 Val accuracy: 0.8622
6 Batch accuracy: 0.8636364 Val accuracy: 0.8702
7 Batch accuracy: 0.8709091 Val accuracy: 0.877
8 Batch accuracy: 0.8881818 Val accuracy: 0.8842
9 Batch accuracy: 0.8809091 Val accuracy: 0.8864
10 Batch accuracy: 0.88272727 Val accuracy: 0.8904
11 Batch accuracy: 0.91 Val accuracy: 0.8954
12 Batch accuracy: 0.8872727 Val accuracy: 0.8984
13 Batch accuracy: 0.89454544 Val accuracy: 0.9016
14 Batch accuracy: 0.8963636 Val accuracy: 0.9044
15 Batch accuracy: 0.9 Val accuracy: 0.906
16 Batch accuracy: 0.90636367 Val accuracy: 0.9074
17 Batch accuracy: 0.8972727 Val accuracy: 0.9106
18 Batch accuracy: 0.8909091 Val accuracy: 0.9126
19 Batch accuracy: 0.89 Val accuracy: 0.9136
20 Batch accuracy: 0.9

In [74]:
with tf.Session() as sess:
    saver.restore(sess, save_path)
    X_new_scaled = X_test[:20]
    Z = logits.eval(feed_dict={X: X_new_scaled})
    y_pred = np.argmax(Z, axis=1)

INFO:tensorflow:Restoring parameters from ./my_model_final.ckpt


In [46]:
print("Predicted classes:", y_pred)
print("Actual classes:   ", y_test[:20])

Predicted classes: [7 2 1 0 4 1 4 9 6 9 0 6 9 0 1 5 9 7 3 4]
Actual classes:    [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4]


TensorFlow offers us a function to define our fully connected layers & softmax layers so we don't need to define our function `neuron_layer`. Example:

In [48]:
# with tf.name_scope("dnn"):
#     hidden1 = tf.layers.dense(X, n_hidden1, name="hidden1",
#                               activation=tf.nn.relu)
#     hidden2 = tf.layers.dense(hidden1, n_hidden2, name="hidden2",
#                               activation=tf.nn.relu)
#     logits = tf.layers.dense(hidden2, n_outputs, name="outputs")
#     y_proba = tf.nn.softmax(logits)

Instructions for updating:
Use keras.layers.dense instead.
