# Exercise: Introduction to TensorFlow
---

This weeks exercise will walk you through the basics of Tensorflow. The goal is for you to be familiar with:
- How to work with a computational graph
- The tensorflow graph
- How to run a graph using tf.Session
- Defining various nodes such as variables, constant and placeholders
- Defining various operations as add, matmul, etc
- The structure of how to implement a nerual network in tensorflow

Links:
- [Task1: Computational graph](#Task1)
- [Task2: Math operations in tensorflow](#Task2)
- [Task3: Play with cifar10 in tensorflow](#Task3)



Software verion:
- Python 3.6
- Tensorflow 1.4.0


<a id='Task1'></a>
### Task1: Computational graph
---

Here you should compute the forward and the backward pass of the given computational graph. 

The graph represent the sigmoid function:     $\sigma(\vec{x}, \vec{w}) = \frac{1}{1 + exp(-[w_0x_0+w_1x_1+w_2])}$

The green values are the input values which should be propagated forward, and the red values are the values you should propagate backward (the gradients). You should compute the values for the forward and backward pass for all nodes as we did in the lecture. 

![title](images/sigmoid_graph.png)


---
Simple example from the lecture:

<img src="images/simple_graph_example.png" alt="Drawing" style="width: 600px;"/>



In [1]:
# To do
# Draw the computational graph of the sigmoid function and fill in the values for the forward and backward pass.

<a id='Task2'></a>
### Task2: Math operations in tensorflow
---
In this exersice you shall implement mathematical operations in tensorflow.

In [2]:
import tensorflow as tf
import numpy as np

  from ._conv import register_converters as _register_converters


---

In [3]:
def math1(mat):
    """
    Square each value in mat separately
    """
    res=np.square(mat)
    return res

In [4]:
tf.reset_default_graph()
mat = tf.convert_to_tensor(value=np.arange(16).reshape((4, 4)),dtype=tf.float32)
res = math1(mat)
with tf.Session() as sess:
    #print(sess.run(mat))    
    print(sess.run(res))
    

[[  0.   1.   4.   9.]
 [ 16.  25.  36.  49.]
 [ 64.  81. 100. 121.]
 [144. 169. 196. 225.]]


Expected output
```python
[[   0.    1.    4.    9.]
 [  16.   25.   36.   49.]
 [  64.   81.  100.  121.]
 [ 144.  169.  196.  225.]]
```

---

In [26]:
def math2(mat):
    """
    Return the sum of all the values in mat
    """
    res=tf.reduce_sum(mat)
    return res

In [27]:
tf.reset_default_graph()
mat = tf.convert_to_tensor(value=np.arange(16).reshape((4, 4)),dtype=tf.float32)
res = math2(mat)
with tf.Session() as sess:
    #print(sess.run(mat))    
    print(sess.run(res))

120.0


Expected output

120

---

In [32]:
def math3(mat):
    """
    Return the sum of each row in mat
    """
    res=tf.reduce_sum(mat,1)
    return res

In [33]:
tf.reset_default_graph()
mat = tf.convert_to_tensor(value=np.arange(16).reshape((4, 4)),dtype=tf.float32)
res = math3(mat)
with tf.Session() as sess:
    #print(sess.run(mat))
    print(sess.run(res))


[ 6. 22. 38. 54.]


Expected output

[ 6 22 38 54]

---

In [34]:
def math4(mat):
    """
    Return a scaled version of mat, so that it sums to 1
    """
    res = np.divide(mat,math2(mat))
    return res

In [35]:
tf.reset_default_graph()
mat = tf.convert_to_tensor(value=np.arange(16).reshape((4, 4)),dtype=tf.float32)
res = math4(mat)
with tf.Session() as sess:
    #print(sess.run(mat))    
    print(sess.run(res))


[[0.         0.00833333 0.01666667 0.025     ]
 [0.03333334 0.04166667 0.05       0.05833333]
 [0.06666667 0.075      0.08333334 0.09166667]
 [0.1        0.10833333 0.11666667 0.125     ]]


Expected output
```
[[ 0.          0.00833333  0.01666667  0.025     ]
 [ 0.03333333  0.04166667  0.05        0.05833333]
 [ 0.06666667  0.075       0.08333333  0.09166667]
 [ 0.1         0.10833333  0.11666667  0.125     ]]
 ```
 ---   

In [76]:
def math5(mat, v):
    """
    Element-wise multiply each column in mat with vector v
    """
    res=np.multiply(mat[:],v)
    print(v)
    return res

In [77]:
tf.reset_default_graph()
mat = tf.convert_to_tensor(value=np.arange(16).reshape((4, 4)),dtype=tf.float32)
vec = tf.convert_to_tensor(value=np.arange(4),dtype=tf.float32)
res = math5(mat,vec)
with tf.Session() as sess:
    #print(sess.run(mat))    
    #print(sess.run(vec))
    print(sess.run(res))
    

Tensor("Const_1:0", shape=(4,), dtype=float32)
[[ 0.  1.  4.  9.]
 [ 0.  5. 12. 21.]
 [ 0.  9. 20. 33.]
 [ 0. 13. 28. 45.]]


Expected output
```
[[ 0  0  0  0]
 [ 4  5  6  7]
 [16 18 20 22]
 [36 39 42 45]]
 ```
 --- 

In [None]:
def math6(mat, v):
    """
    Element-wise multiply each row in mat with vector v
    """
    res=None
    return res


In [None]:
tf.reset_default_graph()
mat = tf.convert_to_tensor(value=np.arange(16).reshape((4, 4)),dtype=tf.float32)
vec = tf.convert_to_tensor(value=np.arange(4),dtype=tf.float32)
res = math6(mat,vec)
with tf.Session() as sess:
    #print(sess.run(mat))    
    #print(sess.run(vec))
    print(sess.run(res))

Expected output
```
[[ 0  1  4  9]
 [ 0  5 12 21]
 [ 0  9 20 33]
 [ 0 13 28 45]]
 ```
 ---

In [None]:
def math7(mat, v):
    """
    Matrix multiply matrix mat with vector v
    """
    #Hint - check dimention on v
    res=None
    return res

In [None]:
tf.reset_default_graph()
mat = tf.convert_to_tensor(value=np.arange(16).reshape((4, 4)),dtype=tf.float32)
vec = tf.convert_to_tensor(value=np.arange(4),dtype=tf.float32)
res = math7(mat,vec)
with tf.Session() as sess:
    #print(sess.run(mat))    
    #print(sess.run(vec))
    print(sess.run(res))
    

Expected output

[14 38 62 86]


<a id='Task3'></a>
### Task3: Play with cifar10 in tensorflow

---

This exercise does not require coding. We want you to study the code, and try to understand how a neural network is built up from the ground in tensorflow. Before you can start, you need to download the cifar10 dataset. This can be done by running <b>"get_datasets.sh"</b> within the <b>"dataset"</b> folder, or by downloading the dataset (CIFAR-10 python version) from: https://www.cs.toronto.edu/~kriz/cifar.html

The cifar10 dataset have 10 classes: [airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck].  
The training set consists of 5000 images and the test set consists of 500 images. The images are of size [32,32,3].





- <b>"%matplotlib inline"</b> is used to plot figures within the jupyter notebook
- <b>"tf.reset_default_graph()"</b> is added to clear the TensorFlow graph from any previous nodes/operations.

In [None]:
import tensorflow as tf
from utils import dataClass
import numpy as np
import matplotlib.pyplot as plt
import time

%matplotlib inline
tf.reset_default_graph()

---
The following cell creates an instance of the class dataClass. The instance <b>"myData"</b> loads the cifar10 images. Every image is flattened to an array of size $[32\cdot32\cdot3+1, 1] = [3073, 1]$. The "+1" is a row of ones to accommodate for the bias. The dataClass have useful functions:
- next_training_batch(batch_size)
- get_test_data()


To be able to feed the training and the test data into the tensorflow graph, we define <b>"tf.placeholders"</b> for the data and the corresponding labels (onehot format). The <b>"global_step"</b> variable will be used to count the number of training iterations.


In [None]:
#Load cifar10 data
cifar10_dir = 'datasets/cifar-10-batches-py'
myData      = dataClass.dataClass(cifar10_dir)

#Define placeholders for being able to feed data to the tensorflow graph
data          = tf.placeholder(shape=(None, myData.numbOfFeatures), dtype=tf.float32, name='data')
labels_onehot = tf.placeholder(shape=(None, myData.numbOfClasses),  dtype=tf.int32,   name='labels_onehot')
global_step   = tf.Variable(initial_value=0, trainable=False, name='global_step')

---
Lets define the structure of the tensorflow graph. The number and size of the hidden layers can be modified by changing the <b>"hiddenLayerSizes"</b> list. Default, the neural network is a fully connected neural network (dense neural network) with two hidden layers of size 1024 and 265. We encurrage you to play with the network configuration. For example, a fully connected neural network with 3 hidden layers can be defined by editing <b>"hiddenLayerSizes"</b> to:

hiddenLayerSizes = [myData.numbOfFeatures, 1024, 265, 128, myData.numbOfClasses]

You can also try to change the initialization of the weights and the activation function. 


In [None]:
# N -> number of training samples
# D1 -> number of input features
# D2 -> number of output features
# C -> number of output features

# - W: A array of shape (D1, D2) containing weights.
# - data: A array of shape (N, D1) containing a minibatch of data.
# - labels_onehot: A array of shape (N, C) containing training labels

# Lets define a fully connected neural network
hiddenLayerSizes = [myData.numbOfFeatures, 1024, 265, myData.numbOfClasses]
a = data
for ii in range(len(hiddenLayerSizes)-1):
    layerName = 'layer%s' % ii
    with tf.variable_scope(layerName):
        ny = hiddenLayerSizes[ii]
        nx = hiddenLayerSizes[ii+1]
        W  = tf.get_variable(name='W', shape=(ny, nx), initializer=tf.contrib.layers.xavier_initializer())
        z  = tf.matmul(a, W, name='matmul')
        a  = tf.tanh(z, name='activation_function')

---
The loss is computed using the built in tensorflow function <b>"tf.losses.softmax_cross_entropy"</b>. It calculates the softmax cross entrpy loss. If you want to improve the generalization of the network, you could try to add regularization loss?

In [None]:
#Define your loss function
logits  = a
loss    = tf.losses.softmax_cross_entropy(onehot_labels=labels_onehot, logits=logits)
regloss = 0
losses  = loss + regloss 

---
We define a gradient descent optimizer. We pass in the loss (losses) we want to minimize, and the list of the variables (weights) we want to minimize the loss with respect to. The minimizer returns an operation which we call, "train_op". Every time we want to perform a gradient descent step we will call <b>"train_op"</b> in the tf.Session.

The <b>"global_step"</b> variable is passed into the minimizer and is incremented for every gradient descent step.

We would like you to play with the learning_rate. Default <b>"learning_rate=0.05"</b>

In [None]:
#Define an optimizer
all_variables = tf.trainable_variables()
optimizer     = tf.train.GradientDescentOptimizer(learning_rate=0.05)
train_op      = optimizer.minimize(losses, global_step=global_step, var_list=all_variables)

---
The accuracy measure <b>"accuracy"</b> is calculated. Other possible measures could be: recall, precision, f1-score

In [None]:
#Calculate the accuracy
estimated_class = tf.argmax(logits, axis=1)
labels          = tf.argmax(labels_onehot, axis=1)
accuracy        = tf.reduce_mean(tf.cast(tf.equal(estimated_class, labels), tf.float32), name='accuracy')

---
Here is where the action takes place! The cell creates a <b>"tf.Session"</b> and trains the nerual network by calling the <b>"train_op"</b>. Note, see how we use the two placeholders <b>"data"</b> and <b>"labels_onehot"</b> to feed the graph with new training images/labels. If training takes a long time try to reduce <b>"numbOfTrainingSteps"</b>.

You can try to play with the hyperparameters:
- numbOfTrainingSteps
- batch_size 





In [None]:
# Hyperparameters
numbOfTrainingSteps = 10000
batch_size          = 500

#Log train loss/accuracy and test loss/accuracy
train_loss     = np.zeros(numbOfTrainingSteps)
train_accuracy = np.zeros(numbOfTrainingSteps)
test_loss     = []
test_accuracy = []
test_inds     = []


with tf.Session() as sess:
    sess.run(tf.global_variables_initializer()) 
    timeZero = time.time()
    for ii in range(numbOfTrainingSteps):
        npData, npLabels_onehot   = myData.next_training_batch(batch_size)
        loss_val, accuracy_val, _ = sess.run([loss, accuracy, train_op],
                                             feed_dict={data: npData, labels_onehot: npLabels_onehot})
        train_loss[ii]         = loss_val
        train_accuracy[ii]     = accuracy_val

        #Block is printing accuracy, loss and ETA.
        if ii % 50 == 0:
            currentTime = time.time()-timeZero
            secPerIter  = currentTime/(ii+1)
            remTime     = (numbOfTrainingSteps - ii)*secPerIter
            remMin      = int(remTime/60)
            remSec      = remTime%60
            print('Accuracy=%f | loss=%f | ETA: min=%d, sec=%d' % (train_accuracy[ii], train_loss[ii], 
                                                               remMin, remSec))
        
        #Block is calculating test accuracy and loss
        if ii % 100 == 0:
            npData, npLabels_onehot = myData.get_test_data()
            loss_val, accuracy_val, _ = sess.run([loss, accuracy, train_op], 
                                                 feed_dict={data: npData, labels_onehot: npLabels_onehot})
            test_loss.append(loss_val)
            test_accuracy.append(accuracy_val)
            test_inds.append(ii)

---
We plot the loss and the accuracy as a function of gradient descent steps to monitor the training.


In [None]:
# Plot the training accuracy and the training loss
#plt.figure()
plt.figure(figsize=(18, 16), dpi= 80, facecolor='w', edgecolor='k')
ax = plt.subplot(2, 1, 1)
# plt.subplots_adjust(hspace=2)
ax.plot(train_loss, 'b', label='train_loss')
ax.plot(np.array(test_inds), np.array(test_loss), 'r', label='test_loss')
ax.grid()
plt.title('Loss', fontsize=20)
plt.ylabel('Loss', fontsize=18)
plt.xlabel('Iterations', fontsize=18)
ax.legend(loc='upper right', fontsize=15)

ax = plt.subplot(2, 1, 2)
plt.subplots_adjust(hspace=0.7)
ax.plot(train_accuracy*100, 'b', label='train_accuracy')
ax.plot(np.array(test_inds), np.array(test_accuracy)*100, 'r', label='test_accuracy')
ax.grid()
plt.title('Accuracy', fontsize=20)
plt.ylabel('Accuracy [%]', fontsize=18)
plt.xlabel('Iterations', fontsize=18)
ax.legend(loc='lower right', fontsize=15)