## X. Tensorflow
### X.1 Up and running with tensorflow

Tensorflow uses computational graphs at its core, which allow it to run more efficiently by splitting the computations across servers. So, scalability is one of the main advantages. Also, a lot of high-level have been built on top of tensorflow, like keras for example.

<img src="tensorflow_graphs.PNG" width="400" height="150"><br><br>

The following code creates the graph represented in Figure 9-1:

In [2]:
import warnings
warnings.filterwarnings('ignore')
import tensorflow as tf

In [4]:
x = tf.Variable(3, name='x')
y = tf.Variable(4, name='y')
f = x*x*y+y+2

Instructions for updating:
Colocations handled automatically by placer.


This code only creates the graph, but does not actually do any compunting. To start compunting, we must start a tensorflow session like so:

In [5]:
sess = tf.Session()
sess.run(x.initializer)
sess.run(y.initializer)
result = sess.run(f)
print(result)
sess.close()

42


Initializing the session each time is a bit cumbersome. A better way to do this is to use a with statement, which initializes the session and automatically closes it, once everything is done.

In [7]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    result = f.eval()

In [8]:
result

42

**Managing Graphs**

Any node you create is automatically added to the default graph. However, if you wanted to have multiple independent graphs, you could create a new Graph temporarily like so:

In [9]:
graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)
    
print(x2.graph is graph)
print(x2.graph is tf.get_default_graph)

True
False


**Lifecycle of a Node Value**

When evaluating a node, tensorflow automatically detects the set of nodes it needs to calculate and in what order. For example:

In [10]:
w = tf.constant(3)
x = w+2
y = x+5
z = x*3

with tf.Session() as sess:
    print(y.eval())
    print(z.eval())

10
15


Tensorflow knows, that it needs to calculate w and x first, in order to get y.<br>

IMPORTANT: Tensorflow does not store results after computing. To evaluate z, it again calcualtes w and x.

**Linear Regression with TensorFlow**

Tensorflow takes as input scalars (constants), called source ops, or vectors, called ops. The inputs and outputs are multidimensional arrays called *tensors*. In Python, tensors are represented by numpy arrays.<br>

Again, in tensorflow you first build the graph, which is then executed. Assigning variables while building the graph does not execute any computations.<br>

Lets use the Linear Reression as an example. I am using the normal equation (the algebraic form of OLS) on sklearn's california housing dataset:

In [20]:
import numpy as np
from sklearn import datasets

housing = datasets.fetch_california_housing()
m,n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m,1)), housing.data]

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1,1), dtype=tf.float32, name='y')
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()

### X.2 Implementing Gradient Descent

Instead of using the not very scalable Normal Equation, lets use Batch Gradient Descent. I will first manually compute the gradients and then use tensorflow's autodiff(), which does the job for us automatically.

**Manually Computing the Gradients**

A few explanatory comments:
- random_unform() creates a node in the graph that will generate a tensor containing random values, given its shape and value range (initializing weights)
- assign() creates a node that will assign a new value to a variable

In [26]:
n_epochs = 1000
learning_rate = 0.01

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1,1), dtype=tf.float32, name='y')
theta = tf.Variable(tf.random_uniform([n+1,1], -1.0, 1.0),name='theta')
y_pred = tf.matmul(X, theta, name='predictoins')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')
gradients = 2/m * tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta-learning_rate*gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print('Epoch', epoch, 'MSE =', mse.eval())
        sess.run(training_op)
    best_theta = theta.eval()

Epoch 0 MSE = 1669473.5
Epoch 100 MSE = nan
Epoch 200 MSE = nan
Epoch 300 MSE = nan
Epoch 400 MSE = nan
Epoch 500 MSE = nan
Epoch 600 MSE = nan
Epoch 700 MSE = nan
Epoch 800 MSE = nan
Epoch 900 MSE = nan


**Using autodiff()**

In the case of a linear regression, it might be easy to mathematically derive the error function in Python. But dong the same with a multiple layers deep neural network would be very complex and error prone. Luckily, tensorflow's autodiff function does the trick for us. Just replace *gradients = 2/m * tf.matmul(tf.transpose(X), error)* (for Linear regression), with *gradients = tf.gradients(mse, [theta])[0]*

The gradient function takes an op (like mse) and a list of variables (in this case just theta). It then computes the gradients of the op with regards to each variable.<br>

**Using an optimizer**

It gets even easier. Tensorflow lets you use pre-built optimizers as well. So, instead of the *gradients* and *training_op* in the previous code, you could also just use:<br>
*optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)*<br>
*training_op = optimizer.minimize(mse)*

**Feeding Data to the Training Algorithm**

Instead of using the whole batch, we could also just use Mini-Batches. Tensorflow actually provides the very helpful concept of *placeholder()* to do so. This is an empty node, that is feeded at runtime. To specify it, you just need to call placeholder() and specify the output datat type. Optionally, you could also provide its shape, but that is not a must. Here is a simple example:

In [27]:
A = tf.placeholder(tf.float32, shape=(None, 3))
B = A+5
with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1,2,3]]})
    B_val_2 = B.eval(feed_dict={A: [[4,5,6],[7,8,9]]})
    
print(B_val_1)
print(B_val_2)

[[6. 7. 8.]]
[[ 9. 10. 11.]
 [12. 13. 14.]]


IMPORTANT: The array feeded always needs to have three columns!

Lets implement the concept of Mini-Batches into the Linear Regression defined earlier:

In [29]:
data = datasets.fetch_california_housing()
m,n = data.data.shape
print("Number of datapoints: {}, Number of features: {}".format(m,n))

Number of datapoints: 20640, Number of features: 8


In [30]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaled_data = scaler.fit_transform(data.data)

data_with_bias = np.c_[np.ones((m,1)),scaled_data]
data_target = np.array(data.target.reshape(-1,1))

In [32]:
## First, change definition of X and y 
## to make them placeholder nodes
X = tf.placeholder(tf.float32, shape=(None, n+1), name='X')
y = tf.placeholder(tf.float32, shape=(None, 1), name='y')

batch_size = 100
n_batches = int(np.ceil(m/batch_size))

def fetch_batch(epoch, batch_index, batch_size, X_full, y_full):
    starting_index = batch_index*batch_size
    X_batch = X_full[starting_index:batch_size-1,:]
    y_batch = y_full[starting_index:batch_size-1,:]
    return X_batch, y_batch

with tf.Session() as sess:
    sess.run(init) # init = global_variables_initializer()
    
    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size, data_with_bias, data_target)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
            
    best_theta = theta.eval()

KeyboardInterrupt: 

### X.3 Saving and Restoring Models

After training a model or maybe even in between, you might want to save your model weights to disk, so you can use it or compare it to other models later on. Tensorflow offers an easy way to do so by adding a *save* node at the end of your model and calling it whenever you want it to be saved:

In [None]:
# [...]
theta = tf.Variable(tf.random_uniform([n+1, 1], -1.0, 1.0), name='theta')

# [...]
init = tf.global_variables_initializer()
saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            save_path = saver.save(sess, 'C:/Users/wehrm/OneDrive/Documents/UChicago/test_model.ckpt')
            
        sess.run(training_op)
        
    best_theta = theta.eval()
    save_path = saver.save(sess, 'C:/Users/wehrm/OneDrive/Documents/UChicago/test_model_final.ckpt')
    

Restoring is very similar to storing in terms of code. Just use the *restore* method instead of *save*.

In [None]:
with tf.Session() as sess:
    saver.restore(sess, 'C:/Users/wehrm/OneDrive/Documents/UChicago/test_model_final.ckpt')
    
    ## [...] Use model for prediction/comparison etc.

By default, all variables are saved. However, you can specify what variables you want ot be saved (e.g., only the weights) like so:

In [None]:
saver = tf.train.Saver({'weights': theta})

### X.4 Visualizing the Graph and Training Curves Using TensorBoard

TensorBoard offers a great way to visualize different aspects of your model (like learning curve etc.). To do so, the code needs to be tweaked in such a way that it writes training stats to a log file, which TensorBoard then will read from.<br>

IMPORTANT: Use a different log directory each time, as otherwise the previous model will be overwritten. Do so by inlcuding a time-stamp at the end of the log_directory name.

In [None]:
## This part should be at the beginning of your code
from datetime import datetime

now = datetime.utcnow()strftime('%Y%m%d%H%M%S')
root_logdir = 'tf_logs'
logdir = '{}/run-{}/'.format(root_logdir, now)

## This part should be at the very end of the construction phase
mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

The log files should then be updated every few batches (NOT every training step, since that would slow down training considerably!).

In [None]:
## [...]

for batch_index in range(n_batches):
    X_batch, y_batch = fetch_bacth(epoch, batch_index, batch_size, data_with_bias, data_target)
    if batch_index % 10 == 0:
        summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
        step = epoch*n_batches+batch_index
        file_writer.add_summary(summary_str, step)
    sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
    
## [...]

## Finally, close the FileWriter()
file_writer.close()

**Name Scopes**

Names Scopes give you the possibility to create *name scopes* to group related nodes. This is especially handy, when considerin neural networks with possibly thousands of nodes. Applying name scopes to the given example:

In [None]:
## defining name scope for mse related nodes
with tf.name_scope('loss') as scope:
    error = y_pred - y
    mse = tf.reduce_mean(tf.square(error), name='mse')

The name of each op defined within the scope is now prefixed with 'loss/'. In TensorBoard, mse and error now appear inside the loss namespace.

**Modularity**

Suppose you want to create a graph that adds the outputs of two *Rectified Linear Units* (ReLU). Remember, ReLU computes a linear function of inputs, and outputs the result of it is positive, and 0 otherwise.<br>

The following code creates two ReLUs and adds them up. However, it is very repetitive.

In [None]:
n_features = 3 
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")

w1 = tf.Variable(tf.random_normal((n_features, 1)), name="weights1") 
w2 = tf.Variable(tf.random_normal((n_features, 1)), name="weights2") b1 = tf.Variable(0.0, name="bias1") b2 = tf.Variable(0.0, name="bias2")

z1 = tf.add(tf.matmul(X, w1), b1, name="z1") 
z2 = tf.add(tf.matmul(X, w2), b2, name="z2")

relu1 = tf.maximum(z1, 0., name="relu1")
relu2 = tf.maximum(z1, 0., name="relu2")

output = tf.add(relu1, relu2, name="output") 

It is much smarter to create a relu function once and call it whenever you want to define a relu function. Also, use name scope within the function, so all relus are automatically grouped together (relu1, relu2, relu3,...).

In [None]:
def relu(X):
    with tf.name_scope('relu'):
        w_shape = (int(X.get_shape()[1]), 1)
        w = tf.Variable(tf.random_normal(w_shape), name='weights')
        b = tf.Variable(0.0, name='bias')
        z = tf.add(tf.matmul(X, w), b, name='z')
        return tf.maximum(z, 0., name='relu')

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name='X')
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name='output')