In [83]:
import tensorflow as tf
import os

## Creating First Tensorflow Graph

In our first example, we create a tensorflow graph representing the formula:

$$ f(x,y) = x^2y + y + 2 $$

Where x = 3 and y = 4

In [3]:
x = tf.Variable(3, name = "x")
y = tf.Variable(4, name = "y")
f = x*x*y + y + 2

Important to note that the above code does not perform any computation. Only a computation graph is created.

To evaluate the graph, we need to open a TensorFlow session and use it to initialize variables and evaluate f. Remember to close the session at the end to free up resources.

In [5]:
sess = tf.Session()
sess.run(x.initializer)
sess.run(y.initializer)
result = sess.run(f)
print(result)
sess.close()

42


A more efficient way to re-write the above code is as follows:

In [6]:
with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()

Inside the with block the session is set as the default session. We no longer need to type sess.run everytime we initialize our variables. Moreover, session is automatically closed at the end of the block.

Instead of manually running the initilizer for every single variable we can further simply the code using global_variables_initializer() function.

In [7]:
init = tf.global_variables_initializer() #prepare an init node

with tf.Session() as sess:
    init.run() #actually initialize all the variables
    result = f.eval()

To avoid using the with block altogether and maintaining parsimony, we can use the InteractiveSession command. We do have to manually close session, however.

In [8]:
sess = tf.InteractiveSession()
init.run()
result = f.eval()
print(result)
sess.close()

42


## Managing Graphs

In [12]:
x1 = tf.Variable(1)
x1.graph is tf.get_default_graph()

True

Here, we have introduced a new node x1 to default graph. 

If we want to manage graphs independent of each other, we can do so by created a new Graph and temporarily making it the default graph inside the with block:

In [15]:
graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)

print(x2.graph is graph)
print(x2.graph is tf.get_default_graph)

True
False


## Lifecycle of a Node Value

Tensorflow is good at determining which nodes need to be evaluated first in case certain nodes require calculations from other nodes. For example:

In [16]:
w = tf.constant(3)
x = w + 2
y = x + 5
z = x + 3

In [17]:
with tf.Session() as sess:
    print(y.eval())
    print(z.eval())

10
8


It is important to note that it will not reuse the result of the previous evaluation of w and x. Instead, it evaluates w and x twice. 

All node values are dropped between graph runs. Variable values are kept across sessions, however.

## Linear Regression with TensorFlow

Linear regression formula:

$\theta = (X^T.X)^{-1}X^{T}y$

In [18]:
import numpy as np
from sklearn.datasets import fetch_california_housing

In [23]:
housing = fetch_california_housing()
#print(housing)
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m,1)), housing.data]
#print(housing_data_plus_bias)
#Building the computation graph
X = tf.constant(housing_data_plus_bias, dtype = tf.float32, name = "X")
y = tf.constant(housing.target.reshape(-1,1), dtype = tf.float32, name = "y")
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT,X)), XT), y)


In [27]:
#Evaluate the computation graph
with tf.Session() as sess:
    theta_value = theta.eval()

## Implementing Gradient Descent

Very important to standardize before proceeding. (Can use commands like StandardScalar if it helps)

### Manually Computing Gradients

We start off by manually implementing batch gradient descent:

$\theta^{(next step)} = \theta - \eta\Delta_{\theta}MSE(\theta) $

In [39]:
n_epochs = 1000
learning_rate = 0.01

X = tf.constant(housing_data_plus_bias, dtype = tf.float32, name = "X")
y = tf.constant(housing.target.reshape(-1,1), dtype = tf.float32, name = "y")
#random initialization of weights at the start between -1 and +1
theta = tf.Variable(tf.random_uniform([n+1,1], -1.0, 1.0), name = "theta")
y_pred = tf.matmul(X, theta, name = "predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name = "mse")
gradients = 2/m*tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta - learning_rate*gradients)

init = tf.global_variables_initializer()
#Evaluate gradient descent
with tf.Session() as sess:
    sess.run(init)
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE=", mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()


Epoch 0 MSE= 1008841.6
Epoch 100 MSE= nan
Epoch 200 MSE= nan
Epoch 300 MSE= nan
Epoch 400 MSE= nan
Epoch 500 MSE= nan
Epoch 600 MSE= nan
Epoch 700 MSE= nan
Epoch 800 MSE= nan
Epoch 900 MSE= nan


## Using Autodiff

Here Tensorflow automatically determines how it should calculate the gradient in order to solve a function.


In [40]:
def my_func(a,b):
    z = 0
    for i in range(100):
        z = a*np.cos(z+i) + z*np.sin(b-i)
    return z

In [43]:
#Caclulate gradient descent with 1 line of code.

gradients = tf.gradients(mse, [theta])[0]

The gradients() function takes an op(in this case mse) and a list of variables(in this case just theta) and it creates a list of ops (one per variable) to compute the gradients of the op with regards to each variable using reverse-mode autodiff. Reverse mode auto-diff calculates the partial derivative of the outputs with regards to all inputs. 

## Using an Optimizer

In [47]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate)
training_op = optimizer.minimize(mse)

## Feeding Data to the Training Algorithm

We will now try to implement mini-batch gradient descent. The simplest way to do this is to use placeholder nodes. These nodes dont perform computation. They only output the data in mini-batch form at run time. 

To create a placeholder node, you must call placeholder() function and specify the output tensor's data type. In the following example, we create placeholder node A, and also node B = A + 5. 

When we evaluate B, we pass a feed_dict to the eval() method that specifies the value of A.

In [52]:
A = tf.placeholder(tf.float32, shape = (None, 3)) #None for dimension implies any size
#In A we can have any number of rows but 3 columns
B = A + 5
with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict = {A: [[1,2,3]]})
    B_val_2 = B.eval(feed_dict = {A: [[4,5,6], [7,8,9], [10,11,12]]})
    
print(B_val_1)
print(B_val_2)

[[6. 7. 8.]]
[[ 9. 10. 11.]
 [12. 13. 14.]
 [15. 16. 17.]]


To implement mini-batch gradient descent we need to change our input and output variables into placeholder nodes:

In [53]:
x = tf.placeholder(tf.float32, shape = (None, n+1), name = "x")
y = tf.placeholder(tf.float32, shape  = (None, 1), name = "y")

The define the batch size and compute total number of batches:

In [54]:
batch_size = 100
n_batches = int(np.ceil(m/batch_size))

In [57]:
#?np.ceil
#np.ceil refers to the ceiling of each element in 'x'
#Example
#>>> a = np.array([-1.7, -1.5, -0.2, 0.2, 1.5, 1.7, 2.0])
#>>> np.ceil(a)
#array([-1., -1., -0.,  1.,  2.,  2.,  2.])

Now fetch mini-batches one by one, then provide the value of x and y via the feed_dict parameter when evaluating the node that depends on either of them.

In [60]:
#def fetch_batch(epoch, batch_index, batch_size):
    #load data from disk
#    return X_batch, y_batch

#with tf.Session() as sess:
#    sess.run(init)
    
#    for epoch in range(n_epochs):
#        for batch_index in range(n_batches):
#            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
#            sess.run(training_op, feed_dict = {X: X_batch, y:y_batch})
            
#        best_theta = theta.eval()

## Saving and Restoring Models

After training the models, we want to save the parameters to disk so we can return to it whenever we want or use it in another program/compare models etc. We would also like to create checkpoints in case of computer crash.

We will achieve this by creating the Saver node at the end of construction phase. 

In [111]:
os.chdir('/Users/siddharth/Desktop/TensorFlow/tmp/')

theta = tf.Variable(tf.random_uniform([n+1,1], -1.0,1.0), name = "theta")
init = tf.global_variables_initializer()
#saver node
saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0: #checkpoint every 100 epochs
            save_path = saver.save(sess,"⁨my_model.ckpt")
            
            sess.run(training_op)
    best_theta = theta.eval()
    #print(best_theta)
    save_path = saver.save(sess, "⁨my_model_final.ckpt")


#For some reason the damn saver.save refused to take the full path.
#Had to import os and then redefine the path.

[[ 0.9681423 ]
 [ 0.4930575 ]
 [ 0.4602003 ]
 [-0.09579325]
 [ 0.39398646]
 [ 0.3718593 ]
 [ 0.03154683]
 [-0.8800242 ]
 [ 0.7732265 ]]


In [103]:
os.chdir('/Users/siddharth/Desktop/TensorFlow')

Restoring a model is just as easy. At the beginning of the execution phase, instead of initializing variables using the init node, you call the restore() method of the saver object.

#### (Needs better code. Book not working)

In [110]:
#os.chdir('/Users/siddharth/Desktop/TensorFlow/tmp/')

#with tf.Session() as sess:
#    saver.restore(sess, '⁨my_model_final.ckpt⁩.data-00000-of-00001')
#    for epoch in range(n_epochs):
#        if epoch % 100 == 0: #checkpoint every 100 epochs
            #save_path = saver.save(sess,"⁨my_model.ckpt")
            
#            sess.run(training_op)
#    best_theta = theta.eval()
    #save_path = saver.save(sess, "⁨my_model_final.ckpt")


## Visualizing the Graph and Training Curves Using TensorBoard (Needs More Work)

Need to ensure each we create a new log directory every time we run the model. For this purpose we will include the timestamp in name.

In [112]:
from datetime import datetime

now = datetime.utcnow().strftime("%Y%m%d5H%M%S")
root_logdir = "tf_logs"
logdir = "{}/run.{}".format(root_logdir, now)

In [114]:
theta = tf.Variable(tf.random_uniform([n+1,1], -1.0,1.0), name = "theta")
init = tf.global_variables_initializer()
#Need to add the following 2 lines at the end of our construction phase.
mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0: #checkpoint every 100 epochs
            
            sess.run(training_op)
    best_theta = theta.eval()
    #print(best_theta)


[[-0.14194155]
 [ 0.178478  ]
 [-0.3267207 ]
 [ 0.557111  ]
 [ 0.2079308 ]
 [-0.03617215]
 [ 0.6804452 ]
 [ 0.6969156 ]
 [ 0.6307759 ]]


The first line creates a node in the graph that will evaluate the MSE value and write it to a TensorBoard compatible binary log string called summary.

The second line creates a FileWriter that you will use to write summaries to logfiles in the log directory.

---------------------(Will Come Back Later)-----------------

## Modularity

Suppose we want to create a graph that uses multiple ReLU components. 

Eqn of Relu:

$h_{w,b}(X) = max(X.w + b, 0)$

In [116]:
n_features = 3
X = tf.placeholder(tf.float32, shape = (None, n_features), name = "x")

w1 = tf.Variable(tf.random_normal((n_features, 1)), name = "weights1")
w2 = tf.Variable(tf.random_normal((n_features, 1)), name = "weights2")
b1 = tf.Variable(0.0, name = "bias1")
b2 = tf.Variable(0.0, name = "bias2")

z1 = tf.add(tf.matmul(X, w1), b1, name = "z1")
z2 = tf.add(tf.matmul(X, w2), b2, name = "z2")

relu1 = tf.maximum(z1, 0., name = "relu1")
relu2 = tf.maximum(z2, 0., name = "relu2")

output = tf.add(relu1, relu2, name = "output")

    

The code above is repetitive. TensorFlow allows you to stay DRY (Don't Repeat Yourself). 

Create a function to build a ReLU. The following code creates five ReLUs and outputs their sum (note that add_n() creates an operation that will compute the sum of a list of tensors):

In [117]:
def relu(x):
    w_shape = (int(X.get_shape()[1]),1)
    w = tf.Variable(tf.random_normal(w_shape), name = "weights")
    b = tf.Variable(0.0, name = "bias")
    z = tf.add(tf.matmul(X,w), b, name = 'z')
    return tf.maximum(z, 0., name = "relu")

n_features = 3
X = tf.placeholder(tf.float32, shape = (None, n_features), name = "x")
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name = "output")

Note that when you create a node, TensorFlow checks whether its name already exists and if it does appends an underscore followed by an index to make the name unique.

## Sharing Variables

if you want to share a variable between different parts of a graph, one simple option is to create it first, then pass it as a parameter to functions that need it. 

For example, suppose you want to change the ReLU threshold from 0 using a shared threshold variable for all ReLUs, you can create that variable first and then pass it to the relu() function:

In [119]:
def relu(X, threshold):
    with tf.name_scope("relu"):
        w_shape = (int(X.get_shape()[1]),1)
        w = tf.Variable(tf.random_normal(w_shape), name = "weights")
        b = tf.Variable(0.0, name = "bias")
        z = tf.add(tf.matmul(X,w), b, name = 'z')
    return tf.maximum(z, threshold, name = "max")

threshold = tf.Variable(0.0, name = "threshold")
X = tf.placeholder(tf.float32, shape = (None, n_features), name = "X")
relus = [relu(X, threshold) for i in range(5)]
output = tf.add_n(relus, name = "output")


In [118]:
#?tf.name_scope

In the above code we assigned threshold separately from our ReLU function. We can also assign it within the function.

In [122]:
def relu(X):
    with tf.name_scope("relu"):
        if not hasattr(relu, "threshold"):
            relu.threshold = tf.Variable(0.0, name = "threshold")
            w_shape = (int(X.get_shape()[1]),1)
            w = tf.Variable(tf.random_normal(w_shape), name = "weights")
            b = tf.Variable(0.0, name = "bias")
            z = tf.add(tf.matmul(X,w), b, name = 'z')
        return tf.maximum(z, relu.threshold, name = "max")
            

In [123]:
#?hasattr
#Return whether the object has an attribute with the given name.