# Tensorflow Tutorial

Before doing the coding assignemnt for unit8, you probably need to get yourself familiar with Tensorflow, a open source software library for numerial computation, particulary well suited and fine-tuned for large scale machine learning. The basic principle is you define your computation graph and the tensorflow will take the graph and run it efficiently on optimized c++ code.

## Download the tensorflow package

if you are using anaconda, you first get into your environment with:

```source activate env_name```

and then download the tensorflow

```conda install -c conda-forge tensorflow```

this command will install a cpu version in your machine.

if you are not using anaconda, you may want to run this to download tensorflow:

```pip install tensorflow```

which will install the lastest version tensorflow.

For this tutorial we are using python 3.6; tensorflow version 1.1.0

In [1]:
import sys
import tensorflow as tf
import numpy as np
print(tf.__version__)

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


1.14.0


  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


## Creating And Running a Graph

Our first goal is to define a computation graph (computation_graph.png) in tensorflow and trigger the computation. Each node in the graph is called operation and each edge represents the flow of the data. The node can either operate on tensors (addition, subtraction, multiplication, etc) or generate a tensor (constant and variable). Each node takes zero or more tensors as inputs and produces a tensor as an output.

In [2]:
x = tf.Variable(3, name = "x")
y = tf.Variable(4, name = "y")
two = tf.constant(2)

op1 = tf.multiply(x, x)
op2 = tf.multiply(x, op1)
op3 = tf.add(y, two)
op4 = tf.add(op2, op3)

Your operation will be built on a default graph since you didn't specify tf.Graph() which we will talk about later.

Once you define your operation, you can start a session and execute your graph.  

In [3]:
with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = op4.eval()

You initialize the variable in the graph and and trigger the computation by evaluating the last operation.  Since the op4 is dependent on op2 and op3, it will recursively call evaluation on op2 and op3 until it reaches the leaf node which is the variable and constant defined.

In [4]:
result

33

## Managing the Graph 

In [5]:
def reset_graph(seed=42):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

In [6]:
reset_graph()

You can create your own graphs and run them in sessions


In [7]:
graph1 = tf.Graph()

with graph1.as_default():
    x = np.random.rand(100).astype(np.float32)
    target = x * 0.3 - 0.23
    W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
    b = tf.Variable(tf.zeros([1]))
    pred = W * x + b
    loss = tf.reduce_mean(tf.square(target - pred))
    print('num of trainable variables = %d' % len(tf.trainable_variables()))
    print('num of global variables = %d' % len(tf.global_variables()))
    print('graph1=', graph1)
    print('get default graph in current session = ', tf.get_default_graph())
    
print("*"*100)
print('num of trainable variables = %d' % len(tf.trainable_variables()))
print('num of global variables = %d' % len(tf.global_variables()))
print('global default graph = ' , tf.get_default_graph())
print('get default graph in current session = ', tf.get_default_graph())

graph2 = tf.Graph()
with graph2.as_default():
    x = np.random.rand(100).astype(np.float32)
    target = x * 0.4 - 0.73
    W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
    b = tf.Variable(tf.zeros([1]))
    pred = W * x + b
    loss = tf.reduce_mean(tf.square(target - pred))
    print("*"*100)
    print('num of trainable variables = %d' % len(tf.trainable_variables()))
    print('num of global variables = %d' % len(tf.global_variables()))
    print('graph2 = ', graph2)
    print('get default graph in current session = ', tf.get_default_graph())


num of trainable variables = 2
num of global variables = 2
graph1= <tensorflow.python.framework.ops.Graph object at 0x7ff5a4bd1550>
get default graph in current session =  <tensorflow.python.framework.ops.Graph object at 0x7ff5a4bd1550>
****************************************************************************************************
num of trainable variables = 0
num of global variables = 0
global default graph =  <tensorflow.python.framework.ops.Graph object at 0x7ff5a4b53828>
get default graph in current session =  <tensorflow.python.framework.ops.Graph object at 0x7ff5a4b53828>
****************************************************************************************************
num of trainable variables = 2
num of global variables = 2
graph2 =  <tensorflow.python.framework.ops.Graph object at 0x7ff5a4bd1518>
get default graph in current session =  <tensorflow.python.framework.ops.Graph object at 0x7ff5a4bd1518>


## Practice Create Graph with Tensorflow

Now it's your turn to practice to define a computation graph in tensorflow (cross_entropy.png). (NOTE : use placeholder to define variable instead of tf.Variable)

![cross_entropy](img/cross_entropy.png)

In [31]:
# TODO :: define the cross entorpy computation graph in tensorflow; expect 10-15 lines of code (Requirement : create your own graph with tf.Graph an run your graph; 
# use placeholder to define variable instead of tf.Variable)

graph3 = tf.Graph()
with graph3.as_default():
    
    pos1 = tf.constant(1, dtype=tf.float32)
    neg1 = tf.constant(-1, dtype=tf.float32)
    y = tf.placeholder(tf.float32, name = "y")
    p = tf.placeholder(tf.float32, name = "p")
    
    part1 = tf.multiply(y, tf.log(p))
    part2 = tf.multiply(tf.subtract(pos1, y), tf.log(tf.subtract(pos1, p)))
    tf.multiply(neg1, tf.add(part1, part2))

## Linear Regression

### Using the Normal Equation

In [8]:
import numpy as np
from sklearn.datasets import fetch_california_housing

reset_graph()

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
XT = tf.transpose(X)

# TODO :: write down the normal equation, for more detail of the normal equation, you can refer to http://mlwiki.org/index.php/Normal_Equation 
# hint : you may want to use tf.matrix_inverse, tf.matrix_inverse and tf.matmul
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT,X)),XT),y)

with tf.Session() as sess:
    theta_value = theta.eval()

In [9]:
theta_value

array([[-3.6894890e+01],
       [ 4.3661433e-01],
       [ 9.4453208e-03],
       [-1.0704148e-01],
       [ 6.4345831e-01],
       [-3.9632569e-06],
       [-3.7880042e-03],
       [-4.2093179e-01],
       [-4.3400639e-01]], dtype=float32)

In [10]:
X = housing_data_plus_bias
y = housing.target.reshape(-1, 1)
# TODO :: implement the same normal equation with numpy
# hint : you may want to use np.linalg.inv
theta_numpy=np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

print(theta_numpy)

[[-3.69419202e+01]
 [ 4.36693293e-01]
 [ 9.43577803e-03]
 [-1.07322041e-01]
 [ 6.45065694e-01]
 [-3.97638942e-06]
 [-3.78654265e-03]
 [-4.21314378e-01]
 [-4.34513755e-01]]


Compare with Scikit-Learn

In [11]:
from sklearn.linear_model import LinearRegression
# TODO :: define the linear regression model and fit the training data. the model name should be lin_reg
lin_reg =LinearRegression().fit(housing.data, housing.target.reshape(-1, 1))

print(np.r_[lin_reg.intercept_.reshape(-1, 1), lin_reg.coef_.T])

[[-3.69419202e+01]
 [ 4.36693293e-01]
 [ 9.43577803e-03]
 [-1.07322041e-01]
 [ 6.45065694e-01]
 [-3.97638942e-06]
 [-3.78654265e-03]
 [-4.21314378e-01]
 [-4.34513755e-01]]


## Using Batch Gradient Descent

Gradient Descent requires scaling the feature vectors first. We could do this using TF, but let's just use Scikit-Learn for now.

In [12]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

In [13]:

print(scaled_housing_data_plus_bias.mean(axis=0))
print(scaled_housing_data_plus_bias.mean(axis=1))
print(scaled_housing_data_plus_bias.mean())
print(scaled_housing_data_plus_bias.shape)

[ 1.00000000e+00  6.60969987e-17  5.50808322e-18  6.60969987e-17
 -1.06030602e-16 -1.10161664e-17  3.44255201e-18 -1.07958431e-15
 -8.52651283e-15]
[ 0.38915536  0.36424355  0.5116157  ... -0.06612179 -0.06360587
  0.01359031]
0.11111111111111005
(20640, 9)


In [14]:
reset_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

gradients = 2/m * tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

Epoch 0 MSE = 9.161542
Epoch 100 MSE = 0.71450055
Epoch 200 MSE = 0.56670487
Epoch 300 MSE = 0.55557173
Epoch 400 MSE = 0.5488112
Epoch 500 MSE = 0.5436363
Epoch 600 MSE = 0.53962904
Epoch 700 MSE = 0.5365092
Epoch 800 MSE = 0.53406775
Epoch 900 MSE = 0.5321473


In [15]:
best_theta

array([[ 2.0685523 ],
       [ 0.8874027 ],
       [ 0.14401656],
       [-0.34770885],
       [ 0.36178368],
       [ 0.00393811],
       [-0.04269556],
       [-0.66145283],
       [-0.6375278 ]], dtype=float32)

## Using a GradientDescentOptimizer

In [16]:
reset_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

In [17]:
# TODO :: define the GradientDescentOptimizer and call minimize on the optimizer, the result should be named as training_op; you can refer to the tf documentation : https://www.tensorflow.org/versions/r1.14/api_docs/python/tf/train/GradientDescentOptimizer
training_op = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(mse)


In [18]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

print("Best theta:")
print(best_theta)

Epoch 0 MSE = 9.161542
Epoch 100 MSE = 0.7145004
Epoch 200 MSE = 0.56670487
Epoch 300 MSE = 0.55557173
Epoch 400 MSE = 0.5488112
Epoch 500 MSE = 0.5436363
Epoch 600 MSE = 0.53962904
Epoch 700 MSE = 0.5365092
Epoch 800 MSE = 0.53406775
Epoch 900 MSE = 0.5321473
Best theta:
[[ 2.0685525 ]
 [ 0.8874027 ]
 [ 0.14401658]
 [-0.34770882]
 [ 0.36178368]
 [ 0.00393811]
 [-0.04269556]
 [-0.6614528 ]
 [-0.6375277 ]]


In [19]:
# TODO :: repeat the same procedure this time use the MomentumOptimizer, you can refer to the tensorflow documentation : https://www.tensorflow.org/versions/r1.14/api_docs/python/tf/train/MomentumOptimizer
tf.train.MomentumOptimizer(learning_rate,0.01).minimize(mse, global_step=tf.Variable(0, trainable=False))                                              


<tf.Operation 'Momentum' type=AssignAdd>

## Saving and restoring a model 



In [20]:
reset_graph()

n_epochs = 1000                                                                       
learning_rate = 0.01                                                                  

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")            
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")            
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")                                     
error = y_pred - y                                                                   
mse = tf.reduce_mean(tf.square(error), name="mse")                                    
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)            
training_op = optimizer.minimize(mse)                                                 

init = tf.global_variables_initializer()
saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())                                
            save_path = saver.save(sess, "/tmp/my_model.ckpt")
        sess.run(training_op)
    
    best_theta = theta.eval()
    save_path = saver.save(sess, "/tmp/my_model_final.ckpt")

Epoch 0 MSE = 9.161542
Epoch 100 MSE = 0.7145004
Epoch 200 MSE = 0.56670487
Epoch 300 MSE = 0.55557173
Epoch 400 MSE = 0.5488112
Epoch 500 MSE = 0.5436363
Epoch 600 MSE = 0.53962904
Epoch 700 MSE = 0.5365092
Epoch 800 MSE = 0.53406775
Epoch 900 MSE = 0.5321473


In [21]:
best_theta

array([[ 2.0685525 ],
       [ 0.8874027 ],
       [ 0.14401658],
       [-0.34770882],
       [ 0.36178368],
       [ 0.00393811],
       [-0.04269556],
       [-0.6614528 ],
       [-0.6375277 ]], dtype=float32)

In [22]:

with tf.Session() as sess:
    saver.restore(sess, "/tmp/my_model_final.ckpt")
    best_theta_restored = theta.eval() # not shown in the book

Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from /tmp/my_model_final.ckpt


In [23]:
np.allclose(best_theta, best_theta_restored)

True

By default the saver also saves the graph structure itself in a second file with the extension .meta. You can use the function tf.train.import_meta_graph() to restore the graph structure. This function loads the graph into the default graph and returns a Saver that can then be used to restore the graph state (i.e., the variable values):

In [25]:
reset_graph()
# notice that we start with an empty graph.

saver = tf.train.import_meta_graph("/tmp/my_model_final.ckpt.meta")  # this loads the graph structure
theta = tf.get_default_graph().get_tensor_by_name("theta:0") 

with tf.Session() as sess:
    saver.restore(sess, "/tmp/my_model_final.ckpt")  
    best_theta_restored = theta.eval() 

INFO:tensorflow:Restoring parameters from /tmp/my_model_final.ckpt


In [26]:
np.allclose(best_theta, best_theta_restored)

True


This means that you can import a pretrained model without having to have the corresponding Python code to build the graph. This is very handy when you keep tweaking and saving your model: you can load a previously saved model without having to search for the version of the code that built it.

## Using TensorBoard


In [27]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

from IPython.display import clear_output, Image, display, HTML

def strip_consts(graph_def, max_const_size=32):
    """Strip large constant values from graph_def."""
    strip_def = tf.GraphDef()
    for n0 in graph_def.node:
        n = strip_def.node.add() 
        n.MergeFrom(n0)
        if n.op == 'Const':
            tensor = n.attr['value'].tensor
            size = len(tensor.tensor_content)
            if size > max_const_size:
                tensor.tensor_content = "<stripped %d bytes>"%size
    return strip_def

def show_graph(graph_def=None, width=1200, height=800, max_const_size=32, ungroup_gradients=False):
    if not graph_def:
        graph_def = tf.get_default_graph().as_graph_def()
        
    """Visualize TensorFlow graph."""
    if hasattr(graph_def, 'as_graph_def'):
        graph_def = graph_def.as_graph_def()
    strip_def = strip_consts(graph_def, max_const_size=max_const_size)
    data = str(strip_def)
    if ungroup_gradients:
        data = data.replace('"gradients/', '"b_')
        #print(data)
    code = """
        <script>
          function load() {{
            document.getElementById("{id}").pbtxt = {data};
          }}
        </script>
        <link rel="import" href="https://tensorboard.appspot.com/tf-graph-basic.build.html" onload=load()>
        <div style="height:600px">
          <tf-graph-basic id="{id}"></tf-graph-basic>
        </div>
    """.format(data=repr(data), id='graph'+str(np.random.rand()))

    iframe = """
        <iframe seamless style="width:{}px;height:{}px;border:0" srcdoc="{}"></iframe>
    """.format(width, height, code.replace('"', '&quot;'))
    display(HTML(iframe))


In [28]:
g = tf.Graph()
with g.as_default():
    X = tf.placeholder(tf.float32, name = "x")
    W1 = tf.placeholder(tf.float32, name = "W1")
    b1 = tf.placeholder(tf.float32, name = "b1")
    
    a1 = tf.nn.relu(tf.matmul(X, W1) + b1)
    
    W2 = tf.placeholder(tf.float32, name = "W2")
    b2 = tf.placeholder(tf.float32, name = "b2")
    
    a2 = tf.nn.relu(tf.matmul(a1, W2) + b2)
    
    W3 = tf.placeholder(tf.float32, name = "W3")
    b3 = tf.placeholder(tf.float32, name = "b3")
    
    y_hat = tf.matmul(a2, W3) + b3
    
tf.summary.FileWriter("logs", g).close()

In [29]:
show_graph(g)