# Up and Running with Tensorflow

## Steps to Install Tensorflow

1. `pip install tensorflow`<br>If you have an Nvidia GPU: `pip install tensorflow-gpu`. <br>
Just make sure you have your nvidia graphics driver 384, since it's required for CUDA 9.0, installed<br> `sudo apt-get install nvidia-384` .<br> Then download the CUDA run file<br>
``wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run``<br>
**You can stop here if you don't have a GPU.**

1. Run the run-file:<br>
`chmod +x cuda_9.0.176_384.81_linux-run`<br>
`./cuda_9.0.176_384.81_linux-run --extract=$HOME`

1. You should have unpacked three components: NVIDIA-Linux-x86_64-384.81.run (1. NVIDIA driver that we ignore), cuda-linux.9.0.176-22781540.run (2. CUDA 9.0 installer), and cuda-samples.9.0.176-22781540-linux.run (3. CUDA 9.0 Samples).<br>
Run the second one: `$ sudo ./cuda-linux.9.0.176-22781540.run`

1. Add cuda to your environment variable `LD_LIBRARY_PATH`. I have it added in my .bashrc file and so it should look something like this: `LD_LIBRARY_PATH=/usr/local/cuda/lib64/:/usr/lib/nvidia-384`

1. Install cuDNN 7.0: `sudo dpkg -i libcudnn7_7.0.5.15–1+cuda9.0_amd64.deb`

Side note: To upgrade Tensorflow you can run `pip3 install --upgrade tensorflow-gpu`

In [None]:
import tensorflow as tf

In [None]:
tf.__version__

## Creating your first graph and running it in a session

In [None]:
x = tf.Variable(3, name='x')
y = tf.Variable(4, name='y')
f = x*x*y + y + 2

The above code just creates the computation graph without executing them. We need to create a session for that and to initialze our variables. The session takes care of placing the operations onto devices such as CPUs or GPUs.

In [None]:
sess = tf.Session()
sess.run(x.initializer)
sess.run(y.initializer)
result = sess.run(f)
print(result)
sess.close() #frees up resources

Repeating `sess.run()` is tedious. Thankfully, there's a better way!

In [None]:
with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()

Calling `x.initializer.run()` is equivalent to calling `tf.get_default_session().run(x.initializer)`. Our default session here is `sess` since we are inside the `sess` block. What's also nice is that our session closes automatically once the block as ended (smart pointers anybody).

Instead of initializing every single variable, we can initialize all variables using `global_variables_initializer()`

In [None]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    result = f.eval()

We have two types of sessions: Regular Sessions or Interactive Sessions. The difference is that one we create a regular session it doesn't set itself as the default session whereas the Interactive Session does. So we don't need a `with` block but we still need to close the session manually.

Tensorflow unlike Pytorch for example has two phases: the Construction phase where we build our computation graph, and the Execution phase where we actually perform the computations usually required to train our model.

Any node we create is automatically added to the default graph.

In [None]:
x1 = tf.Variable(1)
x1.graph is tf.get_default_graph()

What if we wanted to manage multiple independent graphs, well we can do this:

In [None]:
graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)
    
x2.graph is graph

In [None]:
x2.graph is tf.get_default_graph()

In [None]:
tf.reset_default_graph() #if we run the same command multiple times and re-add the nodes to the same graph

Tensorflow detects dependencies. Here for example, it first evaluates w, then x, then y. Then re-runs the graph to compute z. So twice.

All node values are dropped between graph runs except variable values which are maintained by the session. A variable starts its life when its initializer is run and it ends when the session is closed.

In [None]:
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3

with tf.Session() as sess:
    print(y.eval()) #10
    print(z.eval()) #15

To evaluate y and z efficiently in one graph run:

In [None]:
with tf.Session() as sess:
    y_val, z_val = sess.run([y,z])
    print(y_val)
    print(z_val)

Single process TF do not share any state even if they reuse the same graph (each session would have its own copy of every variable). In distributed TF, variable state is stored on the servers, not in the sessions, so multiple sessions can share the same variables.

## Linear Regression on Tensorflow

Tensorflow operations or ops: binary ops such as multiplication or additon, and source ops such as constants and variables.

In the Python API, tensors are actually numpy ndarrays (n-dimensional arrays)

This is an example of applying gradient descent in the closed form using the Normal equation: $\boldsymbol{\hat\theta} = {({X}^\intercal X)}^{-1} X^\intercal y$

In [None]:
import numpy as np
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing() # fetching the housing dataset from sci-kit learn

m,n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m,1)), housing.data] # adding the bias x0 using numpy column stack
X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1,1), dtype=tf.float32, name='y')
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT,X)), XT), y) #setting the equation

with tf.Session() as sess:
    theta_value = theta.eval()

## Implementing Gradient Descent

We're going to apply batch gradient descent manually instead of the normal equations

In [None]:
from sklearn.preprocessing import StandardScaler

n_epochs = 1000
learning_rate = 0.01

standard_scalar = StandardScaler()

scaled_housing_data_plus_bias = standard_scalar.fit_transform(housing_data_plus_bias) #don't forget to scale 
                                                                                      #your inputs before GD

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1,1), dtype=tf.float32, name='y')
theta = tf.Variable(tf.random_uniform([n+1,1], -1.0, 1.0), name='theta') #creates a node that will generate a tensor containing random values  given its shape and value range 
y_pred = tf.matmul(X, theta, name='predictions')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')
gradients = 2/m * tf.matmul(tf.transpose(X), error) # can be replaced with gradients = tf.gradients(mse,[theta])[0]
training_op = tf.assign(theta, theta - learning_rate * gradients) #creates a node that will assign a new value to a variable

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch  in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval()) # Printing MSE every 100 iterations
        sess.run(training_op)
        
    best_theta = theta.eval()

In [None]:
print(mse)
gradients = tf.gradients(mse, [theta])[0]
print(gradients)

## Methods of Computing Gradients

1. Manual Differentiation
1. Symbolic Differentiation
1. Numerical Differentiation
1. Forward-Mode Autodiff
1. Reverse-Mode Autodiff: Used by tensorflow and is optimal for high input low output functions just like NNs

<a href="https://github.com/ageron/handson-ml/blob/master/extra_autodiff.ipynb">Check out this link to understand more.</a>

## Optimizers

Replace gradients and training_op with this:

In [None]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
#optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.9)
training_op = optimizer.minimize(mse)

## Feeding data dynamically

We have mostly seen nodes with well-defined values at construction time and used using the execution phase. But what if we wanted to feed the training algorithm at run time. In mini-batch gradient descent we require to have a truly variable node with X's and y's being fed in batches. Thus, we use a placeholder node which will throw an exception at runtime if it wasn't filled. You can provide the shape to enforce that on the placeholder node.

In [None]:
A = tf.placeholder(tf.float32, shape=(None,3)) #enforcing A to be 2D with any number of rows and 3 columns
B = A + 5

with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1,2,3]]}) # need to feed the dictionary
    B_val_2 = B.eval(feed_dict={A: [[4,5,6],[7,8,9]]}) # any output can be fed, not necessarily placeholders, it will use the values that were fed instead of evaluating the operations 
    
print(B_val_1)
print(B_val_2)

Here's the unfinished code for mini-batch GD

In [None]:
# X = tf.placeholder(tf.float32, shape=(None,n+1), name='X')
# y = tf.placeholder(tf.float32, shape=(None,1), name='y')

# batch_size = 100
# n_batches = int(np.ceil(m / batch_size))

# def fetch_batch(epoch, batch_index, batch_size):
#     [...] # load data from disk
#     return X_batch, y_batch

# with tf.Session() as sess:
#     sess.run(init)
    
#     for epoch in range(n_epochs):
#         for batch_index in range(n_batches):
#             X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
#             sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
            
#     best_theta = theta.eval()

## Saving and Loading the Model

Computers crash and we don't want to start over. Thankfully, Tensorflow has ways for us to save our models systematically during training so that if there's a crash during training, we can just reload the model and continue from there. It uses a `Saver` node that we can plug to the end of our graph and call save in the session whenever we want.

In [None]:
# [...]
# theta = tf.Variable(tf.random_uniform([n+1, 1], -1.0, 1.0))
# [...]
# init = tf.global_variables_initializer()
# saver = tf.train.Saver() # adding a node at the end

# with tf.Session as sess:
#     sess.run(init)
    
#     for epoch in range(n_epochs):
#         if epoch % 100 == 0:
#             save_path = saver.save(sess, "/tmp/my_model.ckpt") #pass in the session and file path
        
#         sess.run(training_op)
        
#     best_theta = theta.eval()
#     save_path = saver.save(sess, "/tmp/my_moel_final.ckpt")

Loading is similar, we have a `Saver` node at the end of the construction phase and call restore at the begininng of the execution phase

In [None]:
# with tf.Session() as sess:
#     saver.restore(sess, "/tmp/my_modell_final.ckpt")

In [None]:
# saver = tf.train.Saver({"weights":theta}) saving only theta under the name weights

By default, the `save()` method also saves the structure of the graph in a second `.meta` file. Loading that file adds the graph to the default graph structure using `tf.train.import_meta_graph()` and returns a `Saver` instance that can be used to restore the graph's state and variables.

In [None]:
# saver = tf.train.import_meta_graph("/tmp/my_model_final_ckpt.meta")

# with tf.Session() as sess:
#     saver.restore(sess, "/tmp/my_model_final.ckpt")
#     [...]

## Visualization using TensorBoard

To visualize the graph and training curves using TensorBoard, we need to tweak a couple of things.
Add a log file and write MSE into it, but this log should be unique to the run or else tensorflow will merge multiple runs into the same log file and mess up the visualizations.

In [None]:
from datetime import datetime
from sklearn.preprocessing import StandardScaler

now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = "tf_logs"
logdir = "{}/run-{}".format(root_logdir, now)

n_epochs = 1000
learning_rate = 0.01

standard_scalar = StandardScaler()

scaled_housing_data_plus_bias = standard_scalar.fit_transform(housing_data_plus_bias) #don't forget to scale 
                                                                                      #your inputs before GD

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1,1), dtype=tf.float32, name='y')
theta = tf.Variable(tf.random_uniform([n+1,1], -1.0, 1.0), name='theta') #creates a node that will generate a tensor containing random values  given its shape and value range 
y_pred = tf.matmul(X, theta, name='predictions')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')
gradients = 2/m * tf.matmul(tf.transpose(X), error) # can be replaced with gradients = tf.gradients(mse,[theta])[0]
training_op = tf.assign(theta, theta - learning_rate * gradients) #creates a node that will assign a new value to a variable

mse_summary = tf.summary.scalar('MSE', mse) # evaulates mse and dumps it into a binary log string called summary for tensorboard to read
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph()) # used to write summaries to binary log files called events file

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch  in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval()) # Printing MSE every 100 iterations
        if epoch % 10 == 0:
            summary_str = mse_summary.eval()
            file_writer.add_summary(summary_str, epoch)
        sess.run(training_op)
        

    best_theta = theta.eval()
    file_writer.close()

We can now visualize our graph and MSE using tensorboard. Run it using `tensorboard --logdir tf_logs/`

We can define name scopes to reduce clutter on tensorboard.

In [None]:
# with tf.name_scope("loss") as scope:
#     error = y_pred - y
#     mse = tf.reduce_mean(tf.square(error), name='mse')
    
# print(error.op.name) #loss/error
# print(mse.op.name) #loss/mse

## Modularity

If we wanted to create a graph that adds the output of two rectified linear units(ReLU), writing each node individually would be tedious and require a lot of copying and pasting. Fortunately, Tensorflow lets you stay DRY (Don't Repeat Yourself) like so:

In [None]:
def relu(x):
    w_shape = (int(X.get_shape()[1]), 1)
    w = tf.Variable(tf.random_normal(w_shape), name='weights')
    b = tf.Variable(0.0, name='bias')
    z = tf.add(tf.matmul(X,w), b, name='z')
    return tf.maximum(z, 0, name='relu')

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name='X')
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name='output')

When Tensorflow creates a node, it checks whether its name already exists and if it does it appends an underscore followed by an index to make the name unique e.g: weights_1, bias_1, weights_2, bias_2... You can check them on TensorBoard.

Using nam scopes, Tensorflow additionally gives nae scopes unique names by appending _1, _2...

In [None]:
# def relu(X):
#     with tf.name_scope("relu"):
#         [...]

## Sharing Variables

If we wanted to share a variable between multiple components, like the threshold for our relu function, we could just pass it as such:

In [None]:
def relu(X, threshold):
    with tf.name_scope("relu"):
        [...]
        return tf.maximum(z, threshold, 'max')

threshold = tf.Variable(0.0, name='threshold')
X = tf.placeholder(tf.float32, shape=(None, n_features), name='X')
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name='output')

If you wanted to add more parameters, you could create a dictionary and pass it to the function, but it still seems like clutter. We could create a class called ReLU and have a member variable called threshold that you can use.<br> Yet, another option is to set the shared variable as an attribute of the relu() function upon the first call, like so:

In [None]:
def relu(X):
    with tf.name_scope("relu"):
        if not hasattr(relu, "threshold"):
            relu.threshold = tf.Variable(0.0, name='threshold')
        [...]
        return tf.maximum(z, relu.threshold, name='max')

Tensorflow offers another option which is to use the `get_variable()` function to create the shared variable if it does not exist yet, or reuse it if it already exist and this is controlled by an attribute of the current `variable_scope()` not `name_scope()` which often confuses.

In [None]:
with tf.variable_scope("relu"):
    threshold = tf.get_variable("threshold", shape=(), initializer=tf.constant_initializer(0.0))

If the variable was created previously then the above code will throw an exception. This behavior prevents reusing variables. To enable reusing variables, we must set the `reuse` attribute in the variable scope in which case we don't have to specify any shape or initializer.

In [None]:
with tf.variable_scope("relu", reuse=True):
    threshold = tf.get_variable("threshold")

The above code will fetch the existing "**relu/threshold**" variable, or raise an exception if it does not exist or if it was not created using `get_variable()`.<br>Alternatively, we can set the `reuse` attribute to True inside the block by calling the scope's `reuse_variables()` method.

In [None]:
with tf.variable_scope("relu") as scope:
    scope.reuse_variables()
    threshold = tf.get_variable("threshold")

A complete view of how this will look like:

In [None]:
def relu(X):
    with tf.variable_scope("relu", reuse=True):
        threshold = tf.get_variable("threshold") # getting the already existing reusable variable
        [...]
        return tf.maximum(z, threshold, name='max')

X = tf.placeholder(tf.float32, shape=(None, n_features), name='X')
with tf.variable_scope("relu") : #create the variable
    threshold = tf.get_variable("threshold", shape=(), initializer=tf.constant_initializer(0.0)) #shape=() means scalar
    
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name='output')

Variables created using `get_variable()` are always named using the name of their `variable_scope` as a prefix (e.g "**relu/threshold**"), but for all other nodes such as `tf.Variable`'s, the variable scope acts like a new name scope. In particular, if a name scope with an identical name was already created, then a sufix is added to make the name unique. For example, all nodes we created before except threshold have names prefixed with `relu_1` to `relu_5`.

It doesn't seem right that we have all our variables defined inside the `relu` function while the threshold is defined outside it. To fix this, the following code creates the threshold variable within the `relu()` function upon the first call, then reuses it in subsequent calls. Now the `relu()` function does not have to worry about name scopes or variable sharing: it just calls `get_variable()`, which will create or reuse the `threshold` variable and doesn't need to know which is the case. The rest of the code calls `relu()` five times, making sure to set `reuse=None` on the first call.

In [None]:
def relu(X):
    threshold = tf.get_variable("threshold", shape=(), initializer=tf.constant_intializer(0.0))
    [...]
    return tf.maximum(z ,threshold, name='max')

X = tf.placeholder(tf.float32, shape=(None, n_features), name='X')
relus=[]
for relu_index in range(5):
    with tf.variable_scope("relu", reuse=(relu_index >= 1 or None)) as scope:
        relus.append(relu(X))
output = tf.add_n(relus, name='output')

Here's another thing that you will probably get mad at me for but since Tensorflow 1.4, you can set reuse=tf.AUTO_REUSE which returns the cariable if it already exists; otherwise, it creates it.

## Conclusion
This concludes the Up and Running with Tensorflow notebook, be on the look out for more advanced topics.