# Simple Linear Regression using `tensorflow`

Now that we've coded up our linear regression solution by building a computational graph with `theano`, we'll do the same with `tensorflow`. From a high level, we'll do exactly what we did before - we'll build a computational graph, and then take advantage of the automatic differentiation that `tensorflow` offers. 

## Computational Graphs for Simple Linear Regression 

As a reference, the computational graphs that we used to visualize the forward and backward propagation steps in solving our simple linear regression problem with gradient descent are as follows: 

### Forward Propagation 

<img src="imgs/custom/simp_linear_comp_graph_forprop.png" width=400\>

### Backward Propagation 

<img src="imgs/custom/simp_linear_comp_graph_backprop.png" width=400\>

### Building a computational graph with `tensorflow`

In the same manner as we coded up our computational graph using `theano`, let's look at how to code this up one piece at a time with `tensorflow`, discussing along the way how it works.

In [1]:
import numpy as np
import tensorflow as tf
from datasets.general import gen_simple_linear

In [2]:
def get_tensorflow_graph(): 
    # 1. Define placeholder matrices for inputs.
    xs = tf.placeholder(tf.float32, name='xs') 
    ys = tf.placeholder(tf.float32, name='ys')
    # 2. Define randomly initialized floats for beta_0 and beta_1. 
    beta_0 = tf.Variable(np.random.random(), name='beta_0')
    beta_1 = tf.Variable(np.random.random(), name='beta_1')

    # 3. Define the equation that generates predictions.
    yhats = beta_0 + xs * beta_1
    # 4. Define the equation that generates our errors. 
    es = ys - yhats
    # 5. Define the aggregate cost (mean of squared errors)
    E = (0.5 * (es ** 2))
    # 6. Take advantage of `tensorflows` automatic differentiation. 
    d_beta_0, d_beta_1 = tf.gradients(E, [beta_0, beta_1])

    # 7. Perform the update step, and use `assign` to actually update the value. 
    new_beta_0 = beta_0 - 0.1 * tf.reduce_mean(d_beta_0)
    new_beta_1 = beta_1 - 0.1 * tf.reduce_mean(d_beta_1)
    beta_0_update = tf.assign(beta_0, new_beta_0)
    beta_1_update = tf.assign(beta_1, new_beta_1)
    
    # Unfortunately, to have this working as a function, we have to return all of these 
    # individual pieces. 
    return beta_0, beta_1, beta_0_update, beta_1_update, xs, ys

From a high level, `get_tensorflow_graph` returns back the steps necessary to perform one iteration of our gradient descent procedure. It's a little different from `get_theano_graph`, though, in that it returns back individual pieces, rather than a callable function. In general, the steps still more or less line up - steps `1-5` define the **forward pass**, step `6` defines the **backward pass**, and step `7` sets us up to be able to perform an iteration of the gradient descent procedure and update our coefficients accordingly.

Let's walk through each of the individual pieces being returned...

1. `beta_0` and `beta_1` are returned simply so we can look at their initial values before any training begins. When run in a session below, these will hold the random values that `beta_0` and `beta_1` were initialized with in step `2`. Note that both `beta_0` and `beta_1` were intialized using as [Variable objects](https://www.tensorflow.org/versions/r0.9/get_started/basic_usage.html#variables), which is what allows their values to be shared and updated across iterations. 

2. `beta_0_update` and `beta_1_update` hold the computational graph that will be run in order to perform an iteration of the gradient descent procedure. When run in a session below, any steps that are necessary to perform the coefficient update specified by `beta_0_update` and `beta_1_update` will be run, which in effect is every step that is part of the forward and backward propagation. The [use of assign](https://www.tensorflow.org/versions/r0.9/api_docs/python/state_ops.html#Variable) in step `7` is what allows us to actually update the values for `beta_0` and `beta_1`, whereas `beta_0_update` and `beta_1_update` are what we use to refer to these updates and later tell `tensorflow` to perform them. 

3. Finally, `xs` and `ys` are placeholders for our data, and are returned so that we can tell `tensorflow` exactly what part of our graph our real data should line up with. 

Now, let's write the code that runs a `tensorflow` session to learn the values for `beta_0` and `beta_1`. 

In [3]:
# Randomly generate a beta_0, beta_1, and number of observations, used to generate 
# fake data to fit. We need a minimum of 2 obs. 
true_beta_0, true_beta_1, n_obs = np.random.randint(2, 10, size=3) 
print('Actual beta_0: {}'.format(true_beta_0))
print('Actual beta_1: {}\n'.format(true_beta_1))

# Generate the tensorflow graph. This is in a function so that we can run this cell multiple 
# times and obtain different randomly generated values for `beta_0` and `beta_1`. 
beta_0, beta_1, beta_0_update, beta_1_update, xs, ys = get_tensorflow_graph()

# Generate the data that follows a linear relationship specified 
# by true_beta_0 and true_beta_1.
x, y = gen_simple_linear(true_beta_0, true_beta_1, n_obs)

# Define the initialization operation. 
init = tf.initialize_variables([beta_0, beta_1])
with tf.Session() as sess: 
    sess.run(init) # Perform the actual initialization operation. 
    
    # Obtain the initial values of beta_0, beta_1 to print
    init_beta_0, init_beta_1 = sess.run([beta_0, beta_1])
    print("Initial tensorflow value for beta_0: {}".format(init_beta_0))
    print("Initial tensorflow for beta_1: {}\n".format(init_beta_1))
    
    # Perform iterations (forward & backward prop.) over the tensorflow graph
    for step in range(5000):
        beta_0, beta_1 = sess.run([beta_0_update, beta_1_update], feed_dict={xs : x, ys : y}) 
    print("Final tensorflow value for beta_0: {}".format(beta_0))
    print("Final tensorflow for beta_1: {}\n".format(beta_1))

Actual beta_0: 9
Actual beta_1: 5

Initial tensorflow value for beta_0: 0.7671453952789307
Initial tensorflow for beta_1: 0.41680485010147095

Final tensorflow value for beta_0: 8.999984741210938
Final tensorflow for beta_1: 5.000018119812012



In comparing our `tensorflow` implementation to either our `numpy` or `theano` implementation, the biggest difference we can see is that our `tensorflow` implementation is run via a `session` object. A [session object](https://www.tensorflow.org/versions/r0.9/api_docs/python/client.html#session-management) allows us to encapsulate all of the calculations and implementation details of our graph (or any  graph) into a single environment. This turns out to be  useful when we want to run multiple graphs, each of which might have their own specialized environment.  

After creating the `session` object, our first step is to [initialize any variables](https://www.tensorflow.org/versions/r0.9/how_tos/variables/index.html#initialization) that we're going to need to use in the graph. This is all of the `tf.Variable` objects that we have created in our graph, which for us is just `beta_0` and `beta_1`. If we have so many variables that we don't want to type them all out, there is also a `tf.initialize_all_variables` function that we can call.

Once we have created a `session` object and initialized all of our variables, we can run parts of our graph by calling `session.run` and passing in the piece of the graph that we want calculated. For example, when we want to view the initial values for `beta_0` and `beta_1`, we run: 

```
sess.run([beta_0, beta_1])
``` 

If we want to view the values for `beta_0` and `beta_1` **after** they have been updated once, we run the following: 

```
sess.run([beta_0_update, beta_1_update], feed_dict={xs : x, ys : y})
```

When these `run` commands are executed, any steps necessary to compute what is asked for will be run. For the first call above, where we ask for `beta_0` and `beta_1`, this is only the initialization step (`2` in our first code cell). For the second call, where we ask for `beta_0_update` and `beta_1_update`, this is every step in the graph. This includes everything in the forward propagation steps (`1-5`), everything in the backward propagation step (`6`), as well as the update itself (`7`). Note, too, that `xs` and `ys` are needed in the second call, and we pass these in via the `feed_dict` argument. For the `feed_dict` argument, the keys are the variable names referring to the placeholder objects, and the values are the data that will be used for those placeholders.

Finally, if we run that second call in a loop (as we do), we see that we can solve our linear regression using this graph built in `tensorflow` and obtain the coefficient values that we expect.  

Now, we'll move on to coding this up with `keras`. 