# Multiple Linear Regression using `tensorflow`

Just as with `theano`, we'll use `tensorflow` to build up a computational graph, and then take advantage of it's automatic differentiation to learn the coefficients for our multiple linear regression problem. 

## Computational Graphs for Multiple Linear Regression 

Let's keep the computational graph visuals around as a reference: 

### Forward Propagation

<img src="imgs/custom/mult_linear_comp_graph_condensed_forprop.png" width=300\>

### Backward Propagation

<img src="imgs/custom/mult_linear_comp_graph_condensed_backprop.png" width=400\>

### Building a computational graph with `tensorflow`

Just as with `theano`, the biggest difference between the `numpy` implementation and the `tensorflow` implementation of our multiple linear regression problem is that we'll be able to use the automatic differentiation that `tensorflow` offers. 

Let's take a look...

In [1]:
import numpy as np
import tensorflow as tf
from datasets.general import gen_multiple_linear

In [2]:
def get_tensorflow_graph(): 
    # 1. Define placeholder matrices for inputs.
    xs = tf.placeholder(tf.float64, name='xs') 
    ys = tf.placeholder(tf.float64, name='ys')
    # 2. Define randomly initialized floats for our betas. 
    betas = tf.Variable(np.random.random(size=(4, 1)), name='betas')

    # 3. Define the equation that generates predictions.
    yhats = tf.matmul(xs, betas)
    # 4. Define the equation that generates our errors. 
    es = ys - yhats
    # 5. Define the aggregate cost (mean of squared errors)
    E = tf.reduce_mean(0.5 * (es ** 2))
    # 6. Take advantage of `tensorflows` optimizer to automate differentiation
    #    as well as the update step. 
    optimizer = tf.train.GradientDescentOptimizer(0.1)
    train = optimizer.minimize(E)
    
    return betas, train, xs, ys

Our multiple linear regression solution with `tensorflow` is going to differ from our solution using `theano` in the same way it did for simple linear regression. `get_tensorflow_graph` is still going to return back the steps that are necessary to perform **one** iteration of our gradient descent procedure, but it will return these as pieces, rather than a callable function. Ultimately, though, we'll still have a **forward pass** (steps `1-5`) and a **backward pass** (step `6`). When comparing our multiple and simple linear regression solutions with `tensorflow`, you'll notice a slight difference right around step `6`, which we'll walk through below. 

In terms of the individual pieces being returned: 

1. The `betas` variable is returned so that we can view our original coefficient values before any training begins (which we'll do by asking for our `betas` back from `sess.run` before training). We'll note the generation of `betas` as a [Variable object](https://www.tensorflow.org/versions/r0.9/get_started/basic_usage.html#variables) - this is what allows our coefficient values to be updated and shared across iterations. 
2. `train` holds the magic of our computational graph, and is different from how we solved our simple linear regression with `tensorflow`. With our simple linear regression problem, we were explicit about our backward propagation steps, and coded up each individual equation for updating our coefficients. Here, we instead just feed the quantity that we want minimized (the **mean squared error**, `E`) to a [tensorflow Optimizer](https://www.tensorflow.org/versions/r0.10/api_docs/python/train.html), which provides us with a simple interface for performing gradient descent. 

 When we call [minimize](https://www.tensorflow.org/versions/r0.10/api_docs/python/train.html#processing-gradients-before-applying-them) on this `Optimizer`, it builds the calculation of the derivatives and the performing of the updates into our computational graph for us. While we have the option to specify what the `Optimzer` takes derivatives with respect to (via a `var_list` argument), it by default takes the derivative of all of the `tf.Variable` objects that it finds in the computational graph prior to that step. For us, this is just the `betas`. 
 
 In the end, when we run `train` with a `Session` object, we'll run every step that is necessary to perform that `train` step. This turns out to be every step of our graph, which means we end up performing one iteration of our gradient descent procedure when we run `train` with a `Session` object. 
 
3. Finally, `xs` and `ys` are placeholders for our data, and are returned so that we can tell `tensorflow` exactly what part of our graph the real data should line up with. 

Next, we'll use one of the `Session` objects we've discussed to perform gradient descent and learn the true values for those `betas`. 

In [3]:
# Randomly generate our betas and number of observations, used to generate 
# fake data to fit. We need a minimum of 4 obs. 
true_betas_array = np.random.randint(2, 10, size=4)
n_obs = np.random.randint(4, 10) 
for idx, beta in enumerate(true_betas_array): 
        print("Actual beta_{}: {}".format(idx, beta))  
print ('\n')

# Generate the tensorflow graph. This is in a function so that we can run this cell multiple 
# times and obtain different randomly generated values for `beta_0` and `beta_1`. 
betas, train, xs, ys = get_tensorflow_graph()

# Generate the data that follows a linear relationship specified by `true_betas_array`.
x, y = gen_multiple_linear(true_betas_array, n_obs)

# Define the initialization operation. 
init = tf.initialize_variables([betas])
with tf.Session() as sess: 
    sess.run(init) # Perform the actual initialization operation. 
    
    # Obtain the initial values of beta_0, beta_1 to print
    init_betas = sess.run(betas)
    for idx, beta in enumerate(init_betas): 
        print("Initial tensorflow value for beta_{}: {}".format(idx, beta[0]))
    print('\n')
    
    # Perform iterations (forward & backward prop.) over the tensorflow graph
    for step in range(50000):
        sess.run(train, feed_dict={xs : x, ys : y}) 
    final_betas = sess.run(betas)
    for idx, beta in enumerate(final_betas): 
        print("Final tensorflow value for beta_{}: {}".format(idx, beta[0]))
    print('\n')

Actual beta_0: 9
Actual beta_1: 7
Actual beta_2: 7
Actual beta_3: 8


Initial tensorflow value for beta_0: 0.8212104981741077
Initial tensorflow value for beta_1: 0.9557979154511299
Initial tensorflow value for beta_2: 0.8314367227003494
Initial tensorflow value for beta_3: 0.36951472164145927


Final tensorflow value for beta_0: 9.009126740805684
Final tensorflow value for beta_1: 6.999201626811398
Final tensorflow value for beta_2: 6.986510618380879
Final tensorflow value for beta_3: 8.000442391404647




Just as with our simple linear regression implementation using `tensorflow`, we'll run our gradient descent procedure for our multiple linear regression problem via a `Session` object. This [Session object](https://www.tensorflow.org/versions/r0.9/api_docs/python/client.html#session-management) allows us to encapsulate all of the calculations and implementation details of our graph (or any graph) into a single environment. When we want to run multiple graphs, each of which might have their own specialized environment, this is incredibly helpful. 

In terms of using a `Session` object to run our computational graph for multiple linear regression, it will look similar to the process of running our computational graph for simple linear regression. After creating the `Session` object, our first step is to [initialize any variables](https://www.tensorflow.org/versions/r0.9/how_tos/variables/index.html#initialization) that we're going to need to use in the graph - this is all of the `tf.Variable` objects that we have created in our graph, which for us is the `betas` variable. If we have so many variables that we don't want to type them all out, there is also a `tf.initialize_all_variables` function that we can call.

Once we have created a `Session` object and initialized all of our variables, we can run parts of our graph by calling `Session.run` and passing in the piece of the graph that we want calculated. When we want to view the initial values for our `betas` variable, for example, we run the following **before** any updates have been performed (e.g. before we have run `train` through the `Session` object): 

```
sess.run(betas)
``` 

If we want to view the values for `betas` **after** they have been updated once, we run the following: 

```
sess.run(train, feed_dict={xs : x, ys : y})
sess.run(betas)
```

When these `run` commands are executed, any steps necessary to compute what is asked for will be run. For the first call above, where we ask for `betas`, this is only the initialization step (`2` in our first code cell). For the second call, where we ask to run `train`, this is every step in the graph. This includes everything in the forward propagation steps (`1-5`), and everything in the backward propagation and update step (`6`). Note, too, that `xs` and `ys` are needed in the call to run `train`, and we pass these in via the `feed_dict` argument. For the `feed_dict` argument, the keys are the variable names referring to the placeholder objects, and the values are the data that will be used for those placeholders. Because `train` doesn't actually return back to us our `betas`, we ask for them back using `sess.run(betas)`.

Finally, if we run the first part of the second call (e.g. we run `train`) in a loop (as we do), we see that we can solve our linear regression using this graph built in `tensorflow` and obtain the coefficient values that we expect.  

Now, we'll move on to coding this up with `keras`. 