# W266:  TensorFlow for Pedestrians - Towards A Neural Net in 3 Simple Steps

This short Tensorflow Starter Notebook covers some very basic steps of TensorFlow. It is intended to be a introduction to the TensorFlow tutorial **"W266: TensorFlow Tutorial"**.

The document aims to illustrate basic concepts of the framework step-by-step by utilizing the simplest of all examples: Linear Regression. In fact, after providing a brief general intro,  we will initially try to learn $w$ and $b$ for the model 
$$y = w x +b$$ 
where two data points x with their corresponding labels y are given. We do this first using gradient descend with manually calculated gradients, and then use a Tensorflow optimizer for gradient descent to do the work for us.

We will then extend the problem to a simple neural net, adding  a hidden layer with one dimension.

While these problems are very simple - the concept and many of the commands used are very similar to what you will see in  *"W266: TensorFlow Tutorial"* and in some of the homework sets.

The structure of the notebook is as follows:


<a name="TOC">
### Discussion Topics


#### [1) Introduction:  Simple TensorFlow Calculations with Scalars and Tensors](#Intro)
* [Basics](#Basics)   
* [Simple Tensor Manipulations](#Tensors)   


#### [2) Most Basic Linear Regression in Tensorflow: $y = wx + b$](#LR)
* [Parallelizing calculations over a set of data points: Placeholders & Feed Dictionary](#Feeds)   
* [Training... with manually calculated gradients](#Manual)  
*  [Training... with a TensorFlow Optimizer](#Optimizer)    
*  [Predictions & Model Save/Restore  ](#Preds)    
 

#### [3) Slightly Fancier: A Simple Neural Net](#NN)

#### [Appendix A: Visualization with Tensorboard](#TB)

#### [Appendix B: A Better way to Manage Variables: Name Scopes & get_variable](#Vars)

Useful References:

** 1) [Getting Started with TensorFlow](https://www.tensorflow.org/get_started/get_started)**    
** 2) "Hands-On Machine Learning with Scikit-Learn and TensorFlow", Aurélien Géron **


<a name="Intro">
## 1) Introduction:  Simple TensorFlow Calculations with Scalars and Tensors
[Back to Table of Contents](#TOC)


<a name="Basics">
### Basics
[Back to Table of Contents](#TOC)



To get started, with an eye on the Linear Regression Model we want to train in the next section, let us define three scalars $x, w, b$ and see whether we can properly calculate $ y = w * x + b$. To get used to some TensorFlow commands, we define y in two equivalent ways. 

In [1]:
import tensorflow as tf

from datetime import datetime

There are 2 basic phases: 1) define the graph that TensorFlow will compute, and 2) actually run it.

First, we define a very simple graph:

In [2]:
tf.reset_default_graph()           # all calculations are represented as graphs. 
                                   # This command resets the default graph.

x = tf.constant(3, name = "x")     # x is a constant. It will be fixed 

w = tf.Variable(1, name = "w")     # w and b are variables, i.e. they could be updated. 
b = tf.Variable(1, name = "b")

y_1 = w * x  + b
y_2 = tf.add(w * x, b)             # identical to previous statement

Here we have defined $x$ as a *constant* and $w$ and $b$ as *variables*. Constants are just what the name implies, while variables can be changed during execution. So the latter are the natural choice for weights that we (later) want to learn. (We will later use a different, and better, way to define variables.)

Now, we run the session to compute what we are interested in:

In [3]:
init = tf.global_variables_initializer()


with tf.Session() as sess:
    sess.run(init)
    result_1 = sess.run(y_1)
    result_2 = y_2.eval()     # equivalent to first format
    

print('result_1: ', result_1)
print('result_2: ', result_2)


result_1:  4
result_2:  4


Nice. The first TensorFlow calculations are working!

The above method is quite inefficient as the whole graph is calculated twice, once for each evaluation. Can we do this more efficiently? We certainly can by passing the list of to-be-calculated/executed operations:

In [4]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    result_1, result_2 = sess.run([y_1, y_2])

print('result_1: ', result_1)
print('result_2: ', result_2)

result_1:  4
result_2:  4


That is better.

So much for scalars. TensorFlow, as the name suggests, generally deals with tensors. Therefore, it is useful to look at some basic tensor manipulations and calculations.

<a name="Tensors">
### Simple Tensor Manipulations
[Back to Table of Contents](#TOC)



Roughly speaking, a tensor is simply an order $n$ generalization of a matrix. 

We start with the simplest tensor: a vector.

In [5]:
import numpy as np

tf.reset_default_graph()

v1 = tf.constant([1,2,3])
v2 = tf.constant([4,5,6])
v2

<tf.Tensor 'Const_1:0' shape=(3,) dtype=int32>

So $v_1$ has one index.

Useful operations are dot products and the 'square', i.e. the dot product of a vector with itself (or element-wise squaring): 

In [6]:
prod = tf.tensordot(v1, v2, axes=1)
v1_square = tf.square(v1)

In [7]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    v1_calc, v2_calc, prod_calc, v1_square_calc = sess.run([v1,v2,prod,v1_square])
    
print('v1: ', v1_calc)
print('v2: ', v2_calc)
print('prod: ', prod_calc)
print('v1_square: ', v1_square_calc)

v1:  [1 2 3]
v2:  [4 5 6]
prod:  32
v1_square:  [1 4 9]


Numpy inputs work as well and are usually used:

In [8]:
v3 = tf.constant(np.asarray([1,4,6]))

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    v = sess.run(v3)
    
print('v3: ', v)


v3:  [1 4 6]


These are all the expected values.

Next, we continue with matrices - transpose, matrix multiplication, and the trace:

In [9]:
tf.reset_default_graph()

a = tf.constant([[1,2,3],[4,5,6]])
b = tf.transpose(a)
c = tf.matmul(a, tf.transpose(a))   # note the transpose to match matrix indices
d = tf.trace(c)
e = tf.square(c)

print('Format of a: ', a)
print('Format of b: ', b)
print('Format of c: ', c)
print('Format of d: ', d)
print('Format of e: ', e)

Format of a:  Tensor("Const:0", shape=(2, 3), dtype=int32)
Format of b:  Tensor("transpose:0", shape=(3, 2), dtype=int32)
Format of c:  Tensor("MatMul:0", shape=(2, 2), dtype=int32)
Format of d:  Tensor("Trace:0", shape=(), dtype=int32)
Format of e:  Tensor("Square:0", shape=(2, 2), dtype=int32)


Note that we could not define *tf.matmul(a,a)* because the indices need to match. An $m\times n$ matrix can only be multiplied by a $n\times s$ matrix.

These statements above again merely define the objects, but no calculations are made.  We can however see the dimensions of the objects in question, and those are correct.

The code below - which should be obvious now - is calculating the values of the objects:

In [10]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    a_calc, b_calc, c_calc, d_calc, e_calc = sess.run([a,b,c,d,e])
    
print('Original matrix: a ', a_calc)
print('\nTranspose: b= ', b_calc)
print('\nm x m^t:  c= ', c_calc)
print('\nTrace of c: d= ', d_calc)
print('\nPoint-wise square of c: e= ', e_calc)

Original matrix: a  [[1 2 3]
 [4 5 6]]

Transpose: b=  [[1 4]
 [2 5]
 [3 6]]

m x m^t:  c=  [[14 32]
 [32 77]]

Trace of c: d=  91

Point-wise square of c: e=  [[ 196 1024]
 [1024 5929]]


These are the expected results. 

Next, let us look at re-shaping of tensors:

In [14]:
print(a)                          # will show original shape (2,3)
print(tf.reshape(a, [3,2]))       # will show new shape (3,2)           
print(tf.reshape(a, [-1,6]))      # '-1' is replaced with the proper dimension. This will show shape (1,6)
print(tf.reshape(a, [3, 1, -1]))  # an this will result in a tensor of shape (3,1, 2)

Tensor("Const:0", shape=(2, 3), dtype=int32)
Tensor("Reshape_1:0", shape=(3, 2), dtype=int32)
Tensor("Reshape_2:0", shape=(1, 6), dtype=int32)
Tensor("Reshape_3:0", shape=(3, 1, 2), dtype=int32)


In [15]:
v = tf.reshape(a, [3,2])
print(v)

Tensor("Reshape_4:0", shape=(3, 2), dtype=int32)


Lastly, out of interest (this will not be used in this tutorial), how does tf.matmul work with tensors? Here is the answer from the TensorFlow documentation:

>output[..., i, j] = sum_k (a[..., i, k] * b[..., k, j]), for all indices i, j.

Let's test that:

In [17]:
s = tf.reshape(a, [1, 3, 1, 2])
t = tf.reshape(a, [1, 3, 2, 1])
print(s)
print(t)

Tensor("Reshape_5:0", shape=(1, 3, 1, 2), dtype=int32)
Tensor("Reshape_6:0", shape=(1, 3, 2, 1), dtype=int32)


In [18]:
tf.matmul(s,t)

<tf.Tensor 'MatMul_1:0' shape=(1, 3, 1, 1) dtype=int32>

Very good, this was correct. Both factors, after suitable transposition, etc., need to have identical index structure for the first $n-2$ indices, and then the last index of the first factor needs to match the 2nd to last index of the 2nd factor.

We have now the foundations to define and train a very simple linear regression model.

<a name="LR">
## 2) Most Basic Linear Regression in TensorFlow: $y = wx + b$
[Back to Table of Contents](#TOC)


With some basics out of the way, we can now proceed with a very simple Linear Regression.

<a name="Feeds">
### Parallelizing calculations over a set of data points: Placeholders & Feed Dictionary
[Back to Table of Contents](#TOC)



First, we need to address how to best deal with the data in training and test sets.
In practice, we will have many different data points and including their values at the outside is not practical. We also may only want to use the data in batches. How can we first define x 'in general', and then feed in the actual values later at run-time?

The solution is to replace the definition of x with a **placeholder** and define a **feed dictionary** that contains the actual values. The placeholder does only specify the shape and type of the variables. 

What is the correct shape for our $x$-placeholder? For the type of problem at hand (linear regression), it is (*batch size*) x *(number of features)*. Since at this point the batch size is unknown, the corresponding dimension is set to *None* in the placeholder. 

The number of features in our simple case is 1 and so we choose $X$ to be a *(batch size) x 1* matrix. We also interpret $w$ and $b$ as 1x1 matrices: 

In [19]:
tf.reset_default_graph()

X = tf.placeholder(tf.float32, shape  = (None,1), name = "x")    # a placeholder is defined

w = tf.Variable([[1.0]],name = "w")
b = tf.Variable([[1.0]], name = "b")

Y = X * w + b

Y = tf.matmul(X, w) + b  # for the 1-feature base case equivalent to Y = X * w + b, 
                         # but in general appropriate. 

Note that in the more general case with more than 1 feature, the rows of $X$ would correspond the individual data point and the columns would capture the features. Also, $w$ would need to be defined as a column vector in this case, otherwise a tf.transpose would need to be used.

Next, we test whether the numbers work out:

In [20]:
feedDict = {X:[[3], [5]]}

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    result = sess.run([Y], feed_dict=feedDict)
print(result)

[array([[ 4.],
       [ 6.]], dtype=float32)]


Again, these are obviously the correct values and we can start the training process.

<a name="Manual">
### Training... with manually calculated gradients
[Back to Table of Contents](#TOC)



That was not hard. But we had to know and define $w$ and $b$. In general, we would have a training set with $x$ and known labels $y$, and we want to learn the weights/parameters of the model, in this case $w$ and $b$. How can that be done in TensorFlow? 

We need to introduce an iterative learning procedure, where in each step the trainable variables are updated. We begin by  defining the updates for $w$ and $b$ using gradients that we can easily calculate manually using gradient descent of the cost function $C$. Briefly recalling the gradient descent update procedure from W207:

$$ w \rightarrow w - \alpha \frac{\partial}{\partial_w} C  ; \\    
b \rightarrow b - \alpha \frac{\partial}{\partial_b} C $$

where $\alpha$ is the learning rate. With 
$$C = \frac{1}{2}\sum_i (y^i - y^i_{pred}) \times (y^i - y^i_{pred})   = \frac{1}{2}\sum_i (y^i - w x^i -b) \times (y^i - w x^i -b)  $$ we find:

$$ \frac{\partial}{\partial_w} C=  -\sum_i (y^i - w x^i -b) x^i = \sum_i (y_{pred}^i - y^i) x^i ; \\  
\frac{\partial}{\partial_b} C  = -\sum_i (y^i - w x^i -b) = \sum_i (y_{pred}^i - y^i)  $$


Now we have everything. Let's start to implement:

**a) Define placeholders for training set, initialize your weights, and define model: **

In [21]:
tf.reset_default_graph()

X = tf.placeholder(tf.float32, shape  = (None,1), name = "x")
Y = tf.placeholder(tf.float32, shape=(None,1), name = "y")          # we will feed in 'labels' as well

w = tf.Variable(tf.random_uniform(shape =(1,1)),name = "w")
b = tf.Variable(tf.random_uniform(shape=(1,)), name = "b")

Y_pred = tf.matmul(X,tf.transpose(w)) + b  # This is the generalizable way to write y = w*x + b to allow 
                                           # for more features.

**b) Define error and cost function:**

In [22]:
error = Y_pred - Y
mse = 0.5 * tf.reduce_mean(tf.square(error), name = "mse")  

**c) Define learning rate, gradients, and gradient descent training steps:**

In [23]:
learning_rate = 0.01

grad_w =  tf.matmul(tf.transpose(X) , error)
grad_b =  tf.reduce_sum(error) 

train_w = tf.assign(w,w - learning_rate * grad_w)
train_b = tf.assign(b, b - learning_rate * grad_b)

tf.assign tells TensorFlow to set the variable on the left to the value on the right upon execution.

**d) Define the Feed Dictionary. Include the training y labels as well:**

In [24]:
feedDict = {X:[[3], [5]], Y:[[4],[6]]} 

**e) Train! (Don't forget to compute the training operation...)**

In [25]:
reportStep = 500

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(5000):
        w_c, b_c,_, _, y_result, mse_result = \
        sess.run([w,b,train_w, train_b,   Y_pred, mse],feed_dict=feedDict)  # note the training steps for w and b!
        #w_c, b_c, y_result, mse_result = \
        #sess.run([w,b,  Y_pred, mse],feed_dict=feedDict)  # note the training steps for w and b!
    
        if (epoch ) % reportStep == 0:
            print('Epoch: ', epoch+1)
            print('\terror: ',mse_result)
            print('\tcurrent w: ', w_c)
            print('\tcurrent b', b_c)
    print('\nFinal values:')
    print('\tepochs: ',epoch+1)
    print('\terror: ',mse_result)
    print('\tw: ', w_c)
    print('\tb: ', b_c)
    print('\tPredicted y: ', y_result)
    print('\tActual y: ', [[4],[6]])

Epoch:  1
	error:  7.32071
	current w:  [[ 0.54205137]]
	current b [ 0.42031857]
Epoch:  501
	error:  0.0019286
	current w:  [[ 1.0603776]]
	current b [ 0.74423653]
Epoch:  1001
	error:  0.000632294
	current w:  [[ 1.03457141]]
	current b [ 0.85355324]
Epoch:  1501
	error:  0.000207307
	current w:  [[ 1.01979518]]
	current b [ 0.91614646]
Epoch:  2001
	error:  6.79674e-05
	current w:  [[ 1.01133466]]
	current b [ 0.9519859]
Epoch:  2501
	error:  2.22838e-05
	current w:  [[ 1.00649011]]
	current b [ 0.9725076]
Epoch:  3001
	error:  7.30544e-06
	current w:  [[ 1.00371623]]
	current b [ 0.98425788]
Epoch:  3501
	error:  2.39583e-06
	current w:  [[ 1.00212812]]
	current b [ 0.99098557]
Epoch:  4001
	error:  7.85369e-07
	current w:  [[ 1.00121868]]
	current b [ 0.994838]
Epoch:  4501
	error:  2.57505e-07
	current w:  [[ 1.00069785]]
	current b [ 0.99704373]

Final values:
	epochs:  5000
	error:  8.47195e-08
	w:  [[ 1.00040007]]
	b:  [ 0.99830472]
	Predicted y:  [[ 3.99950457]
 [ 6.00030565]

Great. So that is how TensorFlow learns. But calculating derivatives manually is obviously not a sustainable approach.

<a name="Optimizer">
### Training... with a TensorFlow Optimizer
[Back to Table of Contents](#TOC)


Gradient Descent is certainly a method that is anything but uncommon so it would be surprising if TensorFlow didn't have a framework where it calculates the derivatives and defines the training procedure for us. And indeed, **Optimizers** are doing the job.

The procedure is essentially the same as above, except that we change the learning steps to use TensorFlow's Gradient Descent optimizer.

** Repeat steps a) and b) from above:**

In [26]:
tf.reset_default_graph()

X = tf.placeholder(tf.float32, shape  = (None,1), name = "x")
Y = tf.placeholder(tf.float32, shape=(None,1), name = "y")

w = tf.Variable(tf.random_uniform(shape =(1,1)),name = "w")
b = tf.Variable(tf.random_uniform(shape=(1,)), name = "b")

Y_pred = tf.matmul(X,tf.transpose(w)) + b

error = Y_pred - Y
mse = 0.5 * tf.reduce_mean(tf.square(error), name = "mse")


Instead of using the manually calculated gradients and enforcing an assign-step at each iteration that updates $w$ and $b$ at every step, we'll leave this work to TensorFlow by choosing an optimizer and defining a new training step, replacing gradient calculation and the iterative re-definition of our weights/biases with:

In [27]:
learning_rate = 0.01

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
train_all = optimizer.minimize(mse)

Note that there are multiple optimizers to choose from. We used the simple gradient descent one.

From here on, the steps are exactly as before in steps d) and e):

In [28]:
feedDict = {X:[[3], [5]], Y:[[4],[6]]} 

reportStep = 500

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(5000):
        w_c, b_c,_,  y_result, mse_result = sess.run([w,b,train_all, Y_pred, mse],feed_dict=feedDict)
    
        if (epoch ) % reportStep == 0:
            print('Epoch: ', epoch+1)
            print('\terror: ',mse_result)
            print('\tcurrent w: ', w_c)
            print('\tcurrent b', b_c)
    print('\nFinal values:')
    print('\tepochs: ',epoch+1)
    print('\terror: ',mse_result)
    print('\tw: ', w_c)
    print('\tb: ', b_c)
    print('\tPredicted y: ', y_result)
    print('\tActual y: ', [[4],[6]])

Epoch:  1
	error:  0.312416
	current w:  [[ 0.98956263]]
	current b [ 0.38830844]
Epoch:  501
	error:  0.00561621
	current w:  [[ 1.10309064]]
	current b [ 0.56330132]
Epoch:  1001
	error:  0.00321629
	current w:  [[ 1.07801414]]
	current b [ 0.66952699]
Epoch:  1501
	error:  0.00184189
	current w:  [[ 1.05903757]]
	current b [ 0.74991304]
Epoch:  2001
	error:  0.00105482
	current w:  [[ 1.04467738]]
	current b [ 0.81074411]
Epoch:  2501
	error:  0.000604077
	current w:  [[ 1.03381002]]
	current b [ 0.85677898]
Epoch:  3001
	error:  0.00034595
	current w:  [[ 1.02558613]]
	current b [ 0.89161563]
Epoch:  3501
	error:  0.000198122
	current w:  [[ 1.01936281]]
	current b [ 0.91797793]
Epoch:  4001
	error:  0.000113468
	current w:  [[ 1.01465321]]
	current b [ 0.93792802]
Epoch:  4501
	error:  6.49831e-05
	current w:  [[ 1.01108944]]
	current b [ 0.95302516]

Final values:
	epochs:  5000
	error:  3.7261e-05
	w:  [[ 1.00839698]]
	b:  [ 0.96443021]
	Predicted y:  [[ 3.9896152]
 [ 6.0064187]

Simple! In various shapes or forms, this structure will be used in the class many times.

<a name="Preds">
### Predictions & Model Save/Restore 
[Back to Table of Contents](#TOC)

Now where we have trained a model, we also want to make predictions. There are multiple obvious ways, and we will discuss two here.

First off, one can simply define a second feed dictionary for the test values for $X$, and feed that into a new calculation at the very end within the same session: 


In [29]:
feedDict = {X:[[3], [5]], Y:[[4],[6]]} 

feedDict_test = {X:[[7], [9]]}                              # feed dictionary for test values

reportStep = 500

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(5000):
        w_c, b_c,_,  y_result, mse_result = sess.run([w,b,train_all, Y_pred, mse],feed_dict=feedDict)
    

    print('\nFinal model values:')

    print('\tw: ', w_c)
    print('\tb: ', b_c)

    
    y_test = sess.run(Y_pred,feed_dict=feedDict_test)         # calculation of test Y
    print('')
    print('Predicted for [[7],[9]]: ', y_test)
    print('Expected for [[7],[9]]: [[8], [10]]')
    print('Manually calculated from y = wx +b:')
    print('\t%s, %s' % (7*w_c + b_c, 9 * w_c +b_c))



Final model values:
	w:  [[ 1.00521314]]
	b:  [ 0.97791725]

Predicted for [[7],[9]]:  [[  8.01440907]
 [ 10.02483559]]
Expected for [[7],[9]]: [[8], [10]]
Manually calculated from y = wx +b:
	[[ 8.01440907]], [[ 10.02483559]]


So we see that this simple approach calculated the actual predictions. But we needed to stay within the same session.

A better way is to save and restore the model: 

In [30]:
feedDict = {X:[[3], [5]], Y:[[4],[6]]} 

reportStep = 500

saver  = tf.train.Saver()                                # Define a 'saver'

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(5000):
        w_c, b_c,_,  y_result, mse_result = sess.run([w,b,train_all, Y_pred, mse],feed_dict=feedDict)
    

    print('\nFinal model values:')

    print('\tw: ', w_c)
    print('\tb: ', b_c)
    
    print('Saving model...')
    save = saver.save(sess, 'modelSave/model_1.ckpt')      # Save model parameters to file
    print('Done!')
    print(' ')

    
    
feedDict_test = {X:[[7], [9]]} 

with tf.Session() as sess:                                # This is a new session!
    
    print('Restoring model in new session...')
    saver.restore(sess, 'modelSave/model_1.ckpt')         # Restore the weights
    
    y_test = sess.run(Y_pred,feed_dict=feedDict_test)
    print('')
    print('Predicted for [[7],[9]]: ', y_test)
    print('Expected for [[7],[9]]: [[8], [10]]')
    print('Manually calculated from y = wx +b:')
    print('\t%s, %s' % (7*w_c + b_c, 9 * w_c +b_c))



Final model values:
	w:  [[ 1.01206577]]
	b:  [ 0.9488886]
Saving model...
Done!
 
Restoring model in new session...
INFO:tensorflow:Restoring parameters from modelSave/model_1.ckpt

Predicted for [[7],[9]]:  [[  8.03334904]
 [ 10.05748081]]
Expected for [[7],[9]]: [[8], [10]]
Manually calculated from y = wx +b:
	[[ 8.03334904]], [[ 10.05748081]]


<a name="NN">
## 3) Slightly Fancier: A Simple Neural Net
[Back to Table of Contents](#TOC)


We can now apply what we learned and define a first neural network. The architecture is very simple: we have one input neuron, a hidden layer of dimension 1 (plus bias), and then the 1-d output Y. The logic is exactly as above, we just need to incorporate the hidden layer.

As before, we define placeholders for $X$ and $Y$ and define $w_1$ and $b_1$ ( we will now have a 2nd set of $w/b$):

In [31]:
tf.reset_default_graph()

X = tf.placeholder(tf.float32, shape  = (None,1), name = "x")  
Y = tf.placeholder(tf.float32, shape=(None,1), name = "y")    

w_1 = tf.Variable(tf.random_uniform(shape =(1,1)),name = "w_1")    
b_1 = tf.Variable(tf.random_uniform(shape = (1,)),name = "b_1")    

Next, we define the hidden layer L. This step is very similar to the definition of what used to be Y_pred, except that we will also add an activation function, *relu* in this case.

[Also, we could choose to rewrite: 

(tf.matmul(X, w_1) +  b_1) $\rightarrow$ (tf.nn.bias_add(tf.matmul(X, w_1), b_1))

This notation is better aligned with adding a bias term in more general neural nets, but here it is equivalent.]

In [32]:
L = tf.matmul(X, w_1) + b_1     # a better notation would be:  L = tf.nn.bias_add(tf.matmul(X, w_1), b_1)
L = tf.nn.relu(L)

In [33]:
w_2 = tf.Variable(tf.random_uniform(shape =(1,1)),name = "w_2")  # define weight for hidden layer -> output
b_2 = tf.Variable(tf.random_uniform(shape = (1,)),name = "b_2")  # define bias for hidden layer -> output   
              
Y_pred = tf.matmul(L, w_2) + b_2                                 # a better notation would be 
                                                                 # Y_pred = tf.nn.bias_add(tf.matmul(L, w_2), b_2) 

error = Y_pred - Y                                               # This is now a (batch size) vector
mse = tf.reduce_mean(tf.square(error), name = "mse")          


learning_rate = 0.01

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
train_all = optimizer.minimize(mse)


In [34]:

reportStep = 500

init = tf.global_variables_initializer()

feedDict = {X:[[3], [5]], Y:[[4],[6]]} 

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(5000):
        _, y_result, mse_result = sess.run([train_all,  Y_pred, mse],feed_dict=feedDict)
    
        if (epoch ) % reportStep == 0:
            print('Epoch: ', epoch+1)
            print('\terror: ',mse_result)
            
    w_1_calc, w_2_calc, b_1_calc,b_2_calc,y_pred_calc,mse_calc = \
        sess.run([w_1, w_2, b_1, b_2,  Y_pred, mse],feed_dict=feedDict)
    
    print('\nFinal values:')
    print('\tepochs: ',epoch+1)
    print('\terror: ',mse_result)
    print('\tPredicted Y: ', y_result)
    print('\tActual y: ', [[4],[6]])
    print('\tw_1: ',w_1_calc)
    print('\tb_1: ',b_1_calc)
    print('\tw_2: ',w_2_calc)
    print('\tb_2: ',b_2_calc)


Epoch:  1
	error:  14.9238
Epoch:  501
	error:  0.000153542
Epoch:  1001
	error:  2.07415e-05
Epoch:  1501
	error:  2.79496e-06
Epoch:  2001
	error:  3.76679e-07
Epoch:  2501
	error:  5.0781e-08
Epoch:  3001
	error:  6.79267e-09
Epoch:  3501
	error:  9.01423e-10
Epoch:  4001
	error:  1.36652e-10
Epoch:  4501
	error:  3.69482e-11

Final values:
	epochs:  5000
	error:  2.10321e-11
	Predicted Y:  [[ 4.00000525]
 [ 5.99999619]]
	Actual y:  [[4], [6]]
	w_1:  [[ 1.05612111]]
	b_1:  [ 0.38839042]
	w_2:  [[ 0.94685668]]
	b_2:  [ 0.63226956]


Needless to say, with two data points we are heavily over-training here, but that is not the point of the exercise.

This concludes the main part of this brief introduction into TensorFlow. In the appendix we will talk about a few useful topics that help with building, training, and testing Tensorflow models.  

<a name="TB">
## Appendix A) Visualization with Tensorboard
[Back to Table of Contents](#TOC)

Tensorboard is a great way to visualize i) the graph, and ii) the values of various parameters like weights or loss function over time. This is a good way to debug and optimize the model. (But don't write variables too often as it will impact performance.)

Key steps include:

1) define a log directory, where the values are written to    
2) define a file writer (similar to "myFile = open(file, 'w')" in straight python.)
3) define 'summaries' for the parameters to track      
4) at various intervals, calculate the summaries and write them to the file.   

After starting Tensorboard (tensorboard --logdir \[logdir\]), the visualziations can then (by defuault) be found at localhost:6006

In [35]:
from datetime import datetime

now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = "tf_logs"
logdir = "{}/run-{}/".format(root_logdir, now)

tf.reset_default_graph()

X = tf.placeholder(tf.float32, shape  = (None,1), name = "x")  
Y = tf.placeholder(tf.float32, shape=(None,1), name = "y")    

w = tf.Variable(tf.random_uniform(shape =(1,1)),name = "w")    
b = tf.Variable(tf.random_uniform(shape = (1,)),name = "b")    

Y_pred = tf.matmul(X, w) + b  


error = Y_pred - Y                                               
mse = tf.reduce_mean(tf.square(error), name = "mse")  


file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())  # define a writer


learning_rate = 0.01

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
train_all = optimizer.minimize(mse)


mse_summary = tf.summary.scalar('MSE', mse)                  # define the summary scalars you want to track
w_summary = tf.summary.scalar('w_1',  tf.reduce_mean(w))
b_summary = tf.summary.scalar('b_1', tf.reduce_mean(b))
merged = tf.summary.merge_all()                               # merge them for simplicity

reportStep = 500
tb_step = 100

init = tf.global_variables_initializer()

feedDict = {X:[[3], [5]], Y:[[4],[6]]} 

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(5000):
        _, y_result, mse_result = sess.run([train_all,  Y_pred, mse],feed_dict=feedDict)
    
        if (epoch ) % reportStep == 0:
            print('Epoch: ', epoch+1)
            print('\terror: ',mse_result)
        if (epoch) % tb_step == 0:                                       # don't write at every step!
            
            summary_merged = sess.run(merged, feed_dict=feedDict)        # evaluate your merged scalars 
            file_writer.add_summary(summary_merged, epoch)               # write them to your directory
            
    w_calc, b_calc,mse_calc = \
        sess.run([w, b, mse],feed_dict=feedDict)
    
    print('\nFinal values:')
    print('\tepochs: ',epoch+1)
    print('\terror: ',mse_result)
    print('\tPredicted Y: ', y_result)
    print('\tActual y: ', [[4],[6]])
    print('\tw: ',w_calc)
    print('\tb: ',b_calc)
    
    file_writer.close()


Epoch:  1
	error:  16.3699
Epoch:  501
	error:  0.000506673
Epoch:  1001
	error:  0.000166116
Epoch:  1501
	error:  5.44654e-05
Epoch:  2001
	error:  1.78582e-05
Epoch:  2501
	error:  5.85509e-06
Epoch:  3001
	error:  1.9199e-06
Epoch:  3501
	error:  6.29899e-07
Epoch:  4001
	error:  2.06716e-07
Epoch:  4501
	error:  6.78303e-08

Final values:
	epochs:  5000
	error:  2.21917e-08
	Predicted Y:  [[ 3.99982071]
 [ 6.00011063]]
	Actual y:  [[4], [6]]
	w:  [[ 1.00014508]]
	b:  [ 0.99938571]


<a name="Vars">
## Appendix B: A Better way to Manage Variables: Name Scopes & get_variable
[Back to Table of Contents](#TOC)

In deep networks, naming the individual nodes and layers can be rather intractable. A good way to manage this is through **name scopes**.  Also, using the tf.get_variable framework is a better way to define variables, as it allows for variable sharing across layers/components (with suitable option), etc:  

In [36]:
now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = "tf_logs"
logdir = "{}/run-{}/".format(root_logdir, now)

tf.reset_default_graph()

X = tf.placeholder(tf.float32, shape  = (None,1), name = "x")  
Y = tf.placeholder(tf.float32, shape=(None,1), name = "y")    

#w = tf.Variable(tf.random_uniform(shape =(1,1)),name = "w")            
#b = tf.Variable(tf.random_uniform(shape = (1,)),name = "b")  

with tf.name_scope("LM") as scope_1:
    w = tf.get_variable("w", (1,1), initializer=tf.random_normal_initializer())
    b = tf.get_variable("b", (1,), initializer=tf.random_normal_initializer())
    
    Y_pred = tf.matmul(X, w) + b    # a better notation would be:  Y_pred = tf.nn.bias_add(tf.matmul(X, w), b)


with tf.name_scope("loss") as scope:                               # Adding the name scope
    error = Y_pred - Y                                               
    mse = tf.reduce_mean(tf.square(error), name = "mse")  


file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())


learning_rate = 0.01

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
train_all = optimizer.minimize(mse)


mse_summary = tf.summary.scalar('MSE', mse)
w_summary = tf.summary.scalar('w_1',  tf.reduce_mean(w))
b_summary = tf.summary.scalar('b_1', tf.reduce_mean(b))
merged = tf.summary.merge_all()

reportStep = 500
tb_step = 100

init = tf.global_variables_initializer()

feedDict = {X:[[3], [5]], Y:[[4],[6]]} 

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(5000):
        _, y_result, mse_result = sess.run([train_all,  Y_pred, mse],feed_dict=feedDict)
    
        if (epoch) % tb_step == 0:
            
            summary_merged = sess.run(merged, feed_dict=feedDict)
            file_writer.add_summary(summary_merged, epoch)
            
    w_calc, b_calc,mse_calc = \
        sess.run([w, b, mse],feed_dict=feedDict)
    
    print('\nFinal values:')
    print('\tepochs: ',epoch+1)
    print('\terror: ',mse_result)
    print('\tPredicted Y: ', y_result)
    print('\tActual y: ', [[4],[6]])
    print('\tw: ',w_calc)
    print('\tb: ',b_calc)
    
    file_writer.close()



Final values:
	epochs:  5000
	error:  4.15357e-07
	Predicted Y:  [[ 3.99922442]
 [ 6.00047874]]
	Actual y:  [[4], [6]]
	w:  [[ 1.00062644]]
	b:  [ 0.99734581]


So? TensorFlow adds the prfix 'loss/' in front of the variable...

In [37]:
print(error)
print(mse)

Tensor("loss/sub:0", shape=(?, 1), dtype=float32)
Tensor("loss/mse:0", shape=(), dtype=float32)


In [38]:
print(Y_pred)

Tensor("LM/add:0", shape=(?, 1), dtype=float32)
