# Goal of this workshop
The goal of this workshop is to give you a thourough understanding of Tensorflow and how you can use it to effectively build and train neural network models.

We will not be running big models since we have limited time and hardware, but hopefully this will give you the skills and tools to go home later and confidently start running different experiments.

Lets start by importing some standard libraries:

In [1]:
import sys, os
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
np.set_printoptions(precision=4)

In [2]:
tf.reset_default_graph()

In [3]:
x = tf.placeholder(tf.float32, shape=[], name="x")
y = x + 2

session = tf.Session()

result = session.run(y, feed_dict={x: 3})
print(result)

5.0


Since both y and x are nodes in the graph, we can "ask" for both of them in the output of session.run (this does not run the graph twice!):

In [4]:
result = session.run([y, x], feed_dict={x: 5})
print("y = {}".format(result[0]))
print("x = {}".format(result[1]))

y = 7.0
x = 5.0


Now let's say we want a linear model, so we have some input $x$ and the output is given by $y = x^T \theta$. Let's put that into a function called _linear_, which receives an input $x$ and parameters $\theta$ and return $y$. Make the input $x$ a 3-dimensional vector and use a constant variable $\theta$ by using tf.constant(value).

Given info:
    remember that $x^T \theta = \sum_i x_i \cdot \theta_i$

In [5]:
def linear(x, theta):
    return tf.reduce_sum(tf.multiply(theta, x), axis=0)

x = tf.placeholder(tf.float32, shape=[3], name="x")
theta = tf.constant(np.array([1.,1.,1.], dtype=np.float32), name="theta")
y = linear(x, theta)

session = tf.Session()

result = session.run(y, feed_dict={x: np.array([1.,2.,3.])})
print("y = {}".format(result))

y = 6.0


Ok so at this point we have a function that can take one input and return its output according to a simple linear model. However, we usually want to pass many inputs at the same time (this is what we call a batch), and get all the outputs. So let's change our placeholder in order for it to accept 4 input vectors (lets make x a 4x3 placeholder). Remember we now want our output to have 4 elements, one for each input array.

In [6]:
def linear(x, theta):
    return tf.reduce_sum(tf.multiply(x, theta), axis=1)

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
theta = tf.constant(np.array([1.,1.,1.], dtype=np.float32), name="theta")
y = linear(x, theta)

session = tf.Session()

result = session.run(y, feed_dict={x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]])})
print("y = {}".format(result))

y = [  6.  15.  24.  33.]


Furthermore, we also want the outputs to have higher dimension than 1, so let's make our outputs are 2 dimensional. This means that now $\theta$ is not a vector but a input_dim x out_dim matrix, so 3x2 in this case. So go ahead and change the matrix accordingly and check if the results are correct. Your $y$ now should be 4x2 dimensional.

Also, using tf.reduce_sum and tf.multiply is ugly and won't work in this case, so let's use what we should have been using in the first place: tf.matmul, which multiplies two matrices, which is what we want!

In [7]:
def linear(x, theta):
    return tf.matmul(x, theta)

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
theta = tf.constant(np.array([[1,2],[1,2],[1,2]], dtype=np.float32), name="theta")
y = linear(x, theta)

session = tf.Session()

result = session.run(y, feed_dict={x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]])})
print("y = {}".format(result))

y = [[  6.  12.]
 [ 15.  30.]
 [ 24.  48.]
 [ 33.  66.]]


As you can see we are heading towards a linear regression implementation here. We have a linear model, our inputs, outputs, and parameters. However, we need a few more things before we can make it learn.

First of all we need targets, right? For linear regression we have our inputs, but we also have a set of target outputs, to which the model's outputs $y$ must be close. So let's create another placeholder which will receive the target outputs and compute the squared difference between that and the model's output for each item in the batch. __I will provide some data here__ Remember target outputs are supposed to be the same shape as the model's outputs, so that we can compute their difference. Also, change session.run so that it returns the cost and not the outputs y.


In [8]:
def linear(x, theta):
    return tf.matmul(x, theta)

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
theta = tf.constant(np.array([[1,2],[1,2],[1,2]], dtype=np.float32), name="theta")
y = linear(x, theta)

target = tf.placeholder(tf.float32, shape=[4, 2], name="target")

cost = tf.reduce_sum(tf.square(y-target), axis=1)

session = tf.Session()

feed_dict = {
    x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]]),
    target: np.array([[3,6], [7.5, 15], [12, 24], [16.5, 33]])
}
result = session.run(cost, feed_dict=feed_dict)
print("cost = {}".format(result))

cost = [   45.     281.25   720.    1361.25]


Another thing we are missing we that the parameters $\theta$ that we want to optimized are defined as a constant in the graph, so we can change it! What we want is to define $\theta$ as a variable instead: tf.Variable.
One thing we have to bear in mind is that unlike the constant, a Variable node has no value before it is initialized (its like being empty). So we have to run session.run(theta.initializer) before running operations that depend on it.

In [9]:
def linear(x, theta):
    return tf.matmul(x, theta)

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
theta = tf.Variable(np.array([[1,2],[1,2],[1,2]], dtype=np.float32), name="theta")
y = linear(x, theta)

target = tf.placeholder(tf.float32, shape=[4, 2], name="target")

cost = tf.reduce_sum(tf.square(y-target), axis=1)

session = tf.Session()
session.run(theta.initializer)

feed_dict = {
    x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]]),
    target: np.array([[3,6], [7.5, 15], [12, 24], [16.5, 33]])
}
result = session.run(cost, feed_dict=feed_dict)
print("cost = {}".format(result))

cost = [   45.     281.25   720.    1361.25]


Great! Now we have inputs, outputs, targets, parameters and the cost. What else do we need? We need to know how to change $\theta$ in order to make the cost smaller. 
__Explain what the cost is here, and how to perform gradient descent__

In [10]:
def linear(x, theta):
    return tf.matmul(x, theta)

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
theta = tf.Variable(np.array([[1,2],[1,2],[1,2]], dtype=np.float32), name="theta")
y = linear(x, theta)

target = tf.placeholder(tf.float32, shape=[4, 2], name="target")

cost = tf.reduce_mean(tf.reduce_sum(tf.square(y-target), axis=1))
gradient = tf.gradients(cost, theta)

session = tf.Session()
session.run(theta.initializer)

In [11]:
feed_dict = {
    x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]]),
    target: np.array([[3,6], [7.5, 15], [12, 24], [16.5, 33]])
}
result = session.run([cost, gradient], feed_dict=feed_dict)
print("cost = {}".format(result[0]))
print("gradient = {}".format(result[1]))

cost = 601.875
gradient = [array([[ 141. ,  282. ],
       [ 160.5,  321. ],
       [ 180. ,  360. ]], dtype=float32)]


Now we have to update our variable $\theta$. Since Variables are special in that their value does not depend on the input, and if we change their value we want it to remain for the next sess.run's, there are special method to update them. So, according to gradient descent we want to perform __formula__
We can change the value using tf.assign.

In [12]:
def linear(x, theta):
    return tf.matmul(x, theta)

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
theta = tf.Variable(np.array([[1,2],[1,2],[1,2]], dtype=np.float32), name="theta")
y = linear(x, theta)

target = tf.placeholder(tf.float32, shape=[4, 2], name="target")

cost = tf.reduce_mean(tf.reduce_sum(tf.square(y-target), axis=1))
gradient = tf.gradients(cost, [theta])

update_op = tf.assign(theta, theta - 0.001 * gradient[0])

session = tf.Session()
session.run(theta.initializer)

In [13]:
feed_dict = {
    x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]]),
    target: np.array([[3,6], [7.5, 15], [12, 24], [16.5, 33]])
}
result = session.run([cost, gradient, update_op], feed_dict=feed_dict)
print("cost = {}".format(result[0]))
print("gradient = {}".format(result[1]))

cost = 601.875
gradient = [array([[ 141. ,  282. ],
       [ 160.5,  321. ],
       [ 180. ,  360. ]], dtype=float32)]


Since we don't want to press enter everytime, let's put this session.run cell into a for loop and run it for, say, 20 iterations, printing out the cost (you don't need to print the gradient).

In [27]:
def linear(x, theta):
    return tf.matmul(x, theta)

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
theta = tf.Variable(np.array([[1,2],[1,2],[1,2]], dtype=np.float32), name="theta")
y = linear(x, theta)

target = tf.placeholder(tf.float32, shape=[4, 2], name="target")

cost = tf.reduce_mean(tf.reduce_sum(tf.square(y-target), axis=1))
gradient = tf.gradients(cost, [theta])

update_op = tf.assign(theta, theta - gradient[0])

session = tf.Session()
session.run(theta.initializer)

In [28]:
feed_dict = {
    x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]]),
    target: np.array([[3,6], [7.5, 15], [12, 24], [16.5, 33]])
}

for i in range(20):
    result = session.run([cost, gradient, update_op], feed_dict=feed_dict)
    print("cost = {:.4f}".format(result[0]))

cost = 601.8750
cost = 62856404.0000
cost = 6564532453376.0000
cost = 685579859656704000.0000
cost = 71599879696999390380032.0000
cost = 7477674298894095899139506176.0000
cost = 780945653460402238085653860450304.0000
cost = 81559605215794493457106262924076253184.0000
cost = inf
cost = inf
cost = inf
cost = inf
cost = inf
cost = inf
cost = inf
cost = inf
cost = inf
cost = nan
cost = nan
cost = nan


Oh no, our cost is going up! Does anyone know why that is happening?
__lengthy discussion about learning rate here__
We need to use a smaller learning rate, so go ahead and multiply the gradient by a small number until you find one that makes the cost go to zero. (settle for 0.001)

In [29]:
def linear(x, theta):
    return tf.matmul(x, theta)

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
theta = tf.Variable(np.array([[1,2],[1,2],[1,2]], dtype=np.float32), name="theta")
y = linear(x, theta)

target = tf.placeholder(tf.float32, shape=[4, 2], name="target")

cost = tf.reduce_mean(tf.reduce_sum(tf.square(y-target), axis=1))
gradient = tf.gradients(cost, [theta])

update_op = tf.assign(theta, theta - 0.001*gradient[0])

session = tf.Session()
session.run(theta.initializer)

In [30]:
feed_dict = {
    x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]]),
    target: np.array([[3,6], [7.5, 15], [12, 24], [16.5, 33]])
}

for i in range(20):
    result = session.run([cost, gradient, update_op], feed_dict=feed_dict)
    print("cost = {:.4f}".format(result[0]))

cost = 601.8750
cost = 274.9148
cost = 125.5756
cost = 57.3649
cost = 26.2097
cost = 11.9795
cost = 5.4799
cost = 2.5111
cost = 1.1551
cost = 0.5358
cost = 0.2529
cost = 0.1237
cost = 0.0646
cost = 0.0376
cost = 0.0253
cost = 0.0196
cost = 0.0171
cost = 0.0159
cost = 0.0153
cost = 0.0150


So there it is, you just implemented your own linear regression in Tensorflow! Pat yourselves in the back and let's get right into using this to build neural networks.

First of all, we need a better dataset, this one is not even real! So to start off, we are going to use the Iris petal dataset __describe dataset__
This is a classification problem, and right now what we are doing is regression, so before we start using our actual dataset let's see how we can change our code to perform classification.

In [6]:
def linear(x, theta):
    return tf.matmul(x, theta)

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
theta = tf.Variable(np.array([[1,2],[1,2],[1,2]], dtype=np.float32), name="theta")
y = tf.nn.softmax(linear(x, theta), 1)

target = tf.placeholder(tf.float32, shape=[4, 2], name="target")

cost = tf.reduce_mean(tf.reduce_sum(tf.square(y-target), axis=1))
gradient = tf.gradients(cost, [theta])

update_op = tf.assign(theta, theta - 0.001*gradient[0])

session = tf.Session()
session.run(theta.initializer)

In [13]:
feed_dict = {
    x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]]),
    target: np.array([[0,1], [1, 0], [1, 0], [1, 0]])
}

for i in range(20):
    result = session.run([cost, gradient, update_op], feed_dict=feed_dict)
    print("cost = {:.4f}".format(result[0]))
    print(result[1])

cost = 1.5000
[array([[  4.8748e-06,  -4.7874e-06],
       [  1.0668e-05,  -1.0558e-05],
       [  1.6460e-05,  -1.6329e-05]], dtype=float32)]
cost = 1.5000
[array([[  4.8748e-06,  -4.7874e-06],
       [  1.0668e-05,  -1.0558e-05],
       [  1.6460e-05,  -1.6329e-05]], dtype=float32)]
cost = 1.5000
[array([[  4.8748e-06,  -4.7874e-06],
       [  1.0668e-05,  -1.0558e-05],
       [  1.6460e-05,  -1.6329e-05]], dtype=float32)]
cost = 1.5000
[array([[  4.8748e-06,  -4.7874e-06],
       [  1.0668e-05,  -1.0558e-05],
       [  1.6460e-05,  -1.6329e-05]], dtype=float32)]
cost = 1.5000
[array([[  4.8748e-06,  -4.7874e-06],
       [  1.0668e-05,  -1.0558e-05],
       [  1.6460e-05,  -1.6329e-05]], dtype=float32)]
cost = 1.5000
[array([[  4.8748e-06,  -4.7874e-06],
       [  1.0668e-05,  -1.0558e-05],
       [  1.6460e-05,  -1.6329e-05]], dtype=float32)]
cost = 1.5000
[array([[  4.8748e-06,  -4.7874e-06],
       [  1.0668e-05,  -1.0558e-05],
       [  1.6460e-05,  -1.6329e-05]], dtype=float32)]