# Goal of this workshop
The goal of this workshop is to give you a thourough understanding of Tensorflow and how you can use it to effectively build and train neural network models.

We will not be running big models since we have limited time and hardware, but hopefully this will give you the skills and tools to go home later and confidently start running different experiments.

Lets start by importing some standard libraries:

In [1]:
import sys, os
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
np.set_printoptions(precision=4)

__take a moment to review numpy and talk about how operations are element-wise, etc__

In [2]:
tf.reset_default_graph()

In [3]:
x = tf.placeholder(tf.float32, shape=[], name="x")
y = x + 2

session = tf.Session()

result = session.run(y, feed_dict={x: 3})
print(result)

5.0


Since both y and x are nodes in the graph, we can "ask" for both of them in the output of session.run (this does not run the graph twice!):

In [4]:
result = session.run([y, x], feed_dict={x: 5})
print("y = {}".format(result[0]))
print("x = {}".format(result[1]))

y = 7.0
x = 5.0


Now let's say we want a linear model, so we have some input $x$ and the output is given by $y = x^T w + b$. Let's put that into a function called _linear_, which receives an input $x$ and weights $w$ and bias $b$ and returns $y$. Make the input $x$ a 3-dimensional vector and use a constant variable $w$ and $b$ by using tf.constant(value).

Given info:
    remember that $x^T w = \sum_i x_i \cdot w_i$

In [5]:
def linear(inp, weights, bias):
    return tf.reduce_sum(tf.multiply(weights, inp)) + bias

x = tf.placeholder(tf.float32, shape=[3], name="x")
w = tf.constant(np.array([1.,1.,1.], dtype=np.float32), name="w")
b = tf.constant(0.0, name="b")
y = linear(x, w, b)

session = tf.Session()

result = session.run(y, feed_dict={x: np.array([1.,2.,3.])})
print("y = {}".format(result))

y = 6.0


Ok so at this point we have a function that can take one input and return its output according to a simple linear model. However, we usually want to pass many inputs at the same time (this is what we call a batch), and get all the outputs. So let's change our placeholder in order for it to accept 4 input vectors (lets make x a 4x3 placeholder). Remember we now want our output to have 4 elements, one for each input array.
__depict this batch multiplication, and show that we need to ue axis=1 in the reduce_sum__

In [6]:
def linear(inp, weights, bias):
    return tf.reduce_sum(tf.multiply(weights, inp), axis=1) + bias

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
w = tf.constant(np.array([1.,1.,1.], dtype=np.float32), name="w")
b = tf.constant(0.0, name="b")
y = linear(x, w, b)

session = tf.Session()

result = session.run(y, feed_dict={x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]])})
print("y = {}".format(result))

y = [  6.  15.  24.  33.]


Furthermore, we also want the outputs to have higher dimension than 1, so let's make our outputs are 2 dimensional. This means that now $w$ is not a vector but a input_dim x out_dim matrix, so 3x2 in this case, and $b$ is a vector of length 2. So go ahead and change the matrix accordingly and check if the results are correct. Your $y$ now should be 4x2 dimensional.

Also, using tf.reduce_sum and tf.multiply is ugly and won't work in this case, so let's use what we should have been using in the first place: tf.matmul, which multiplies two matrices, which is what we want!

In [7]:
def linear(inp, weights, bias):
    return tf.matmul(inp, weights) + bias

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
w = tf.constant(np.array([[1,2],[1,2],[1,2]], dtype=np.float32), name="w")
b = tf.constant(np.array([0,0], dtype=np.float32), name="b")
y = linear(x, w, b)

session = tf.Session()

result = session.run(y, feed_dict={x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]])})
print("y = {}".format(result))

y = [[  6.  12.]
 [ 15.  30.]
 [ 24.  48.]
 [ 33.  66.]]


As you can see we are heading towards a linear regression implementation here. We have a linear model, our inputs, outputs, and parameters. However, we need a few more things before we can make it learn.

First of all we need targets, right? For linear regression we have our inputs, but we also have a set of target outputs, to which the model's outputs $y$ must be close. So let's create another placeholder which will receive the target outputs and compute the squared difference between that and the model's output for each item in the batch. __I will provide some data here__ Remember target outputs are supposed to be the same shape as the model's outputs, so that we can compute their difference. Also, change session.run so that it returns the cost and not the outputs y.


In [8]:
def linear(inp, weights, bias):
    return tf.matmul(inp, weights) + bias

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
w = tf.constant(np.array([[1,2],[1,2],[1,2]], dtype=np.float32), name="w")
b = tf.constant(np.array([0,0], dtype=np.float32), name="b")
y = linear(x, w, b)

target = tf.placeholder(tf.float32, shape=[4, 2], name="target")

cost = tf.reduce_sum(tf.square(y-target), axis=1)

session = tf.Session()

feed_dict = {
    x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]]),
    target: np.array([[3,6], [7.5, 15], [12, 24], [16.5, 33]])
}
result = session.run(cost, feed_dict=feed_dict)
print("cost = {}".format(result))

cost = [   45.     281.25   720.    1361.25]


Another thing we are missing we that the parameters $w$ and $b$ that we want to optimized are defined as a constant in the graph, so we can't change them! What we want is to define $w$ and $b$ as a variables instead: tf.Variable.
One thing we have to bear in mind is that unlike the constant, a Variable node has no value before it is initialized (its like being empty). So we have to run session.run(theta.initializer) before running operations that depend on it.

In [9]:
def linear(inp, weights, bias):
    return tf.matmul(inp, weights) + bias

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
w = tf.Variable(np.array([[1,2],[1,2],[1,2]], dtype=np.float32), name="w")
b = tf.Variable(np.array([0,0], dtype=np.float32), name="b")
y = linear(x, w, b)

target = tf.placeholder(tf.float32, shape=[4, 2], name="target")

cost = tf.reduce_sum(tf.square(y-target), axis=1)

session = tf.Session()
session.run(w.initializer)
session.run(b.initializer)

feed_dict = {
    x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]]),
    target: np.array([[3,6], [7.5, 15], [12, 24], [16.5, 33]])
}
result = session.run(cost, feed_dict=feed_dict)
print("cost = {}".format(result))

cost = [   45.     281.25   720.    1361.25]


Great! Now we have inputs, outputs, targets, parameters and the cost. What else do we need? We need to know how to change $\theta$ in order to make the cost smaller. 
__Explain what the cost is here, and how to perform gradient descent__

Let's start by computing the gradient of the cost with respect to all our parameters, which are $w$ and $b$. Tensorflow provides a way to access all variables by calling tf.global_variables()

In [2]:
tf.reset_default_graph()
def linear(inp, weights, bias):
    return tf.matmul(inp, weights) + bias

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
w = tf.Variable(np.array([[1,2],[1,2],[1,2]], dtype=np.float32), name="w")
b = tf.Variable(np.array([0,0], dtype=np.float32), name="b")
y = linear(x, w, b)

target = tf.placeholder(tf.float32, shape=[4, 2], name="target")

cost = tf.reduce_mean(tf.reduce_sum(tf.square(y-target), axis=1))
gradients = tf.gradients(cost, tf.trainable_variables())

session = tf.Session()
session.run(w.initializer)
session.run(b.initializer)

In [3]:
feed_dict = {
    x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]]),
    target: np.array([[3,6], [7.5, 15], [12, 24], [16.5, 33]])
}
result = session.run([cost, gradients], feed_dict=feed_dict)
print("cost = {}".format(result[0]))
print("gradients = {}".format(result[1]))

cost = 601.875
gradients = [array([[ 141. ,  282. ],
       [ 160.5,  321. ],
       [ 180. ,  360. ]], dtype=float32), array([ 19.5,  39. ], dtype=float32)]


Now we have to update our parameters. Since Variables are special in that their value does not depend on the input, and if we change their value we want it to remain for the next sess.run's, there are special method to update them. So, according to gradient descent we want to perform __formula__
We can change the value using tf.assign. If we wanted to change a single variable, we would do tf.assign(var, new_value), since want to update all the variables, lets use it like this instead
[tf.assign(var, var - 0.001 * grad) for var, grad in zip(tf.trainable_variables(), gradients)]
just copy it to your code.

In [9]:
tf.reset_default_graph()

def linear(inp, weights, bias):
    return tf.matmul(inp, weights) + bias

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
w = tf.Variable(np.array([[1,2],[1,2],[1,2]], dtype=np.float32), name="w")
b = tf.Variable(np.array([0,0], dtype=np.float32), name="b")
y = linear(x, w, b)

target = tf.placeholder(tf.float32, shape=[4, 2], name="target")

cost = tf.reduce_mean(tf.reduce_sum(tf.square(y-target), axis=1))
gradients = tf.gradients(cost, tf.trainable_variables())

update_op = tf.group(*[tf.assign(var, var - grad) for var, grad in zip(tf.trainable_variables(), gradients)])

session = tf.Session()
session.run(w.initializer)
session.run(b.initializer)

In [11]:
feed_dict = {
    x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]]),
    target: np.array([[3,6], [7.5, 15], [12, 24], [16.5, 33]])
}
result = session.run([cost, gradients, update_op], feed_dict=feed_dict)
print("cost = {}".format(result[0]))

cost = 63474308.0


Since we don't want to press enter everytime, let's put this session.run cell into a for loop and run it for, say, 20 iterations, printing out the cost (you don't need to print the gradient).

In [12]:
tf.reset_default_graph()
def linear(inp, weights, bias):
    return tf.matmul(inp, weights) + bias

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
w = tf.Variable(np.array([[1,2],[1,2],[1,2]], dtype=np.float32), name="w")
b = tf.Variable(np.array([0,0], dtype=np.float32), name="b")
y = linear(x, w, b)

target = tf.placeholder(tf.float32, shape=[4, 2], name="target")

cost = tf.reduce_mean(tf.reduce_sum(tf.square(y-target), axis=1))
gradients = tf.gradients(cost, tf.trainable_variables())

update_op = tf.group(*[tf.assign(var, var - grad) for var, grad in zip(tf.trainable_variables(), gradients)])

session = tf.Session()
session.run(w.initializer)
session.run(b.initializer)

In [13]:
feed_dict = {
    x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]]),
    target: np.array([[3,6], [7.5, 15], [12, 24], [16.5, 33]])
}

for i in range(20):
    result = session.run([cost, gradients, update_op], feed_dict=feed_dict)
    print("cost = {:.4f}".format(result[0]))

cost = 601.8750
cost = 63474308.0000
cost = 6694441582592.0000
cost = 706042320805429248.0000
cost = 74464124024010752131072.0000
cost = 7853504425729086972177154048.0000
cost = 828285176903947277824201001533440.0000
cost = inf
cost = inf
cost = inf
cost = inf
cost = inf
cost = inf
cost = inf
cost = inf
cost = inf
cost = inf
cost = nan
cost = nan
cost = nan


Oh no, our cost is going up! Does anyone know why that is happening?
__lengthy discussion about learning rate here__
We need to use a smaller learning rate, so go ahead and multiply the gradient by a small number until you find one that makes the cost go to zero. (settle for 0.001)

In [14]:
tf.reset_default_graph()
def linear(inp, weights, bias):
    return tf.matmul(inp, weights) + bias

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
w = tf.Variable(np.array([[1,2],[1,2],[1,2]], dtype=np.float32), name="w")
b = tf.Variable(np.array([0,0], dtype=np.float32), name="b")
y = linear(x, w, b)

target = tf.placeholder(tf.float32, shape=[4, 2], name="target")

cost = tf.reduce_mean(tf.reduce_sum(tf.square(y-target), axis=1))
gradients = tf.gradients(cost, tf.trainable_variables())

update_op = tf.group(*[tf.assign(var, var - 0.001 * grad) for var, grad in zip(tf.trainable_variables(), gradients)])

session = tf.Session()
session.run(w.initializer)
session.run(b.initializer)

In [15]:
feed_dict = {
    x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]]),
    target: np.array([[3,6], [7.5, 15], [12, 24], [16.5, 33]])
}

for i in range(20):
    result = session.run([cost, gradients, update_op], feed_dict=feed_dict)
    print("cost = {:.4f}".format(result[0]))

cost = 601.8750
cost = 273.6334
cost = 124.4133
cost = 56.5772
cost = 25.7386
cost = 11.7193
cost = 5.3459
cost = 2.4486
cost = 1.1314
cost = 0.5325
cost = 0.2602
cost = 0.1364
cost = 0.0801
cost = 0.0544
cost = 0.0427
cost = 0.0373
cost = 0.0348
cost = 0.0337
cost = 0.0331
cost = 0.0328


So there it is, you just implemented your own linear regression in Tensorflow! Pat yourselves in the back and let's get right into using this to build neural networks.

First of all, we need a better dataset, this one is not even real! So to start off, we are going to use the Iris petal dataset __describe dataset__
This is a classification problem, and right now what we are doing is regression, so before we start using our actual dataset let's take a moment to talk about classification and logistic regression.

__lengthy discussion of logistic regression and softmax__

Ok, now that we understand how logistic regression works, lets change our code to perform classification. First, change your target outputs to these __provide targets__ and create a negative loglikelihood cost function __as formula shown__.

In [77]:
tf.reset_default_graph()

def negloglikelihood_cost(out, target_out):
    return -tf.reduce_sum(target_out * tf.log(out + 1e-7) + (1.0 - target_out) * tf.log(1.0 - out + 1e-7), axis=1)

def linear(inp, weights, bias):
    return tf.matmul(inp, weights) + bias

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
w = tf.Variable(np.array([[1,2],[1,2],[1,2]], dtype=np.float32), name="w")
b = tf.Variable(np.array([0,0], dtype=np.float32), name="b")
y = tf.nn.softmax(linear(x, w, b), 1)

target = tf.placeholder(tf.float32, shape=[4, 2], name="target")

cost = tf.reduce_mean(negloglikelihood_cost(y, target))
gradients = tf.gradients(cost, tf.trainable_variables())

update_op = tf.group(*[tf.assign(var, var - 0.001 * grad) for var, grad in zip(tf.trainable_variables(), gradients)])

session = tf.Session()
session.run(w.initializer)
session.run(b.initializer)

In [78]:
feed_dict = {
    x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]]),
    target: np.array([[0,1], [0, 1], [1, 0], [1, 0]])
}

for i in range(20):
    result = session.run([cost, gradients, update_op], feed_dict=feed_dict)
    print("cost = {:.4f}".format(result[0]))

cost = 16.1192
cost = 16.1192
cost = 16.1192
cost = 16.1192
cost = 16.1192
cost = 16.1192
cost = 16.1192
cost = 16.1192
cost = 16.1192
cost = 16.1192
cost = 16.1192
cost = 16.1192
cost = 16.1192
cost = 16.1192
cost = 16.1192
cost = 16.1192
cost = 16.1192
cost = 16.1192
cost = 16.1192
cost = 16.1192


Ok, our cost is going down, but it is doing so kinda slowly... This highlights a very important aspect of training neural networks: initializations and dimensionality matter a lot!
__exaplain that weights should be small and inputs should be centered, or at least scaled__
So lets instead initialize the weight matrix with values 10 times smaller and also divide the input vector by 12 (let's assume 12 is the maximum value), and see how things work now.

In [79]:
tf.reset_default_graph()

def negloglikelihood_cost(out, target_out):
    return -tf.reduce_sum(target_out * tf.log(out + 1e-7) + (1.0 - target_out) * tf.log(1.0 - out + 1e-7), axis=1)

def linear(inp, weights, bias):
    return tf.matmul(inp, weights) + bias

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
w = tf.Variable(np.array([[1,2],[1,2],[1,2]], dtype=np.float32)/10, name="w")
b = tf.Variable(np.array([0,0], dtype=np.float32), name="b")
y = tf.nn.softmax(linear(x, w, b), 1)
target = tf.placeholder(tf.float32, shape=[4, 2], name="target")

cost = tf.reduce_mean(negloglikelihood_cost(y, target))
gradients = tf.gradients(cost, tf.trainable_variables())

update_op = tf.group(*[tf.assign(var, var - 0.001 * grad) for var, grad in zip(tf.trainable_variables(), gradients)])

session = tf.Session()
session.run(w.initializer)
session.run(b.initializer)

In [80]:
feed_dict = {
    x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]])/12.,
    target: np.array([[0,1], [0, 1], [1, 0], [1, 0]])
}

for i in range(20):
    result = session.run([cost, gradients, update_op], feed_dict=feed_dict)
    print("cost = {:.4f}".format(result[0]))

cost = 1.4696
cost = 1.4691
cost = 1.4685
cost = 1.4679
cost = 1.4674
cost = 1.4668
cost = 1.4662
cost = 1.4657
cost = 1.4651
cost = 1.4645
cost = 1.4640
cost = 1.4634
cost = 1.4629
cost = 1.4623
cost = 1.4618
cost = 1.4612
cost = 1.4607
cost = 1.4601
cost = 1.4596
cost = 1.4590


Ok, now the cost is much lower, even before we start training, and thats great! However, it is still going down kinda slow. Any idea why that might be? __audience input__ Yes, now our learning rate is too small! As you see, the learning rate has to be balanced according to the usual dimensionality of our weights and outputs, and this takes some magic to set sometimes, although there are smarted methods to set them, as we will see later.
Set your learning rate to 0.1 and run again.

In [81]:
tf.reset_default_graph()

def negloglikelihood_cost(out, target_out):
    return -tf.reduce_sum(target_out * tf.log(out + 1e-7) + (1.0 - target_out) * tf.log(1.0 - out + 1e-7), axis=1)

def linear(inp, weights, bias):
    return tf.matmul(inp, weights) + bias

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
w = tf.Variable(np.array([[1,2],[1,2],[1,2]], dtype=np.float32)/10, name="w")
b = tf.Variable(np.array([0,0], dtype=np.float32), name="b")
y = tf.nn.softmax(linear(x, w, b), 1)
target = tf.placeholder(tf.float32, shape=[4, 2], name="target")

cost = tf.reduce_mean(negloglikelihood_cost(y, target))
gradients = tf.gradients(cost, tf.trainable_variables())

update_op = tf.group(*[tf.assign(var, var - 1.0 * grad) for var, grad in zip(tf.trainable_variables(), gradients)])

session = tf.Session()
session.run(w.initializer)
session.run(b.initializer)

In [82]:
feed_dict = {
    x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]])/12.,
    target: np.array([[0,1], [0, 1], [0, 1], [1, 0]])
}

for i in range(20):
    result = session.run([cost, gradients, update_op], feed_dict=feed_dict)
    print("cost = {:.4f}".format(result[0]))

cost = 1.3696
cost = 1.2155
cost = 1.0832
cost = 0.9888
cost = 0.9106
cost = 0.8478
cost = 0.7952
cost = 0.7510
cost = 0.7133
cost = 0.6806
cost = 0.6521
cost = 0.6268
cost = 0.6043
cost = 0.5839
cost = 0.5655
cost = 0.5487
cost = 0.5333
cost = 0.5191
cost = 0.5059
cost = 0.4937


Great, now our cost is decreasing steadily. 
Let's just add another metric to measure the progress of our training. Since now we have classes, we can calculate the accuracy, which is easier to interpret than that weird negative loglikelihood quantity.
Just copy the following function to your code, and put it in the fetch list alongside the cost and the rest (and print it).

In [83]:
tf.reset_default_graph()

def negloglikelihood_cost(out, target_out):
    return -tf.reduce_sum(target_out * tf.log(out + 1e-7) + (1.0 - target_out) * tf.log(1.0 - out + 1e-7), axis=1)

def accuracy(out, target_out):
    correct = tf.count_nonzero(tf.cast(tf.equal(tf.argmax(out, axis=1), tf.argmax(target_out, axis=1)), tf.float32))
    total = out.get_shape()[0]
    return correct/total * 100

def linear(inp, weights, bias):
    return tf.matmul(inp, weights) + bias

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
w = tf.Variable(np.array([[1,2],[1,2],[1,2]], dtype=np.float32)/10, name="w")
b = tf.Variable(np.array([0,0], dtype=np.float32), name="b")
y = tf.nn.softmax(linear(x, w, b), 1)

target = tf.placeholder(tf.float32, shape=[4, 2], name="target")

cost = tf.reduce_mean(negloglikelihood_cost(y, target))
acc = accuracy(y, target)
gradients = tf.gradients(cost, tf.trainable_variables())

update_op = tf.group(*[tf.assign(var, var - 1.0 * grad) for var, grad in zip(tf.trainable_variables(), gradients)])

session = tf.Session()
session.run(w.initializer)
session.run(b.initializer)

In [84]:
feed_dict = {
    x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]])/12.,
    target: np.array([[0,1], [0, 1], [1, 0], [1, 0]])
}

for i in range(20):
    result = session.run([cost, acc, gradients, update_op], feed_dict=feed_dict)
    print("cost = {:.4f}, acc = {}%".format(result[0], result[1]))

cost = 1.4696, acc = 50.0%
cost = 1.2767, acc = 50.0%
cost = 1.0922, acc = 100.0%
cost = 0.9705, acc = 75.0%
cost = 0.8672, acc = 100.0%
cost = 0.7912, acc = 75.0%
cost = 0.7302, acc = 100.0%
cost = 0.6809, acc = 100.0%
cost = 0.6395, acc = 100.0%
cost = 0.6042, acc = 100.0%
cost = 0.5735, acc = 100.0%
cost = 0.5465, acc = 100.0%
cost = 0.5227, acc = 100.0%
cost = 0.5014, acc = 100.0%
cost = 0.4823, acc = 100.0%
cost = 0.4650, acc = 100.0%
cost = 0.4492, acc = 100.0%
cost = 0.4348, acc = 100.0%
cost = 0.4216, acc = 100.0%
cost = 0.4094, acc = 100.0%


Now feel free to take a few minutes to run more iterations and see that the cost goes down, and to mess with the learning rate to see how that affects the speed of convergence. __wait 5 minutes and ask for questions from the audience__

Ok, now lets do a tiny change to our dataset: change the third target from [1,0] to [0,1]

In [83]:
tf.reset_default_graph()

def negloglikelihood_cost(out, target_out):
    return -tf.reduce_sum(target_out * tf.log(out + 1e-7) + (1.0 - target_out) * tf.log(1.0 - out + 1e-7), axis=1)

def accuracy(out, target_out):
    correct = tf.count_nonzero(tf.cast(tf.equal(tf.argmax(out, axis=1), tf.argmax(target_out, axis=1)), tf.float32))
    total = out.get_shape()[0]
    return correct/total * 100

def linear(inp, weights, bias):
    return tf.matmul(inp, weights) + bias

x = tf.placeholder(tf.float32, shape=[4, 3], name="x")
w = tf.Variable(np.array([[1,2],[1,2],[1,2]], dtype=np.float32)/10, name="w")
b = tf.Variable(np.array([0,0], dtype=np.float32), name="b")
y = tf.nn.softmax(linear(x, w, b), 1)

target = tf.placeholder(tf.float32, shape=[4, 2], name="target")

cost = tf.reduce_mean(negloglikelihood_cost(y, target))
acc = accuracy(y, target)
gradients = tf.gradients(cost, tf.trainable_variables())

update_op = tf.group(*[tf.assign(var, var - 1.0 * grad) for var, grad in zip(tf.trainable_variables(), gradients)])

session = tf.Session()
session.run(w.initializer)
session.run(b.initializer)

In [86]:
feed_dict = {
    x: np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]])/12.,
    target: np.array([[0,1], [1, 0], [0, 1], [1, 0]])
}

for i in range(100):
    result = session.run([cost, acc, gradients, update_op], feed_dict=feed_dict)
    print("cost = {:.4f}, acc = {}%".format(result[0], result[1]))

cost = 1.1840, acc = 50.0%
cost = 1.1826, acc = 50.0%
cost = 1.1814, acc = 50.0%
cost = 1.1803, acc = 50.0%
cost = 1.1794, acc = 50.0%
cost = 1.1786, acc = 50.0%
cost = 1.1779, acc = 50.0%
cost = 1.1773, acc = 50.0%
cost = 1.1768, acc = 50.0%
cost = 1.1763, acc = 50.0%
cost = 1.1759, acc = 50.0%
cost = 1.1756, acc = 50.0%
cost = 1.1753, acc = 50.0%
cost = 1.1751, acc = 50.0%
cost = 1.1749, acc = 50.0%
cost = 1.1747, acc = 50.0%
cost = 1.1746, acc = 50.0%
cost = 1.1744, acc = 50.0%
cost = 1.1743, acc = 50.0%
cost = 1.1743, acc = 50.0%
cost = 1.1742, acc = 50.0%
cost = 1.1741, acc = 50.0%
cost = 1.1741, acc = 50.0%
cost = 1.1740, acc = 50.0%
cost = 1.1740, acc = 50.0%
cost = 1.1739, acc = 50.0%
cost = 1.1739, acc = 50.0%
cost = 1.1739, acc = 50.0%
cost = 1.1739, acc = 50.0%
cost = 1.1738, acc = 50.0%
cost = 1.1738, acc = 50.0%
cost = 1.1738, acc = 50.0%
cost = 1.1738, acc = 50.0%
cost = 1.1738, acc = 50.0%
cost = 1.1738, acc = 50.0%
cost = 1.1738, acc = 50.0%
cost = 1.1738, acc = 50.0%
c