Hello, TensorFlow! 
 Building and training your first TensorFlow graph from the ground up

Oriole Online Tutorial by Aaron Schumacher, September 8, 2016

https://www.safaribooksonline.com/oriole/
       

Google open-sourced TensorFlow package in November 2015.

Tensors: matrix, multi-dimensional approach to numerical computation, focus is on weights rather than neurons

Flows:  manages computations, graph with nodes and edges where data moves through the graph and computation happens at the nodes
    
Easy to put computation on to high performance GPUs (parallel processing, distributed computation)

Tensor boards:  system for logging and visualizing what you are doing; built-in easy to get statistics (less manual like in sci-kit learn)
    
High performance serving infrastructure to serve TensorFlow models

In [1]:
# Name managed by tensorflow system
foo = []
bar = foo
#  these are not 2 different lists; 2 names point to the same list.
#  an object in python has no name
# The variable names in Python code aren't what they represent; 
# they're just pointing at the same object.
# name is separate from the object itself. Name points to object.

print('`foo == bar` evaluates to: {}'.format(foo == bar))
print('`foo is bar` evaluates to: {}'.format(foo is bar))

`foo == bar` evaluates to: True
`foo is bar` evaluates to: True


In [3]:
# You can also see that id(foo) and id(bar) are the same.
print('`id(foo)` evaluates to: {}'.format(id(foo)))
print('`id(bar)` evaluates to: {}'.format(id(bar)))

# This identity, especially with mutable data structures like lists, 
# can lead to surprising bugs when it's misunderstood.

`id(foo)` evaluates to: 4367939528
`id(bar)` evaluates to: 4367939528


In [2]:
# Take a list and append it inside of itself
# How should python interpret this? Infinite recurrent data structure.
# Continues to point to itself. Python gives up.
# Imagine this as a graph. 

foo.append(bar)
print('Now Python displays `foo` as: {}'.format(foo))

# It keeps repeating and can't show you the whole thing

Now Python displays `foo` as: [[...]]


TensorFlow separates the definition of computations from their execution 
even further by having them happen in separate places: a graph defines the
operations, but the operations only happen within a session. Graphs and 
sessions are created independently. A graph is like a blueprint and a 
session is like a construction site.

Back to our plain Python example, recall that foo and bar refer to the same list. By appending bar into foo, we've put a list inside itself. You could think of this structure as a graph with one node, pointing to itself. Nesting lists is one way to represent a graph structure like a TensorFlow computation graph.

In [4]:
import tensorflow as tf

At this point TensorFlow has already started managing a lot of state for 
us. There's already an implicit default graph, for example. Internally, 
the default graph lives in the _default_graph_stack, but we don't have 
access to that directly. We use tf.get_default_graph().

In [5]:
# Assign name so that we can refer to it.
graph = tf.get_default_graph()

The nodes of the TensorFlow graph are called “operations” or “ops”. We can 
see what operations are in the graph with graph.get_operations().

In [6]:
graph.get_operations()
# Nothing has happened because we did not put anything into the graph.

[]

Currently, there isn't anything in the graph. We’ll need to put everything 
we want TensorFlow to compute into that graph. Let's start with a simple 
constant input value of one.

In [7]:
# Assign name so that we can refer to it.
input_value = tf.constant(1.0)

In [8]:
operations = graph.get_operations()
print('`operations` is now: {}'.format(operations))

`operations` is now: [<tensorflow.python.framework.ops.Operation object at 0x10ffdae80>]


TensorFlow uses protocol buffers internally. (Protocol buffers are sort of 
like a Google-strength JSON.) Printing the node_def for the constant operation above shows what's in TensorFlow's protocol buffer representation for the number one.

In [9]:
operations[0].node_def
# double float data type; json format

name: "Const"
op: "Const"
attr {
  key: "dtype"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "value"
  value {
    tensor {
      dtype: DT_FLOAT
      tensor_shape {
      }
      float_val: 1.0
    }
  }
}

People new to TensorFlow sometimes wonder why there's all this fuss about 
making “TensorFlow versions” of things. Why can't we just use a normal 
Python variable without also defining a TensorFlow object? One of the 
TensorFlow tutorials has an explanation:

To do efficient numerical computing in Python, we typically use libraries 
like NumPy that do expensive operations such as matrix multiplication 
outside Python, using highly efficient code implemented in another 
language. Unfortunately, there can still be a lot of overhead from 
switching back to Python every operation. This overhead is especially bad 
if you want to run computations on GPUs or in a distributed manner, where 
there can be a high cost to transferring data.

TensorFlow also does its heavy lifting outside Python, but it takes things 
a step further to avoid this overhead. Instead of running a single expensive operation independently from Python, TensorFlow lets us describe a graph of interacting operations that run entirely outside Python. This approach is similar to that used in Theano or Torch.

TensorFlow can do a lot of great things, but it can only work with what's 
been explicitly given to it. This is true even for a single constant.

In [10]:
# If we inspect our input_value, we see it is a constant 32-bit float tensor 
# of no dimension: just one number.

input_value

# It doesn't tell you the value because you did not ask it to evaluate it

<tf.Tensor 'Const:0' shape=() dtype=float32>

Note that this doesn't tell us what that number is. To evaluate input_value and get a numerical value out, we need to create a “session” where graph operations can be evaluated and then explicitly ask to evaluate or “run” input_value. (The session picks up the default graph by default.)

In [11]:
# A session is a place for where computation can happen
# Graph is like a blueprint
sess = tf.Session()  # session is like the construction site where you put things together
print('`input_value` evaluates to: {}'.format(sess.run(input_value)))

`input_value` evaluates to: 1.0


TensorFlow is managing its own space of things—the computational graph—
and it has its own method of evaluation.

Now that we have a session with a simple graph, let's build a neuron with 
just one parameter, or weight. Often, even simple neurons also have a bias 
term and a non-identity activation function, but we'll leave these out.

The neuron's weight isn't going to be constant; we expect it to change in 
order to learn, based on the “true” input and output we use for training. 
The weight will be a TensorFlow variable. We'll give that variable a 
starting value of 0.8.

In [12]:
# The Simplest TensorFlow Neuron
weight = tf.Variable(0.8)

You might expect that adding a variable would add one operation to the 
graph, but in fact that one line adds four operations.

In [13]:
for op in graph.get_operations(): 
    print(op.name)

Const
Variable/initial_value
Variable
Variable/Assign
Variable/read


In [14]:
output_value = weight * input_value

Now there are six operations in the graph, and the last one is that multiplication.

In [15]:
op = graph.get_operations()[-1]
print('The `op.name` is: {}'.format(op.name))

The `op.name` is: mul


In [16]:
for op_input in op.inputs: 
    print(op_input)

Tensor("Variable/read:0", shape=(), dtype=float32)
Tensor("Const:0", shape=(), dtype=float32)


This shows how the multiplication operation tracks where its inputs come 
from: they come from other operations in the graph. To understand a whole 
graph, following references this way quickly becomes tedious for humans. 
TensorBoard graph visualization is designed to help.

How do we find out what the product is? We have to “run” the output_value 
operation. But that operation depends on a variable: weight. We told TensorFlow that the initial value of weight should be 0.8, but the value hasn't yet been set in the current session. The tf.initialize_all_variables() function generates an operation which will initialize all our variables (in this case just one) and then we can run that operation.

In [17]:
init = tf.initialize_all_variables()
sess.run(init)

The result of tf.initialize_all_variables() will include initializers for 
all the variables currently in the graph, so if you add more variables 
you'll want to use tf.initialize_all_variables() again; a stale init 
wouldn't include the new variables. Calling tf.initialize_all_variables()
many times won't hurt anything in the examples here, but it does add to 
the graph each time it is called.

In [18]:
# Now we're ready to run the output_value operation.
sess.run(output_value)

# Why isn't it 0.8 exactly? Why the extras?
# TensorFlow cares about types. 
# Recall that's 0.8 * 1.0 with 32-bit floats, and 32-bit floats have a hard time with 0.8; 
# 0.80000001 is as close as they can get.

0.80000001

In [None]:
# See your graph in Tensorboard


When using the default graph, you may want to clear it out from time to 
time rather than just keep adding more and more things to it. Here we 
reset the graph and start a new session using that new empty graph.


In [19]:
tf.reset_default_graph()
sess = tf.Session()

Up to this point, the graph has been simple, but it would already be nice 
to see it represented in a diagram. We'll use TensorBoard to generate that 
diagram. TensorBoard reads the name field that is stored inside each 
operation (quite distinct from Python variable names). We can use these 
TensorFlow names and switch to more conventional Python variable names. 
Using tf.mul here is equivalent to our earlier use of just * for 
multiplication, but it lets us set the name for the operation.

In [21]:
x = tf.constant(1.0, name='input')
w = tf.Variable(0.8, name='weight')
y = tf.mul(w, x, name='output')

TensorBoard works by looking at a directory of output created from TensorFlow sessions. Since TensorFlow will keep appending to an existing directory of output, which can lead to confusion in interpreting results, you'll usually want to ensure you're starting from an empty directory. If you're running locally, you may want to run rm -rf log_simple_graph at a shell prompt (not in the Python interpreter) or in a Jupyter notebook with a leading exclamation point, like !rm -rf log_simple_graph. In this Oriole, we'll just show the relevant visualizations from TensorBoard.

The log output for TensorBoard can be written with a SummaryWriter, and if 
we do nothing aside from creating one with a graph, it will just write out 
that graph.

In [23]:
# The first argument when creating the SummaryWriter is an output directory name, 
# which will be created if it doesn't exist.
summary_writer = tf.train.SummaryWriter('log_simple_graph', sess.graph)

If you're running locally, you can start up TensorBoard at the command line by running tensorboard --logdir=log_simple_graph. This runs a local web app on port 6006. (“6006” is “goog” upside-down.) After startup, go in a web browser to http://localhost:6006/#graphs to see the interface. You'll see a diagram of the graph you created in TensorFlow, as in Figure 3.

Now that we’ve built our neuron, how does it learn? We set up an input value of 1.0. Let's say the correct output value is zero. That is, we have a very simple “training set” of just one example with one feature, which has the value one, and one label, which is zero. We want the neuron to learn the function taking one to zero.

Currently, the system takes the input one and returns 0.8, which is not 
correct. We need a way to measure how wrong the system is. We'll call 
that measure of wrongness the “loss” and give our system the goal of 
minimizing the loss. If the loss can be negative, then minimizing it 
could be silly, so let's make the loss the square of the difference 
between the current output and the desired output.

In [24]:
y_ = tf.constant(0.0)
loss = (y - y_)**2  # squared loss

So far, nothing in the graph does any learning. For that, we need an 
optimizer. We'll use a gradient descent optimizer so that we can update 
the weight based on the derivative of the loss. The optimizer takes a 
learning rate to moderate the size of the updates, which we'll set at 
0.025.

In [25]:
optim = tf.train.GradientDescentOptimizer(learning_rate=0.025)

The optimizer is remarkably clever. It can automatically work out and apply the appropriate gradients through a whole network, carrying out the backward step for learning.

Let's see what the gradient looks like for our simple example.

In [26]:
grads_and_vars = optim.compute_gradients(loss)
sess.run(tf.initialize_all_variables())
print('The gradient is: {}'.format(sess.run(grads_and_vars[0][0])))

The gradient is: 1.600000023841858


Why is the value of the gradient 1.6? Our loss is error squared, and the 
derivative of that is two times the error. Currently the system says 0.8 
instead of 0, so the error is 0.8, and two times 0.8 is 1.6. It's working!

For more complex systems, it will be very nice indeed that TensorFlow 
calculates and then applies these gradients for us automatically.

Let's apply the gradient, finishing the backpropagation.

In [27]:
sess.run(optim.apply_gradients(grads_and_vars))
print('The new value of the weight is: {}'.format(sess.run(w)))

The new value of the weight is: 0.7599999904632568


The weight decreased by 0.04 because the optimizer subtracted the gradient 
times the learning rate, 1.6 * 0.025, pushing the weight in the right 
direction.

Instead of hand-holding the optimizer like this, we can make one operation 
that calculates and applies the gradients: the train_step.

In [28]:
train_step = tf.train.GradientDescentOptimizer(0.025).minimize(loss)

for i in range(100):
    sess.run(train_step)

print('The neuron now outputs: {}'.format(sess.run(y)))

The neuron now outputs: 0.004499601200222969


Running the training step many times, the weight and the output value are 
now very close to zero. The neuron has learned!

TRAINING DIAGNOSTICS IN TENSORBOARD

We may be interested in what's happening during training. Say we want to 
follow what our system is predicting at every training step. We could 
print from inside the training loop.

In [29]:
sess.run(tf.initialize_all_variables())

for i in range(100):
    print('before step {}, y is {}'.format(i, sess.run(y)))
    sess.run(train_step)

before step 0, y is 0.800000011920929
before step 1, y is 0.7599999904632568
before step 2, y is 0.722000002861023
before step 3, y is 0.6858999729156494
before step 4, y is 0.651604950428009
before step 5, y is 0.6190246939659119
before step 6, y is 0.5880734324455261
before step 7, y is 0.5586697459220886
before step 8, y is 0.5307362675666809
before step 9, y is 0.5041994452476501
before step 10, y is 0.47898948192596436
before step 11, y is 0.45504000782966614
before step 12, y is 0.4322880208492279
before step 13, y is 0.4106736183166504
before step 14, y is 0.39013993740081787
before step 15, y is 0.37063294649124146
before step 16, y is 0.35210129618644714
before step 17, y is 0.33449622988700867
before step 18, y is 0.31777140498161316
before step 19, y is 0.3018828332424164
before step 20, y is 0.2867887020111084
before step 21, y is 0.272449254989624
before step 22, y is 0.2588267922401428
before step 23, y is 0.2458854466676712
before step 24, y is 0.23359116911888123
before

This works, but there are some problems. It's hard to understand a list of 
numbers. A plot would be better; and even with only one value to monitor, 
there's too much output to read. We're likely to want to monitor many 
things. It would be nice to record everything in some organized way.

Luckily, the same system that we used earlier to visualize the graph also 
has just the mechanisms we need.

We instrument the computation graph by adding operations that summarize 
its state. Here, we'll create an operation that reports the current value 
of y, the neuron's current output.

In [30]:
summary_y = tf.scalar_summary('output', y)

When you run a summary operation, it returns a string of protocol buffer 
text that can be written to a log directory with a SummaryWriter.

Again, if running locally you'll want to ensure that you start with an 
empty output directory by running rm -rf log_simple_stat.

In [31]:
summary_writer = tf.train.SummaryWriter('log_simple_stat')

sess.run(tf.initialize_all_variables())

for i in range(100):
    summary_str = sess.run(summary_y)
    summary_writer.add_summary(summary_str, i)
    sess.run(train_step)

Now TensorBoard can visualize the logged results. If running locally, 
you would run tensorboard --logdir=log_simple_stat and then go to 
http://localhost:6006/#events to see the interface. 
       

FLOWING ONWARD

Once again to freshen things, we can get a new empty default graph.

In [32]:
tf.reset_default_graph()

If running locally, it would be a good time to also clear out the output 
directory: rm -rf log_simple_stats.

Here's a final version of the code. It's fairly minimal, with every part 
showing useful (and understandable) TensorFlow functionality.

In [33]:
import tensorflow as tf

x = tf.constant(1.0, name='input')
w = tf.Variable(0.8, name='weight')
y = tf.mul(w, x, name='output')
y_ = tf.constant(0.0, name='correct_value')
loss = tf.pow(y - y_, 2, name='loss')
train_step = tf.train.GradientDescentOptimizer(0.025).minimize(loss)

for value in [x, w, y, y_, loss]:
    tf.scalar_summary(value.op.name, value)

summaries = tf.merge_all_summaries()

sess = tf.Session()
summary_writer = tf.train.SummaryWriter('log_simple_stats', sess.graph)

sess.run(tf.initialize_all_variables())
for i in range(100):
    summary_writer.add_summary(sess.run(summaries), i)
    sess.run(train_step)

Running locally, you could start up TensorBoard with 
tensorboard --logdir=log_simple_stats and then go to 
http://localhost:6006/#events to see the interface. You would see plots 
for all five instrumented scalars.

The example we just ran through is even simpler than the ones that 
inspired it in Michael Nielsen's Neural Networks and Deep Learning. 
For myself, seeing details like these helps with understanding and 
building more complex systems that use and extend from simple building 
blocks. Part of the beauty of TensorFlow is how flexibly you can build 
complex systems from simpler components.