In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

<h1>Why Tensorflow?</h1>

<h2>Python's limitations</h2>
<p style="font-size:20px">Python is what's known as a dynamically typed programming language. This means that while your program is running, it stores in memory (alongside each variable) information about a variable's type. When you run x + y in Python, for example, Python's runtime looks up x's type, y's type, then figures out how to add those two things together, or throws an error if it doesn't know how. (For example, try adding a list and a string.) This is called a runtime type check.

<p style="font-size:20px">This automatic bookkeeping makes Python a pleasure to use, but also leads to inefficiencies. If we store a long list of numbers, for instance, we must allocate memory not just for the data itself but for each number's metadata (type information). If we then want to sum the list, using a loop, for instance, we need to do type checks for every addition operation we perform. This makes pure Python nearly unuseable for processing large datasets.

<h2>Numpy for fast arithmetic</h2>
<p style="font-size:20px">That's where numpy comes in. In pure Python, a list is a Python object that holds other Python objects, each with its own metadata. The numpy package, on the other hand, exposes a new kind of list, called a numpy array, that holds values all of the same type. Since a numpy array must hold values only of one type, we can store the metadata once for the whole array, instead of separately for each element.

<p style="font-size:20px">Furthermore, since numpy array elements are all of one type, they are all guaranteed to be the same size in memory, which allows us to store them more compactly and access them more quickly. (In pure Python, if you stored all elements "next to each other" in memory, it would be costly to calculate, say, where in memory the 100th item started, as this would depend on the sizes of each previous object. So Python actually stores elements all over the place in memory, then keeps an "index" of the memory locations of each element of the array. This means to sum 100 elements, Python needs to look in the index 100 times and go all over your RAM to find the numbers you want to add. Numpy just stores the 100 items in a row, and since they're all the same size, it's easy to calculate where the 5th or 100th or 1000th item is pretty much instantly.)

<p style="font-size:20px">In addition to this compact array type, numpy provides a number of operations, implemented in C, that manipulate these arrays, taking advantage of their compact representation. Arrays can be multidimensional, so when we talk about operations on arrays, that includes what you might think of as matrix operations (like fast matrix multiplication) too.

<p style="font-size:20px">numpy is wonderful, and enables Python programmers to work efficiently with vast amounts of data. It is the foundation of higher-level packages like pandas and scipy. But numpy's design choices make certain tradeoffs; tensorflow makes a different set of choices and accordingly has a different set of tradeoffs.

<h2>Downsides of numpy</h2>
<p style="font-size:20px">Even though single numpy operations are blazing-fast, when composing numpy operations, things can get a bit slower, because between each operation, you still have to go back and forth between Python's intepreter and numpy's C code. Especially when inspecting numpy values using, say, print, or logging them to files, you incur a cost, because a full translation must be made between the numpy value you are converting and the corresponding pure Python type that can interact with other Python code.

<p style="font-size:20px">The second downside of numpy really applies only to deep learning applications. The classic algorithm for training a deep model is (stochastic) gradient descent using backpropagation, which requires taking lots of derivatives. Before TensorFlow and other similar libraries, programmers manually (i.e., using pen and paper) did the calculus, deriving the symbolic gradient of the function to be minimized, then writing special code to take partial derivatives at an arbitrary input point. This is mechanical work that a computer should be able to do automatically. But numpy's structure does not provide an easy way of computing these derivatives automatically. Why? Automatically computing the derivative of some formula requires having some representation of that formula in memory. But when you run numpy operations, they simply execute and return their results; no trace is left of the steps used to get from first input to final output. There is no easy way to go back and compute derivatives later on in a program.

<h2>Tensorflow</h2>
<p style="font-size:20px">TensorFlow solves both these problems with the idea of a computation graph. Unlike in numpy, where data goes back and forth between Python and C for each operation, in TensorFlow, the programmer creates, in pure Python, an object-oriented representation of the entire computation she wishes to perform. This representation is called a "graph," in the graph theory sense: the nodes in the graph are operations, and the edges represent the data flowing from one operation to the next. Building the graph is like writing down a formula. No data is actually being processed. As such, all of TensorFlow's graph-building functions are lightweight and fast, simply creating a description of computation in memory. Once the graph is complete, it is sent to TensorFlow's low-level execution algorithm. That algorithm, written (like much of numpy) in C and C++, performs all the requested operations at once, then returns any values of interest (as specified by the user) back to the Python world.

<p style="font-size:20px">Because an entire graph of computation is processed at once, there is much less shuttling back and forth between one representation and another. And because the computation graph is essentially an in-memory record of every step used to compute each value in your program, TensorFlow is able to do the necessary calculus automatically, computing gradients based on the structure of that graph.

<h1>TensorFlow Default Graph</h1>

<p style="font-size:20px">Let's get started by importing Tensorflow.

In [1]:
import tensorflow as tf

<p style="font-size:20px">We can access the default graph explicitly using tf.get_default_graph:

In [2]:
g = tf.get_default_graph()
g

<tensorflow.python.framework.ops.Graph at 0x7f38097c5320>

<p style="font-size:20px">It is currently empty. We can check this fact by listing the operations(nodes) in the graph:

In [3]:
g.get_operations()

[]

<p style="font-size:20px">Let's start adding some operations to g. An operation is a node of the computation graph. It contains only some light metadata, like "I am an addition operation, and my inputs come from these two other operations." Although Python Operation objects don't actually do anything, we usually think of an operation in terms of what it will cause the execution engine to do after the graph has been completely built and is handed over to TensorFlow to run.

<p style="font-size:20px">Every operation takes in some number of inputs (0 or more), and produces 0 or more outputs. Its outputs can become the inputs to other operations. Executing an operation can also have side effects, like printing something to the console, recording data to a file, or modifying a variable in memory. Again, all this computation happens after the graph has been completely built. The Operation object itself is simply a description of computation that will take place.

<p style="font-size:20px">Perhaps the simplest operation we can create is constant. It has zero inputs and one output. When we create a constant operation, we define what that constant output will be (this is stored as metadata on the Operation object we create). TensorFlow's tf.constant function creates a constant operation and adds it to the default graph:

<h1>Constants</h1>
<p style="font-size:20px">Constant can be created using <b>tf.constant()</b> function.

In [4]:
x = tf.constant([1,2,3,4])

<p style="font-size:20px">The most important thing to understand is that this code does not actually assign the values to "x". If we print "x", we will see it shows the type of "x" is a constant Tensor with shape of 4. 

In [5]:
print(x)

Tensor("Const:0", shape=(4,), dtype=int32)


<p style="font-size:20px">Now let's check the operation in the graph again.

In [6]:
g.get_operations()

[<tf.Operation 'Const' type=Const>]

<p style="font-size:20px">g now has a Const operation! Note that tf.constant affected the graph g, even though we didn't explicitly say we wanted the constant operation to be added to g. It is possible to add operations to a specific, non-default graph, but most of the time, we add directly to the default graph, using functions like <b>tf.constant</b>. In fact, we generally don't even call get_default_graph to give g a name; we just use it implicitly.

<p style="font-size:20px">Let's examine the constant operation we created. We can use the <b>inputs</b> and <b>outputs</b> attributes of the operation to confirm that there really are zero inputs and one output.

In [7]:
const_operation = g.get_operations()[0]
len(const_operation.inputs),len(const_operation.outputs)

(0, 1)

<p style="font-size:20px">Those inputs and outputs are of type <b>Tensor</b>

In [8]:
const_tensor = const_operation.outputs[0]
const_tensor

<tf.Tensor 'Const:0' shape=(4,) dtype=int32>

<p style="font-size:20px">A <b>Tensor</b> is a lightweight Python object that represents a piece of data flowing along the edges of our graph. That data can be a scalar, a vector, a matrix, or a higher-dimensional array. The Tensor's data is not actually stored inside the Tensor object, and in fact does not exist in Python at all. A Tensor is just a lightweight way to reference a piece of data that will be computed by Tensorflow's execution engine.

<p style="font-size:20px">Now let's create second constant $y$:

In [9]:
y = tf.constant([5,6,7,8])
print(y)

Tensor("Const_1:0", shape=(4,), dtype=int32)


<p style="font-size:20px">Let's check the operations in the graph again:

In [10]:
g.get_operations()

[<tf.Operation 'Const' type=Const>, <tf.Operation 'Const_1' type=Const>]

<p style="font-size:20px">Now there are two operations in the graph. TensorFlow has named them <b>'Const'</b> and <b>'Const_1'</b>, but you can also give them names yourself by passing a name keyword argument to the <b>tf.constant</b>. For example:

In [11]:
z = tf.constant([9,10,11,12],name='z')
g.get_operations()

[<tf.Operation 'Const' type=Const>,
 <tf.Operation 'Const_1' type=Const>,
 <tf.Operation 'z' type=Const>]

<h1>Session</h1>
<p style="font-size:20px">To show the value of $x,y$ and $z$, we need to execute computation graph by creating a TensorFlow Session.  A TensorFlow Session is basically the backbone of a tensorflow program. The actual operation happens after the session is run. A session fires up the program to get the constants ready and perform the desired operation. It can be done in the following way:

In [26]:
sess = tf.Session()
print(sess.run(x))
print(sess.run(y))
print(sess.run(z))
sess.close()

[1 2 3 4]
[5 6 7 8]
[ 9 10 11 12]


<p style="font-size:20px">We define a session $sess$ by using tf.Session() function. This fires up our program and we are ready to calculate and display the result. We do that by using session.run($x$) function inside the print() function. Last, we need to close the session by using sess.close() function.
<p style="font-size:20px">Note that there is no memorization. Each time you call <b>sess.run</b>, everything is computed anew. Because of this, if you want to fetch more than one tensor, it's more efficient to fetch them all in one go, by passing a list to <b>sess.run</b>. YOu can also pass a dictionary, a tuple, a named tuple, or nested bombinations of these data strctures.

In [27]:
sess = tf.Session()
print(sess.run([x,y,z]))#pass a list
print(sess.run({'x':x,'y':y,'z':z}))#pass a dictionary
sess.close()

[array([1, 2, 3, 4], dtype=int32), array([5, 6, 7, 8], dtype=int32), array([ 9, 10, 11, 12], dtype=int32)]
{'x': array([1, 2, 3, 4], dtype=int32), 'y': array([5, 6, 7, 8], dtype=int32), 'z': array([ 9, 10, 11, 12], dtype=int32)}


<p style="font-size:20px">Note: We see we need to close the session everytime. we can use python "with" statement to make our lives easier. It will automatically close the Session:

In [None]:
with tf.Session() as sess:
    sess.run([x,y,z])

<h2>Math Operations</h2>
<p style="font-size:20px">Now we want to add up $x$ and $y$.

<p style="font-size:20px">We use tf.add() which is a TensorFlow Math Operation function to add them together and store in $result$ variable.

In [28]:
result = tf.add(x,y)

In [29]:
print(result)

Tensor("Add:0", shape=(4,), dtype=int32)


<p style="font-size:20px">Check the graph operation again we will see there is <b>'Add'</b> operation:

In [31]:
g.get_operations()

[<tf.Operation 'Const' type=Const>,
 <tf.Operation 'Const_1' type=Const>,
 <tf.Operation 'z' type=Const>,
 <tf.Operation 'Add' type=Add>]

<p style="font-size:20px">Like before, we need to use TensorFlow Session to perform the computation and get the result:

In [7]:
sess = tf.Session()
print(sess.run(result))
sess.close()

[ 6  8 10 12]


<p style="font-size:20px">Now, what happen if we want to change the value of $x$ and recompute the addition result? Let's re-assign value of $x$ and update <b>result</b>:

In [42]:
x = tf.constant([-1,-2,-3,-4])
result = tf.add(x,y)
sess = tf.Session()
print(sess.run(result))
sess.close()

[4 4 4 4]


<p style="font-size:20px">We see the output is correct. However, let's check the operations in the graph:

In [43]:
g.get_operations()

[<tf.Operation 'Const' type=Const>,
 <tf.Operation 'Const_1' type=Const>,
 <tf.Operation 'z' type=Const>,
 <tf.Operation 'Add' type=Add>,
 <tf.Operation 'Const_2' type=Const>,
 <tf.Operation 'Const_3' type=Const>,
 <tf.Operation 'Add_1' type=Add>]

<p style="font-size:20px">We see there are two addition opereations: <b>'Add'</b> and <b>'Add_1'</b>: The first one is we defined at the beginning 

In [44]:
add_operation = g.get_operations()[3]#first add operation

In [47]:
print(add_operation.inputs[0])
print(add_operation.inputs[1])

Tensor("Const:0", shape=(4,), dtype=int32)
Tensor("Const_1:0", shape=(4,), dtype=int32)
Tensor("Add:0", shape=(4,), dtype=int32)


<p style="font-size:20px">The second addition operation is the new one:

In [49]:
add_operation_2 = g.get_operations()[6]
print(add_operation_2.inputs[0])
print(add_operation_2.inputs[1])

Tensor("Const_3:0", shape=(4,), dtype=int32)
Tensor("Const_1:0", shape=(4,), dtype=int32)


<p style="font-size:20px">We see every time we change the value of $x$ and update result, we will have two more operations (one for the constant and the other for the add operation). It will make our graph operations very messy when we want to change $x$ frequently. Fortunately, TensorFlow Placeholder saves us.

<h1>Placeholders</h1>
<p style="font-size:20px">A placeholder is simply a variable that we will assign value to at a later time. A placeholder operation, just like a constant, has 0 inputs and 1 output. However, instead of fixing the output value when you define the operation, we pass the placeholder's value to sess.run when executing the graph. This allows us to execute the same graph multiple times with different placeholder values.

<p style="font-size:20px">Let's reset(clean up) our default graph by:

In [56]:
tf.reset_default_graph()
g = tf.get_default_graph()

<p style="font-size:20px">Let's create a placeholder, passing in the type of the value we'd like it to hold. Here we use "float". Like tf.constant, tf.placeholder returns a tensor.

In [59]:
x = tf.placeholder("float",None)
print(x)

Tensor("Placeholder:0", dtype=float32)


In [60]:
y = tf.placeholder("float",None)
print(y)

Tensor("Placeholder_1:0", dtype=float32)


In [61]:
result = tf.add(x,y)

<p style="font-size:20px">Now let's check the graph operation:

In [62]:
g.get_operations()

[<tf.Operation 'Placeholder' type=Placeholder>,
 <tf.Operation 'Placeholder_1' type=Placeholder>,
 <tf.Operation 'Add' type=Add>]

<p style="font-size:20px">Let's run the session with a feed dictionary <b>feed_dict</b> that provides the values of $x$ and $y$ you want to use for addition result. Basically, in the sess.run(), the first argument is the <b>Output</b> (result) and second argument is <b>Input</b> ($x$ and $y$).

In [65]:
sess = tf.Session()
print(sess.run(result, feed_dict = {x:[1,2,3,4],y:[5,6,7,8]}))
print(sess.run(result, feed_dict = {x:[-1,-2,-3,-4],y:[5,6,7,8]}))
print(sess.run(result, feed_dict = {x:[1,2,3,4],y:[-5,-6,-7,-8]}))
sess.close()

[ 6.  8. 10. 12.]
[4. 4. 4. 4.]
[-4. -4. -4. -4.]


<p style="font-size:20px">Now, if we check the graph operations again, it will not increase other operations

In [66]:
g.get_operations()

[<tf.Operation 'Placeholder' type=Placeholder>,
 <tf.Operation 'Placeholder_1' type=Placeholder>,
 <tf.Operation 'Add' type=Add>]

<h1>tf.Variable</h1>

<p style="font-size:20px">Like constants and placeholders, variable operations take 0 inputs and produce 1 output; the big difference is that a variable is mutable and persistent across runs of your graph (within a session). Whereas a constant's value is fixed when creating the graph, and a placeholder's value is set anew each time you call sess.run, a variable's value is set or changed while the graph is running, by side-effect-ful "assign" operations, and remembered even after sess.run is finished. (That memory is in the Session object, which manages stateful components like variables. Calling sess.close is necessary to free that memory.)

<p style="font-size:20px">In general, we will use tf.Variable to hold and update parameters (such as weights, bias in neural network). tf.Variable must be explicitly initialized and can be saved to disk during and after training. You can later restore saved values to exercise or analyze your model.

<p style="font-size:20px">Let's do an example, suppose we want to initialize a 2x2 weight matrix $W$, we can do the following:

In [17]:
W = tf.Variable(tf.random_uniform((2,2)),name='W')

In [19]:
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

In [20]:
sess.run(W)

array([[0.9528363 , 0.13091397],
       [0.06521881, 0.74127674]], dtype=float32)

In [21]:
sess.close()