# Overview

TensorFlow is a software library created by researchers at Google for numerical computation using data flow graphs, a programming paradigm that models a program as a directed graph of the data flowing between operations. 
TensorFlow was and open sourced under the Apache 2.0 License in November 2015. TensorFlow's flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.

TensorFlow provides multiple APIs. The lowest level API --TensorFlow Core-- provides you with complete programming control. We recommend TensorFlow Core for machine learning researchers and others who require fine levels of control over their models. The higher level APIs are built on top of TensorFlow Core. These higher level APIs are typically easier to learn and use than TensorFlow Core.

In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf

### What is a tensor?

The central unit of data in TensorFlow is the tensor. A tensor consists of a set of primitiv values shaped into an array of any number of dimensions. A tensor's rank is its number of dimensions.  Here are some examples of tensors:

* 3                                 # a rank 0 tensor; a scalar with shape []
* [1., 2., 3.]                      # a rank 1 tensor; a vector with shape [3]
* [[1., 2., 3.], [4., 5., 6.]]      # a rank 2 tensor; a matrix with shape [2, 3]
* [[[1., 2., 3.]], [[7., 8., 9.]]]  # a rank 3 tensor with shape [2, 1, 3]

### Computational graph

A computational graph is a series of TensorFlow operations arranged into a graph of nodes. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them.  Each node takes zero or more tensors as inputs and produces a tensor as an output

One type of node is a constant. Like all TensorFlow constants, it takes no inputs, and it outputs a value it stores internally. Let's build a simple computational graph. We can create two floating point Tensors node1 and node2 as follows:

In [2]:
node1 = tf.constant(3.0, dtype=tf.float32)
node2 = tf.constant(4.0) # also tf.float32 implicitly
print(node1, node2)

Tensor("Const:0", shape=(), dtype=float32) Tensor("Const_1:0", shape=(), dtype=float32)


A constant's value is stored in the graph and replicated wherever the graph is loaded. A variable is stored separately, and may live on a parameter server. constants are stored in the graph definition. When constants are memory expensive, such as a weight matrix with millions of entries, it will be slow each time you have to load the graph. 

### Sessions

To evaluate the node, we must run the computational graph within a session. A session encapsulates the control and state of the TensorFlow runtime.  The following code creates a Session object and then invokes its run method to run enough of the computational graph to evaluate node1 and node2. By running the computational graph in a session as follows:

In [3]:
sess = tf.Session()

print(sess.run([node1, node2]))

[3.0, 4.0]


As it stands, this graph is not especially interesting because it always produces a constant result. 

### Placeholders

With the graph assembled, we, or our clients, can later supply their own data when they need to execute the computation. To define a placeholder, we use:
A graph can be parameterized to accept external inputs, known as placeholders. A placeholder is a promise to provide a value later.

Dtype, shape, and name are self-explanatory. The only thing to note here is when you set the shape of the placeholder to None. shape=None means that tensors of any shape will be accepted. Using shape=None is easy to construct graphs, but nightmarish for debugging. You should always define the shape of your placeholders as detailed as possible. shape=None also breaks all following shape inference, which makes many ops not work because they expect certain rank. 



In [4]:
a = tf.placeholder(tf.float32) 
b = tf.placeholder(tf.float32) 
adder_node = a + b  # + provides a shortcut for tf.add(a, b)

The preceding three lines are a bit like a function or a lambda in which we define two input parameters (a and b) and then an operation on them. We can evaluate this graph with multiple inputs by using the feed_dict argument to the run method to feed concrete values to the placeholders:

In [5]:
print(sess.run(adder_node, {a: 3, b: 4.5}))
print(sess.run(adder_node, {a: [1, 3], b: [2, 4]}))

7.5
[ 3.  7.]



### Variables

In machine learning we will typically want a model that can take arbitrary inputs, such as the one above. 
To make the model trainable, we need to be able to modify the graph to get new outputs with the same input. 
Variables allow us to add trainable parameters to a graph.To declare a variable, you create an instance of the class tf.Variable with a type and initial value:

In [6]:
W = tf.Variable([.3], dtype=tf.float32) 
b = tf.Variable([-.3], dtype=tf.float32) 
x = tf.placeholder(tf.float32) 
linear_model = W*x + b

However, this old way is discouraged and TensorFlow recommends that we use the wrapper tf.get_variable, which allows for easy variable sharing. With tf.get_variable, we can provide variable’s internal name, shape, type, and initializer to give the variable its initial value. Note that when we use tf.constant as an initializer, we don’t need to provide shape.

tf.get_variable(
    name,
    shape=None,
    dtype=None,
    initializer=None,
    regularizer=None,
    trainable=True,
    collections=None,
    caching_device=None,
    partitioner=None,
    validate_shape=True,
    use_resource=None,
    custom_getter=None,
    constraint=None
)

### Closing a session

sessions can be closed according to:

In [7]:
sess.close()

### reseting the default graph

In [8]:
tf.reset_default_graph()

In [9]:
W = tf.get_variable("weight", initializer=tf.constant(.3)) 
b = tf.get_variable("bias", initializer=tf.constant([-.3]))
x = tf.placeholder(tf.float32) 
linear_model = W*x + b

### Initialize variables

Constants are initialized when you call tf.constant, and their value can never change. 
By contrast, variables are not initialized when you call tf.Variable. 
Although you can initialize individual variables or subset of variables, the easiest way is initialize all variables in a TensorFlow program at once with with:

It is important to realize init is a handle to the TensorFlow sub-graph that initializes all the global variables. Until we call sess.run, the variables are uninitialized.

In [10]:
sess = tf.Session()
init = tf.global_variables_initializer() 
sess.run(init)


If you try to evaluate the variables before initializing them you'll run into FailedPreconditionError: Attempting to use uninitialized value. 

Here are some ways to evaluate a variable:

In [20]:
# You can also get a variable’s value from tf.Variable.eval()
print(W.eval(sess))

# You can also get a variable’s value from tf.Variable.eval()
sess.run(W.assign(100))
print(W.eval(sess))

# another way to evaluate a variable
print(sess.run(W))

100.0
100.0
100.0


### Constant Ops

* create a tensor of shape and all elements are zeros: tf.zeros(shape, dtype=tf.float32, name=None)

* create a tensor of shape and type (unless type is specified) as the input_tensor but all elements are zeros: tf.zeros_like(input_tensor, dtype=None, name=None, optimize=True)

* tf.ones(shape, dtype=tf.float32, name=None)

* tf.ones_like(input_tensor, dtype=None, name=None, optimize=True)

* etc


### Random Constants

* tf.random_normal
* tf.truncated_normal
* tf.random_uniform
* tf.random_shuffle
* tf.random_crop
* tf.multinomial
* tf.random_gamma
* tf.set_random_seed

### Math ops

many math ops are similar to numpy. It is definitely worth checking out the documentation to make sure it performs what you think it's supposed to.  

For more details on various ops check out the API: https://www.tensorflow.org/api_docs/python/

### Feed dict for placeholders

Since x is a placeholder, we can evaluate linear_model for several values of x simultaneously as follows:

placeholder with no value. To supplement the value of placeholders, we use a feed_dict, which is basically a dictionary with keys being the placeholders, value being the values of those placeholders.

In [13]:
print(sess.run(linear_model, feed_dict={x: [1, 2, 3, 4]}))

[ 0.          0.30000001  0.60000002  0.90000004]


We've created a model, but we don't know how good it is yet. To evaluate the model on training data, we need a y placeholder to provide the desired values, and we need to write a loss function.

A loss function measures how far apart the current model is from the provided data. We'll use a standard loss model for linear regression, which sums the squares of the deltas between the current model and the provided data. linear_model - y creates a vector where each element is the corresponding example's error delta. We call tf.square to square that error. Then, we sum all the squared errors to create a single scalar that abstracts the error of all examples using tf.reduce_sum:

In [14]:
y = tf.placeholder(tf.float32) 
squared_deltas = tf.square(linear_model - y) 
loss = tf.reduce_sum(squared_deltas) 

print(sess.run(loss, {x: [1, 2, 3, 4], y: [0, -1, -2, -3]}))

23.66


We could improve this manually by reassigning the values of W and b to the perfect values of -1 and 1. A variable is initialized to the value provided to tf.Variable but can be changed using operations like tf.assign. For example, W=-1 and b=1 are the optimal parameters for our model. We can change W and b accordingly:

In [15]:
fixW = tf.assign(W, [-1.]) 
fixb = tf.assign(b, [1.]) 
sess.run([fixW, fixb]) 

print(sess.run(loss, {x: [1, 2, 3, 4], y: [0, -1, -2, -3]}))

0.0


We guessed the "perfect" values of W and b, but the whole point of machine learning is to find the correct model parameters automatically. We will show how to accomplish this in the next section.

### Optimizers, otherwise known as tf.train API

optimizer is an op whose job is to minimize loss. To execute this op, we need to pass it into the list of fetches of tf.Session.run(). When TensorFlow executes optimizer, it will execute the part of the graph that this op depends on. In this case, we see that optimizer depends on loss, and loss depends on inputs X,  Y, as well as two variables weights and bias. 
GradientDescentOptimizer means that our update rule is gradient descent. TensorFlow does auto differentiation for us, then update the values of w and b to minimize the loss. Autodiff is amazing!

Blog tutorial on autodiff: http://www.columbia.edu/~ahd2125/post/2015/12/5/


By default, the optimizer trains all the trainable variables its objective function depends on. If there are variables that you do not want to train, you can set the keyword trainable=False when you declare a variable. One example of a variable you don’t want to train is the variable global_step, a common variable you will see in many TensorFlow model to keep track of how many times you’ve run your model.


A complete discussion of machine learning is out of the scope of this tutorial. However, TensorFlow provides optimizers that slowly change each variable in order to minimize the loss function. The simplest optimizer is gradient descent. It modifies each variable according to the magnitude of the derivative of loss with respect to that variable. In general, computing symbolic derivatives manually is tedious and error-prone. Consequently, TensorFlow can automatically produce derivatives given only a description of the model using the function tf.gradients. For simplicity, optimizers typically do this for you. For example,

You can also ask your optimizer to take gradients of specific variables. You can also modify the gradients calculated by your optimizer.

In [None]:
# create an optimizer.
optimizer = GradientDescentOptimizer(learning_rate=0.1)

# compute the gradients for a list of variables.
grads_and_vars = optimizer.compute_gradients(loss, <list of variables>)

# grads_and_vars is a list of tuples (gradient, variable).  Do whatever you
# need to the 'gradient' part, for example, subtract each of them by 1.
subtracted_grads_and_vars = [(gv[0] - 1.0, gv[1]) for gv in grads_and_vars]

# ask the optimizer to apply the subtracted gradients.
optimizer.apply_gradients(subtracted_grads_and_vars)

You can also prevent certain tensors from contributing to the calculation of  the derivatives with respect to a specific loss with tf.stop_gradient. 

stop_gradient( input, name=None )

This is very useful in situations when you want to freeze certain variables during training. Here are some examples given by TensorFlow’s official documentation.
When you train a GAN (Generative Adversarial Network) where no backprop should happen through the adversarial example generation process.
The EM algorithm where the M-step should not involve backpropagation through the output of the E-step.

### gradients
The optimizer classes automatically compute derivatives on your graph, but you can explicitly ask TensorFlow to calculate certain gradients with tf.gradients.This method constructs symbolic partial derivatives of sum of ys w.r.t. x in xs. ys and xs are each a Tensor or a list of tensors. grad_ys is a list of Tensor, holding the gradients received by the ys. The list must be the same length as ys.

In [None]:
tf.gradients(
    ys,
    xs,
    grad_ys=None,
    name='gradients',
    colocate_gradients_with_ops=False,
    gate_gradients=False,
    aggregation_method=None,
    stop_gradients=None
)

Technical detail: This is especially useful when training only parts of a model. For example, we can use tf.gradients() for to take the derivative G of the loss w.r.t. to the middle layer. Then we use an optimizer to minimize the difference between the middle layer output M and M + G. This only updates the lower half of the network.

GradientDescentOptimizer is not the only update rule that TensorFlow supports. Here is the list of optimizers that TensorFlow supports, as of 1/17/2017. The names are self-explanatory. You can visit the official documentation for more details:

tf.train.Optimizer
tf.train.GradientDescentOptimizer
tf.train.AdadeltaOptimizer
tf.train.AdagradOptimizer
tf.train.AdagradDAOptimizer
tf.train.MomentumOptimizer
tf.train.AdamOptimizer
tf.train.FtrlOptimizer
tf.train.ProximalGradientDescentOptimizer
tf.train.ProximalAdagradOptimizer
tf.train.RMSPropOptimizer

http://ruder.io/optimizing-gradient-descent/

In [17]:
optimizer = tf.train.GradientDescentOptimizer(0.01) 
train = optimizer.minimize(loss) 

sess.run(init) # reset values to incorrect defaults. 
for i in range(1000):   
    sess.run(train, {x: [1, 2, 3, 4], y: [0, -1, -2, -3]}) 
print(sess.run([W, b]))

[array([-0.9999969], dtype=float32), array([ 0.99999082], dtype=float32)]


Now we have done actual machine learning! Although this simple linear regression model does not require much TensorFlow core code, more complicated models and methods to feed data into your models necessitate more code. Thus, TensorFlow provides higher level abstractions for common patterns, structures, and functionality. We will learn how to use some of these abstractions in the next section.

The complete program:

In [18]:
import tensorflow as tf
# Model parameters
W = tf.Variable([.3], dtype=tf.float32)
b = tf.Variable([-.3], dtype=tf.float32)
# Model input and output
x = tf.placeholder(tf.float32)
linear_model = W*x + b
y = tf.placeholder(tf.float32)
# loss
loss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares
# optimizer
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
# training data
x_train = [1, 2, 3, 4]
y_train = [0, -1, -2, -3]
# training loop
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init) # reset values to wrong
for i in range(1000):
  sess.run(train, {x: x_train, y: y_train})
# evaluate training accuracy
curr_W, curr_b, curr_loss = sess.run([W, b, loss], {x: x_train, y: y_train})
print("W: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss))

W: [-0.9999969] b: [ 0.99999082] loss: 5.69997e-11


Notice that the loss is a very small number (very close to zero). If you run this program, your loss may not be exactly the same as the aforementioned loss because the model is initialized with pseudorandom values.

In [27]:
tf.reset_default_graph()

In [28]:
# NumPy is often used to load, manipulate and preprocess data. 
import numpy as np 
import tensorflow as tf 

# Declare list of features. We only have one numeric feature. There are many 
# other types of columns that are more complicated and useful. 
feature_columns = [tf.feature_column.numeric_column("x", shape=[1])] 

# An estimator is the front end to invoke training (fitting) and evaluation 
# (inference). There are many predefined types like linear regression, 
# linear classification, and many neural network classifiers and regressors. 
# The following code provides an estimator that does linear regression. 
estimator = tf.estimator.LinearRegressor(feature_columns=feature_columns) 

# TensorFlow provides many helper methods to read and set up data sets. 
# Here we use two data sets: one for training and one for evaluation 
# We have to tell the function how many batches 
# of data (num_epochs) we want and how big each batch should be. 
x_train = np.array([1., 2., 3., 4.]) 
y_train = np.array([0., -1., -2., -3.]) 

x_eval = np.array([2., 5., 8., 1.]) 
y_eval = np.array([-1.01, -4.1, -7, 0.]) 

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_tf_random_seed': 1, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_save_checkpoints_steps': None, '_model_dir': '/tmp/tmpsp8Q_x', '_save_summary_steps': 100}


In [29]:
#input_fn = tf.estimator.inputs.numpy_input_fn({"x": x_train}, y_train, batch_size=4, num_epochs=None, shuffle=True) 
train_input_fn = tf.estimator.inputs.numpy_input_fn({"x": x_train}, y_train, batch_size=4, num_epochs=1000, shuffle=False) 
eval_input_fn = tf.estimator.inputs.numpy_input_fn({"x": x_eval}, y_eval, batch_size=4, num_epochs=1000, shuffle=False) 

In [30]:
# We can invoke 1000 training steps by invoking the  method and passing the 
# training data set. estimator.train(input_fn=input_fn, steps=1000) 
# Here we evaluate how well our model did. 
train_metrics = estimator.evaluate(input_fn=train_input_fn) 
eval_metrics = estimator.evaluate(input_fn=eval_input_fn) 

ValueError: Could not find trained model in model_dir: /tmp/tmpsp8Q_x.

In [22]:
print("train metrics: %r"% train_metrics) 
print("eval metrics: %r"% eval_metrics)

ValueError: Could not find trained model in model_dir: /tmp/tmpfSyHta.

### Where to go from here

go over tensorflow tutorials:
https://www.tensorflow.org/tutorials/

check out tensorboard:


building better tensorflow models:
https://danijar.com/structuring-your-tensorflow-models/

https://github.com/vahidk/EffectiveTensorflow
