## Introduction

This tutorial is going to introduce you to the basics of TensorFlow and walk you through a simple example using TensorFlow to classify handwritten digits from the MNIST dataset.

### What is TensorFlow?
TensorFlow is a software framework that was developed at Google and is used for numerical computation using data flow graphs. It facilitates the process of constructing, training, and deploying deep neural networks and has enabled people to make progress on various tasks such as cancer detection, language translation, and preventing blindness in people with diabetes among many others. TensorFlow is popular because it is flexible and allows user to easily create different kinds of models. One of the most common uses of tensorflow is image classification.

### What is a Tensor?
The data used in computations with TensorFlow are stored in multidimensional arrays called tensors. The name TensorFlow comes from the fact that we are designing computational graphs that define how the tensors will "flow" through the system. The number of dimensions a tensor has is referred to as its rank.

## Installing TensorFlow

In order to install the TensorFlow library you should first install pip then issue the following command: 

    $ pip install --upgrade tensorflow
    
This will install TensorFlow as well as all the packages that TensorFlow requires.

In [9]:
import tensorflow as tf
import numpy as np

In order to validate that the installation ran correctly, I'm going to run a short TensorFlow program.

In [7]:
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello).decode())

Hello, TensorFlow!


The code above should print "Hello, TensorFlow!" which means we are now ready to run TensorFlow programs.

## Building a Computational Graph

### Simple Linear Model
In order to get started with the basics of constructing a computational graph we will first start by creating a simple linear model and training this model.

A computational graph is made up of nodes. These nodes can be constants which do not take in any arguments as seen below.

In [16]:
a = tf.constant(2.0, tf.float32)
b = tf.constant([1, 2, 3, 4])
c = tf.constant(4.0, shape=[2, 2])

sess = tf.Session()
print(sess.run(a))
print(sess.run(b))
print(sess.run(c))

2.0
[1 2 3 4]
[[4. 4.]
 [4. 4.]]


The variable a is a float with a value of 2.0, b is a 1D array, and c is a 2D array with two rows and two columns. In order to evaluate nodes and print out their values we must run them within a session which is created using tf.Session().

Nodes can also be operations which take in tensors and return tensors as output. The code below shows how to create a node that returns the result of dividing two tensors.

In [93]:
div_node = tf.divide(c, a)
print(sess.run(div_node))

[[2. 2.]
 [2. 2.]]


Nodes can also be defined to take in variable parameters as seen below.

In [94]:
d = tf.placeholder(tf.float32)
e = tf.placeholder(tf.float32)
mult_node = tf.multiply(d, e)
print(sess.run(mult_node, feed_dict = {d: 10.0, e: 5.0}))

50.0


Variables d and e are placeholders indicating that they will be given values later. When running the graph, the values are passed in through the argument feed_dict and the resulting tensor contains the result of multiplying nodes d and e together.

Now we will create and train a linear model. First we must define the model parameters as shown below. We use tf.Variable to create the parameters. The method tf.Variable is similar to tf.constant however it allows for modifiable inputs as the model is being trained.

In [38]:
W = tf.Variable(1.0, tf.float32)
b = tf.Variable(1.0, tf.float32)
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
model = W * x + b

The code above defines the model parameters and constructs the graph. Next, we must define the loss function and how to optimize the loss function. We will use a squared loss function and gradient descent with a step size of 0.01 as our optimization function.

In [95]:
squared_loss = tf.square(model - y)
loss = tf.reduce_sum(squared_loss)
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)

The next step is to create example training data and feed it to the model. We will then train the function and evaluate our accuracy.

In [48]:
x_train = [1, 2, 3, 4]
y_train = [7, 9, 11, 13]

# must run the following line in order to initialize the variables
init = tf.global_variables_initializer()
sess.run(init)
for i in range(2000):
  sess.run(train, {x: x_train, y: y_train})

W_, b_, loss_  = sess.run([W, b, loss], {x:x_train, y:y_train})
print("W: %s b: %s loss: %s" % (W_, b_, loss_))

W: 2.0000064 b: 4.999981 loss: 2.3101165e-10


The final coefficient estimates are printed above along with the loss. The true values of the coefficients are 2 and 5 so as we can see the model came pretty close to the correct values and our loss is very low.

Next we are ready to work with a real example using TensorFlow to classify images.

## Example: Classifying Images

Now we are going to train a model to classify digits from the MNIST dataset. The dataset can be found at this website: http://yann.lecun.com/exdb/mnist/. This dataset has 60,000 examples and a test set of 10,000 examples. The images in the dataset are all normalized and centered within a fixed size. Each image is 28 by 28 pixels and is flattened to produce a vector of length 784 in order to make the computation easier.

The first step is to download the data by running the following two lines.

In [96]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


The one_hot parameter ensures that the labels are represented as one hot vectors. For example, to represent the label 1, all indices in the vector would be 0 except for the value at index 1 which would be 1. This format will make it easier for us to predict labels as you will see later on.

Next, we will want to define our model and its parameters. The model we are going to use for the classification is the softmax regression model. Softmax regression is a good model for our purposes as it returns a number between 0 and 1 which allows us to assign probabilities to each label and determine which label is most likely the correct one.

Below, we define x as a placeholder which we will provide a value for later. W is the coefficient which is defined using tf.Variable so its value can be modified during the training of the model.

In [104]:
length = 28 * 28 # length of flattened image
x = tf.placeholder(tf.float32, [None, length])
W = tf.Variable(tf.ones([length, 10]))
y = tf.nn.softmax(tf.matmul(x, W))
y_train = tf.placeholder(tf.float32, [None, 10])

The next step is to define our loss function and how we are going to optimize it. 

In [105]:
cross_entropy = -1 * tf.reduce_sum(y_train * tf.log(y))
cross_entropy = tf.reduce_mean(cross_entropy)
opt = tf.train.GradientDescentOptimizer(0.01)
train = opt.minimize(cross_entropy)

Now we are going to initialize our variables and then train the model. We are going to run the training step 5000 times and for each step we are going to use a batch of 150 training examples. We use a smaller batch size because using the whole dataset for each training step would be expensive and take much longer.

In [109]:
# initialize all variables
init = tf.global_variables_initializer()
sess.run(init)

for batch in range(5000):
  x_batch, y_batch = mnist.train.next_batch(150)
  sess.run(train, feed_dict={x: x_batch, y_train: y_batch})

Finally, we are going to measure how well our model classified the digits. Since the labels are one-hot vectors we will compare the indices of the vectors with the highest probabilities

In [111]:
num_correct = tf.equal(tf.argmax(y, axis=1), tf.argmax(y_train, axis=1))
num_correct = tf.cast(num_correct, tf.float32) # cast from boolean to float
acc = tf.reduce_mean(num_correct)
print(sess.run(acc, feed_dict={x: mnist.test.images, y_train: mnist.test.labels}))

0.9088


As you can see, we got an accuracy rate of roughly 91%. This is not a very good rate and small changes to our model could improve it. However, this example shows the easiness of creating and training a model with TensorFlow and the flexibility with which it allows us to define models.

## Further Resources
This tutorial is meant to introduce you to the basics. For learning more about TensorFlow and information about how to use it to accomplish a variety of tasks the links below may be helpful.

1. https://www.tensorflow.org/tutorials/
2. https://github.com/tensorflow
3. http://amygdala.github.io/ml/2017/02/03/transfer_learning.html
4. https://www.tensorflow.org/tutorials/deep_cnn

### Citations
https://www.tensorflow.org/versions/r1.1/get_started/get_started

https://www.tensorflow.org/versions/r1.1/get_started/mnist/beginners

https://www.lynda.com/Google-TensorFlow-tutorials/What-TensorFlow/601800/647721-4.html

https://www.computerworlduk.com/open-source/what-is-tensorflow-how-are-businesses-using-it-3658374/

https://research.googleblog.com/2017/02/announcing-tensorflow-10.html

https://dataplatform.ibm.com/analytics/notebooks/91440c8b-0bfb-471e-b04e-235e4d9f510d/view?access_token=fb4380415a903111e26cec3bd95d8ba91a04746185c866fecde9d36643fa5585

Data from: http://yann.lecun.com/exdb/mnist/