## Getting started with TensorFlow
### Load data 
Load the MNIST data using helper functions in tensorFlow. It will download the data if not already present in your computer. 


In [1]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


The dataset contain images of digits and corresponding labels. They are already separated into training, testing and validation sets. 

In [2]:
print "Image dimension = ", mnist.train.images.shape, "Labels dimension = ", mnist.train.labels.shape

Image dimension =  (55000, 784) Labels dimension =  (55000, 10)


### Train a basic softmax Regression
The MNIST images have 28x28 = 784 dimension, but the dataset has been flattened for purposes of training a softmax regression. This will destroy the spatial structure present in the image. We will exploit the spatial structure when we use convolutional nets. 

This is a classic case where a softmax regression is a natural, simple model. If you want to assign probabilities to an object being one of several different things, softmax is the thing to do. Even later on, when we train more sophisticated models, the final step will be a layer of softmax.

Very simply, we compute the evidence that any image $x$ belongs to class $i$ as 

\begin{equation*}
Evidence_i = \sum_j{W_{i,j}x_j+b_i} 
\end{equation*}
where $W_{i,j}$ is the weight and $b_i$ are the biases which we will learn during the training phase. 

Next, we use the softmax as an activation or link function to convert our linear evidence to probability distribution over 10 classes. 

Softmax is a simple function which is 
\begin{equation*}
softmax(x)_i = \frac{\exp{x_i}}{\sum_j{\exp{x_j}}}
\end{equation*} 

#### Built the model 
We build a symbolic model using inbuilt functions in tensorflow. For defining input to the model, we can use placeholder variables with specified dimensions. 

In [3]:
import tensorflow as tf
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x,W)+b)

#### Cost Function
To train a model, we need to define a **Cost or Loss Function** which we minimize during training to learn the weights and biases. The cost function we will use here is the _Cross Entropy_ function. 

\begin{equation*}
H_{\tilde{y}}(y) = -\sum_i{\tilde{y}_i\log{y_i}}
\end{equation*}
where $y$ is our predicted class probabilities, and $\tilde{y}$ is the true class probability (the one-hot vector we'll input). 

In [20]:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

#### Running the model in a session

In [23]:
sess = tf.Session()
init = tf.initialize_all_variables()
sess.run(init)

num_steps = 2000
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

### Evaluating the model 

In [24]:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.9177


So, we got somewhere around 92% accuracy in classifying MNIST dataset with our simplistic model. 

###  Tensorflow approach 
In tensorflow, you first build up your computation model (e.g. feedfordward neural network), and then train the model inside a session. A lot of things happen under the hood, e.g. the back-propagation and weight updates.  

Tensorflow simplifies experimenting with new models quickly, and the models can be trained on a CPU or a GPU without writing any extra line of code. 