## Artificial Neural Network and TensorFlow
[Artificial Neural Network](https://en.wikipedia.org/wiki/Artificial_neural_network) (ANNs) are a family of models inspired by biological neural networks which are used to estimate or approximate functions that can depend on a large number of inputs and are generally unknown.  

### Biological Neuron
* There are about one hundred billion neurons in human brain.
* Brain is an extremeley interconnected networks of neurons. 
* A neuron collects inputs using a structure called dendrites, the neuron effectively sums all of these inputs from the dendrites and if the resulting value is greater than it's firing threshold, the neuron fires.
* When the neuron fires it sends an electrical impulse through the neuron's axon to it's boutons. These boutons can then be networked to thousands of other neurons via connections called synapses.
<img src="images/biological_neuron.png" width=50%>  

### Perceptron (or Artificial Neuron)
A typical perceptron will have many inputs and these inputs are all individually weighted. The perceptron weights can either amplify or deamplify the original input signal. For example, if the input is 1 and the input's weight is 0.2 the input will be decreased to 0.2. These weighted signals are then added together and passed into the activation function. The activation function is used to convert the input into a more useful output. There are many different types of activation function but one of the simplest would be step function. A step function will typically output a 1 if the input is higher than a certain threshold, otherwise it's output will be 0.  

**Two step process**
1. Calculate the weighted sum
2. Use the activation function to decide the output of of the neuron
<img src="images/perceptron.png" width=50%>

**Types of activation functions**
1. [Step function](https://en.wikipedia.org/wiki/Step_function)  
2. [ReLU (Rectified Linear units)](https://en.wikipedia.org/wiki/Rectifier_(neural_networks))
3. [Sigmoid](https://en.wikipedia.org/wiki/Sigmoid_function) 
4. [tanh](https://en.wikipedia.org/wiki/Hyperbolic_function)
5. [Softmax](https://en.wikipedia.org/wiki/Softmax_function)

In [None]:
import numpy as np

def step_activation_function(val, threshold):
    if val>threshold:
        return 1
    else:
        return 0
    
def weighted_sum(inp, wt):
    temp_sum = 0
    for i in range(len(inp)):
        temp_sum += inp[i]*wt[i]
    return temp_sum
    

In [None]:
#### OR Gate
# Input_1 Input_2 Output
#    0       0      0
#    0       1      1
#    1       0      1
#    1       1      1

inp = [0, 0]
wt = [2.0, 2.0]
threshold = 1.0

## Step 1: Calculate the weighted sum
wt_sum = weighted_sum(inp, wt)

## Step 2: Use the activation function to decide the output of of the neuron
out = step_activation_function(wt_sum, threshold)
print out


### XOR gate problem
Life was simple back then! Things were going good with single layer perceptron. Boats were sailing, computers were happily running in one big giant hall! Until one day, two researchers: Marvin Minsky and Seymour Papert published a paper that demonstrated two major problems with single layer perceptron: 
* Single layer perceptron could not learn xor gate (or any data that is not linearly separable)
* The "tiny GIANT computers" back then didn't have enough processing power to effectively handle the long run time required by large neural networks

### Neural Network
<img src="images/neural_network.png" width=40%>

#### Feed forward Neural Network
A [feedforward neural network](https://en.wikipedia.org/wiki/Feedforward_neural_network) is an artificial neural network wherein connections between the units do not form a cycle.  
* Input Layer: Nodes of the network that accept input values. In the network below, **nodes 1, 2 and 3** are **input nodes**. They do not compute anything, but simply pass the values to the processing nodes in the next layers.
* Ouput Layer: These nodes provide us with the output. **Nodes (6 and 7)** are associated with the **output variables**.
* Hidden Layer: If neural network was a black box then thesea re the layers that are not visible. They compute values depending on the weight values between interconnections. **Nodes (4 and 5)** are **hidden nodes**.

<img src="images/feed_forward.png" width=40%>

#### Exercise:
<img src="images/feed_forward_example.png" width=45% align="middle">

### Training the network (Learning the weights)  
1. Randomly initialize all the weights ($w_{ij}$)
2. Repeat:  
  * Feed the network with an input x from one of the examples in the training set  
  * Compute the network’s output f(x)  
  * Change the weights $w_{ij}$ of the nodes  
3. Until the error is small

### Back propagation algorithm (How to update the weights)
<img src="images/backpropagation.png" width=50%>
#### Further readings on backpropagation
[Derivation of backpropagation rule](https://www.cs.swarthmore.edu/~meeden/cs81/s10/BackPropDeriv.pdf)  
[Detailed step by step backpropagation explained](https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/)

In [30]:
# A  B  C    Output
# 0  0  0      0
# 0  0  1      0
# 0  1  0      0
# 0  1  1      1
# 1  0  0      0
# 1  0  1      1
# 1  1  0      0
# 1  1  1      1

def my_func(a,b,c):
    return (a or b) and c

In [31]:
def test_func():
    a = [0,0,0,0,1,1,1,1]
    b = [0,0,1,1,0,0,1,1]
    c = [0,1,0,1,0,1,0,1]
    output = map(my_func,a,b,c)
    print 'A\tB\tC\tOutput'
    for i in range(8):
        print '{}\t{}\t{}\t   {}\n'.format(a[i],b[i],c[i],output[i])
        
test_func()

A	B	C	Output
0	0	0	   0

0	0	1	   0

0	1	0	   0

0	1	1	   1

1	0	0	   0

1	0	1	   1

1	1	0	   0

1	1	1	   1



In [32]:
import numpy as np

## Initialize parameters
learning_rate = 0.1
training_iter = 10000
num_input = 3
num_hidden = 5
num_output = 1

hidden_layer_wt_matrix = np.random.random([num_input, num_hidden])
output_layer_wt_matrix = np.random.random([num_hidden, num_output])
print "Hidden Layer weight matrix: "
print hidden_layer_wt_matrix
print "\n"
print "Output Layer weight matrix: "
print output_layer_wt_matrix

Hidden Layer weight matrix: 
[[ 0.21457722  0.02227932  0.55934058  0.9109393   0.38573136]
 [ 0.93374853  0.63480644  0.76543304  0.19923367  0.40042782]
 [ 0.59148007  0.97936727  0.04343354  0.57932368  0.54847296]]


Output Layer weight matrix: 
[[ 0.39631182]
 [ 0.73080643]
 [ 0.0526231 ]
 [ 0.45656168]
 [ 0.33062717]]


In [33]:
def sigmoid_activation_function(val):
    return 1.0/(1.0+math.exp(-val))


def get_output(inp,PRINT_FLAG=False):
    if PRINT_FLAG:
        print "HIDDEN LAYER:\n\nStep 1) Weighted Sum:"
        print "Input Vector:{}\nHidden Layer Weight matrix:\n{}\nWeighted Sum:\n{}\n".format(inp, hidden_layer_wt_matrix, np.dot(inp,hidden_layer_wt_matrix))

    hidden_layer_output = map(sigmoid_activation_function, np.dot(inp,hidden_layer_wt_matrix))
    if PRINT_FLAG:
        print "Step 2) Activated Value:"
        print hidden_layer_output
        print "\n\nOUTPUT LAYER:\n\nStep 1) Weighted Sum:"
        print "Hidden Layer Output:{}\nOutput Layer Weight matrix:\n{}\nWeighted Sum:\n{}\n".format(hidden_layer_output, output_layer_wt_matrix, np.dot(hidden_layer_output, output_layer_wt_matrix))
        
    final_output = map(sigmoid_activation_function, np.dot(hidden_layer_output, output_layer_wt_matrix))
    if PRINT_FLAG:
        print "Step 2) Activated Value:"
        print final_output
    return final_output, hidden_layer_output

In [34]:
import math

inp_mat = [[0,0,0], [0,0,1], [0,1,0], [0,1,1], [1,0,0], [1,0,1], [1,1,0], [1,1,1]]
inp = inp_mat[0]

out, hidden_layer_out = get_output(inp, True)


HIDDEN LAYER:

Step 1) Weighted Sum:
Input Vector:[0, 0, 0]
Hidden Layer Weight matrix:
[[ 0.21457722  0.02227932  0.55934058  0.9109393   0.38573136]
 [ 0.93374853  0.63480644  0.76543304  0.19923367  0.40042782]
 [ 0.59148007  0.97936727  0.04343354  0.57932368  0.54847296]]
Weighted Sum:
[ 0.  0.  0.  0.  0.]

Step 2) Activated Value:
[0.5, 0.5, 0.5, 0.5, 0.5]


OUTPUT LAYER:

Step 1) Weighted Sum:
Hidden Layer Output:[0.5, 0.5, 0.5, 0.5, 0.5]
Output Layer Weight matrix:
[[ 0.39631182]
 [ 0.73080643]
 [ 0.0526231 ]
 [ 0.45656168]
 [ 0.33062717]]
Weighted Sum:
[ 0.9834651]

Step 2) Activated Value:
[0.7277952274410893]


In [35]:
for inp in inp_mat:
    out,_ = get_output(inp)
    print 'A\tB\tC\tTrue Output\tNeural Network Output\n'
    print '{}\t{}\t{}\t   {}\t\t    {}\n'.format(inp[0],inp[1],inp[2],my_func(inp[0], inp[1], inp[2]),out[0])

A	B	C	True Output	Neural Network Output

0	0	0	   0		    0.727795227441

A	B	C	True Output	Neural Network Output

0	0	1	   0		    0.788421254005

A	B	C	True Output	Neural Network Output

0	1	0	   0		    0.776774187887

A	B	C	True Output	Neural Network Output

0	1	1	   1		    0.820928035031

A	B	C	True Output	Neural Network Output

1	0	0	   0		    0.758544421125

A	B	C	True Output	Neural Network Output

1	0	1	   1		    0.810373823804

A	B	C	True Output	Neural Network Output

1	1	0	   0		    0.801280508452

A	B	C	True Output	Neural Network Output

1	1	1	   1		    0.837253386058



In [36]:
from operator import mul

def test_NN():
    MSE = 0.
    for inp in inp_mat:
        out,_ = get_output(inp)
        MSE += math.pow((my_func(inp[0], inp[1], inp[2]) - out[0]), 2)
    return MSE/8
  
def update_weights(inp, hidden_layer_out, out, PRINT_FLAG=False):
    delta_wt_output = -1*(my_func(inp[0], inp[1], inp[2]) - out[0])*out[0]*(1-out[0])
    
    global output_layer_wt_matrix
    global hidden_layer_wt_matrix
    
    new_output_layer_wt_matrix = output_layer_wt_matrix - learning_rate*delta_wt_output*np.reshape(hidden_layer_out,(num_hidden,num_output))
    if PRINT_FLAG:
        print "Previous Output Layer Weight matrix:"
        print output_layer_wt_matrix
        print "\nUpdated Output Layer Weight matrix:"
        print new_output_layer_wt_matrix
    
    delta_wt_hidden = delta_wt_output*np.multiply(hidden_layer_out, (np.ones((num_output,num_hidden)) - hidden_layer_out))
    
    if PRINT_FLAG:
        print "\nPrevious Hidden Layer Weight matrix:"
        print hidden_layer_wt_matrix
    
    hidden_layer_wt_matrix = hidden_layer_wt_matrix - learning_rate*np.dot(np.reshape(inp,(num_input,1)), delta_wt_hidden*np.reshape(output_layer_wt_matrix,(1,num_hidden)))
    
    if PRINT_FLAG:
        print "\nUpdated Hidden Layer Weight matrix:"
        print hidden_layer_wt_matrix
    
    output_layer_wt_matrix = new_output_layer_wt_matrix

out, hidden_layer_out = get_output(inp)
update_weights(inp, hidden_layer_out, out, True)

count = 0
while count!=training_iter:
    if count%1000==0:
        print "Iteration {}:{}".format(count, test_NN())
    count += 1
    
    for inp in inp_mat:
        out, hidden_layer_out = get_output(inp)
        update_weights(inp, hidden_layer_out, out)
        
        
        

Previous Output Layer Weight matrix:
[[ 0.39631182]
 [ 0.73080643]
 [ 0.0526231 ]
 [ 0.45656168]
 [ 0.33062717]]

Updated Output Layer Weight matrix:
[[ 0.39819824]
 [ 0.73266267]
 [ 0.05439072]
 [ 0.45843368]
 [ 0.33238263]]

Previous Hidden Layer Weight matrix:
[[ 0.21457722  0.02227932  0.55934058  0.9109393   0.38573136]
 [ 0.93374853  0.63480644  0.76543304  0.19923367  0.40042782]
 [ 0.59148007  0.97936727  0.04343354  0.57932368  0.54847296]]

Updated Hidden Layer Weight matrix:
[[ 0.21468887  0.02250037  0.55935946  0.9110725   0.38585231]
 [ 0.93386018  0.63502749  0.76545192  0.19936687  0.40054877]
 [ 0.59159172  0.97958832  0.04345241  0.57945688  0.54859391]]
Iteration 0:0.384156739086
Iteration 1000:0.0756623633159
Iteration 2000:0.01056586273
Iteration 3000:0.0041462036059
Iteration 4000:0.00245956799435
Iteration 5000:0.00171964222422
Iteration 6000:0.00131137421484
Iteration 7000:0.00105488915508
Iteration 8000:0.000879726975216
Iteration 9000:0.000752932192013


In [37]:
for inp in inp_mat:
    out,_ = get_output(inp)
    print 'A\tB\tC\tTrue Output\tNeural Network Output\n'
    print '{}\t{}\t{}\t   {}\t\t    {}\n'.format(inp[0],inp[1],inp[2],my_func(inp[0], inp[1], inp[2]),out[0])

A	B	C	True Output	Neural Network Output

0	0	0	   0		    0.00369053197933

A	B	C	True Output	Neural Network Output

0	0	1	   0		    0.037393301626

A	B	C	True Output	Neural Network Output

0	1	0	   0		    0.0233606826554

A	B	C	True Output	Neural Network Output

0	1	1	   1		    0.967749045452

A	B	C	True Output	Neural Network Output

1	0	0	   0		    0.0228911088741

A	B	C	True Output	Neural Network Output

1	0	1	   1		    0.96757724984

A	B	C	True Output	Neural Network Output

1	1	0	   0		    0.0167908680527

A	B	C	True Output	Neural Network Output

1	1	1	   1		    0.979947174203



## Brief Introduction to TensorFlow
TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them.  
### Overview
* Need to imagine computations as graphs
* Nodes in the graph are operations
* Edges are data. Data in tensorflow are called Tensors.
* Example: Tensor representing a batch of colored images will be of dimensions [batch size, height, width, color channel(RBG)]
* To start computation, the graph must be launched in a Session
* Session places the graph ops onto Devices, such as CPUs or GPUs
* Methods return output in the form of numpy.ndarray in case of Python

### Building the graph
* Start with constant values because they don't need any input
* Pass the output of this constant op to other op nodes that do computation
* TensorFlow provides us with a default graph
* While adding ops you need to specify the graph to which the node is being added or else it will be added to the default graph. But since we generally work with single graphs this won't be required much


In [None]:
## Example of constant op

import tensorflow as tf

hello = tf.constant('Hello, TensorFlow!')
print hello
sess = tf.Session()
print hello
print sess.run(hello)
print tf.shape(hello)
sess.close()

In [None]:
val = [tf.constant([1.]), tf.constant([2.]), tf.constant([3.])]
sum_arr = tf.add_n(val)

with tf.Session() as sess:
    print sess.run(sum_arr)


In [None]:
import numpy as np

#matrix1 = tf.Variable([[3., 3.]])
#matrix2 = tf.Variable([[2.],[2.]])
matrix1 = tf.Variable(np.matrix('1. 2.; 3. 4.'))
matrix2 = tf.Variable(np.matrix('1. 2.; 3. 4.'))
matrix3 = tf.Variable(np.matrix('1. 2.; 3. 4.'))
product = tf.matmul(matrix1, matrix2)

sess = tf.Session()

#sess.run(matrix1)
#sess.run(matrix2)
matrix1.initializer.run(session=sess)
matrix2.initializer.run(session=sess)
matrix3.initializer.run(session=sess)

print sess.run(product)
sess.close()

#### Preparing the data sets

In [39]:
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

import tensorflow as tf

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting /tmp/data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


In [40]:
# Parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
display_step = 1

# Network Parameters
n_hidden_1 = 256 # 1st layer number of features
n_hidden_2 = 256 # 2nd layer number of features
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])

In [41]:
# Create model
def multilayer_perceptron(x, weights, biases):
    # Hidden layer with RELU activation
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.relu(layer_1)
    # Hidden layer with RELU activation
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    layer_2 = tf.nn.relu(layer_2)
    # Output layer with linear activation
    out_layer = tf.add(tf.matmul(layer_2, weights['out']), biases['out'])
    return out_layer

In [42]:
# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

# Construct model
pred = multilayer_perceptron(x, weights, biases)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Initializing the variables
init = tf.initialize_all_variables()

In [43]:
# Launch the graph
with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([optimizer, cost], feed_dict={x: batch_x,
                                                          y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print "Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(avg_cost)
    print "Optimization Finished!"

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print "Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels})

Epoch: 0001 cost= 195.142113180
Epoch: 0002 cost= 43.801992302
Epoch: 0003 cost= 27.233832005
Epoch: 0004 cost= 19.293722562
Epoch: 0005 cost= 14.084875138
Epoch: 0006 cost= 10.669653669
Epoch: 0007 cost= 8.068330675
Epoch: 0008 cost= 6.239608951
Epoch: 0009 cost= 4.636941172
Epoch: 0010 cost= 3.603453781
Epoch: 0011 cost= 2.724619403
Epoch: 0012 cost= 2.133785369
Epoch: 0013 cost= 1.578597355
Epoch: 0014 cost= 1.371632929
Epoch: 0015 cost= 1.110075789
Optimization Finished!
Accuracy: 0.9473


In [None]:
import tensorflow.contrib.learn as skflow
from sklearn import metrics

mnist = input_data.read_data_sets("/tmp/data/", one_hot=False)

classifier = skflow.TensorFlowDNNClassifier(hidden_units=[256, 256], n_classes=10)
classifier.fit(mnist.train.images, mnist.train.labels.astype(int))
score = metrics.accuracy_score(mnist.test.labels.astype(int), classifier.predict(mnist.test.images))
print("Accuracy: %f" % score)

#### Recurrent Neural Networks
A recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. This makes them applicable to tasks such as unsegmented connected handwriting recognition or speech recognition.  
**Analogy to Human Brain**: Try speaking out loud from A to Z. When you are done, try it in reverse i.e Z to A. You will find that A to Z is a lot easier then in reverse order. This is because our entire life we have been learning A to Z and not in reverse order. So brain does some optimization on it's own and remembers the sequence. If I ask you what is fifth letter from D in correct order. It is tough to tell right away! 

#### Vanishing gradient problem and Exploding gradient problem
The **vanishing gradient problem** is a difficulty found in training artificial neural networks with gradient-based learning methods and backpropagation. In such methods, each of the neural network's weights receives an update proportional to the gradient of the error function with respect to the current weight in each iteration of training. Traditional activation functions such as the hyperbolic tangent function have gradients in the range (−1, 1) or [0, 1), and backpropagation computes gradients by the chain rule. This has the effect of multiplying n of these small numbers to compute gradients of the "front" layers in an n-layer network, meaning that the gradient (error signal) decreases exponentially with n and the front layers train very slowly.  
When activation functions are used whose derivatives can take on larger values, one risks encountering the related **exploding gradient problem**.

##### [LSTM](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) 


In [44]:
from tensorflow.python.ops import rnn, rnn_cell

mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

# Parameters
learning_rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10

# Network Parameters
n_input = 28 # MNIST data input (img shape: 28*28)
n_steps = 28 # timesteps
n_hidden = 128 # hidden layer num of features
n_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])

# Define weights
weights = {
    'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
    'out': tf.Variable(tf.random_normal([n_classes]))
}

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


In [45]:
def RNN(x, weights, biases):

    # Prepare data shape to match `rnn` function requirements
    # Current data input shape: (batch_size, n_steps, n_input)
    # Required shape: 'n_steps' tensors list of shape (batch_size, n_input)
    
    # Permuting batch_size and n_steps
    x = tf.transpose(x, [1, 0, 2])
    # Reshaping to (n_steps*batch_size, n_input)
    x = tf.reshape(x, [-1, n_input])
    # Split to get a list of 'n_steps' tensors of shape (batch_size, n_input)
    x = tf.split(0, n_steps, x)

    # Define a lstm cell with tensorflow
    lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)

    # Get lstm cell output
    outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)

    # Linear activation, using rnn inner loop last output
    return tf.add(tf.matmul(outputs[-1], weights['out']), biases['out'])

pred = RNN(x, weights, biases)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Evaluate model
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
init = tf.initialize_all_variables()



In [46]:
# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    step = 1
    # Keep training until reach max iterations
    while step * batch_size < training_iters:
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Reshape data to get 28 seq of 28 elements
        batch_x = batch_x.reshape((batch_size, n_steps, n_input))
        # Run optimization op (backprop)
        sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
        if step % display_step == 0:
            # Calculate batch accuracy
            acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})
            # Calculate batch loss
            loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
            print "Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
                  "{:.6f}".format(loss) + ", Training Accuracy= " + \
                  "{:.5f}".format(acc)
        step += 1
    print "Optimization Finished!"

    # Calculate accuracy for 128 mnist test images
    test_len = 128
    test_data = mnist.test.images.reshape((-1, n_steps, n_input))
    test_label = mnist.test.labels
    #test_data = mnist.test.images[:test_len].reshape((-1, n_steps, n_input))
    #test_label = mnist.test.labels[:test_len]
    print "Testing Accuracy:", \
        sess.run(accuracy, feed_dict={x: test_data, y: test_label})

Iter 1280, Minibatch Loss= 1.682531, Training Accuracy= 0.43750
Iter 2560, Minibatch Loss= 1.505489, Training Accuracy= 0.51562
Iter 3840, Minibatch Loss= 1.084329, Training Accuracy= 0.59375
Iter 5120, Minibatch Loss= 0.842865, Training Accuracy= 0.75000
Iter 6400, Minibatch Loss= 0.690978, Training Accuracy= 0.80469
Iter 7680, Minibatch Loss= 1.056808, Training Accuracy= 0.64062
Iter 8960, Minibatch Loss= 0.712785, Training Accuracy= 0.79688
Iter 10240, Minibatch Loss= 0.607738, Training Accuracy= 0.82812
Iter 11520, Minibatch Loss= 0.352851, Training Accuracy= 0.90625
Iter 12800, Minibatch Loss= 0.544186, Training Accuracy= 0.81250
Iter 14080, Minibatch Loss= 0.473677, Training Accuracy= 0.84375
Iter 15360, Minibatch Loss= 0.282231, Training Accuracy= 0.91406
Iter 16640, Minibatch Loss= 0.356363, Training Accuracy= 0.93750
Iter 17920, Minibatch Loss= 0.281336, Training Accuracy= 0.89844
Iter 19200, Minibatch Loss= 0.257095, Training Accuracy= 0.91406
Iter 20480, Minibatch Loss= 0.18

#### RNN implementation with skflow (Need to update)

In [None]:
import tensorflow.contrib.learn as skflow

def mnist_rnn_input_op_fn(x):
    x = tf.transpose(x, [1, 0, 2])
    # Reshaping to (n_steps*batch_size, n_input)
    x = tf.reshape(x, [-1, 28])
    # Split to get a list of 'n_steps' tensors of shape (batch_size, n_input)
    x = tf.split(0, 28, x)
    return x

classifier = skflow.TensorFlowRNNClassifier(rnn_size=28, cell_type='rnn', n_classes=10, input_op_fn=mnist_rnn_input_op_fn)

#classifier = skflow.TensorFlowRNNClassifier(rnn_size=28, 
#    n_classes=10, cell_type='rnn', input_op_fn=mnist_rnn_input_op_fn,
#    num_layers=1, bidirectional=False, sequence_length=None,
#    steps=1000, optimizer='Adam', learning_rate=0.01, continue_training=True)
print(mnist.train.images.shape)
print(mnist.train.labels.shape)
train_data = mnist.train.images.reshape((55000, 28, 28))
classifier.fit(train_data, mnist.train.labels)
print mnist.test.images.shape
pred = classifier.predict(mnist.test.images.reshape((10000, 28, 28)))
score = metrics.accuracy_score(mnist.test.labels.astype(int), pred)
print score
#print mnist.test.labels[0]
#correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(mnist.test.labels, 1))
#print correct_prediction
#correct_prediction = tf.equal(tf.argmax(classifier.predict(mnist.test.images).reshape(10000, 28, 28), 1), tf.argmax(mnist.test.labels, 1))
    # Calculate accuracy
#accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
#print "Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels})


#### Convolutional Neural Networks

#### Deep Neural Networks