# Using Machine Learning for image recognition

Here, we are going to attempt to use machine learning on a series of images of numbers to detect what the actual numbers are. Shouldnt be too bad
***

### Import Statements
We want the TensorFlow library as well as the MNIST data. From there we can initialize our data and our training session.

In [1]:
#importing tensorflow library
import tensorflow as tf

#importing MNIST data
from tensorflow.examples.tutorials.mnist import input_data


#Creating session and input data
mnist = input_data.read_data_sets('MNIST_data', one_hot = True)
sess = tf.InteractiveSession()

Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz


***
## Softmax regression model
### Single Linear Layer
#### Placeholders
We make some placeholder nodes to specify the input and output of the algorithm 

    x is the input with shape of [batch_size, flattened_image_size]

which in this case is None, for arbitary sizes, and $28\times28$ or $784$

    y_ is the output with shape of [batch_size, 10] 
    
where 10 corresponds to the 10 possible digits. 

This shape argument is optional, but good for catching bugs resulting from inconsistent tensor shapes. 

In [2]:
x = tf.placeholder(tf.float32, shape = [None, 784])
y_ = tf.placeholder(tf.float32, shape = [None, 10])

#### Variables
Now we define the weights W and biases b

W has a size of [784, 10] as we have 784 inputs and 10 outputs
b is a 10 dimensional vector because of the 10 classes

Remember, we stil have to initialize all our variables before they can be used

In [3]:
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

sess.run(tf.global_variables_initializer())

#### Predicted Class and Loss Function
We now need to implement the actual brainsbehind our thing, which is essentially the regression model. For this case, it is actually pretty simple. We just want to model our input x by our weights matrix W and add our bias b. 

Since this is a single layer, this is all we need. More often, it is much more complicated than this.

We can also implement our loss function pretty easily. 

In [4]:
y = tf.matmul(x, W) + b
cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels = y_, logits = y))

***
#### Training the Model
We now want to optimize the computation graph we made in the last section. TensorFlow has many built-in optimizations we can preform. In this case, we will use a steep gradient descent with a step of 0.5

In [5]:
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

This line adds operations to compute the gradients, compute parameter updates and apply the updates to the parameters.

We then want to run train step multiple times to  train the model


In [6]:
for _ in range(1000):
    batch = mnist.train.next_batch(100)
    train_step.run(feed_dict = {x:batch[0], y_: batch[1]})

This loads 100 training examples in each of the 1000 iterations
From there, we load the tensors x and y`_` with the appropriate training examples using feed_dict

Note, you can replace any tensors with feed_dict, not just placeholders.
***
#### Evaluating the Model
We need to know if the model is actually working or not. To do that we can use the function

    tf.argmax
    
to determine which number is most likely. Argmax returns the highest entry in a tensor along an axis. 
    
    tf.argmax(y,1) would return what we predicted
    tf.argmax(y_,1) would return what the number actually is
    tf.equal can be used to check

In [7]:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_, 1))

This returns us a list of booleans, which we can then determine which faction is correct. We can just cast them to floating numbers and take the mean.

In [8]:
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

We can them evaluate our accuracy and spit out the answer

In [9]:
print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.9205


***
but this accuracy is like laughably bad. So, instead, we can build a small convolutional neural network, getting us to 99.2%  accuracy. Which isn't extrodinarily great, but pretty respectible

## Multilayer Convolutional Network
### Weight Initialization

Basically, to convolve something, we need a lot of weights and biases. These should be initialized with some small noise to break symmetry and prevent 0 gradients. With ReLU, it is advisable to do them with a slightly positive bias. We are gonna make a funtion to do this for us

In [10]:
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

***
### Convolution and Pooling
Tensorflow is pretty good about being flexible in convolutions and pooling operations. It allows us to change how we handle our boundaries and our stride size.

For our purposes, in this case, we want a default setting stride of 1 and zero padded to make output the same size as the input. Furthermore, we want to do a max pooling over 2x2 blocks

In [11]:
def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides = [1,1,1,1], padding = 'SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1,2,2,1], strides=[1,2,2,1], padding = 'SAME')

***
Now is where things get fun, cause we are about to implement our 
### First Convolutional Layer
This is going to be one convolution, followed by one max pooling. For each 5x5 patch, there will be a calculation, creating 32 features. The tensor will have a shape of [5, 5, 1, 32]. The first two give patch size, the third, the input channel, the last, gives us the output channel

In [12]:
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

***
To apply this layer, we need to make x a 4d tensor, making 2nd and 3rd correspond to width and heihgt, the final correspond to color channel. 

From there, we can convolve it with the weight tensor and apply the fuction, and max pool. THe max pool will reduce the image size to 14x14.

In [13]:
x_image = tf.reshape(x, [-1,28,28,1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1)+b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

*** 
Well. Now what? WE ADD A
### Second Convoutional Layer!
So we can have a truly deep network. It has 64 features in each 5x5 patch


In [14]:
W_conv2 = weight_variable([5,5,32,64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1,W_conv2)+b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

***
Is that enough? Of course not lol. 
### Densely Connected Layer
Now we add 1024 neurons to the 7x7 resulting image andreshape the tensor to apply what we just did.

In [15]:
W_fc1 = weight_variable([7*7*64,1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

### Dropout
We apply dropout before the readout layer to reduce overfitting, we make this as a placeholder to allow us to turn it on during training, but off during testing.

In [16]:
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

### Readout Layer
Finally, we need a layer for readouts

In [17]:
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

### Train and Evaluate Model
We use code almost completely identical to the single layer SoftMax layer.

The differences are
* The gradient descent optimizer is replaced by a shiny, new ADAM optimizer
* We keep trake of keep_prob in feed_dict to control the dropout rate
* We add logging every 100 iterations to the training process.

We also use tf.Session instead of tf.InteractiveSession so we can better keep clode clean.

In [18]:
cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    for i in range(20000):
        batch = mnist.train.next_batch(50)
        if i % 100 == 0:
            train_accuracy = accuracy.eval(feed_dict={
                x: batch[0], y_: batch[1], keep_prob: 1.0})
            print('step %d, training accuracy %g' % (i, train_accuracy))
        train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
    print('test accuracy %g' % accuracy.eval(feed_dict={
        x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

step 0, training accuracy 0.2
step 100, training accuracy 0.78
step 200, training accuracy 0.92
step 300, training accuracy 0.86
step 400, training accuracy 0.9
step 500, training accuracy 1
step 600, training accuracy 0.96
step 700, training accuracy 0.92
step 800, training accuracy 0.92
step 900, training accuracy 1
step 1000, training accuracy 1
step 1100, training accuracy 0.96
step 1200, training accuracy 0.96
step 1300, training accuracy 0.98
step 1400, training accuracy 0.92
step 1500, training accuracy 0.96
step 1600, training accuracy 0.98
step 1700, training accuracy 0.98
step 1800, training accuracy 0.98
step 1900, training accuracy 0.94
step 2000, training accuracy 0.96
step 2100, training accuracy 0.98
step 2200, training accuracy 1
step 2300, training accuracy 1
step 2400, training accuracy 1
step 2500, training accuracy 0.98
step 2600, training accuracy 1
step 2700, training accuracy 0.98
step 2800, training accuracy 0.94
step 2900, training accuracy 1
step 3000, trainin