# Machine Learning Engineer Nanodegree
## Deep Learning
## Project: Build a Digit Recognition Program

In this notebook, a template is provided for you to implement your functionality in stages which is required to successfully complete this project. If additional code is required that cannot be included in the notebook, be sure that the Python code is successfully imported and included in your submission, if necessary. Sections that begin with **'Implementation'** in the header indicate where you should begin your implementation for your project. Note that some sections of implementation are optional, and will be marked with **'Optional'** in the header.

In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a **'Question'** header. Carefully read each question and provide thorough answers in the following text boxes that begin with **'Answer:'**. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.

>**Note:** Code and Markdown cells can be executed using the **Shift + Enter** keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.

----
## Step 1: Design and Test a Model Architecture
Design and implement a deep learning model that learns to recognize sequences of digits. Train the model using synthetic data generated by concatenating character images from [notMNIST](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html) or [MNIST](http://yann.lecun.com/exdb/mnist/). To produce a synthetic sequence of digits for testing, you can for example limit yourself to sequences up to five digits, and use five classifiers on top of your deep network. You would have to incorporate an additional ‘blank’ character to account for shorter number sequences.

There are various aspects to consider when thinking about this problem:
- Your model can be derived from a deep neural net or a convolutional network.
- You could experiment sharing or not the weights between the softmax classifiers.
- You can also use a recurrent network in your deep neural net to replace the classification layers and directly emit the sequence of digits one-at-a-time.

Here is an example of a [published baseline model on this problem](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42241.pdf). ([video](https://www.youtube.com/watch?v=vGPI_JvLoN0))

### Implementation
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.

In [1]:
### Your code implementation goes here.
### Feel free to use as many code cells as needed.

In [2]:
# Importing the required modules
from __future__ import print_function
import matplotlib.pyplot as plt
import numpy as np
import os
import sys
from IPython.display import display
# from PIL import Image
from scipy import ndimage
from sklearn.linear_model import LogisticRegression
from six.moves.urllib.request import urlretrieve
from six.moves import cPickle as pickle
from six.moves import range
import tensorflow as tf
import idx2numpy
from sklearn.model_selection import train_test_split
import random
from sklearn.preprocessing import OneHotEncoder

# Config the matplotlib backend as plotting inline in IPython
%matplotlib inline

In [3]:
# MNIST dataset is downloaded and extracted.
# Data is in idx unicode format.

In [4]:
# Readin MNIST data.
train_dataset_orig = idx2numpy.convert_from_file('train-images.idx3-ubyte')
train_labels_orig =idx2numpy.convert_from_file('train-labels.idx1-ubyte')
test_dataset_orig = idx2numpy.convert_from_file('t10k-images.idx3-ubyte')
test_labels_orig = idx2numpy.convert_from_file('t10k-labels.idx1-ubyte')
print('Training set', train_dataset_orig.shape, train_labels_orig.shape)
print('Test set', test_dataset_orig.shape, test_labels_orig.shape)

Training set (60000, 28, 28) (60000,)
Test set (10000, 28, 28) (10000,)


In [5]:
# creating sequences of numbers
def create_numbers(dataset,labels):
    # creating a numpy dataset.
    # ndataset will have only 20% size as we are mixing 5 digits and each digit will have 140 width size.
    ndataset =  np.ndarray(shape=(int(dataset.shape[0]/3),int(dataset.shape[1]),int(dataset.shape[2]*5)),dtype=np.float32) 
    # Initializing labels
    nlabels = []
    
    i=0
    w=0
    # creating a blank image
    null_pic = np.zeros(shape=(28,28))
    
###### Added blank images to create random sequences#######
    while True:
        try:
            # creates random elements in the  range of 2.
            random_blanks = random.randint(0,2)
            if random_blanks == 0:
                temp = np.hstack([dataset[w],dataset[w+1],dataset[w+2],dataset[w+3],dataset[w+4]])
                temp_labels = [labels[w],labels[w+1],labels[w+2],labels[w+3],labels[w+4]]
            # adding labels to form a 5 digit
                w += 5
                i += 1
            elif random_blanks == 1:
                temp = np.hstack([dataset[w],dataset[w+1],dataset[w+2],dataset[w+3],null_pic])
                # We are using 10 as a label for blank pic
                temp_labels = [labels[w],labels[w+1],labels[w+2],labels[w+3],10]
                # adding labels to form a 5 digit
                w += 4
                i += 1
            elif random_blanks == 2:
                temp = np.hstack([dataset[w],dataset[w+1],dataset[w+2],null_pic,null_pic])
                # coinver to numpy array
                temp_labels = [labels[w],labels[w+1],labels[w+2],10,10]
                # adding labels to form a 5 digit
                w += 3
                i += 1
                
            ndataset[i,:,:] = temp
            nlabels.append(temp_labels)
        except Exception as e:
            print ("There are less than 5 numbers to form a 5-digit sequence.")
            break;        
    ndataset = ndataset[:i,:,:]
    print("Shape of dataset is {}".format(ndataset.shape))
    return ndataset,np.array(nlabels).astype(int)


train_dataset_seq,train_labels_seq = create_numbers(train_dataset_orig,train_labels_orig)
test_dataset_seq,test_labels_seq = create_numbers(test_dataset_orig,test_labels_orig)

# normalized train dataset
train_dataset_nor = train_dataset_seq/256.0 -0.5
test_dataset_nor  = test_dataset_seq/256.0 -0.5

There are less than 5 numbers to form a 5-digit sequence.
Shape of dataset is (15029, 28, 140)
There are less than 5 numbers to form a 5-digit sequence.
Shape of dataset is (2482, 28, 140)


In [6]:
# Reformating data so that it's suitable for the tensorflow format (it expects image data to be 3d(cube shape))
image_height = 28
image_width = 140
num_labels = 10 # Without including the null label
num_channels = 1 # grayscale


# Reformating the dataset to make it suitable for the tensor flow
def reformat(dataset):
    dataset = dataset.reshape((-1, image_height, image_width, num_channels)).astype(np.float32)
    return dataset

# formatting the datasets
train_dataset_for, train_labels_for = reformat(train_dataset_nor),train_labels_seq
test_dataset_for,test_labels_for = reformat(test_dataset_nor),test_labels_seq

# creating a validation set from training set

train_dataset,valid_dataset,train_labels_encode,valid_labels_encode = train_test_split(train_dataset_for,train_labels_for,test_size=0.3,random_state=50)

test_dataset,test_labels_encode = test_dataset_for,test_labels_for


# one-hot encoding of the dataset.
def oneHot(labels):
    enc = OneHotEncoder(sparse=False)
    enc.fit(labels)
    return enc.transform(labels)

train_labels = oneHot(train_labels_encode)
valid_labels = oneHot(valid_labels_encode)
test_labels = oneHot(test_labels_encode)


print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (10520, 28, 140, 1) (10520, 52)
Validation set (4509, 28, 140, 1) (4509, 52)
Test set (2482, 28, 140, 1) (2482, 52)


In [17]:
train_labels[0]

array([False,  True, False, False, False, False, False, False, False, False], dtype=bool)

In [7]:
# Calculating the accuracy of the model
def accuracy(predictions, labels):
    
    # Separating predictions and labels for each separate digit
    predictions_digit1 = predictions[:,0:10]
    predictions_digit2 = predictions[:,10:20]
    predictions_digit3 = predictions[:,20:30]
    predictions_digit4 = predictions[:,30:41]
    predictions_digit5 = predictions[:,41:52]
    
    labels_digit1 = labels[:,0:10]
    labels_digit2 = labels[:,10:20]
    labels_digit3 = labels[:,20:30]
    labels_digit4 = labels[:,30:41]
    labels_digit5 = labels[:,41:52]
    
    truth1 = np.argmax(predictions_digit1,axis = 1)==np.argmax(labels_digit1,axis=1)
    truth2 = np.argmax(predictions_digit2,axis=1)==np.argmax(labels_digit2,axis=1)
    truth3 = np.argmax(predictions_digit3,axis=1)==np.argmax(labels_digit3,axis=1)
    truth4 = np.argmax(predictions_digit4,axis=1)==np.argmax(labels_digit4,axis=1)
    truth5 = np.argmax(predictions_digit5,axis=1)==np.argmax(labels_digit5,axis=1)
    
    return (np.sum(np.column_stack([truth1,truth2,truth3,truth4,truth5]).all(axis=1))/predictions.shape[0])

In [8]:
# convolutional networks of size 4

batch_size = 16
patch_size = 5
depth = 32
num_hidden = 512

# For representing the computations as graph
graph = tf.Graph()

# Making this graph as the default graph
with graph.as_default():
    # Input data.
    
    # for storing the input train data(variable size)
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_height, image_width, num_channels))
#     tf_train_labels = tf.placeholder(tf.int32, shape=(batch_size, num_labels))
    tf_train_labels = tf.placeholder(tf.int32, shape=(batch_size, 52))
    # storing validation and test datasets
    tf_valid_dataset = tf.constant(valid_dataset,dtype=tf.float32)
    tf_test_dataset = tf.constant(test_dataset,dtype=tf.float32)
     
    # Variables.
    
    # Building a four layer CNN
    
    ## FIRST LAYER
    # Intiating the input variable weights randomly from normal distribution
    layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth], stddev=0.1))
    # Intiating the layer1 biasesa as zeros
    layer1_biases = tf.Variable(tf.constant(0.5,shape =(1,depth)))
    
    # SECOND LAYER
    layer2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth,depth], stddev=0.1))
    layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
    # THRID LAYER
    layer3_weights = tf.Variable(tf.truncated_normal([int(image_height / 4) * int(image_width / 4) * depth, num_hidden], stddev=0.1))
    layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
    
    # FOURTH LAYER with 5 logits
    logits1_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=0.1))
    logits1_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
       
    logits2_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=0.1))
    logits2_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
    
    logits3_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=0.1))
    logits3_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
    
    # null pic will add a new label of 10.
    logits4_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels+1], stddev=0.1))
    logits4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels+1])) 
    
    # Null pic will add new label of 10.
    logits5_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels+1], stddev=0.1))
    logits5_biases = tf.Variable(tf.constant(1.0, shape=[num_labels+1]))


    # Model.
    def model(data):
        conv1 = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
        hidden1 = tf.nn.relu(conv1 + layer1_biases)
        
        conv2 = tf.nn.conv2d(hidden1, layer2_weights, [1, 2, 2, 1], padding='SAME')
        hidden2 = tf.nn.relu(conv2 + layer2_biases)
        
        shape = hidden2.get_shape().as_list()
        reshape = tf.reshape(hidden2, [-1, shape[1] * shape[2] * shape[3]])
        hidden3 = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
        
        return [tf.matmul(hidden3,logits1_weights) + logits1_biases,\
                tf.matmul(hidden3,logits2_weights) + logits2_biases,\
                tf.matmul(hidden3,logits3_weights) + logits3_biases,\
                tf.matmul(hidden3,logits4_weights) + logits4_biases,\
                tf.matmul(hidden3,logits5_weights) + logits5_biases
                ]
    
    # Training computation.
    logits1,logits2,logits3,logits4,logits5 = model(tf_train_dataset)
    

    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = logits1,labels = tf_train_labels[:,0:10]))+\
    tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = logits2,labels = tf_train_labels[:,10:20]))+\
    tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = logits3,labels = tf_train_labels[:,20:30]))+\
    tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = logits4,labels = tf_train_labels[:,30:41]))+\
    tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = logits5,labels = tf_train_labels[:,41:52]))
    
    
    # Optimizer.
    optimizer = tf.train.AdamOptimizer(0.0005).minimize(loss)
  
    # Predictions for the training, validation, and test data
    def predictions(x):
        return  tf.concat([tf.nn.softmax(logits = x[0]),\
                tf.nn.softmax(logits = x[1]),\
                tf.nn.softmax(logits = x[2]),\
                tf.nn.softmax(logits = x[3]),\
                tf.nn.softmax(logits = x[4])],1)
    
    train_prediction = predictions(model(tf_train_dataset))
    valid_prediction = predictions(model(tf_valid_dataset))
    test_prediction = predictions(model(tf_test_dataset))

In [9]:
num_steps = 10000
len_train_data = train_dataset.shape[0]

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(num_steps):
        random_index = [random.randint(0,len_train_data-1) for x in range(batch_size)] 
        batch_data = train_dataset[random_index, :, :, :]
        batch_labels = train_labels[random_index, :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict = feed_dict)
        if (step % 50 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))
    
# save the session
# tf.train.saver

Initialized
Minibatch loss at step 0: 118.473778
Minibatch accuracy: 0.0%
Validation accuracy: 0.0%
Minibatch loss at step 50: 10.902331
Minibatch accuracy: 0.0%
Validation accuracy: 0.0%
Minibatch loss at step 100: 12.129419
Minibatch accuracy: 0.0%
Validation accuracy: 0.0%
Minibatch loss at step 150: 10.986760
Minibatch accuracy: 0.0%
Validation accuracy: 0.0%
Minibatch loss at step 200: 11.242470
Minibatch accuracy: 0.0%
Validation accuracy: 0.0%
Minibatch loss at step 250: 10.419225
Minibatch accuracy: 0.0%
Validation accuracy: 0.0%
Minibatch loss at step 300: 10.385300
Minibatch accuracy: 0.0%
Validation accuracy: 0.0%
Minibatch loss at step 350: 11.115490
Minibatch accuracy: 0.0%
Validation accuracy: 0.0%
Minibatch loss at step 400: 9.828760
Minibatch accuracy: 0.0%
Validation accuracy: 0.0%
Minibatch loss at step 450: 11.307780
Minibatch accuracy: 0.0%
Validation accuracy: 0.0%
Minibatch loss at step 500: 10.953825
Minibatch accuracy: 0.0%
Validation accuracy: 0.0%
Minibatch lo

### Question 1
_What approach did you take in coming up with a solution to this problem?_

**Answer:** 
I used  notMNIST data.

### Question 2
_What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.)_

**Answer:**

### Question 3
_How did you train your model? How did you generate your synthetic dataset?_ Include examples of images from the synthetic data you constructed.

**Answer:**

----
## Step 2: Train a Model on a Realistic Dataset
Once you have settled on a good architecture, you can train your model on real data. In particular, the [Street View House Numbers (SVHN)](http://ufldl.stanford.edu/housenumbers/) dataset is a good large-scale dataset collected from house numbers in Google Street View. Training on this more challenging dataset, where the digits are not neatly lined-up and have various skews, fonts and colors, likely means you have to do some hyperparameter exploration to perform well.

### Implementation
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.

In [10]:
# Required Imports

import scipy.io
import numpy as np
import matplotlib.pyplot as plt

In [11]:
# Loading and processing datasets

train_dataset = (scipy.io.loadmat('train_32x32.mat')['X']).astype('float')
train_labels = (scipy.io.loadmat('train_32x32.mat')['y']).astype('float')
test_dataset= (scipy.io.loadmat('test_32x32.mat')['X']).astype('float')
test_labels = (scipy.io.loadmat('test_32x32.mat')['y']).astype('float')

print("Shape of train dataset before formatting : ",train_dataset.shape)
print("Shape of test  dataset before formatting : ",test_dataset.shape)
print("Shape of train labels  before formatting : ",train_labels.shape)
print("Shape of test  labels  before formatting : ",test_labels.shape)

num_labels = 10
def reformat(dataset,labels):
    
    
    # Nomalizing the picture values
    dataset = (dataset/256) - 0.5
    
    # Rearraning dataset to match placeholder shape
    dataset = np.array([dataset[:,:,:,x] for x in range(dataset.shape[3])])
    
    # one hot encoding
    labels[labels == 10] = 0
    
    # Reshaping labels to 1 dimentional array
    y = []
    for x in range(len(labels)):
        y.append(labels[x][0])
    labels = np.array(y)
    
    # One hot encoding
    labels = (np.arange(num_labels) == labels[:,None])
    
    return dataset,labels
train_dataset,train_labels = reformat(train_dataset,train_labels)
test_dataset ,test_labels  = reformat(test_dataset ,test_labels)

print("Shape of train dataset after formatting : ",train_dataset.shape)
print("Shape of test  dataset after formatting : ",test_dataset.shape)
print("Shape of train labels  after formatting : ",train_labels.shape)
print("Shape of test  labels  after formatting : ",test_labels.shape)

Shape of train dataset before formatting :  (32, 32, 3, 73257)
Shape of test  dataset before formatting :  (32, 32, 3, 26032)
Shape of train labels  before formatting :  (73257, 1)
Shape of test  labels  before formatting :  (26032, 1)
Shape of train dataset after formatting :  (73257, 32, 32, 3)
Shape of test  dataset after formatting :  (26032, 32, 32, 3)
Shape of train labels  after formatting :  (73257, 10)
Shape of test  labels  after formatting :  (26032, 10)


In [12]:
# Reformating data so that it's suitable for the tensorflow format(it expects iamge data to be 3d(cube shape))
image_size = 32
num_labels = 11 
num_channels = 1 # grayscale

In [13]:
# Calculating the accuracy of the model
def accuracy(predictions, labels):
    # getting the percent of how many predictions were correct
    return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))/ predictions.shape[0])

### Question 4
_Describe how you set up the training and testing data for your model. How does the model perform on a realistic dataset?_

**Answer:**

### Question 5
_What changes did you have to make, if any, to achieve "good" results? Were there any options you explored that made the results worse?_

**Answer:**

### Question 6
_What were your initial and final results with testing on a realistic dataset? Do you believe your model is doing a good enough job at classifying numbers correctly?_

**Answer:**

----
## Step 3: Test a Model on Newly-Captured Images

Take several pictures of numbers that you find around you (at least five), and run them through your classifier on your computer to produce example results. Alternatively (optionally), you can try using OpenCV / SimpleCV / Pygame to capture live images from a webcam and run those through your classifier.

### Implementation
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.

In [14]:
### Your code implementation goes here.
### Feel free to use as many code cells as needed.



### Question 7
_Choose five candidate images of numbers you took from around you and provide them in the report. Are there any particular qualities of the image(s) that might make classification difficult?_

**Answer:**

### Question 8
_Is your model able to perform equally well on captured pictures or a live camera stream when compared to testing on the realistic dataset?_

**Answer:**

### Optional: Question 9
_If necessary, provide documentation for how an interface was built for your model to load and classify newly-acquired images._

**Answer:** Leave blank if you did not complete this part.

----
### Step 4: Explore an Improvement for a Model

There are many things you can do once you have the basic classifier in place. One example would be to also localize where the numbers are on the image. The SVHN dataset provides bounding boxes that you can tune to train a localizer. Train a regression loss to the coordinates of the bounding box, and then test it. 

### Implementation
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.

In [15]:


### Your code implementation goes here.
### Feel free to use as many code cells as needed.



### Question 10
_How well does your model localize numbers on the testing set from the realistic dataset? Do your classification results change at all with localization included?_

**Answer:**

### Question 11
_Test the localization function on the images you captured in **Step 3**. Does the model accurately calculate a bounding box for the numbers in the images you found? If you did not use a graphical interface, you may need to investigate the bounding boxes by hand._ Provide an example of the localization created on a captured image.

**Answer:**

----
## Optional Step 5: Build an Application or Program for a Model
Take your project one step further. If you're interested, look to build an Android application or even a more robust Python program that can interface with input images and display the classified numbers and even the bounding boxes. You can for example try to build an augmented reality app by overlaying your answer on the image like the [Word Lens](https://en.wikipedia.org/wiki/Word_Lens) app does.

Loading a TensorFlow model into a camera app on Android is demonstrated in the [TensorFlow Android demo app](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android), which you can simply modify.

If you decide to explore this optional route, be sure to document your interface and implementation, along with significant results you find. You can see the additional rubric items that you could be evaluated on by [following this link](https://review.udacity.com/#!/rubrics/413/view).

### Optional Implementation
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.

In [16]:


### Your optional code implementation goes here.
### Feel free to use as many code cells as needed.



### Documentation
Provide additional documentation sufficient for detailing the implementation of the Android application or Python program for visualizing the classification of numbers in images. It should be clear how the program or application works. Demonstrations should be provided. 

_Write your documentation here._

> **Note**: Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an HTML document. You can do this by using the menu above and navigating to  
**File -> Download as -> HTML (.html)**. Include the finished document along with this notebook as your submission.