### `__future__`
is a real module, and serves three purposes:
- To avoid confusing existing tools that analyze import statements and expect to find the modules they’re importing. (ref 'https://docs.python.org/3/library/__future__.html')

### `Keras`
Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow. It was developed with a focus on enabling fast experimentation. (ref 'https://keras.io')

#### Sequential
-The sequential API allows you to create models layer-by-layer for most problems. It is limited in that it does not allow you to create models that share layers or have multiple inputs or outputs. (ref 'https://machinelearningmastery.com/keras-functional-api-deep-learning/')

#### Dense
-Dense is a name for a Fully connected / linear layer in keras. (ref 'https://forums.fast.ai/t/dense-vs-convolutional-vs-fully-connected-layers/191')

#### Dropout
- Dropout is a technique where randomly selected neurons are ignored during training. (ref 'https://machinelearningmastery.com/dropout-regularization-deep-learning-models-keras/')

#### Flatten
- The purpose of this argument is to preserve weight ordering when switching a model from one data format to another. (ref 'https://keras.io/layers/core/')

#### Conv2D
- This layer creates a combination kernel that is combined with the layer input to produce a tensor of outputs. If 'use_bias' is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well. (ref 'https://keras.io/layers/convolutional/')

#### MaxPooling2D
- Max pooling operation for spatial data. (ref 'https://keras.io/layers/pooling/#maxpooling2d') 

#### Backend (Tensorflow)
- keras relies on a specialized, well optimized tensor manipulation library to do so, serving as the "backend engine" of Keras. (ref 'https://keras.io/backend/') 


### `MNIST`
The MNIST database of handwritten digits, available from keras, has a training set of 60,000 examples, and a test set of 10,000 examples. The digits have been size-normalized and centered in a fixed-size image. (ref 'http://yann.lecun.com/exdb/mnist/')


In [10]:
# Refference "https://www.pytorials.com/deploy-keras-model-to-production-using-flask/"

from __future__ import print_function
#simplified interface for building models 
import keras
#our handwritten character labeled dataset (28x28 images of numbers 0-9)
from keras.datasets import mnist
#because our models are simple
from keras.models import Sequential
#dense means fully connected layers, dropout is a technique to improve convergence, flatten to reshape our matrices for feeding
#into respective layers
from keras.layers import Dense, Dropout, Flatten
#for convolution (images) and pooling is a technique to help choose the most relevant features in an image
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

### Gradient Descent 
In machine learning, we use gradient descent to update the parameters of our model. (ref 'https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html')

### Epochs
One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE.
Since one epoch is too big to feed to the computer at once we divide it in several smaller batches. (ref 'https://towardsdatascience.com/epoch-vs-iterations-vs-batch-size-4dfb9c7ce9c9')

### Batch Size
Total number of training examples present in a single batch.
Note: Batch size and number of batches are two different things.
But What is a Batch?
As I said, you can’t pass the entire dataset into the neural net at once. So, you divide dataset into Number of Batches or sets or parts. (ref 'https://towardsdatascience.com/epoch-vs-iterations-vs-batch-size-4dfb9c7ce9c9')


In [11]:
#mini batch gradient descent
batch_size = 128
#10 difference characters 0-9
num_classes = 10
#very short training time

# Epoch: an arbitrary cutoff, generally defined as "one pass over the entire dataset", 
# used to separate training into distinct phases, which is useful for logging and periodic evaluation. 
# When using evaluation_data or evaluation_split with the fit method of Keras models, 
# evaluation will be run at the end of every epoch
# (ref http://faroit.com/keras-docs/2.0.2/getting-started/faq/)
epochs = 12

# input image dimensions
# 28x28 pixel images. 
img_rows, img_cols = 28, 28

In [12]:
# Refference "https://www.pytorials.com/deploy-keras-model-to-production-using-flask/"

# the data downloaded, shuffled and split between train and test sets (imported and formated)
# image data unloaded from mnist into the variables on the left

# The MNIST database contains 60,000 training images and 10,000 testing images taken from 
# American Census Bureau employees and American high school students
# (ref https://towardsdatascience.com/image-classification-in-10-minutes-with-mnist-dataset-54c35b77a38d)

# Therefore, in the second line, I have separated these two groups as train and test and 
# also separated the labels and the images. x_train and x_test parts contain greyscale RGB codes (from 0 to 255) 
# while y_train and y_test parts contains labels from 0 to 9 which represents which number they actually are
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# made trainig slower so commented out
# Normalization is a rescaling of the data from the original range so that all values are within the range of 0 and 1.
#x_train = keras.utils.normalize(x_train, axis = 1)
#x_test = keras.utils.normalize(x_test, axis = 1)

#this assumes our data format
#For 3D data, "channels_last" assumes (conv_dim1, conv_dim2, conv_dim3, channels) while 
#"channels_first" assumes (channels, conv_dim1, conv_dim2, conv_dim3).
if K.image_data_format() == 'channels_first':
    
    # a full-color image with all 3 RGB channels will have a depth of 3.
    # Our MNIST images only have a depth of 1, but we must explicitly declare that.
    # In other words, we want to transform our dataset from having shape (n, width, height) to (n, depth, width, height)
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

In [13]:
#more reshaping
x_train = x_train.astype('float32') # Converts to float
x_test = x_test.astype('float32')
x_train /= 255 # rescaling of the data from the original range so that all values are within the range of 0 and 1.
x_test /= 255 # (Normalizing)
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


In [14]:
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)


In [15]:
# The Sequential model is a linear stack of layers.
# You can create a Sequential model by passing a list of layer instances to the constructor
model = Sequential()
# You can also simply add layers via the .add() method using Sequential
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))


In [16]:
#again
model.add(Conv2D(64, (3, 3), activation='relu'))
#choose the best features via pooling
model.add(MaxPooling2D(pool_size=(2, 2)))
#randomly turn neurons on and off to improve convergence
model.add(Dropout(0.25))

#flatten since too many dimensions, we only want a classification output
# Flattening a tensor means to remove all of the dimensions except for one
# condensing to a one dimentional array
model.add(Flatten())

#fully connected to get all relevant data
model.add(Dense(128, activation='relu'))
#one more dropout for convergence' sake :) 
model.add(Dropout(0.5))
#output a softmax to squash the matrix into output probabilities
model.add(Dense(num_classes, activation='softmax'))
#Adaptive learning rate (adaDelta) is a popular form of gradient descent rivaled only by adam and adagrad
#categorical ce since we have multiple classes (10) 
model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])


In [17]:
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1, # verbose = 1, will show you an animated progress bar
          validation_data=(x_test, y_test))
 # prints outs for loss and accuracy
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Train on 60000 samples, validate on 10000 samples
Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12
Test loss: 0.023570960051386647
Test accuracy: 0.9926


In [18]:
#Save the model
# serialize model to JSON
model_json = model.to_json()
with open("model.json", "w") as json_file:
    json_file.write(model_json)
# serialize weights to HDF5
model.save_weights("model.h5")
print("Saved model to disk")

Saved model to disk
