### The Data
Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive

The training data set, (train.csv), has 785 columns. The first column, called "label", is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image.

In [1]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from keras.utils.np_utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout, Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from sklearn.model_selection import train_test_split
from keras.preprocessing.image import ImageDataGenerator

import matplotlib.pyplot as plt



Using TensorFlow backend.
  return f(*args, **kwds)


## Load in test and training data/ Preprocessing

In [2]:
train = pd.read_csv("../input/train1/train.csv")

In [3]:
test = pd.read_csv("../input/digit-recognizer/test.csv")

In [4]:
train.head()

Unnamed: 0,label,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [5]:
a = train.label
train_labels = a.to_frame()

#important to have pixels np arrays to use "reshape" 
train_pixels = train.drop('label', 1).values
test_pixels = test.values

In [6]:
#reshape to input into cnn as [w][h[d]
train_pixels = train_pixels.reshape(train_pixels.shape[0], 28, 28,1).astype('float')
test_pixels = test_pixels.reshape(test_pixels.shape[0], 28, 28,1).astype('float32')

In [7]:
train_pixels.shape

(42000, 28, 28, 1)

## Preprocessing 

### standardize? 

In [8]:
train_pixels = train_pixels/255

### one-hot encode labels 

In [9]:
train_labels.shape

(42000, 1)

In [10]:
train_labels = to_categorical(train_labels)
num_classes = train_labels.shape[1]

train_labels

array([[ 0.,  1.,  0., ...,  0.,  0.,  0.],
       [ 1.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  1.,  0., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  1.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  1.]])

In [11]:
train_labels.shape

(42000, 10)

## Design Neural Network

Karas Sequential model = linear stack of layers 

In [12]:
model = Sequential()

#add convolutional layer
#32 kernals per conv layer, size of the kernals is 5x5
#input_shape [width][height][depth]
model.add(Conv2D(filters = 32,kernel_size = (5, 5), input_shape=(28, 28, 1), activation='relu'))

#add pooling layer
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.3))

#adding 2nd Conv layer!
model.add(Conv2D(32, (3, 3),activation = 'relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.3))

#3rd conv
model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', 
                 activation ='relu'))

#add dropout layer, excludes 20% of neurons to avoid overfiting
model.add(Dropout(0.3))

#converts 2d matrix to vector... allows the output to be processed by standard fully connected layers.
model.add(Flatten())

#adds a fully connected layer with 256 neurons
model.add(Dense(256, activation = "relu"))
model.add(Dropout(0.4))

model.add(Dense(128, activation='relu'))

model.add(Dense(num_classes, activation='softmax'))



## Compile NN

In [13]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])


In [14]:
train_labels.shape


(42000, 10)

## Set Optimizer

## Fit Model

In [15]:
model.fit(train_pixels, train_labels, epochs=10, batch_size=200, verbose=2)


Epoch 1/10
 - 47s - loss: 0.5116 - acc: 0.8303
Epoch 2/10
 - 46s - loss: 0.1341 - acc: 0.9586
Epoch 3/10
 - 45s - loss: 0.0983 - acc: 0.9698
Epoch 4/10
 - 45s - loss: 0.0783 - acc: 0.9754
Epoch 5/10
 - 46s - loss: 0.0697 - acc: 0.9785
Epoch 6/10
 - 43s - loss: 0.0608 - acc: 0.9804
Epoch 7/10
 - 44s - loss: 0.0584 - acc: 0.9820
Epoch 8/10
 - 46s - loss: 0.0531 - acc: 0.9833
Epoch 9/10
 - 45s - loss: 0.0490 - acc: 0.9850
Epoch 10/10
 - 45s - loss: 0.0475 - acc: 0.9847


<keras.callbacks.History at 0x7fb5077e3cf8>

In [16]:
predictions = model.predict_classes(test_pixels)


In [17]:
submissions=pd.DataFrame({"ImageId": list(range(1,len(predictions)+1)),
                         "Label": predictions})


In [18]:
submissions.head()


Unnamed: 0,ImageId,Label
0,1,2
1,2,0
2,3,9
3,4,9
4,5,3


In [19]:
submissions.to_csv("result.csv", index=False, header=True)


## Data Augmentation

In order to avoid overfitting problem, we need to expand artificially our handwritten digit dataset. We can make your existing dataset even larger. The idea is to alter the training data with small transformations to reproduce the variations occuring when someone is writing a digit.

Approaches that alter the training data in ways that change the array representation while keeping the label the same are known as data augmentation techniques. Some popular augmentations people use are grayscales, horizontal flips, vertical flips, random crops, color jitters, translations, rotations, and much more.



In [20]:
datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=10,  # randomly rotate images in the range (degrees, 0 to 180)
        zoom_range = 0.1, # Randomly zoom image 
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=False,  # randomly flip images
        vertical_flip=False)  # randomly flip images


datagen.fit(train_pixels)

In [21]:
model.fit_generator(datagen.flow(train_pixels,train_labels, batch_size=200),epochs =10, verbose=2)

Epoch 1/10
 - 44s - loss: 0.1895 - acc: 0.9416
Epoch 2/10
 - 43s - loss: 0.1275 - acc: 0.9605
Epoch 3/10
 - 43s - loss: 0.1170 - acc: 0.9647
Epoch 4/10
 - 44s - loss: 0.1058 - acc: 0.9675
Epoch 5/10
 - 42s - loss: 0.0983 - acc: 0.9699
Epoch 6/10
 - 40s - loss: 0.0929 - acc: 0.9710
Epoch 7/10
 - 41s - loss: 0.0872 - acc: 0.9735
Epoch 8/10
 - 41s - loss: 0.0853 - acc: 0.9739
Epoch 9/10
 - 40s - loss: 0.0848 - acc: 0.9737
Epoch 10/10
 - 41s - loss: 0.0811 - acc: 0.9752


<keras.callbacks.History at 0x7fb5053b44e0>

In [22]:
predictions = model.predict_classes(test_pixels)
submissions=pd.DataFrame({"ImageId": list(range(1,len(predictions)+1)),
                         "Label": predictions})
submissions.to_csv("result.csv", index=False, header=True)


# NOTES

-Finally got model to run. achieved 0.78 accuracy

-will see how much accuracy improves after converting input to float
achieved 0.79 acc

-will see how much accuracy improves after standardizing input to 0-1 range
jumped to 0.99 accuracy
kaggle score of 97%

-fixed standardization from 225 to 255
only 93% accuracy?

-dont think i changed anything but acc is up to 99.4%
maybe this is because I am not using a random seed
kaggle score of 8%???
kaggle score of 98 now....

-adding one more conv layer
kaggle slight increase

-adding dropout after each conv layer
another slight increase
kaggle is now at 99.1%

-adding another fully connected layer(dense 256)
kaggle score does not change



-set epochs to 30?
lowered kaggle score

-add augmented data
lowered kaggle to 96


addint more layers really increased run time
30sec to 1 min to 1.2 min per epoch

