# Kaggle Competition: Digit Recognizer 

[Digit Recognizer Competition](https://www.kaggle.com/c/digit-recognizer)

> MNIST ("Modified National Institute of Standards and Technology") is the de facto “hello world” dataset of computer vision. Since its release in 1999, this classic dataset of handwritten images has served as the basis for > benchmarking classification algorithms. As new machine learning techniques emerge, MNIST remains a reliable resource for researchers and learners alike.

> In this competition, your goal is to correctly identify digits from a dataset of tens of thousands of handwritten images. We’ve curated a set of tutorial-style kernels which cover everything from regression to neural networks. We encourage you to experiment with different algorithms to learn first-hand what works well and how techniques compare.

## Model Hyperparameters

In [1]:
batch_size = 32
num_classes = 10
epochs = 12
img_rows, img_columns = 28, 28
input_shape = (img_rows, img_columns, 1)

## Load Data

Download the train and test datasets using Kaggle API:

```
$ kaggle competitions download digit-recognizer 
```

In [2]:
import pandas as pd
import numpy as np

In [3]:
train_dataframe = pd.read_csv('train.csv', sep=',')
train_data = train_dataframe.values

test_dataframe = pd.read_csv('test.csv', sep=',')
test_data = test_dataframe.values

In [4]:
# Splits train and validation sets
from sklearn.model_selection import train_test_split
X_train, y_train = train_data[:, 1:], train_data[:, 0]
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

In [5]:
X_test = test_data

## Prepare Data

In [6]:
# Reshapes training and validation data to a third-order degree
# Since MNIST is composed of grayscale images, just one channel is needed
X_train = X_train.reshape(X_train.shape[0], img_rows, img_columns, 1)
X_val = X_val.reshape(X_val.shape[0], img_rows, img_columns, 1)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_columns, 1)

In [7]:
# Ensures arrays are float32
X_train = X_train.astype('float32')
X_val = X_val.astype('float32')
X_test = X_test.astype('float32')

In [8]:
# Ensures data have zero-mean
X_train = X_train / 255
X_val = X_val / 255
X_test = X_test / 255

In [9]:
# Checks arrays dimensions
print(X_train.shape)
print(X_val.shape)
print(X_test.shape)
print(y_train.shape)
print(y_val.shape)

(33600, 28, 28, 1)
(8400, 28, 28, 1)
(28000, 28, 28, 1)
(33600,)
(8400,)


In [10]:
import keras
# One-hot encodes output to get multiclass classification using softmax 
y_train = keras.utils.to_categorical(y_train, num_classes)
y_val = keras.utils.to_categorical(y_val, num_classes)

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


## Create Model

In [11]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D

In [12]:
model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), activation='relu', input_shape=input_shape))
model.add(Conv2D(64, kernel_size=(3,3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(rate=0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(rate=0.5))
model.add(Dense(num_classes, activation='softmax'))

In [13]:
model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(), metrics=['accuracy'])

In [14]:
model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1)

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x7f81094199e8>

## Evaluates model on validation set

In [15]:
score = model.evaluate(X_val, y_val, verbose=1)
print('Validation Loss:', score[0])
print('Validation accuracy:', score[1])

Validation Loss: 0.045616205896839714
Validation accuracy: 0.9891666666666666


## Evaluates model on test set

In [16]:
predictions = model.predict(X_test)

In [17]:
labels = [np.argmax(predictions[i]) for i in range(predictions.shape[0])]
image_ids = range(1, len(labels) + 1)

## Saves submission

In [18]:
df = pd.DataFrame({'ImageId': image_ids, 'Label': labels})

In [20]:
df.to_csv('submission.csv', encoding='utf-8', index=False)

Submit the results using Kaggle API:

```
$ kaggle competitions submit -f submission.csv -m 'Recognizing digits with Keras and Tensorflow' digit-recognizer
```