<h1 style="text-align:center">MNIST Digit Recognition</h1>

<div style="text-align:center;"><img src="https://upload.wikimedia.org/wikipedia/commons/2/27/MnistExamples.png" /></div>

**Context:** 
> In this competition, your goal is to correctly identify digits from a dataset of tens of thousands of handwritten images.

**About the Data:**
> Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.

# Imports

In [None]:
# Data Processing
import numpy as np 
import pandas as pd 

# Data Visualization
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(style='whitegrid')

# Modeling
from sklearn.model_selection import train_test_split

from sklearn.linear_model import SGDClassifier
from sklearn.ensemble import RandomForestClassifier

from keras.utils.np_utils import to_categorical

# Basic Data Analysis

In [None]:
df_train = pd.read_csv('/kaggle/input/digit-recognizer/train.csv')
df_test = pd.read_csv('/kaggle/input/digit-recognizer/test.csv')

In [None]:
df_train.head()

In [None]:
df_test.head()

In [None]:
df_train.shape

In [None]:
df_test.shape

### NaN values

Let's check if we have NaN values in our dataframe.

In [None]:
df_train.isna().values.any()

There are no NaN values. That's Great!

### Target Value: label

Let's take a look at the distribution of the target value.

In [None]:
b = sns.countplot(x='label', data=df_train)
b.set_title("label distribution", fontsize=15)
b.set_xlabel("label", fontsize=15)
b.set_ylabel("Count", fontsize=15);

As we can see, the labels are distributed relatively even.

# Modeling

In [None]:
X = df_train.drop(['label'], 1).values
y = df_train['label'].values

test_x = df_test.values

In [None]:
# Greyscale normalization
X = X / 255.0
test_x = test_x / 255.0

In [None]:
# Reshape the data
X = X.reshape(-1,28,28,1)
test_x = test_x.reshape(-1,28,28,1)

y = to_categorical(y)

In [None]:
np.random.seed(42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)

In [None]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Conv2D, Flatten, BatchNormalization
#create model
model = Sequential()
#add model layers
'''
model.add(Conv2D(64, kernel_size= (3,3), activation='relu', input_shape=(28,28,1)))
model.add(Conv2D(32, kernel_size= (3,3), activation='relu'))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))
'''

model.add(Conv2D(32, kernel_size = 3, activation='relu', input_shape = (28, 28, 1)))
model.add(BatchNormalization())
model.add(Conv2D(32, kernel_size = 3, activation='relu'))
model.add(BatchNormalization())
model.add(Conv2D(32, kernel_size = 5, strides=2, padding='same', activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))


model.add(Conv2D(32, kernel_size = 3, activation='relu', input_shape = (28, 28, 1)))
model.add(BatchNormalization())
model.add(Conv2D(32, kernel_size = 3, activation='relu'))
model.add(BatchNormalization())
model.add(Conv2D(32, kernel_size = 5, strides=2, padding='same', activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))


model.add(Conv2D(128, kernel_size = 4, activation='relu'))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

In [None]:
#compile model using accuracy to measure model performance
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [None]:
#train the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=14)

In [None]:
y_pred = model.predict_classes(test_x, verbose=1)

In [None]:
y_pred

In [None]:
sub = pd.read_csv('/kaggle/input/digit-recognizer/sample_submission.csv')

sub['Label'] = y_pred
sub.to_csv("results_mnist_2.csv", index=False)
sub.head()

# Work in Progress

**If you liked this notebook or found it helpful in any way, feel free to leave an upvote - That will keep me motivated :)**