In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

### Let's classify some numbers

First thing first, upload data into two dataframes. Run code below.

In [1]:
test = pd.read_csv('/kaggle/input/digit-recognizer/test.csv')
train = pd.read_csv('/kaggle/input/digit-recognizer/train.csv')
w = pd.read_csv('/kaggle/input/digit-recognizer/sample_submission.csv')

Let's check the data out, see what we are working with.

In [1]:
#Let's look at test data first. This is what we will use to test our model.

test.head() #We have 5 rows and 784 columns. The column 'label' appears to be missing.

In [1]:
train.head() # Train appears to have the column label

In [1]:
test.shape # It appears we have 28,000 rows of data and 784 columns. For test data.

In [1]:
train.shape # We have 42,000 and 785 columns for trian dataframe

Interesting, the 'test' data has only 784 columns, while the 'train' data has 785 columns. Let's compare 'test.head()' to 'train.head()'

In [1]:
test.head()

In [1]:
train.head()

Okay, so it appears 'test' data is missing the column 'label'. The code below appears to show that the dataframe 'w' may hold the column 'label'.

In [1]:
w.head()

In [1]:
w.shape

Interesting, so the number of rows (observations) in dataframe 'w' equals the number of rows (observations) in 'test' dataframe. Let's take a look at the two columns in dataframe 'w' "ImageId" and "Label" and see what values they take. We will use the .unique() method.

In [1]:
w['ImageId'].unique()

In [1]:
w['Label'].unique()

Interesting, so the column 'Label' in dataframe (df) 'w' only have values of 0. Something weird is going on. So first, let's build our model from the 'train' data set. First thing first, we will want to split our 'train' dataset into a training set and a test set.

* Note, since the original 'test' df does not appear to have the column 'label', at this point we will use the 'train' data set to build our model.

In [1]:
import numpy as np
from sklearn.model_selection import train_test_split

X = train.drop(columns=['label'])
y = train['label']


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=40)

First, we split our 'train' df into two differnt df's. Dataframe (df) 'X' is a df containing of all columns except 'label'. DF 'y' is a df consisting of only column 'label'.

We then split our 'X' and 'y' into 'X_train', 'X_test', 'y_train', and 'y_test'. We will use 'X_train' and 'y_train' df's to train our model. We will use 'X_test' and 'y_test' to test our model (aka see how well the model does). 

Below we will produce the first 5 rows of each dataframe to make sure the X_train, X_test, y_train, and y_test worked.

In [1]:
X_train.head()

In [1]:
X_train.shape

In [1]:
y_train.shape

In [1]:
X_test.head()

In [1]:
y_train.head()

In [1]:
y_train.unique()

In [1]:
y_test.head()

In [1]:
y_test.unique()

In [1]:
X_test.shape

In [1]:
y_test.shape

So it appears the code worked. Next, that's plot a barchart of 'y_test' and 'y_train'. Make sure we have a even representation of digits in both dfs.

In [1]:
spread_of_y_test = y_test.value_counts(ascending=True)

spread_of_y_test = dict(spread_of_y_test)
print(spread_of_y_test)



In [1]:
spread_of_y_train = y_train.value_counts(ascending=True)

spread_of_y_train = dict(spread_of_y_train)
print(spread_of_y_train)


In [1]:
import matplotlib.pyplot as plt

plt.bar(spread_of_y_test.keys(), spread_of_y_test.values(), color='g')
('Spread of digitis 0 - 9 for Test Data')

In [1]:
plt.bar(spread_of_y_train.keys(), spread_of_y_train.values(), color='b')
plt.title('Spread of digitis 0 - 9 for Train Data')

From the two, above bar graphs, it does appear that the spread of the digits 0-9 are consistent between 'y_test' and 'y_train'.

Ok cool, so now we have our data that we are going to use to "train" our model ('X_trian' and 'y_train') and we have our data that we are going to use to "test" our model ('X_test' and 'y_test').

So what are we actualy trying to do in this problem set and with this code? Well, we should remember that this is a 'multi-classification' problem. In short, we want to be able to use the columns in X_train df, (and respectively the X_test df) to predict if a digit is a '0', '1', '2', '3', '4' , '5', '6', '7', '8', or '9'. As you can seem this is multi-classification.

Below, let's get a better sense of what we are trying to do. I have displayed two images below. (One image is from 'training' dataset, the other from the 'testing' dataset.

You can see, once all the pixels are combined together, we form an image that looks a digit between 0 -9.

In [1]:
X_train_2d = X_train.values.reshape(-1,28,28)

plt.imshow(X_train_2d[25]) # looks like the image below is the number three

print('When combining all the pixels together, it looks like we produce the number 3')


print('The corresponding value in y_train is: ' + str(y_train[25]))


Alright cool, let's see if the same holds true for the 'test' data set. Note I will be using the same exact code, but this time just pulling from the 'test' data set, instead of the 'train' data set.

In [1]:
X_test_2d = X_test.values.reshape(-1,28,28) #Note my code changed from 'X_train' to 'X_test'

plt.imshow(X_test_2d[24]) # looks like the image below is the number three

print('When combining all the pixels together, it looks like we produce the number 9')


list_y_test = list(y_test)

print('The corresponding value in y_test is: ' + str(list_y_test[24]))

Great, so now that we can visulize what we are trying to do, what type of model should we use to train digit recognition? 

Let's remind ourselves on what we are trying to do:

1) We have y_train & X_train dataframes, which we will use to train our model.

2) We have y_test & X_test dataframes, which we will use to test out model.

3) We have 784 columns (features) of pixel data, when combined together, form a solid image of an integer (0-9).

4) We want to train a moodel on these 784 columns to predict the image.

5) Remember, this is a multiclassification problem, not binary. 

### Convolutional Neural Networks - CNN's

The basic premise behind Convolutional Neural Networks (CNN), is the following. A CNN consists of several layers, like floors of an office building.  Each layer of a CNN consists of neurons, which will 'fire' (aka produce a certain output) if a particular threshold is met. Another way to think about it, say you are in the office building and in an elevator delivering a package, you can't remember what floor you are suppose to get off at in this 100 story building, all you do remeber is that the floor number, where you are suppose to get off at, started with a '2'. Having just this information (input) you would skip floors '10', '17', '33', etc, since they do not start with a '2'. In short, each floor can be seen as a neuron, and you, being a time constraint individual, will only get off at floors which start with a '2'. So floors (neurons) '22', '23' will fire, neurons (floors) '33', '55' will not fire. 

So, as one individual, in our model you would consist of one layer, with '100' possible neurons that could or could not fire. So, wanting to increase your accuracy and speed at which you can find the correct floors to deliver the packages, you invite a friend to help. This friend would add another layer, to our model. But before your friend starts to explore the '100' floor office building, he/she waits for radio communication from you on what you have learned. 

This is what makes the model 'sequential'. Each person (layer) utilizes there neurons to deliver input (information) to the next layer of neurons (individual).

# Model 1

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
import keras
import matplotlib.pyplot as plt

tf.keras.backend.clear_session()

input_shape = []
input_shape.append(len(X_train.columns))

#Configure the model
model_1 = tf.keras.models.Sequential([tf.keras.layers.Flatten(), 
                                    tf.keras.layers.Dense(128, activation=tf.nn.relu, input_shape = input_shape),
                                    tf.keras.layers.Dense(64, activation=tf.nn.relu), 
                                    tf.keras.layers.Dense(32, activation=tf.nn.relu), 
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)]) #Softmax is the function we use for multi-classification


model_1.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
model_1.fit(X_train,y_train, epochs=10)


#Remember now we need to test the model

print(" ")
print('As you can see, in our simple first model, accuracy increases with each epoch')

#Here are some important parameters for Sequential models
#1 -- 'input_shape' is how many columns (features) of information our first layer can expect. Since we are using pixels to create images of digits we will use all
# 784 columns (features) of data. That is why you see the code 'input_shape = len(X_train.columns)'.'

As we can see above, with each epoch, which basically means iteration, our model improved in correctly identifying an image of a digit. Our simple model was able to increase all the way to 97% accuracy! Crazy. However, remember, this is only our 'training' data, we now need to see how the model performs on data it has never seen before, aka the 'test' data.

In [1]:
model_1.evaluate(X_test, y_test)

Okay, so on the test dataset, we were able to predict the correct image ~ 95% of the time. Not bad, but defintely lower then ~97% from our training data.

More importantly, when you look at the two visuals below, it appears our model is...

In [1]:
history = model_1.fit(X_train, y_train,
                    epochs=10,
                    validation_data=(X_test, y_test))


# Plot training & validation accuracy values
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()

# Plot training & validation loss values
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()