Simple Neural Network notebook using:
-------------------------------------

- TFLearn
- Feature Standardization
- Labels Hot Encoding
- Softmax
- Stochastic Gradient Descent
- Cross Entropy

In [None]:
import numpy as np
import pandas as pd
import tflearn
import tensorflow as tf
# Remove regular python warnings
import warnings
warnings.filterwarnings('ignore')
# Remove TensorFlow warnings
tf.logging.set_verbosity(tf.logging.ERROR)
# Visualizations
from IPython.display import display, Math, Latex
import matplotlib.pyplot as plt
%matplotlib inline

# Data Load

I do the following:

- Load train.csv into data
- Load test.csv into test

In [None]:
data=pd.read_csv("../input/train.csv")
test=pd.read_csv("../input/test.csv")

- Since labels are the first column of train, I separate it in **train** and **labels**.
- Concatenate **train** and **test** so data standardization is done just once.

In [None]:
train=data.ix[:,1:]
labels=data.ix[:,0:1]
data = pd.concat([train,test],ignore_index=True)

- Print shapes of each dataframe so we don't make mistakes when fitting the Neural Network.

In [None]:
print(train.shape)
print(test.shape)
print(data.shape)
print(labels.shape)

# Data Standardization

Pixel values goes from 255 to 0. To speedup the training process we can standardizate the data. The changes to apply:

 - Zero mean.
 - Low variance. 

Some have stdev 0, so the result will be NaN. We change those to 0.

$$x'=\frac{x-\bar{x}}{\sigma}$$

In [None]:
norm_data = (data - data.mean()) / (data.std())
norm_data = norm_data.fillna(0)

Labels Hot-Encoding
----------------------

Labels have the following values:

In [None]:
labels[0:5]

We need to convert it to a binary array. So, if 0 is the label the array would be:

[1,0,0,0,0,0,0,0,0,0]

For 1:

[0,1,0,0,0,0,0,0,0,0]

and so on...

To do that, I am going to **hot-encode** the dataframe and store the result in a numpy array.

In [None]:
norm_labels=[]
for value in labels.iterrows():
    new_label=np.zeros(10)
    new_label[value[1]]=1
    norm_labels.append(new_label)
norm_labels=np.array(norm_labels)

In [None]:
print(labels.ix[12:12,0:1])
print(norm_labels[12])

# Preparing the data for TensorFlow

The data is separated again in two variables, train and test.

In [None]:
train = norm_data.as_matrix()[0:42000]
test = norm_data.as_matrix()[42000:]

Tensorflow doesn't read Panda's Dataframes, so it is necessary to convert them to numpy arrays. This will avoid the following error:

**IndexError: indices are out-of-bounds**

# Neural Network

It is going to have the following characteristics:

- An input layer.
- A hidden layer with ReLu activation.
- An output layer using softmax.
- Backpropagation using Stochastic Gradient Descent.
- Cross-entropy with labels.

First it is needed to clean Tensorflow's graph, so it doesn't show erros when we try to create the model again.

In [None]:
tf.reset_default_graph()

Input Layer
-----------

With 784 neurons, the number of features we have. The images are 28x28 pixels, so 784.

In [None]:
net = tflearn.input_data(shape=[None, 784])

Hidden layer
------------

Using ReLu (Rectifier Neural Network).

$$f(x)=max(0,x)$$

It is a very simple and fast function. If X value is greater than 0, it returns that value. If it is smaller or equal to 0 it returns 0. It is important to configure the correct learning_rate (usually a low value). If a neuron gets to 0, it will die and will be useless during the training process.

In [None]:
x=np.arange(-10,10,1)
y=np.maximum(x, 0)
plt.plot(x,y)
plt.xlim(-10,10)
plt.show()

In [None]:
net = tflearn.fully_connected(net, 128, activation='ReLu')
# add a second hidden layer
net = tflearn.fully_connected(net, 64, activation='ReLu')
# third layer, better going deeper than wider
net = tflearn.fully_connected(net, 32, activation='ReLu')

Output Layer
------------

It has 10 neurons, one for each possible number. It uses softmax as the activation function. Softmax is a probability distribution function. It highlights the largest value and suppress values which are significantly below the maximum one.

$$f(v_i) = \displaystyle\frac{e^{v_i}}{\displaystyle\sum_{j} e^{v_j}}$$

So, with the following input:

In [None]:
i=np.array([1,2,3,4,1,2,3,7])

We get the following output after applying softmax:

In [None]:
o=np.exp(i)/np.sum(np.exp(i))
o

And since softmax is a probability distribution, it sums up to 1.

In [None]:
int(np.sum(o))

In [None]:
net = tflearn.fully_connected(net, 10, activation='softmax')

Regression with Gradient Descent
--------------------------------

In [None]:
net = tflearn.regression(net, optimizer='sgd', learning_rate=0.01, loss='categorical_crossentropy')
model = tflearn.DNN(net)

In [None]:
model.fit(train, norm_labels,show_metric=True,validation_set=0.1,batch_size=100, n_epoch=50)

Predictions
-------------------------------------

Let's predict ten first numbers and show the actual image:

In [None]:
for i in range(3):
    ran=np.random.randint(0,test.shape[0])
    pred=model.predict(test)[ran]
    pred_digit=pred.index(max(pred))
    digit=test[ran].reshape(28,28)
    plt.imshow(digit, cmap='gray_r')
    plt.text(1, -1,"PREDICTION: {}".format(pred_digit),fontsize=20) 
    plt.show()

# Predict test's labels

In [None]:
ids=[]
predictions=[]
pred=model.predict(test)
for i, values in enumerate(pred):
    pred_digit=values.index(max(values))
    ids.append(i+1)
    predictions.append(pred_digit)
    
# Make predictions

sub = pd.DataFrame({
        "ImageId": ids,
        "Label": predictions
    })

sub.to_csv("digit_submission.csv", index=False)