# [Project 3] 10 digits classification - Neural network

## Cleaning data

We start by **loading** our data

In [9]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [10]:
image = pd.read_csv('features.txt')
labels = pd.read_csv('labels.txt')

In [32]:
labels.shape

(4999, 1)

As indicated in the subject it is necessary to **replace** the "10" with 0 in the dataset. <br>
To do this we use the function `replace`.

In [11]:
labels.rename(columns = {'10' : 'actual_value'}, inplace = True)
labels.replace(10,0, inplace = True)

## Spliting datas

we use the `scikit learn` method called `train_test_split` to split the datas into a **train** and a **test** set <br>
Here we're using a test set representing 20% of our dataset. <br>
this method automaticaly suffles the datas

In [12]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(image, labels,test_size=0.2)

In [13]:
y_train = pd.get_dummies(y_train.actual_value)
y_test = pd.get_dummies(y_test.actual_value)

## Implementing neural network from scratch

### Initialization

To initialize the neural network, the number of nodes for each layer (input, hidden, output)must be given.<br>
After those informations, the constructor will initialize various things : 
* A Matrix of weights from input to hidden layer : `weight_ih`
* A Matrix of weights from hidden to output layer : `weight_ho`
* A Matrix of bias for the hidden layer : `bias_h`
* A Matrix of bias for the output layer : `bias_o`
* A learning initialized abitrary : `learningRate`


### Train and score method

We have three main methods to implement:
* feedforward: a function that will permit to obtain a prediction from a single input of the 400 pixels
* train: A method that will take the input images and their labels from the train set as argument, considering we are doing supervised learning
* score: A method that will take the input images and their labels from the test set as argument

#### Feedforward method

The first thing to do is to implement the feedforward method. <br>
To do that we need to calculate the activations of the hidden and the output layers. <br>

$Hiden \, Layer =  \sigma \, (weight\_ih . Input \, layer + bias\_h)$ <br>
$Output \, Layer =  \sigma \, (weight\_ho . Input \, layer + bias\_o)$ <br>
> Where $\sigma$ represents the sigmoid function 

we'll take the output layer's activations as the output of this function

#### Train method

After that we need to implement the train function. <br>
we begin by computing the errors of the output (`output_errors`) and the hidden (`hidden_errors`) layers by those formulas :  <br>

$Output\_errors = labels - outputs$ <br>
$Hidden\_errors = weight\_ho^t . Output\_errors$

> Here the `.` represents a matricial product

Once we have the errors for each layers we can compute the **gradients** in order to obtain the value by which we are going to correct our **weights** (`weight_ih_deltas` and `weight_ho_deltas`)  and **biases** (`hidden_gradients` and `output_gradients`). <br>
To do this we will use linear regression formulas .

$ Weight\_ho\_deltas  = learning\, rate \times output\_errors \, \times \, \sigma'(Output \, Layer) \, . \, Hiden \, Layer \, ^t$

$ Weight\_ih\_deltas  = learning\, rate \times hidden\_errors \, \times \, \sigma'(hidden \, activation) \, . \, Input \, Layer \, ^t$

$hidden\_gradients  = learning\, rate \times hidden\_errors \, \times \, \sigma'(Hiden \, Layer)$

$output\_gradients  = learning\, rate \times output\_error \, \times \, \sigma'(Input \, Layer)$

> Here $\sigma'$ reprensents the derivative of the sigmoid function

The final step of the train method is to **update** our variables by adding the gradient we just calculated

$weight\_ih = weight\_ih + Weight\_ih\_deltas$<br>
$weight\_ho = weight\_ho + Weight\_ho\_deltas$<br>

$bias\_h = bias\_h + hidden\_gradients$ <br>
$bias\_o = bias\_o + output\_gradients$ <br>

#### Score method 

The score method permits to compute the **accuracy** of the model on a train set <br>
To do that we're just taking the **maximum value** of our outputs and stating that it's our predicted value <br>
Then we build an array that compares `actual` and `predicted` value.
We add a **one** to the array if the predicted value is equal to the actual one and **zero** if not <br>

After that we just have to return following result:

$Accuracy = \frac{one's \, occurences}{array's \, size}$

In [24]:
def sigmoid(x):
    return 1/(1+np.exp(-x))

def dsigmoid(x):
    return sigmoid(x) * (1 - sigmoid(x))

In [25]:
sigmoid(1)

0.7310585786300049

In [15]:
class NeuralNetwork:
    def __init__(self, nb_input_nodes, nb_hiden_nodes, nb_output_nodes):
        
        #Seting each layer size
        self.nb_input_nodes = nb_input_nodes
        self.nb_hiden_nodes = nb_hiden_nodes
        self.nb_output_nodes = nb_output_nodes
        
        #Seting weight matrix
        self.weight_ih = np.random.rand(self.nb_hiden_nodes, self.nb_input_nodes)
        self.weight_ho = np.random.rand(self.nb_output_nodes, self.nb_hiden_nodes)
        
        #Seting learningRate
        self.learningRate = 0.7
        
        #Seting Bias
        self.bias_h = np.random.rand(nb_hiden_nodes, 1)*5
        self.bias_o = np.random.rand(nb_output_nodes, 1)*5
        
    # input must have (nb_input_nodes, 1) as shape
    def feedforward(self, input):
        self.z_hiden_activation = np.dot(self.weight_ih, input)
        self.z_hiden_activation += self.bias_h
        self.hiden_activation = sigmoid(self.z_hiden_activation)
        
        self.z_output_activation = np.dot(self.weight_ho, self.hiden_activation)
        self.z_output_activation += self.bias_o
        self.output_activation = sigmoid(self.z_output_activation)
        
        return self.output_activation
    
    def train(self, inputs, labels):
        
        #Calculate the errors
        outputs = self.feedforward(inputs)
        output_errors = labels - outputs
        hiden_errors = np.dot(self.weight_ho.T, output_errors)
        
        #Calculate the gradient for output layer
        gradients = dsigmoid(outputs)
        gradients = gradients * output_errors
        gradients = gradients * self.learningRate
        
        #Calculate deltas for ho weights
        weight_ho_deltas = np.dot(gradients, self.hiden_activation.T)

        #Adjusting ho weights and bias
        self.weight_ho += weight_ho_deltas
        self.bias_o += gradients
        
        #Calculate the gradient for hiden layer
        hiden_gradients = dsigmoid(self.hiden_activation)
        hiden_gradients = hiden_gradients * hiden_errors
        hiden_gradients = hiden_gradients * self.learningRate
        
        #Calculate deltas for ih weights
        weight_ih_deltas = np.dot(hiden_gradients, inputs.T)

        #Adjusting ih weights and bias
        self.weight_ih += weight_ih_deltas
        self.bias_h += hiden_gradients
        
        return (1/output_errors.shape[0]) * ((output_errors*output_errors).sum())
    
    def score(self, X_test, y_test):
        random_list = random.sample(range(0, y_test.shape[0]), y_test.shape[0])
        prediction_count = []
        for i in random_list:
            
            prediction = self.feedforward(X_test.values[i].reshape(400,1))
            actual = y_test.values[i]
            
            predicted_value = np.where(prediction == prediction.max())[0][0]
            actual_value = np.where(actual == actual.max())[0][0]
            
            if(predicted_value == actual_value):
                prediction_count.append(1)
        
            else:
                prediction_count.append(0)
            
            np_prediction_count = np.array(prediction_count)
            accuracy = np_prediction_count.sum()/np_prediction_count.size
            print(str(np_prediction_count.size) + ' / ' + str(y_test.shape[0]), end ='\r')
            
        return accuracy

## Testing our model

The model is up! <br>
Now we'll test it on the digits classification usecase <br>
To do that we'll use `400` as the number composing the input layer giving that we have 400 pixels in each image. <br>
We'll use `10` output nodes as our 10 different possible values <br>
We use arbitrary `35` nodes for the hidden layer

We use the `sample` method to suffle our data before training our model with all of the training set

In [30]:
import random
n = NeuralNetwork(400, 35, 10)

loop = 0

l = random.sample(range(0, y_train.shape[0]), y_train.shape[0])

for i in l:
    curr_img = X_train.values[l[i]].reshape(400,1)
    curr_label = y_train.values[l[i]].reshape(10,1)
    MSE = n.train(curr_img, curr_label)
    print (str(loop) + '/' + str(X_train.shape[0]) + '\tMSE = ' + str(MSE), end='\r')
    loop = loop+1
    
    



3998/3999	MSE = 0.13659230947698517052

We use the `score` method to compute the **accuracy** of our model 

In [37]:
n.score(X_test, y_test) * 100

1000 / 1000

84.0

In [48]:
print(classification_report(X_test,y_test))

ValueError: Classification metrics can't handle a mix of continuous-multioutput and multilabel-indicator targets

Here we have a different resulting accuracy depending on the shuffle. <br>
it fluctuate from 75% to 85% which is prertty good

## Comparing our model accuracy to scikit learn's one

Scikit learn's neural network does not work the same way. <br>
Indeed it will feedforward all the datasets before computing the errors. <br>
the datasets will be iterated several times.
> You can configure this value with `max_iter` attribute 

In [41]:
from sklearn.neural_network import MLPClassifier
from sklearn.neural_network import MLPRegressor

In [42]:
mlp = MLPClassifier(hidden_layer_sizes=(35), activation='logistic', solver='adam', max_iter=100 )
mlp.fit(X_train,y_train)



MLPClassifier(activation='logistic', alpha=0.0001, batch_size='auto',
              beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=35, learning_rate='constant',
              learning_rate_init=0.001, max_iter=100, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=None, shuffle=True, solver='adam', tol=0.0001,
              validation_fraction=0.1, verbose=False, warm_start=False)

In [43]:
predict_train = mlp.predict(X_train)
predict_test = mlp.predict(X_test)

In [49]:
from sklearn.metrics import classification_report,confusion_matrix
from sklearn.metrics import precision_score


# print(classification_report(y_train,predict_train))
print(precision_score(y_train,predict_train, average='macro'))


0.9704267740341732


Scikit learn's neural network model gives us a 97% accuracy score. <br>
This value could be explained by the way this neural network train the model as explained above firstly and by the learning rate which is realy low.

In [40]:
n.score(X_test, y_test) * 100

1000 / 1000

84.0