# Introduction
It's much easier to manage something when you know how it works<br>
The goal of this Notebook is building neural network from scratch without any special libraries such as TensorFlow or PyTorch.<br>
It's not going to have the best score or work really fast. It's going to be pretty simple, comparing to modern CNN, but it's going to be written from scratch using only NumPy

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm
import time
import pickle
from sklearn.model_selection import train_test_split

In [None]:
df_train = pd.read_csv('../input/digit-recognizer/train.csv')
df_test = pd.read_csv('../input/digit-recognizer/test.csv')

In [None]:
# grayscale normalization

for column in tqdm(df_test.columns):
    df_train[column] = df_train[column]/255
    df_test[column] = df_test[column]/255

Let's look at the picture

In [None]:
def pict_from_array(numarray, width, height):
    """
    Shows plot of array image
    :param numarray: numpy array to plot
    :param width: width of picture
    :param height: height of picture
    
    :return: plotted image
    """
    numarray = np.reshape(numarray, (width, height))
    plt.imshow(numarray)
    plt.show

In [None]:
pict_from_array(df_test.loc[0].to_numpy(), 28, 28)

We need to make everything numpy. All our calculations will use numpy

In [None]:
y = df_train['label'].to_numpy()
X = df_train.loc[:,'pixel0':'pixel783'].to_numpy()

In [None]:
z = np.zeros((len(y), 10))

In [None]:
for key, value in enumerate(y):
    z[key][value] = 1

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, z, test_size=0.33, random_state=13)

# Some theory
I'm going to make a class with neural net below, but I'm going to explain some things.<br>
I suppose, you have seen pictures where one set of neurons convert to different number of neurons<br>
In our network we will convert 784 neurons to 512 then to 128 then to 64 then to 10<br>
Let's take a look at small example and convert 5 neurons to 3 neurons. It's made with matrix multiplications. We need to multiply vector with size of 5, by matrix (5,3)

In [None]:
neurons_5 = [1,1,1,1,1] # size of 5
conversion_matrix = np.random.randn(5, 3)
neurons_3 = np.dot(neurons_5, conversion_matrix) # size of 3
neurons_3

And we need will make it a few times<br>
This is linear calculation<br>
Then we need to add non-linear activation. I'll use RELU (some people say it works pretty good)<br>And we will make it multiple times<br>
<br>Than we need to train our model. I've made some notebooks for better understanding how training works
https://www.kaggle.com/konstantinsuspitsyn/gradient-descent-for-formulas-like-y-a-x-b
I'll use Gradient Descent with momentum (people say, that it works faster then Adam for MNIST)

# Class of it's own

In [None]:
class NeuralNetwork:
    
    '''
    Neural network from scratch
    Self-education purpose only. 
    Use with caution! And have fun
    '''
    
    def __init__(self, neurons = [784, 512, 128, 64, 10], lr=0.001, epoch=10, \
                 weights = None, momentum = 0.9):
        '''
        Here we build our network
        
        :param neurons: array with sizes of neuron layers
                        neurons[0] = number of pixels = 28*28
                        neurons[-1:] = number of classes = 10
                        all berween are hidden layers
        :param lr: learning_rate
        :param epoch: number of epochs
        :param weights: if weights == None, then initialize random
        '''
        self.input = neurons[0]
        self.hidden_1 = neurons[1]
        self.hidden_2 = neurons[2]
        self.hidden_3 = neurons[3]
        self.output = neurons[4]
        self.lr = lr
        self.epoch = epoch
        
        if weights == None:
            self.weights = {
                # We need to make from default 784 neurons 512
                # Only way we can do that by multiplying by matrix of shape 512×784
                'w0': np.random.randn(self.hidden_1, self.input) * np.sqrt(1. / self.hidden_1),
                # Multiplying by matrix of shape 128×512
                'w1': np.random.randn(self.hidden_2, self.hidden_1) * np.sqrt(1. / self.hidden_2),
                # Multiplying by matrix of shape 64×128
                'w2': np.random.randn(self.hidden_3, self.hidden_2) * np.sqrt(1. / self.hidden_3),
                # Multiplying by matrix of shape 10×64
                'w3': np.random.randn(self.output, self.hidden_3) * np.sqrt(1. / self.output),
            }
        else:
            self.weights = weights
        
        # It will contain all current neuron calculations
        self.neurons = {}
        # Gradient for momentum
        self.prev_grad = None
        self.momentum = momentum
            
        
            
    
    def softmax(self, x, derivative = False):
        '''
        Softmax activation function
        https://en.wikipedia.org/wiki/Softmax_function
        
        Basic softmax should look like this:
        def softmax(x):
            """Compute the softmax of vector x."""
            exps = np.exp(x)
            return exps / np.sum(exps)
        But code below is more numerically stable
        
        '''
        exps = np.exp(x - x.max())
        if derivative == True:
            return exps / np.sum(exps, axis=0) * (1 - exps / np.sum(exps, axis=0))
        return exps / np.sum(exps, axis=0)
    
    def relu(self, x, derivative = False):
        '''
        RELU activation function
        https://en.wikipedia.org/wiki/Rectifier_(neural_networks)
        I used Relu, because I've read that with this activation function,
        training process goes much faster
        
        :param x:
        :param derivative: if False, than 
        
        '''
        if derivative == True:
            # if x == 0 derivative should be undefined, but I'll make it 0
            x[x<=0] = 0
            x[x>0] = 1
            return x
        else:
            return np.maximum(0, x)
    
    
    def forward(self, x):
        '''
        Forward pass
        
        :param x: input neurons
        :return :
        '''
        
        # All new keys in dictionary will be l# or a# 
        # (# - based on layer number, l or a based on linear or activation layer)
        
        # Working with input
        self.neurons['l_input'] = x
        
        self.neurons['l0'] = np.dot(self.weights['w0'], self.neurons['l_input'])
        self.neurons['a0'] = self.relu(self.neurons['l0']) # activation of first layer
        
        # Create hidden layer
        self.neurons['l1'] = np.dot(self.weights['w1'], self.neurons['a0'])
        self.neurons['a1'] = self.relu(self.neurons['l1'])
        
        # Create hidden layer
        self.neurons['l2'] = np.dot(self.weights['w2'], self.neurons['a1'])
        self.neurons['a2'] = self.relu(self.neurons['l2'])
        
        # Working with output
        self.neurons['l3'] = np.dot(self.weights['w3'], self.neurons['a2'])
        self.neurons['a3'] = self.softmax(self.neurons['l3']) # return probability of 0 to 10
        
        return self.neurons['a3']
    
    def backward(self, y_true, y_pred):
        '''
        y_pred - predicted y
        y_true - real y
        '''
        
        # Computation of gradients
        gradients = {}
        
        error = 2 * (y_pred - y_true) / y_pred.shape[0] * self.softmax(self.neurons['l3'], derivative=True)
        gradients['w3'] = np.outer(error, self.neurons['a2'])
        
        error = np.dot(self.weights['w3'].T, error)*self.relu(self.neurons['l2'], derivative=True)
        gradients['w2'] = np.outer(error, self.neurons['a1'])
        
        error = np.dot(self.weights['w2'].T, error)*self.relu(self.neurons['l1'], derivative=True)
        gradients['w1'] = np.outer(error, self.neurons['a0'])
        
        error = np.dot(self.weights['w1'].T, error)*self.relu(self.neurons['l0'], derivative=True)
        gradients['w0'] = np.outer(error, self.neurons['l_input'])
        
        return gradients
        
    def sgd_with_momentum(self, gradient, momentum=0.9):
        '''
        expl
        '''
        
        for key, value in gradient.items():
            if self.prev_grad == None:
                self.weights[key] -= self.lr * value
            else:
                self.weights[key] -= (self.lr * value + self.lr * self.prev_grad[key] * momentum)

        self.prev_grad = gradient.copy()
        
        
        
    def compute_accuracy(self, x_val, y_val):
        '''
        '''
        predictions = []

        for x, y in zip(x_val, y_val):
            output = self.forward(x)
            pred = np.argmax(output)
            predictions.append(pred == np.argmax(y))
            
        return np.mean(predictions)
    
    def train(self, x_train, y_train, x_val, y_val):
        score = 0
        start_time = time.time()
        for iteration in range(self.epoch):
            for x,y in zip(x_train, y_train):
                output = self.forward(x)
                gradident = self.backward(y, output)
                self.sgd_with_momentum(gradident, self.momentum)
            
            accuracy = self.compute_accuracy(x_val, y_val)

            if accuracy > score:
                with open('./weights.pkl', 'wb') as f:
                    pickle.dump(self.weights, f, pickle.HIGHEST_PROTOCOL)
                print('Accuracy improved from {:.2f}% to {:.2f}%. File with weights updated'.format(score*100, accuracy * 100))
                score = accuracy
            
            print('Epoch: {}/{}, Time Spent: {:.2f}s, Accuracy: {:.2f}%'.format(
                iteration+1, self.epoch, time.time() - start_time, accuracy * 100
            ))

        return self.weights

# Training

In [None]:
# Training from scratch
# Just uncoment 2 lines below and do not run Loading model section
# dnn = NeuralNetwork()
# model_weights = dnn.train(X_train, y_train, X_test, y_test)

## Loading model

In [None]:
# Loading weights (it was training for about 70 epoches)
with open('../input/nn-from-scratch-mnist/weights_95_8.pkl', 'rb') as f:
    weigths = pickle.load(f)

In [None]:
# I'll run only one epoch to see results
dnn = NeuralNetwork(weights = weigths, epoch=1)
model_weights = dnn.train(X_train, y_train, X_test, y_test)

## Fitting data

In [None]:
# Prepare data to fit
np_test = df_test.to_numpy()

In [None]:
image_id = []
label = []
j=1
for i in tqdm(np_test):
    label.append(np.argmax(dnn.forward(i)))
    image_id.append(j)
    j+=1

In [None]:
df_test_answ = pd.DataFrame(list(zip(image_id, label)),
               columns =['ImageId', 'Label'])

In [None]:
df_test_answ.to_csv('./answer.csv', index = False)

After uploading, score is 95.8%. It's definitely not the best, model is not the fastest, but it was built from scratch