# Introduction
On this notebook I will explain the gradient decent and will train a MLP to achieve 96% accuracy on breast cancer dataset from Scikit.<br>
I will explain the theory and functions sepetately in each cell and then put it altogether in a class in a seperate cell

# Class Initilizier
<br>

The First thing we need to pass is the **Architecture** that is the structure of each layer. The for for this variable is a list/tuple consisting number of neurons in each layer. For example if we want to use a 4 layer MLP with 10, 5, 3 and 1 neurons then we need to pass `[10,5,3,1]` as input for this variable,<br>
<br>

The second argumrnt is **Activation**. Currently only ReLU and sigmoid functions are supported. If we have _n_ levels of neurons we need _n-1_ activation functions. If the activation used in all the layers are same you can pass a string 'sigmoid' or 'relu'. If you want to apply different activation activation function for each layer then you need to pass a list/tuple of _n-1_ strings consisting either __relu__ or __sigmoid__ <br>

So let's say you have passed `[10, 5, 3, 1]` as `architecture. Now the activation will be applied to 3 layers. If you want to apply `sigmoid` to all 3 layers then you can do either `'sigmoid'` or `['sigmoid', 'sigmoid', 'sigmoid']`. If you want to apply different activations for different layers you can do `['sigmoid','relu', 'sigmoid']` <br> 
<br>

You can set **Learning_rate** here or also while training.

\* **Other functions including Custoom functions wil be added later

The Next thing you need to keep in mind that we need to store outputs of each layer. let's say output of one layer is __h1__ before activation after passing through activation function (ReLU, sigmoid or any other) it will be __a1__ <br>
<br>

<img src="images/Architecture.png">

We will also store activation function and derivative of that function on two lists named __activation__ and __prime_funcs__ respectively. <br>

### Creating Weights and Biases of each layer
The first function we will define is not \__init\__ <br>
We will look into __create_weights_and_biases__ it takes number of input nodes and output nodes and creates weights and biases. It also initializes according to some initilization function. It returns a dictionary.<br>
Let's consider you have passed `[10, 5, 3, 1]` as input so the layer with 10 neurons will be input and there will be 1 output neuron there will be 3 hidden states (between 10 and 5 neurons, between 5 and 3 neurons and between 3 and 1 neuron )

```python
def create_weights_and_biases(self, in_nodes, out_nodes):
    std= np.sqrt(2.0/(in_nodes + out_nodes))
    w= np.random.normal(loc= 0, scale= std, size= (in_nodes, out_nodes))
    b= np.zeros((1, out_nodes))
    return {'weight':w , 'bias':b}
```



### The initializer of class
Inside the initializer function at the first we have assigned some variables which will be used throughout. The __Learning Rate__ can be set while declaring the class instance, it can be scheduled while training as well

```python
def __init__(self, architecture, activation= 'sigmoid', learning_rate= 0.001):
        self.architecture= architecture
        self.activation= []
        self.lr= learning_rate
        self.parameters=[]
        self.buffered_op={}
        self.prime_funcs=[]
        
        if isinstance(activation, str):
            if activation.lower().strip() in ['relu', 'sigmoid']:
                if activation.lower().strip()== 'relu':
                    self.activation= [self.relu for x in range(len(self.architecture) - 1)]
                    self.prime_funcs = [self.relu_prime for x in range(len(self.architecture) - 1)]
                else:
                    self.activation= [self.sigmoid for x in range(len(self.architecture) - 1)]
                    self.prime_funcs = [self.sigmoid_prime for x in range(len(self.architecture) - 1)]
                    
            else:
                raise ValueError("activation Value should be either relu or sigmoid")
                
        elif isinstance(activation, list):
            if len(activation)+1 != len(self.architecture):
                raise ValueError("Number of Activations is not compatible with architecture")
            
            for func in activation:
                if isinstance(func, str):
                    self.activation.append(self.relu if func.lower().strip()== 'relu' else self.sigmoid)
                    self.prime_funcs.append(self.relu_prime if func.lower().strip()== 'relu' \
                                            else self.sigmoid_prime)
                    
                else:
                    raise ValueError("Custom fuctions are not supported yet")
                    
        
        for i in range(len(architecture) -1):
            layer_parameters= self.create_weights_and_biases(architecture[i], architecture[i + 1])
            self.parameters.append(layer_parameters)
```

#### Sigmoid Function
$$
sig(x)= \frac{1}{1+e^{-x}}
$$
<br>
#### Sigmoid prime function
$$
sig'(x)= sig(x) * ( 1- sig(x))
$$ 
<br>
The derivative of sigmoid is explained in details [in this article](https://towardsdatascience.com/derivative-of-the-sigmoid-function-536880cf918e)

In [1]:
import numpy as np

In [15]:
class Network:
    
    def __init__(self, architecture, activation= 'sigmoid', learning_rate= 0.001):
        self.architecture= architecture
        self.activation= []
        self.lr= learning_rate
        self.parameters=[]
        self.buffered_op={}
        self.prime_funcs=[]
        
        if isinstance(activation, str):
            if activation.lower().strip() in ['relu', 'sigmoid']:
                if activation.lower().strip()== 'relu':
                    self.activation= [self.relu for x in range(len(self.architecture) - 1)]
                    self.prime_funcs = [self.relu_prime for x in range(len(self.architecture) - 1)]
                else:
                    self.activation= [self.sigmoid for x in range(len(self.architecture) - 1)]
                    self.prime_funcs = [self.sigmoid_prime for x in range(len(self.architecture) - 1)]
                    
            else:
                raise ValueError("activation Value should be either relu or sigmoid")
                
        elif isinstance(activation, list):
            if len(activation)+1 != len(self.architecture):
                raise ValueError("Number of Activations is not compatible with architecture")
            
            for func in activation:
                if isinstance(func, str):
                    self.activation.append(self.relu if func.lower().strip()== 'relu' else self.sigmoid)
                    self.prime_funcs.append(self.relu_prime if func.lower().strip()== 'relu' \
                                            else self.sigmoid_prime)
                    
                else:
                    raise ValueError("Custom fuctions are not supported yet")
                    
        
        for i in range(len(architecture) -1):
            layer_parameters= self.create_weights_and_biases(architecture[i], architecture[i + 1])
            self.parameters.append(layer_parameters)
        
        
    def create_weights_and_biases(self, in_nodes, out_nodes):
        std= np.sqrt(2.0/(in_nodes + out_nodes))
        w= np.random.normal(loc= 0, scale= std, size= (in_nodes, out_nodes))
        b= np.zeros((1, out_nodes))
        return {'weight':w , 'bias':b}
    
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    
    def relu(self, x):
        x[x<=0]= 0
        return x
    
    def sigmoid_prime(self, x):
        return self.sigmoid(x) * (1 - self.sigmoid(x))    
    
    def relu_prime(self, x):
        x[x>0]= 1
        return x

    def forward_pass(self, x):
        if x.ndim == 1:
            x= x.reshape((1, -1))
        self.buffered_op['a0']= x
        for i, parameter in enumerate(self.parameters):
            w= parameter['weight']
            b= parameter['bias']
            actv_func= self.activation[i]
            
            h= np.matmul(x, w) + b
            a= actv_func(h)
            self.buffered_op['h' + str(i+1)]= h
            self.buffered_op['a' + str(i+1)]= a
            x= a
        return a
    
    def loss(self, y_cap, y):
        return (y - y_cap) ** 2 #MSE
    
    def loss_prime(self, y_cap, y):
        return -2 * (y - y_cap)
    
    def backpropagation(self,y_cap, y):
        #last_layer= True
        layer_error= self.loss_prime( y_cap, y)
        gradients= []
        batch_size= y_cap.shape[0]
        
        
        for layer_idx in range(len(self.architecture) - 1, 0 , -1):
            prime_func= self.prime_funcs[layer_idx - 1]
            h_cur= self.buffered_op['h'+str(layer_idx)]
            a_prev= self.buffered_op['a'+str(layer_idx - 1)]
            
            
            error_term= layer_error * prime_func(h_cur)
            del_w= np.matmul(error_term.T , a_prev)
            del_b= error_term.sum(axis= 0)
            gradients.append((del_w, del_b))
            
            #updating layer_error term for next iteration
            layer_error= np.matmul(error_term , self.parameters[layer_idx -1]['weight'].T)
            
        gradients.reverse()
        
        #updating the weights:
        for i in range(len(self.parameters)):
            self.parameters[i]['weight'] -= (self.lr / batch_size) * gradients[i][0].T
            self.parameters[i]['bias'] -= (self.lr / batch_size) * gradients[i][1]
            
    def train(self, train_x, train_y, epochs= 5, learning_rate= 0.01, batch_size= 1):
        no_of_records= train_x.shape[0]
        self.lr= learning_rate
    
        for epoch in range(epochs):
            epoch_error= 0
            for n in range(batch_size, no_of_records, batch_size):
                x_batch= train_x[n- batch_size: n]
                y_batch= train_y[n- batch_size: n]
        
                y_cap= self.forward_pass(x_batch)
                batch_error= self.loss(y_cap, y_batch)
                #print(batch_error.shape)
                epoch_error+= batch_error.sum()
                self.backpropagation(y_cap, y_batch)
        
            print("Epoch {} - Training Loss - {}\n".format(epoch+1, epoch_error))
            
            

In [3]:
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

In [4]:
data= load_breast_cancer()
x= data.data
y= data.target
cols= data.target_names
print(type(x))
print(x.shape)

<class 'numpy.ndarray'>
(569, 30)


In [5]:
scaler= StandardScaler()
scaler.fit(x)
scaled_x= scaler.transform(x)

In [6]:
y[:5]

array([0, 0, 0, 0, 0])

In [7]:
y= y.reshape((-1,1))
x_train, x_test, y_train, y_test= train_test_split(scaled_x, y, test_size= 0.2, 
                                                   shuffle= True, random_state= 711)
print(x_train.shape)

(455, 30)


In [16]:
neu_net= Network([30, 10, 5, 1], ['sigmoid', 'relu', 'sigmoid'])
neu_net.train( x_train, y_train, epochs= 25, learning_rate= 0.1, batch_size= 8)

Epoch 1 - Training Loss - 99.83404976116122

Epoch 2 - Training Loss - 80.1516461945468

Epoch 3 - Training Loss - 59.87605354535778

Epoch 4 - Training Loss - 44.04947539042074

Epoch 5 - Training Loss - 32.49447697018934

Epoch 6 - Training Loss - 24.115135585056006

Epoch 7 - Training Loss - 18.541705552546983

Epoch 8 - Training Loss - 15.122142021692751

Epoch 9 - Training Loss - 12.978192744810874

Epoch 10 - Training Loss - 11.542966640589192

Epoch 11 - Training Loss - 10.526460748120416

Epoch 12 - Training Loss - 9.770086254523843

Epoch 13 - Training Loss - 9.188356444923578

Epoch 14 - Training Loss - 8.723530622856948

Epoch 15 - Training Loss - 8.344553180427846

Epoch 16 - Training Loss - 8.027487554992463

Epoch 17 - Training Loss - 7.759651752526115

Epoch 18 - Training Loss - 7.526697005272587

Epoch 19 - Training Loss - 7.3214430098351375

Epoch 20 - Training Loss - 7.140480818919122

Epoch 21 - Training Loss - 6.9778053909038995

Epoch 22 - Training Loss - 6.8308449

In [17]:
def acuracy_check_binary_classifier(predicted, actual):
    y_test_cap= np.zeros_like(predicted)
    y_test_cap[predicted >= 0.5] = 1
    correct_pred_count= (y_test_cap== actual).sum()
    return 100 * correct_pred_count / len(actual)

In [18]:
pred= neu_net.forward_pass(x_test)
print(acuracy_check_binary_classifier(pred, y_test))

97.36842105263158
