# <a>Understanding and Implementing Neural Networks from scratch</a>

<br>

In this kernel, I have explained the intution about neural networks and how to implement neural networks from scratch in python. 

## Contents  
<br>

**<a><i> 1. What are Neural Networks</i></a> **  

**<a><i> 2. Implement a Neural Network - Binary classification</i></a>**   

**<a><i> 3. Implement a Neural Network - Multiclass classification</i></a>**  

**<a><i> 4. What are Deep Neural Networks</i></a> **  

**<a><i> 5. Convolutional Neural Networks Implementation</i></a>**  

![](https://www.pangeanic.com/wp-content/uploads/sites/2/2017/07/neural-network-graph-624x492.jpg)

I would like to thank Andrew NG and deeplearning.ai course for their excellent material



## <a>1. What are Neural Networks </a>

Neural networks are a type of machine learning models which are designed to operate similar to biological neurons and human nervous system. These models are used to recognize complex patterns and relationships that exists within a labelled dataset. They have following properties:

1. The core architecture of a Neural Network model is comprised of a large number of simple processing nodes called Neurons which are interconnected and organized in different layers. 

2. An individual node in a layer is connected to several other nodes in the previous and the next layer. The inputs form one layer are received and processed to generate the output which is passed to the next layer.

3. The first layer of this architecture is often named as input layer which accepts the inputs, the last layer is named as the output layer which produces the output and every other layer between input and output layer is named is hidden layers. 

### Key concepts in a Neural Network 

#### A. Neuron:

A Neuron is a single processing unit of a Neural Network which are connected to different other neurons in the network. These connections repersents inputs and ouputs from a neuron. To each of its connections, the neuron assigns a “weight” (W) which signifies the importance the input and adds a bias (b) term. 

#### B. Activation Functions 

The activation functions are used to apply non-linear transformation on input to map it to output. The aim of activation functions is to predict the right class of the target variable based on the input combination of variables. Some of the popular activation functions are Relu, Sigmoid, and TanH. 

#### C. Forward Propagation 

Neural Network model goes through the process called forward propagation in which it passes the computed activation outputs in the forward direction. 

Z = W*X + b   
A = g(Z) 

- g is the activation function 
- A is the activation using the input 
- W is the weight associated with the input 
- B is the bias associated with the node 

#### D. Error Computation: 

The neural network learns by improving the values of weights and bias. The model computes the error in the predicted output in the final layer which is then used to make small adjustments the weights and bias. The adjustments are made such that the total error is minimized. Loss function measures the error in the final layer and cost function measures the total error of the network. 

Loss = Actual_Value - Predicted_Value   

Cost = Summation (Loss)   

#### E. Backward Propagation: 

Neural Network model undergoes the process called backpropagation in which the error is passed to backward layers so that those layers can also improve the associated values of weights and bias. It uses the algorithm called Gradient Descent in which the error is minimized and optimal values of weights and bias are obtained. This weights and bias adjustment is done by computing the derivative of error, derivative of weights, bias and subtracting them from the original values. 

<br>

## <a> 2. Implement a Neural Network - Binary Classification</a>  

Lets implement a basic neural network in python for binary classification which is used to classify if a given image is 0 or 1.  

In [1]:
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
from keras.models import Sequential
import pandas as pd 
import numpy as np 
import keras

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


### 2.1 Dataset Preparation

First step is to load and prepare the dataset

In [2]:
train = pd.read_csv("E:/input/train.csv")
test = pd.read_csv("E:/input/test.csv")

# include only the rows having label = 0 or 1 (binary classification)
X = train[train['label'].isin([0, 1])]

# target variable
Y = train[train['label'].isin([0, 1])]['label']

# remove the label from X

X = X.drop(['label'], axis = 1)


In [3]:
X.head()

Unnamed: 0,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### 2.2 Implementing a Activation Function 

We will use sigmoid activation function because it outputs the values between 0 and 1 so its a good choice for a binary classification problem

In [4]:
# implementing a sigmoid activation function
def sigmoid(z):
    #########################需要填写，写出sigmoid的计算公式，并赋值给s，sigmoid=1/1+exp(x)
    s = 1.0/ (1 + np.exp(-z))    
    #########################需要填写，写出sigmoid的计算公式，并赋值给s，sigmoid=1/1+exp(x)
    return s

### 2.3 Define Neural Network Architecture

Create a model with three layers - Input, Hidden, Output. 

In [5]:
def network_architecture(X, Y):
    # nodes in input layer
    ##########################需要填写，计算每层的个数，输入层个数赋值给n_x,输出层赋值给n_y
    n_x = X.shape[0] 
    # nodes in hidden layer
    n_h = 10          
    # nodes in output layer
    n_y = Y.shape[0] 
     ##########################需要填写，计算每层的个数，输入层个数赋值给n_x,输出层赋值给n_y
    return (n_x, n_h, n_y)

### 2.4 Define Neural Network Parameters 

Neural Network parameters are weights and bias which we need to initialze with zero values. The first layer only contains inputs so there are no weights and bias, but the hidden layer and the output layer have a weight and bias term. (W1, b1 and W2, b2)

In [6]:
def define_network_parameters(n_x, n_h, n_y):
    W1 = np.random.randn(n_h,n_x) * 0.01 # random initialization
    b1 = np.zeros((n_h, 1)) # zero initialization
    
    ##############################需要填写，请为输出层的W2和b2赋值
    W2 = np.random.randn(n_y,n_h) * 0.01 
    b2 = np.zeros((n_y, 1)) 
    ##############################需要填写，请为输出层的W2和b2赋值
    
    return {"W1": W1, "b1": b1, "W2": W2, "b2": b2}    

### 2.5 Implement Forward Propagation

The hidden layer and output layer will compute the activations using sigmoid activation function and will pass it in the forward direction. While computing this activation, the input is multiplied with weight and added with bias before passing it to the function. 

In [7]:
def forward_propagation(X, params):
    Z1 = np.dot(params['W1'], X)+params['b1']
    A1 = sigmoid(Z1)
    
    ################################需要填写，请计算Z2和A2
    Z2 = np.dot(params['W2'], A1)+params['b2']
    A2 = sigmoid(Z2)
    ################################需要填写，请计算Z2和A2
    
    return {"Z1": Z1, "A1": A1, "Z2": Z2, "A2": A2}    

### 2.6 Compute the Network Error 

To compute the cost, one straight forward approach is to compute the absolute error among prediction and actual value. But a better loss function is the log loss function which is defines as : 

  -Summ ( Log (Pred) * Actual + Log (1 - Pred ) * Actual ) / m

In [8]:
def compute_error(Predicted, Actual):                   #计算误差函数，可以选择不同的误差函数
    logprobs = np.multiply(np.log(Predicted), Actual)+ np.multiply(np.log(1-Predicted), 1-Actual)
    cost = -np.sum(logprobs) / Actual.shape[1] 
    return np.squeeze(cost)

### 2.7 Implement Backward Propagation

In backward propagation function, the error is passed backward to previous layers and the derivatives of weights and bias are computed. The weights and bias are then updated using the derivatives.  

In [9]:
def backward_propagation(params, activations, X, Y):                ##########反向传播算法，计算导数############
    m = X.shape[1]
    
    # output layer
    dZ2 = activations['A2'] - Y # compute the error derivative 
    dW2 = np.dot(dZ2, activations['A1'].T) / m # compute the weight derivative 
    db2 = np.sum(dZ2, axis=1, keepdims=True)/m # compute the bias derivative
    
    # hidden layer
    dZ1 = np.dot(params['W2'].T, dZ2)*(1-np.power(activations['A1'], 2))
    dW1 = np.dot(dZ1, X.T)/m
    db1 = np.sum(dZ1, axis=1,keepdims=True)/m
    
    return {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2}

def update_parameters(params, derivatives, alpha = 1.2):           ##########反向传播算法，更新参数############
    # alpha is the model's learning rate 
    
    params['W1'] = params['W1'] - alpha * derivatives['dW1']
    params['b1'] = params['b1'] - alpha * derivatives['db1']
    
    ####################需要填写，请计算W2和b2更新之后的值
    params['W2'] = params['W2'] - alpha * derivatives['dW2']
    params['b2'] = params['b2'] - alpha * derivatives['db2']
    ####################需要填写，请计算W2和b2更新之后的值
    
    return params

### 2.8 Compile and Train the Model

Create a function which compiles all the key functions and creates a neural network model. 

In [10]:
def neural_network(X, Y, n_h, num_iterations=100):                      ############编译网络
    n_x = network_architecture(X, Y)[0]
    n_y = network_architecture(X, Y)[2]
    
    params = define_network_parameters(n_x, n_h, n_y)                  ############初始化参数  
    for i in range(0, num_iterations):
        results = forward_propagation(X, params)                       ############前向传播 
        error = compute_error(results['A2'], Y)
        derivatives = backward_propagation(params, results, X, Y)      ############反向传播
        params = update_parameters(params, derivatives)                ############更新参数
    return params

In [11]:
y = Y.values.reshape(1, Y.size)
x = X.T.as_matrix()
model = neural_network(x, y, n_h = 10, num_iterations = 10)            ############生成一个新的网络

  
  after removing the cwd from sys.path.


### 2.9 Predictions 

In [12]:
def predict(parameters, X):
    results = forward_propagation(X, parameters)
    print (results['A2'][0])
    predictions = np.around(results['A2'])    
    return predictions

predictions = predict(model, x)
print ('Accuracy: %d' % float((np.dot(y,predictions.T) + np.dot(1-y,1-predictions.T))/float(y.size)*100) + '%')

[0.52351746 0.10311941 0.95776857 ... 0.95776857 0.10311941 0.95776857]
Accuracy: 96%


  after removing the cwd from sys.path.
