<a href="https://colab.research.google.com/github/masterhung0112/ml/blob/master/NN2LayerScratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.preprocessing import MinMaxScaler
from sklearn import metrics
from sklearn.metrics import confusion_matrix
import itertools

 Class **dlnet** that setups and initializes our network.
- **X**: Holds the input layer, the data given to the network. X is a matrix that has as many rows as features and as many columns as samples we have available to train the network.
- **Y**: Holds the desired output.
- **Yh**: Holds the output that our network produces. It should have the same dimensions than **Y**, our desired target values. We initialize it to zero.
- **L**: It holds the number of layers of our network, 2.
- **dims**: Next, we define the number of neurons or units in each of our layers. We do this with a numpy array. The first component of the array is our input (which is not counted as a layer of the network). Our input will have 9 units because, as we will see in a bit, our data-set will have 9 useful features. Next, the first layer of the neural network will have 15 neurons, and our second and final layer will have 1 (the output of the network).
- **param**: A Python dictionary that will hold the W and b parameters of each of the layers of the network.
- **ch**: a cache variable, a python dictionary that will hold some intermediate calculations that we will need during the backward pass of the gradient descent algorithm.
- **lr**: Our learning rate. This sets the speed at which the network will learn.
- **sam**: The number of training samples we have.
- **loss**: An array where we will store the loss value of the network every x iterations. The loss value expresses the difference between the predicted output of our network and the target one.

The function **nInit**, which will initialize with random values the parameters of our network
- **W1**: The number of rows is the number of hidden units of that layer, dims[1], and the number of columns is the number of features/rows of the previous layer (in this case X, our input data), dims[0].
- **b1**: Same number of rows as W1 and a single column.
- **W2**: The number of rows is the number of hidden units of that layer, dims[2], and the number of columns is again the number of rows of the input to that layer, dims[1].
- **b2**: Same number of rows as W2 and a single column.

In [0]:
class dlnet:
  def __init__(self, x, y):
    self.X = x
    self.Y = y
    self.Yh = np.zeros((1, self.Y.shape[1]))
    
    self.L = 2
    self.dims = [9, 15, 1]
    
    self.param = {}
    self.ch = {}
    self.grad = {}
    
    self.loss = {}
    self.lr = 0.003
    self.sam = self.Y.shape[1]
  
  # the function nInit, which will initialize with random values the parameters of our network
  def nInit(self):    
    np.random.seed(1)
    self.param['W1'] = np.random.randn(self.dims[1], self.dims[0]) / np.sqrt(self.dims[0]) 
    self.param['b1'] = np.zeros((self.dims[1], 1))        
    self.param['W2'] = np.random.randn(self.dims[2], self.dims[1]) / np.sqrt(self.dims[1]) 
    self.param['b2'] = np.zeros((self.dims[2], 1))                
    return
  
  # non-linear activation function
  def Sigmoid(Z):
    return 1/(1+np.exp(-Z))
  
  # non-linear activation function
  def Relu(Z):
    return np.maximum(0,Z)
  
  # the Loss, the distance between Yh and Y
  # Cross-Entropy Loss Function
  def nloss(self,Yh):
      loss = (1./self.sam) * (-np.dot(self.Y, np.log(Yh).T) - np.dot(1-self.Y, np.log(1-Yh).T))    
      return loss
    
  # Take the input of the network and pass it forwards through 
  # its different layers until it produces an output
  #
  # Multiply the weights of the first layer by the input data and 
  # add the first bias matrix , b1, to produce Z1. 
  # We then apply the Relu function to Z1 to produce A1.
  #
  # Multiply the weight matrix of the second layer by its input, 
  # A1 (the soutput of the first layer, which is the input of the second layer),
  # and we add the second bias matrix, b2, in order to produce Z2. 
  # We then apply the Sigmoid function to Z2 to produce A2, 
  # which is in fact Yh, the output of the network.
  #
  # Ran our input data through the network and produced Yh, an output
  #
  # Z: represent the output of the computation of a layer
  # A: represent the output of the activation function
  def forward(self):    
    # Layer 1
    Z1 = self.param['W1'].dot(self.X) + self.param['b1'] 
    A1 = Relu(Z1)
    self.ch['Z1'], self.ch['A1'] = Z1, A1
    
    # Layer 2
    Z2 = self.param['W2'].dot(A1) + self.param['b2']  
    A2 = Sigmoid(Z2)
    self.ch['Z2'], self.ch['A2'] = Z2, A2
    self.Yh = A2
    loss = self.nloss(A2)
    
    return self.Yh, loss

  def gd(self,X, Y, iter = 3000):
    np.random.seed(1)                         

    self.nInit()

    for i in range(0, iter):
      Yh, loss=self.forward()
      self.backward()
        
      if i % 500 == 0:
        print ("Cost after iteration %i: %f" %(i, loss))
        self.loss.append(loss)
    
    return

- Store the data in .csv format in your machine or online
- Read the data using Pandas read_csv function
- Then we proceed to clean and prepare the data, build our datasets and run gradient descent.

In [7]:
if __name__ == "__main__":
  data_url = 'https://raw.githubusercontent.com/masterhung0112/ml/master/NN2LayerScratch-wisconsin-cancer-dataset.csv'
  df = pd.read_csv(data_url, header=None)
  df = df[~df[6].isin(['?'])]
  df = df.astype(float)
  df.iloc[:,10].replace(2, 0,inplace=True)
  df.iloc[:,10].replace(4, 1,inplace=True)

  df.head(3)
  scaled_df=df
  names = df.columns[0:10]
  scaler = MinMaxScaler() 
  scaled_df = scaler.fit_transform(df.iloc[:,0:10]) 
  scaled_df = pd.DataFrame(scaled_df, columns=names)
  
  x=scaled_df.iloc[0:500,1:10].values.transpose()
  y=df.iloc[0:500,10:].values.transpose()

  xval=scaled_df.iloc[501:683,1:10].values.transpose()
  yval=df.iloc[501:683,10:].values.transpose()

  print(df.shape, x.shape, y.shape, xval.shape, yval.shape)

  #nn = dlnet(x,y)
  #nn.gd(x, y, iter = 15000)

(683, 11) (9, 500) (1, 500) (9, 182) (1, 182)
