<a href="https://colab.research.google.com/github/raulcodec/NeuralNets/blob/master/NeuralNet_fromScratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Two Layer Neural Net in Numpy**

In [5]:
import numpy as np

In [6]:
# sigmoid function
#Formulae for Sigmoid is 1/1+e^-x

def nonlin(x,deriv=False):
  if(deriv==True):
    return x*(1-x)
  return 1/(1+np.exp(-x))

Data
```
  Inputs	  Output
0	0	1	0
1	1	1	1
1	0	1	1
0	1	1	0
```



In [7]:
# input dataset
X = np.array([  [0,0,1],
                [0,1,1],
                [1,0,1],
                [1,1,1] ])
X.shape

(4, 3)

In [8]:
# output dataset            
y = np.array([[0,0,1,1]]).T
y.shape

(4, 1)

**Synopsis of the Neural Net framework**

*   Thus, we have 3 features in the data (with 4 records) and requires 3 Nodes as Input Layer
*   There will be 3 Synapses from these Nodes,connecting edges, that will be carrying weights and joining to a function Sigmoid(also termed Activation Function)
*   From the function there is a connection to Output Layer which will be a binary classification of data based on INput Layers  

In [9]:
# seed random numbers to make calculation deterministic 
np.random.seed(42)

# initialize weights randomly with mean 0
#np.random.random has a mean of 0.5 as the values ranges from (0,1). Thus *2 and subtracting 1 gives mean 0
syn0 = 2*np.random.random((3,1)) - 1
syn0

array([[-0.25091976],
       [ 0.90142861],
       [ 0.46398788]])

In [10]:
for iter in range(1000):

    # forward propagation
    l0 = X
    #np.dot multiplies the input matrix 4*3 with weights 3*1 to produce resultant 4*1 matrix
    # this dot product resultant matrix is passed through a "non-linear" function and sigmoid 
    # produces the output values between 0 & 1 which are probabilities 
    l1 = nonlin(np.dot(l0,syn0))

    # how much did we miss?
    l1_error = y - l1

    #Backpropagation
    # Compute gradients for all resultant values from Sigmoid function to assess the direction of shift 
    # of probability required : nonlin(l1,True)
    # That gradient is multiplied by error term ( l1_error) to reduce the gradient impact in case of high confidence predictions
    l1_delta = l1_error * nonlin(l1,True)

    # Update weights for next iteration , full batch run
    # L0.T makes it 3*4 matrix and multiplied by l1_delta 4*1 matrix.Results in 3*1 synapse
    syn0 += np.dot(l0.T,l1_delta)

print("Probabilities After Training:")
print (l1)

Output After Training:
[[0.03190418]
 [0.02579624]
 [0.97905223]
 [0.97406172]]
