<a href="https://colab.research.google.com/github/iriyagupta/GENAI-BA-CPlus/blob/main/Basic_concepts_in_Neural_Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Neural Networks**
* Rough idea: mimic the human brain (not that we know how that works!)
* earn through examples with no procedural learning algorithm
* As wikipedia says: <span style="color:blue">vaguely</span> inspired by animal brains

**Structure of neural networks**
* A neural network is a directed often acyclic graph
* **Neurons**: nodes
* **Synapses**: connections (edges) between nodes that contain weights
* a neuron calculates a weighted sum of its input nodes and then decides whether to fire or not (should it react to the input or not)
* **Activation functions**: the function that makes the activation decision
* **Layers**: aggregations of neurons that use the same transformation function (different layers can use different functions)
* **Input layer**: feature values from example cases enter the network here (think of it as the sensory organ of the network)
* **Output layer**: the result (action) layer (classes, continuous values)
* Hidden layers: all the layers between the input and output layers

![neuron](neuron.png)

**Example of a network**
from: https://upload.wikimedia.org/wikipedia/commons/4/46/Colored_neural_network.svg

![network](https://upload.wikimedia.org/wikipedia/commons/4/46/Colored_neural_network.svg)

* The idea in a neural network is that the weights represent knowledge
* The network is trained with some input data using supervised learning
* Most networks are complicated with many hidden layers and many (often millions) of synapses and weights
* The weights in the trained network are the "knowledge" necessary for the network to give appropriate outputs
* As more hidden layers are added, the network can learn more complex concepts

**Rough procedure: "for dummies"!**
* INITIALIZE: randomly assign weights to each connection
* RUN:
1. give an example to the network
2. calculate the weighted sum of inputs at each neuron
3. apply the activation function to that weighted sum
4. do this layer by layer until you get the output layer values
5. calculate the difference between calculated values and actual values
6. tweak weights in the entire network so that the calculate output gets "marginally" closer to the actual value
7. inse and repeat

**A super simple example**
* Each input case has 3 features, each of which is either a 0 or a 1
* We need to classify each input case into either a 0 or 1 (binary classification)
* The actual "real world" rule is that <i>if feature 1 is 0, the case is classified as 0. If feature 1 is 1, the case is classified as 1</i></li>
* (In other words, features 2 and 3 are noise and contain no information)</li>
* We want the net to learn this

![simple network](simple_network.png)

**Define inputs and outputs**

In [17]:
import numpy as np
X = np.array([  [0,0,1], #Since the first value is 0, the output value is 0
                [0,1,1], #Since the first value is 0, the output value is 0
                [1,0,1], #Since the first value is 1, the output value is 1
                [1,1,1]  #Since the first value is 1, the output value is 1
                ])
y = np.array([[0,0,1,1]]).T #Output array

**Initialization**
* generate random weights for every edge in the network
* we'll use np.random.random (generates random numbers between 0 and 1)
* and adjust the weights so that they are between -1 and 1

In [30]:
np.random.seed(42) #Sets the seed for the random numbers

#np.random.random((3,1)) returns a numpy array with 3 random numbers between 0 and 1
#multiply each value by 2. This gives us values between 0 and 2
#Subtract 1. This gives us values between -1 and 1
syn0 = 2*np.random.random((3,1)) - 1
syn0

array([[-0.25091976],
       [ 0.90142861],
       [ 0.46398788]])

**Define the activation function**
* Now we have input values (the X array) and weights (syn0)
* We can use these values to calculate the output from the output node
* The function that calculates the output of a node, given the inputs and the weights, is known as an **activation function**


**Let's see what our weighted sums look like**
* for each input case we can calculat the weighted sum
* $ \sum_{i,j} syn0_{j}*X_{i,j} $ where i is the i-th input case and j is the each element of $X_{i}$


In [31]:

np.dot(X,syn0)

array([[0.46398788],
       [1.3654165 ],
       [0.21306812],
       [1.11449673]])

* Nice. We could say that anything less than 0 is a 0 and anything greater than 0 is a 1
* And use these "predicted" 0s and 1s to compute the error
* And use these errors to adjust weights
* Not ideal, because:
** learning would be binary and the model would keep switching from class 0 to class 1 as we give it more training samples
** what we would like is for learning to be smooth
** "hmm, looks like a class 1 but i'll just tweak the probability that it is a class 1 rather than switch entirely to a class 1"
** "that way, over time, I'll get more and more sure"
** Also, if we're  sure we've learned something, we want to change the weights by a lot less than if we're not very sure we've learned something


<h2>Sigmoid function</h2>


![sigmoid function](https://upload.wikimedia.org/wikipedia/commons/8/88/Logistic-curve.svg)

The sigmoid function takes a number as its argument and returns a value between 0 and 1. It has the following properties:
* $ sigmoid(x) = \dfrac{1}{1+e^{-x}} $
* sigmoid(0) returns 0.5
* the derivative (slope) of the function is higher as values approach 0 and lower as values move away from 0
* the derivative is easy to calculate because it is a function of the sigmoid value
* $ \dfrac{d}{dx}sigmoid(x) = sigmoid(x)*(1-sigmoid(x)) $


**Complicated?**
* Focus on:
** Easy to calculate
** calculated value is between 0 and 1
** the derivative is easy to calculate
** the derivative is used to tweak the weights
** the derivative is lower the closer you are to 0 or 1 and the weights will change less in those cases (we are more sure!)

In [25]:
#This function returns the sigmoid value (between 0 and 1)
# and the derivative if the deriv = True
def sigmoid(x,deriv=False):
    if deriv:
        return sigmoid(x)*(1-sigmoid(x))
    return 1/(1+np.exp(-x))

print(sigmoid(-6),sigmoid(-6,True))

0.0024726231566347743 0.002466509291360048


In [26]:


print(sigmoid(5.5) - sigmoid(5.45))
print(sigmoid(1.0) - sigmoid(0.95))
print(sigmoid(.5) - sigmoid(.45))
print(sigmoid(0.05) - sigmoid(0.0))
print(sigmoid(-5.5) - sigmoid(-5.55))



0.00020778770380880385
0.009943400607141828
0.011820097252632555
0.012497396484210332
0.00019773427522509594


**Now let's train the network**
* First we'll multiply X by the weight array to get the weighted sum of each input weight combination
* Then apply the sigmoid function and get a value in the (0,1) range. This is the output value that we will use
* level_0: First (input layer)
* level_1: Second (hidden layer or the output layer in our case)
* Then compute the error (y - level_1). Note that y is either 0 or 1 but the value of level_1 is between 0 and 1
* Compute an adjustmant factor (delta). Multiply the error by the derivative of the sigmoid function
* Use these deltas to adjust the weights
* Repeat with the next set of training cases (or just give it the same set again)

**The function *run_net* runs the network for us**
* In each pass:
** use the entire input to compute the error and adjust the weights
** after n passes, the weights of the network will contain our rule

In [27]:
def sigmoid(x,deriv=False):
    if deriv:
        return sigmoid(x)*(1-sigmoid(x))
    return 1/(1+np.exp(-x))

def run_net(X,y,activation_function=sigmoid,passes=10):
    np.random.seed(1) #seed the random numbers
    syn0 = 2*np.random.random((3,1)) - 1    #Calculate initial weights
    for i in range(0,passes):
        level_0 = X  #Input to the nn
        level_1 = activation_function(np.dot(level_0,syn0)) #New weights
        level_1_error = y - level_1 #error (note: y is 1/0; level_1 is (0,1))
        #Get the derivative of the sigmoid (the change) and multiply by the error
        level_1_delta = level_1_error * activation_function(level_1,True)
        syn0 += np.dot(level_0.T,level_1_delta) #Update the weights (level_0 * deltas)
    return syn0

In [28]:
final_weights = run_net(X,y,sigmoid,10)
final_weights

array([[ 1.7600348 ],
       [ 0.26494204],
       [-0.86577132]])

**Generate predictions for the test sample**
* Multiply the test input by the weights
* Apply the sigmoid function
* values greater than 0.5 implie 1; value less than 0.5 implie 0;

In [None]:
test_X = np.array([[1,1,1],[0,1,1],[1,0,0],[0,0,1]])
sigmoid(np.dot(test_X,final_weights))

array([[0.76118833],
       [0.35415399],
       [0.85321402],
       [0.29613496]])

**100% accuracy!**

**Walkthrough**

In [32]:
level_0 = X #Input layer
syn0 = 2*np.random.random((3,1)) - 1 #initial weights
print(np.dot(level_0,syn0))
level_1 = sigmoid(np.dot(level_0,syn0)) #predicted y's
level_1_error = y - level_1 #error
level_1_delta = level_1_error * sigmoid(level_1,True) #change factor
syn0 += np.dot(level_0.T,level_1_delta) #new weights
print("level_0")
print(level_0,"\n")
print("syn0")
print(syn0,"\n")
print("level_1")
print(level_1,"\n")
print("y")
print(y,"\n")
print("level_1_error")
print(level_1_error,"\n")
print("deriv")
print(sigmoid(level_1,True),"\n")
print("level_1_delta")
print(level_1_delta,"\n")
print("new weights")
print(syn0,"\n")


[[-0.68801096]
 [-1.37597368]
 [-0.49069399]
 [-1.17865671]]
level_0
[[0 0 1]
 [0 1 1]
 [1 0 1]
 [1 1 1]] 

syn0
[[ 0.53548084]
 [-0.54931301]
 [-0.48107527]] 

level_1
[[0.33447569]
 [0.20165642]
 [0.37973009]
 [0.23529381]] 

y
[[0]
 [0]
 [1]
 [1]] 

level_1_error
[[-0.33447569]
 [-0.20165642]
 [ 0.62026991]
 [ 0.76470619]] 

deriv
[[0.24313621]
 [0.24747554]
 [0.24120006]
 [0.24657148]] 

level_1_delta
[[-0.08132315]
 [-0.04990503]
 [ 0.14960914]
 [ 0.18855474]] 

new weights
[[ 0.53548084]
 [-0.54931301]
 [-0.48107527]] 



**Great! Let's try to find a different pattern**
* If any two  or all three are 1, then the class is 1
* Otherwise the class is zero

In [33]:
import numpy as np
X = np.array([[0,0,1],
            [0,1,1],
            [1,0,1],
            [1,1,1],
             [1,1,0],
             [0,1,0],
             [1,0,0],
             [1,0,0]])

y = np.array([[0],[1],[1],[1],[1],[0],[0],[0]])

final_weights = run_net(X,y,sigmoid,10000)
test_X = np.array([[1,1,1],[0,1,1],[1,0,0],[0,0,1]])
sigmoid(np.dot(test_X,final_weights))

array([[0.80577894],
       [0.84660377],
       [0.42913136],
       [0.70142739]])

**Not so good this time**

* The pattern here is non-linear
* Nonlinearities can be captured by adding layers to the network
* We can try adding a hidden layer to our network


**Three layer network</h2>
* Input layer: 3 nodes
* Hidden layer: 4 nodes (the structure of the hidden layer is our choice)
* Output: 1 node


![network with hidden layer](multi layer network.png)

**Initialize**
* The network has two sets of weights
* 1. set 1 between the input and the hidden layer
* 2. set 2 between the hidden and the output layer
*randomly assign weights at each level

In [34]:
syn0 = 2*np.random.random((3,4)) - 1
syn1 = 2*np.random.random((4,1)) - 1


**feed forward network**
* Calculate the node outputs at the hidden layer level
* These become the inputs to the next layer (could be a hidden layer or, as in our case, the output layer)
* In this manner, information moves from one layer to the next
* This is known as a **feed forward network**


In [35]:
level_0 = X
level_1 = sigmoid(np.dot(level_0,syn0))
level_2 = sigmoid(np.dot(level_1,syn1))
level_2

array([[0.5053376 ],
       [0.50687089],
       [0.51655106],
       [0.51587768],
       [0.46972561],
       [0.45029194],
       [0.46710261],
       [0.46710261]])

**Backpropagation**
* Once information has moved all the way from the input layer, through the hidden layers, to the output layer, we have outputs from the network
* These outputs can be compared with the actual outputs in the training data to get the error
* This error needs to be propagated back through the hidden layers all the way to the input layer, adjusting weights between each set of layers
* The process of propagating this error backward is known as **backpropagation**
* The error is propagated back from the output layer to the input layer one layer at a time, adjusting weights along the way


In [36]:
level_2_error = y - level_2
level_2_error

array([[-0.5053376 ],
       [ 0.49312911],
       [ 0.48344894],
       [ 0.48412232],
       [ 0.53027439],
       [-0.45029194],
       [-0.46710261],
       [-0.46710261]])

In [37]:
level_2_delta = level_2_error*sigmoid(level_2,deriv=True)
level_2_delta

array([[-0.11860027],
       [ 0.11569104],
       [ 0.11314541],
       [ 0.11332227],
       [ 0.12551678],
       [-0.10705403],
       [-0.11063065],
       [-0.11063065]])

**Next, propagate the deltas back toward the input layer**

In [38]:
level_1_error = level_2_delta.dot(syn1.T)
level_1_delta = level_1_error * sigmoid(level_1,deriv=True)

**Calculate the new weights**

In [None]:
syn1 += level_1.T.dot(level_2_delta)
syn0 += level_0.T.dot(level_1_delta)

**Putting it all together**

In [39]:
def sigmoid(x,deriv=False):
    if deriv:
        return x*(1-x)
    return 1/(1+np.exp(-x))

def run_net(X,y,activation_function=sigmoid,passes=10):
    import time
    np.random.seed(1)
    syn0 = 2*np.random.random((3,4)) - 1
    syn1 = 2*np.random.random((4,1)) - 1

    for i in range(passes):
        level_0 = X
        level_1 = activation_function(np.dot(level_0,syn0))
        level_2 = activation_function(np.dot(level_1,syn1))

        level_2_error = y - level_2

        level_2_delta = level_2_error*activation_function(level_2,deriv=True)

        level_1_error = level_2_delta.dot(syn1.T)

        level_1_delta = level_1_error * activation_function(level_1,deriv=True)

        syn1 += level_1.T.dot(level_2_delta)
        syn0 += level_0.T.dot(level_1_delta)
    print(level_2)
    return syn0,syn1

In [40]:
syn0,syn1 = run_net(X,y,activation_function=sigmoid,passes=100)


[[0.29038182]
 [0.7597043 ]
 [0.72788203]
 [0.90378096]
 [0.73607507]
 [0.28925349]
 [0.19580023]
 [0.19580023]]


**Applying the net to test cases**


In [41]:
test_X

array([[1, 1, 1],
       [0, 1, 1],
       [1, 0, 0],
       [0, 0, 1]])

In [42]:
level_0 = test_X
level_1 = sigmoid(np.dot(level_0,syn0))
level_2 = sigmoid(np.dot(level_1,syn1))
level_2

array([[0.90480965],
       [0.76030727],
       [0.19489967],
       [0.28910602]])

**In Summary**
* By adding more hidden layers, the net can find patterns in higher dimensions
* However, as we make the network more complex, the computational power required increases because both feed forward as well as back propagation will be multiplying increasingly larger matrices
* But, because computing power has become cheap, and more accessible thanks to GPUs, neural networks are transforming AI
