# Advanced Certification Program in Computational Data Science
## A program by IISc and TalentSprint
### Additional Notebook (Ungraded): Automatic differentiation

## Learning Objectives

At the end of the experiment, you will be able to:

* understand the basics of automatic differentiation

* understand the backward and forward propagation for a given neural network

In [None]:
#@title Walkthrough Video
from IPython.display import HTML
HTML("""<video width="420" height="240" controls>
<source src="https://cdn-exec.ap-south-1.linodeobjects.com/content/Automatic_Differentiation_v1_Debasish_Bhaskar_edited.mp4">
</video>""")

## Information

**Understanding Basics of Neural Network and its parameters**

**Neural Network**

Neural networks are a class of machine learning algorithms used to model complex patterns in datasets using multiple hidden layers and non-linear activation functions. A neural network takes an input, passes it through multiple layers of hidden neurons (mini-functions with unique coefficients that must be learned), and outputs a prediction representing the combined input of all the neurons.

Neural networks are trained iteratively using optimization techniques like gradient descent. After each cycle of training, an error metric is calculated based on the difference between prediction and target.

**Neuron**

A neuron takes a group of weighted inputs, applies an activation function, and returns an output.

**Weights**

Weights are values that control the strength of the connection between two neurons. That is, inputs are typically multiplied by weights, and that defines how much influence the input will have on the output. In other words: when the inputs are transmitted between neurons, the weights are applied to the inputs along with an additional value (the bias)

**Bias**

Bias terms are additional constants attached to neurons and added to the weighted input before the activation function is applied. Bias terms help models represent patterns that do not necessarily pass through the origin.

**Layers**

*Input Layer*

Holds the data your model will train on. Each neuron in the input layer represents a unique attribute in your dataset (e.g. height, hair color, etc.).

*Hidden Layer*

Sits between the input and output layers and applies an activation function before passing on the results. There are often multiple hidden layers in a network.

*Output Layer*

The final layer in a network. It receives input from the previous hidden layer, optionally applies an activation function, and returns an output representing your model’s prediction.

#### Importing required packages

In [None]:
import numpy as np
import scipy
import matplotlib.pyplot as plt
import numpy.linalg as npl #Linear algebra from numpy
from scipy.optimize import differential_evolution #Finds the global minimum of a multivariate function.
import math
from scipy.stats import norm

## Automatic Differentiation

How do neural networks calculate the partial derivatives of an expression? The answer lies in a process known as automatic differentiation. Automatic differentiation can only calculate the partial derivative of an expression on a certain point.

## Backpropagation

Backpropagation is a special case of automatic differentiation. We can think of automatic differentiation as a set of techniques to numerically (in contrast to
symbolically) evaluate the exact gradient of a
function by working with intermediate variables and applying the chain
rule. Automatic differentiation applies a series of elementary arithmetic operations, e.g., addition and multiplication and elementary functions,
e.g., sin, cos, exp, log. By applying the chain rule to these operations, the
gradient of quite complicated functions can be computed automatically.
Automatic differentiation applies to general computer programs and has
forward and reverse modes.

Let us look at an instuctive example to understand reverse mode of propagation.

*Example:* Consider the function

$f(x) = \sqrt{x^{2}+ \exp{(x^{2})}} + \cos{(x^{2}+ \exp{(x^{2})}) } $

If we were to implement a function f on a computer, we would be able to save some computation by using intermediate variables:

$a=x^{2},$

$b=\exp{(a)},$

$c=a+b,$

$d=\sqrt{c},$

$e=\cos{c},$

$f=d+e.$

![Image]( https://cdn.iisc.talentsprint.com/CDS/Images/Automatic_differentiation.png)

$\text{Figure: Computation graph with inputs x, function values f, and intermediate variables a, b, c, d, e.}$

The set of equations that include intermediate variables can be thought
of as a computation graph, a representation that is widely used in implementations of neural network software libraries. We can directly compute
the derivatives of the intermediate variables with respect to their corresponding inputs by recalling the definition of the derivative of elementary
functions. We obtain the following:

$\frac{\partial a}{\partial x} = 2x $

$\frac{\partial b}{\partial a} = \exp{(a)} $

$\frac{\partial c}{\partial a} = 1 = \frac{\partial c}{\partial b} $

$\frac{\partial d}{\partial c}= \frac{1}{2\sqrt{c}}$

$\frac{\partial e}{\partial c} = -\sin{(c)} $

$\frac{\partial f}{\partial e} = 1 = \frac{\partial f}{\partial d}$

By looking at the computation graph in Figure above, we can compute
$∂f /∂x$ by working backward from the output and obtain:

$\frac{∂f}{∂c} = \frac{∂f}{∂d}  \frac{∂d}{∂c} + \frac{∂f}{∂e}  \frac{∂e}{∂c}$

$\frac{∂f}{∂b} = \frac{∂f}{∂c}\frac{∂c}{∂b}$

$\frac{∂f}{∂a} = \frac{∂f}{∂b}\frac{∂b}{∂a} + \frac{∂f}{∂c}\frac{∂c}{∂a}$

$\frac{∂f}{∂x} = \frac{∂f}{∂a}\frac{∂a}{∂x}$

Note that we implicitly applied the chain rule to obtain $∂f/∂x$. By substituting the results of the derivatives of the elementary functions, we get

$\frac{∂f}{∂c} = 1 · \frac{1}{2√c} + 1 · (− sin(c))$

${∂f}{∂b} = \frac{∂f}{∂c} · 1$

$\frac{∂f}{∂a} = \frac{∂f}{∂b} exp(a) + \frac{∂f}{∂c} · 1$

$\frac{∂f}{∂x} = \frac{∂f}{∂a} · 2x$

By thinking of each of the derivatives above as a variable, we observe
that the computation required for calculating the derivative is of similar
complexity as the computation of the function itself.

Backpropagation is a standard method of training artificial neural networks.  This method is used for fine-tuning the weights of a neural net based on the error rate obtained in the previous iteration. Proper tuning of the weights reduces the error rates and allows the model to make increasingly reliable predictions. It traverses the network in reverse order, from the output to the input layer, according to the chain rule from calculus and helps to calculate the gradient of a loss function with respect to all the weights in the network.

![NN](https://cdn.iisc.talentsprint.com/CDS/NN.jpg)

# Training a neural network: Forward and Backward propagation

- Forward propagation sequentially calculates and stores intermediate variables within the computational graph defined by the neural network. It proceeds from the input to the output layer.

- Backpropagation sequentially calculates and stores the gradients of intermediate variables and parameters within the neural network in the reversed order.

**Implementing backpropagation**

The back propagation algorithm begins by comparing the actual value output by the forward propagation process to the expected value and then moves backward through the network, slightly adjusting each of the weights in a direction that reduces the size of the error by a small degree. Both forward and back propagation are re-run thousands of times on each input combination until the network can accurately predict the expected output of the possible inputs using forward propagation.
Here we take a simple example consisting of input X and output y as given below

In [None]:
# Initialize the input and output
X = np.array(([0, 0], [0, 1], [1, 0], [1, 1]), dtype=float)
y = np.array(([0], [1], [1], [0]), dtype=float)

In [None]:
# Initialize the parameters
iterations = 5000
output = None
learning_rate = 0.1
weights = [np.random.uniform(low=-0.2, high=0.2, size=(2, 2)), np.random.uniform(low=-2, high=2, size=(2, 1)) ]

#### Forward propagation

In forward propagation, input is multiplied with weights and resultant output is passed as input to hidden layers and finally output is carried at final layer.

Below function `feed_forward_pass()` takes input as argument and produces the output by multiplying with weights in sequential layers. It return final layer output and also all the layers outputs which can be useful in backpropagation.  


In [None]:
def feed_forward_pass(x_values):
    # forward
    input_layer = x_values
    hidden_layer = tang(np.dot(input_layer, weights[0]))
    # dot product of hidden layer output with weights and applying activation over it
    output_layer = tang(np.dot(hidden_layer, weights[1]))
    layers = [input_layer,hidden_layer,output_layer]
    # last layer is an output
    return layers, layers[2]

#### Backpropagation

Backpropagation is an algorithm commonly used to train neural networks. When the neural network is initialized, weights are set for its individual elements, called neurons. Inputs are loaded, they are passed through the network of neurons, and the network provides an output for each one, given the initial weights. Backpropagation helps to adjust the weights of the neurons so that the result comes closer and closer to the known true result.

![image.png](https://cdn.iisc.talentsprint.com/CDS/BP.JPG)

In [None]:
# back propagation error through the network layers
def backward_pass(target_output, actual_output, layers):
    global weights
    # divergence of network output
    err = (target_output - actual_output)
    # backward from output to input layer
    # propagate gradients using chain rule
    for backward in range(2, 0, -1):
        err_delta = err * derivative_tang(layers[backward])
        # update weights using computed gradient
        weights[backward - 1] += learning_rate * np.dot(layers[backward - 1].T, err_delta)
        # propagate error using updated weights of previous layer
        err = np.dot(err_delta, weights[backward - 1].T)
    return err

#### Activation functions

Activation function is used to determine the output of neural network like yes or no. It maps the resulting values in between 0 to 1 or -1 to 1 etc. (depending upon the function)

In [None]:
# activation functions
def tang(y):
    return np.tanh(y)

# derivative of tang function to use in backpropagation
def derivative_tang(y):
    return 1.0 - y ** 2

def sigmoid(y):
    return 1 / (1 + np.exp(-y))

# derivative of sigmoid function to use in backpropagation
def derivative_sigmoid(y):
    return y * (1 - y)

#### Train the network by calling `feed_forward_pass` and `backward_pass`

In [None]:
def train(x_values, target):
    # produce the output from forward pass
    layers , output = feed_forward_pass(x_values)
    # calculate the error and update the weights
    error = backward_pass(target, output,layers)
    return output

##### we train the network for n iterations to update the weights accordingly

In [None]:
# training the network for 500 iterations i.e. weights will update 500 times
for i in range(iterations):
    # invoke the train function to get output
    output = train(X, y)

    # To print the output for every 50 iterations
    ten = iterations // 10
    if i % ten == 0:
        print("Iteration number: {} / Squared loss:{} ".format(str(i), str(np.mean(np.square(y - output)))))

#### Predict

let us define a function to predict the input by forward_pass using the updated weights

In [None]:
def predict(x_values):
    # passing inputs through the forward pass
    return feed_forward_pass(x_values)[1]

In [None]:
# predict
for i in range(len(X)):
    print('-' * 20)
    print('Input value: ' + str(X[i]))
    print('Predicted target: ' + str(predict(X[i])))
    print('Actual target: ' + str(y[i]))