# Start Of The Deep Learning ERA - BackProp 

Implementation of an `autograd engine` to understand `Backpropagation` 

The engine performs `Backpropagation`. It calculates the gradient of the weights with respect to the loss function.

Then, it uses the gradient algorithm to `adjust the values of the weights` relative to the loss function so that the `loss moves towards zero.`

## Some Stuff You Should Know Before Building Autograd Engine : 

### How Backpropagation Is Get Used In Training Deep Neural Networks ?

#### `STEPS: `

`Initialize Param:`  Start by initializing the neural network with random weights.(why cause its just help)

`Forward Pass:` Pass the input data through the network to get the predicted output.

`Compute Loss:` Calculate the loss by comparing the predicted output with the actual target values.

`Backprop:` Perform backpropagation to calculate the gradient of the loss with respect to each weight in the network.

`Update Weights:` Use the ``gradient algorithm`` to adjust the weights based on the calculated gradients, in order to minimize the loss.

`Repeat:` Repeat steps 2-5 for a set number of epochs or until the loss goes close to zero.

 




### All Math Behind The BackProp 

#### 1. Derivation of add 

![add](../../images/add-1.drawio.png)

Given the equation:


$$
c = a + b
$$

### Finding the Partial Derivative of \( c \) with Respect to \( a \):

We need to find the partial derivative of \( c \) with respect to \( a \):

$$
\frac{\partial c}{\partial a} = \frac{\partial a}{\partial a} + \frac{\partial b}{\partial a}
$$

Since:

$$
\frac{\partial a}{\partial a} = 1
$$

And:

$$
\frac{\partial b}{\partial a} = 0
$$

Therefore:

$$
\frac{\partial c}{\partial a} = 1
$$

### Finding the Partial Derivative of \( c \) with Respect to \( b \):

Next, we find the partial derivative of \( c \) with respect to \( b \):

$$
\frac{\partial c}{\partial b} = \frac{\partial a}{\partial b} + \frac{\partial b}{\partial b}
$$

Since:

$$
\frac{\partial a}{\partial b} = 0
$$

And:

$$
\frac{\partial b}{\partial b} = 1
$$

Therefore:

$$
\frac{\partial c}{\partial b} = 1
$$




#### 2. Derivation of mul


![mul](../../images/mul.drawio.png)


Given the equation:

$$
c = a \times b
$$

### Finding the Partial Derivative of \( c \) with Respect to \( a \):

We need to find the partial derivative of \( c \) with respect to \( a \):

$$
\frac{\partial c}{\partial a} = b \frac{\partial a}{\partial a} + a \frac{\partial b}{\partial a}
$$

Since:

$$
\frac{\partial a}{\partial a} = 1
$$

And:

$$
\frac{\partial b}{\partial a} = 0
$$

Therefore:

$$
\frac{\partial c}{\partial a} = b
$$

### Finding the Partial Derivative of \( c \) with Respect to \( b \):

Next, we find the partial derivative of \( c \) with respect to \( b \):

$$
\frac{\partial c}{\partial b} = a \frac{\partial a}{\partial b} + b \frac{\partial b}{\partial b}
$$

Since:

$$
\frac{\partial a}{\partial b} = 0
$$

And:

$$
\frac{\partial b}{\partial b} = 1
$$

Therefore:

$$
\frac{\partial c}{\partial b} = a
$$


#### 3. The Chain Rule 

Check Out This Pdf for `BackProp Manually` : [Link](../../01-deep-neural-networks/01-dnn/backprop.pdf)

### Gradient Descent The MC Of The Backpropogation:

![img](../../images/for_revered_guest.png)




#### Let's Learn How It Works Internally `(All The Math):`

`J(θ1,θ2) -> J(w,b)`

$$
J(w, b) \text{ is the cost function.}
$$

$$
w_i = w_i - \alpha \frac{\partial J}{\partial w_i}
$$

$$
b = b - \alpha \frac{\partial J}{\partial b}
$$

forward pass `->` calculate loss `->` backprop `->` **update the weights** (by using above equations)

The `line` on the graph depicts the `gradient descent algorithm`, showing its process of updating parameters to minimize the loss towards zero.
 
The provided `formulas does all the weights updation`.

### Read More, Learn More and Build More:

#### Books -> 

1. [Machine Learning with PyTorch and Scikit-Learn by Sebastian Raschka](https://www.amazon.in/Machine-Learning-PyTorch-Scikit-Learn-learning-ebook/dp/B09NW48MR1)

2. [Understanding Deep Learning](https://udlbook.github.io/udlbook/)


#### Video Tutorials ->

1. [Micrograd by Andrej Karpathy](https://www.youtube.com/watch?v=VMj-3S1tku0)
2. [What is backpropagation really doing? by 3Blue1Brown](https://www.youtube.com/@3blue1brown)


## Now lets Build the Micrograd by Andrej Karpathy (with more functionality): 

In [13]:
#import 

import math 
import torch 
import numpy as np
import matplotlib.pyplot as plt

In [11]:
class Tensor:

    def __init__ (self, data, _op='', _children=(), label=''):
        
        self.data = data # data 
        self._op = _op #store the operation (like +,- etc)
        self._prev = set(_children) # where this came from 
        self.grad = 0.0 # default value 
        self._backward = lambda : None # defalut not activated 
        self.label = label # label for the each data point 

    def __add__(self, other):

        other = other if isinstance(other, Tensor) else Tensor(other)

        out = Tensor(self.data + other.data,(self,other),'+') # the other access the other data other than 
                                                              # the self.data = a , other.data = b 
        def _backward():
            self.data += 1.0 * out.grad 
            other.data += 1.0 * out.grad
        out._backward = _backward                                                     

        return out 
    
    def __mul__(self, other):

        other = other if isinstance(other, Tensor) else Tensor(other) # if the other object is not tensor then it converts the scalar to tensor

        out = Tensor(self.data * other.data,(self,other),'*')

        def _backward():

            self.data += other.data * out.grad # L = d * F - > F for mul der 
            other.data += self.data * out.grad
        out._backward = _backward


        return out 
    
    def tanh(self):
           x = self.data 
           
           t = (math.exp(2*x) - 1)/(math.exp(2*x) + 1)
           out = Tensor(t, (self, ), 'tanh')
           
           def _backward():
               
               self.grad += (1 - t**2) * out.grad
               
           out._backward = _backward
           
           return out 
      
    def relu(self): # relu 
        x = self.data
        t = np.maximum(0, x)
        out = Tensor(t, (self,), 'relu')

        def _backward():
            self.grad += (t > 0) * out.grad

        out._backward = _backward

        return out 
    
    def gelu(self): # Gelu 
        x = self.data
        t = 0.5 * x * (1 + np.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * np.power(x, 3))))
        out = Tensor(t, (self,), 'gelu')

        def _backward():
            tanh_out = np.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * np.power(x, 3)))
            derivative = 0.5 * (1 + tanh_out + x * (1 - np.square(tanh_out)) * (np.sqrt(2 / np.pi) + 0.134145 * np.power(x, 2)))
            self.grad += derivative * out.grad

        out._backward = _backward

        return out
    
    def sigmoid(self):  # sigmoid 
        x = self.data 
        t = 1 / (1 + np.exp(-x))
        out = Tensor(t, (self,), 'sigmoid')

        def _backward():
            self.grad += t * (1 - t) * out.grad

        out._backward = _backward

        return out
    
    def softmax(self):  # softmax 
        x = self.data
        exps = np.exp(x - np.max(x))
        t = exps / np.sum(exps)
        out = Tensor(t, (self,), 'softmax')

        def _backward():
            for i in range(len(t)):
                self.grad[i] += t[i] * (1 - t[i]) * out.grad[i]

        out._backward = _backward

        return ou
    
    
    def __rmul__(self,other): # arranging the a * b == b * a 

        return  self * other 
    

    def backward(self):
        
        topo = []
        visited = set()
        def build_topo(v):
            if v not in visited:
                visited.add(v)
                
                for child in v._prev:
                    build_topo(child)
                topo.append(v)
            
        build_topo(self)
        
        self.grad = 1.0
        
        for node in reversed(topo):
            node._backward()
         
        
    def __repr__(self):

        return f"Tensor(data={self.data})"