# Back Propagation

## Chain Rule

So let's say,

**y = a(x)**

and

**z = b(y)**.

Now, we want to find the derivative of z with respect to x (for any random reason).

This will be represented as *dz/dx*.

In order to find this, we will apply the Chain Rule of differentiation:

> dz/dx = dz/dy \* dy/dx

## Computational Graph

A **computational graph** is a *visual* <u>*representation*</u>of a mathematical *expression* or a series of operations. It is commonly used in **machine learning** and **deep learning** to represent<u>complex computations and their dependencies</u>.

In a computational graph, **nodes** represent **variables** or **operations**, and<u>edges</u>represent the<u>flow of data</u>between them. Each node performs a specific mathematical operation, such as addition, multiplication, or activation functions. The inputs to these operations are the outputs of other nodes in the graph.

# ![Computational Graph by Tyler Elliot Bettilyon](https://medium.com/tebs-lab/deep-neural-networks-as-computational-graphs-867fcaa56c9)

By organizing computations into a graph structure, it becomes easier to understand and optimize complex mathematical expressions. It also enables efficient computation by allowing parallel execution of independent operations.

Computational graphs are particularly useful in deep learning frameworks like TensorFlow and PyTorch, where they are used to define and execute neural networks. They provide a clear and intuitive way to represent the flow of data and gradients through the network, making it easier to train and optimize models.

Overall, computational graphs provide a powerful tool for understanding, visualizing, and optimizing complex mathematical computations in various domains, especially in machine learning and deep learning.

![image.png](attachment:image.png)

To calculate the loss throughout the graph we calculate a Loss Function.

> dLoss/dx = dLoss/d(loss-1) * ... * dy/dx

This involves three steps:

1. Forward pass: Compute Loss
2. Compute local gradients
3. Backward pass: Compute dLoss/dWeights using Chain Rule. 

![image.png](attachment:image.png)

> Fig: Linear regression

Here, the loss is the difference of predicted *y* and real *y*, squared.

### Q. <u>So how do we minimize our loss?</u>


A training sample would be x=1, y=2, w=2.

There will also be a testing sample, which is usually 50/50 of a sample being training or testing. Increase any one type of sample and the whole thing becomes unreliable, or there are different kinds of issues with it.


In [1]:
import torch

x = torch.tensor(1.0)
y = torch.tensor(2.0)

w = torch.tensor(1.0, requires_grad=True)

In [2]:
# Forward Pass and compute the loss

y_hat = w * x
loss = (y_hat - y)**2

print(loss)

tensor(1., grad_fn=<PowBackward0>)


In [3]:
# Backward Pass

loss.backward()
print(w.grad)

tensor(-2.)


In [4]:
### update weights
### next forward and backward passes
