# Autograd

y=x^2
#### derivative
dy/dx=2x

$$du/dx = du/dz * dz/dy * dy/dx$$

X-----w----->b(sigmoid_activation_fn)-------->y^------------>loss

## 1. Forward Pass
$$z=wX + b$$
$$\hat{y} = \sigma(wX + b) = \frac{1}{1 + e^{-(wX + b)}}$$

## 2. Calculate Loss

$$Loss = \hat{y}-y$$

## 3. Backward Pass

Compute gradients of the loss with respect to the parameters

$$
\frac{d\mathcal{L}}{dw} = \frac{d\mathcal{L}}{d\hat{y}} \cdot \frac{d\hat{y}}{dz} \cdot \frac{dz}{dw}
$$

$$
\frac{d\mathcal{L}}{db} = \frac{d\mathcal{L}}{d\hat{y}} \cdot \frac{d\hat{y}}{dz} \cdot \frac{dz}{db}
$$

## 4. Update gradients using optimization (gradient descent)

#### Autograd helps in automatic differentiation for tensor operations, gradient computation using gradient descent (optimization)

In [2]:
!uv pip install torch

[2mUsing Python 3.12.3 environment at: /Users/schubhm/Projects/ai/agents/.venv[0m
[2K[2mResolved [1m10 packages[0m [2min 886ms[0m[0m                                        [0m
[2K[37m⠙[0m [2mPreparing packages...[0m (0/4)                                                   
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/4)--------------[0m[0m     0 B/1.94 MiB            [1A
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/4)--------------[0m[0m 16.00 KiB/1.94 MiB          [1A
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/4)--------------[0m[0m 32.00 KiB/1.94 MiB          [1A
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/4)--------------[0m[0m 48.00 KiB/1.94 MiB          [1A
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/4)--------------[0m[0m 64.00 KiB/1.94 MiB          [1A
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/4)--------------[0m[0m 80.00 KiB/1.94 MiB          [1A
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/

In [3]:
import torch

In [4]:
x=torch.tensor(3.0, requires_grad=True)

In [5]:
x

tensor(3., requires_grad=True)

In [6]:
y=x**2

In [7]:
y

tensor(9., grad_fn=<PowBackward0>)

In [10]:
y.backward()

In [12]:
x.grad

tensor(6.)

In [13]:
x=torch.tensor(3.0,requires_grad=True)

In [14]:
y = x ** 2

In [15]:
z=torch.sin(y)

In [16]:
y

tensor(9., grad_fn=<PowBackward0>)

In [17]:
z

tensor(0.4121, grad_fn=<SinBackward0>)

In [21]:
z.backward

<bound method Tensor.backward of tensor(0.4121, grad_fn=<SinBackward0>)>

In [22]:
x.grad

tensor(-5.4668)

In [25]:
y.backward()

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

In [24]:
x.grad

tensor(-5.4668)

First Derivative
$$
\frac{d\mathcal{L}}{dw} = \frac{d\mathcal{L}}{d\hat{y}} \cdot \frac{d\hat{y}}{dz} \cdot \frac{dz}{dw}
$$
Linear Transformation:
$$ z=wX+b $$
Activation (Sigmoid fn):
$$\hat{y} = \sigma(wX + b) = \frac{1}{1 + e^{-(wX + b)}}$$
Loss fn:
$$Loss = \hat{y}-y$$
Second Derivative
$$
\frac{d\mathcal{L}}{db} = \frac{d\mathcal{L}}{d\hat{y}} \cdot \frac{d\hat{y}}{dz} \cdot \frac{dz}{db}
$$

Gradients - 
$$
\frac{d\mathcal{L}}{dw} = (y^* - y) \cdot x
$$

$$
\frac{d\mathcal{L}}{db} = y^* - y
$$

In [34]:
x = torch.tensor(6.7)
y = torch.tensor(0.0)
w = torch.tensor(1.0, requires_grad=True)
b = torch.tensor(0.0, requires_grad=True)

In [39]:
z = w*x+b

In [40]:
y_pred=torch.sigmoid(z)

In [41]:
y_pred

tensor(0.9988, grad_fn=<SigmoidBackward0>)

In [42]:
def binary_cross_entropy_loss(prediction, target):
    epsilon = 1e-8
    prediction = torch.clamp(prediction, epsilon, 1-epsilon)
    return -(target * torch.log(prediction) + (1 - target) * torch.log(1-prediction))

In [43]:
loss = binary_cross_entropy_loss(y_pred, y)

In [44]:
loss

tensor(6.7012, grad_fn=<NegBackward0>)

In [45]:
loss.backward()

In [47]:
print(w.grad)
print(b.grad)

tensor(6.6918)
tensor(0.9988)
