First we import torch, as we do.

In [19]:
import torch

In [20]:
x = torch.randn(3, requires_grad=True)
print(x)

tensor([ 0.2513, -1.9036, -0.1739], requires_grad=True)


In [21]:
y = x + 2
print(y)

tensor([2.2513, 0.0964, 1.8261], grad_fn=<AddBackward0>)


In [27]:
z = y * y * 2
print(z)

tensor([10.1370,  0.0186,  6.6692], grad_fn=<MulBackward0>)


In [23]:
z = z.mean()
print(z)

tensor(5.6083, grad_fn=<MeanBackward0>)


This will now calculate the gradient of Z with respect to X.

In [24]:
z.backward() #dz/dx
print(x.grad)

tensor([3.0018, 0.1285, 2.4348])


*So*, theres a concept we'll encounter in next video called **Chain Rule** which involves the *Jacobian Matrix* multiplying with the *gradient function*.

In [28]:
# Execute z = y * y * 2 code and then execute this code for results
v = torch.tensor([0.1, 1.0, 0.001], dtype=torch.float32)
z.backward(v) #dz/dx
print(x.grad)

tensor([3.9023, 0.5139, 2.4421])


If `backward()` has a *vector* calling it then we can omit an input [`backward()`] but if we have a *scalar* variable calling the function then we need to provide a **tensor** as input [`backward(v)`].

In [34]:
x = torch.randn(3, requires_grad=True)
print(x)

tensor([ 0.0625,  0.1874, -0.8321], requires_grad=True)


In [30]:
x.requires_grad_(False)
print(x)

tensor([ 0.9165, -0.3080,  0.4239])


In [31]:
y = x.detach()
print(y)

tensor([ 0.9165, -0.3080,  0.4239])


In [32]:
with torch.no_grad():
    y = x + 2
    print(y)

tensor([2.9165, 1.6920, 2.4239])


In [35]:
# executed after making y a tensor with grad again
y = x + 2
print(y)

tensor([2.0625, 2.1874, 1.1679], grad_fn=<AddBackward0>)


In [52]:
weights = torch.ones(4, requires_grad=True)
print(weights)

tensor([1., 1., 1., 1.], requires_grad=True)


The code runs a loop for 3 epochs, performing the following steps in each iteration:

1. Computes model_output by multiplying weights by 3 and summing the result.
2. Calls backward() on model_output to compute gradients.
3. Prints the gradient of weights.
4. Resets the gradient of weights to zero using zero_().

In [38]:
for epoch in range(3):
    model_output = (weights*3).sum()
    model_output.backward()
    print(weights.grad)

    weights.grad.zero_()

tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])


In [50]:
optimizer = torch.optim.SGD([weights], lr=0.01)


In [51]:
optimizer.step()
optimizer.zero_grad()

*Add some context here patch if you understood this cuz i sure as hell didnt. (lecture 4 missing thats why.)*