## A simple example using Pytorch's Autograd

We have a function $f$ that takes in an observation $\vec x$ and weights $\vec w$ to compute our predicted value $f(\vec w,\vec x)$.

We also have a target $y$. We want to tune the weights $\vec w$ such that $f(w,x)\approx y$ as close as possible.

To calculate the "closeness" between our prediction and our target, we define a simple loss function:

$$
loss(\vec w, \vec x, y)=(f(\vec w,\vec x) - y)^2 = (\vec w^T \vec x - y)^2
$$

To tune the weights, we minimize the loss function w.r.t. the weights. We minimize the loss function by iteratively updating the weights using gradient descent. But this requires us to **compute the gradient of the loss function w.r.t. to our weights**.

In [2]:
import torch

In [8]:
# autograd example
def f(x, w):
    """f(x,w)=w0*x0 + w1*x1""" 
    return w.T.dot(x)


x = torch.Tensor([0.5, 1.5])
y = 10

w = torch.randn((2, ), requires_grad=True)
print(w)

# forward
pred = w.T.dot(x)
loss = (pred - y)**2

# Use autograd to compute the backward pass. This call will compute the
# gradient of loss with respect to all Tensors with requires_grad=True.
# PyTorch will store the gradients in w.grad
loss.backward()

gradient_w = w.grad
print(gradient_w)

tensor([ 0.8838, -0.9800], requires_grad=True)
tensor([-11.0281, -33.0844])


**Compute the gradient manually to check whether PyTorch does the right thing.**

<img src="https://i.imgur.com/zpo409u.png" style="width:550px">

In [9]:
def compute_gradient(w, x, y):
    dfdw1 = 2*(w.T.dot(x)-y)*x[0]
    dfdw2 = 2*(w.T.dot(x)-y)*x[1]
    return torch.Tensor([dfdw1, dfdw2])

compute_gradient(w, x, y)

tensor([-11.0281, -33.0844])

Yes! We get the same gradient as PyTorch using math.

In [10]:
# Using the computed gradient w.r.t. weights, 
# we can manually update the weights
lr = 0.01
w = w - lr*gradient_w
print(w)

tensor([ 0.9941, -0.6492], grad_fn=<SubBackward0>)
