# Computing gradients of functions in Pytorch
## Polynomial in $\mathbb{R}^1$

In [18]:
import torch

x = torch.tensor([3.], requires_grad=True)
fx = 3-(2*(x**2))+x**4
fx.backward()

dx = -4*x + 4*x**3
torch.equal(x.grad, dx)

True

## Linear in $\mathbb{R}^2$

In [11]:
x = torch.tensor([[3.], [3.]], requires_grad=True)
A = torch.tensor([[2., -1.], [1., 3.]])
i = torch.mm(x.t(), A)
fx = torch.mm(i, x)
fx.backward()

AA = torch.add(A, A.t())
torch.equal(x.grad, torch.mm(AA,x))

True

## Linear in $\mathbb{R}^{2x2}$ (Jacobian matrix)
For $f(x) = u^T x v$, and $x=
  \begin{bmatrix}
    x_1 & x_2 \\
    x_3 & x_4
  \end{bmatrix}$, since $f: \mathbb{R}^{2x2} \mapsto \mathbb{R}^1$,
  the gradient can be defined by the $2x2$ Jacobian $J_f^{2x2}=
  \begin{bmatrix}
     \frac{\partial f}{\partial x_1} &
     \frac{\partial f}{\partial x_2} \\
     \frac{\partial f}{\partial x_3} &
     \frac{\partial f}{\partial x_4}
  \end{bmatrix}$
  
Since $f(x)=x_1 - x_3 + 2x_2 - 2x_4$,
This gives $J_f^{2x2}=
  \begin{bmatrix}
     1 & 2 \\
     -1 & -2
  \end{bmatrix}$

In [17]:
x = torch.tensor([[1., 2.],[3., 4.]], requires_grad=True)
u = torch.tensor([[1.], [-1.]])
v = torch.tensor([[1.], [2.]])

i = torch.mm(u.t(), x)
fx = torch.mm(i, v)
fx.backward()

J = torch.tensor([[1., 2.], [-1., -2.]])
torch.equal(x.grad, J)

True

## Learning Points

* Initialize tensors properly. There's a difference between `torch.tensor([3.])` and `torch.tensor([[3.]])`:

In [23]:
print(torch.tensor([3.]).shape)
print(torch.tensor([[3.]]).shape)

torch.Size([1])
torch.Size([1, 1])


* Use `torch.mm()` for matrix multiplication (strictly $nxd * dxm$) without broadcasting
* Use `torch.matmul()` for inferred multiplication (matrix-vector, matrix-matrix, ...) with broadcasting