<a href="https://colab.research.google.com/github/lhlich/ml_coding/blob/master/deep_learning/gradient.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Gradient computation

We'll show how to compute gradient in three approaches:
- Analytical - vanilla: with the closed form of the original function, we can take the partial derivatives with respect to all variables, and implement the gradient function as another method
  - We call it vanilla since we apply the definition directly
- Numberical: gradient means how output changes with respect to a small shift of input, so we just simulate the process.
- Analytical - back propagation: we treat the target function as a computation graph of a few known sub-functions with known analytical gradients. In this case the analytical gradient can be deduced from chain rule and implemented through back propagation algorithm


#### Example function
Target function: $y = f(\mathbf{x}) = x_1^2 + x_2^2$, where $\mathbf{x} = [x_1, x_2]^T$ is the variable vector

#### Analytical gradient - vanilla

Vanilla analytical gradient needs to deduce the closed form the gradient for any $\mathbf{x}$. In this example obviously we have $\frac{\partial y}{\partial x_i} = 2x_i$ and hence:

$\triangledown f = [2x_1, 2x_2]^T = 2\mathbf{x} $

In [7]:
import numpy as np

def comp_grad_analytical(x):
    return 2*x

x = np.array([3.0, 4.0])
print(f'analytical gradient: {comp_grad_analytical(x)}')

analytical gradient: [6. 8.]


#### Numerical gradient

Numerical gradient is straightforward and generic:

1. Set a small shift $\epsilon$ for given variable, e.g. $x_1$. In this case denote $d\mathbf{x}$ as $[\epsilon, 0]^T$
2. Compute $dy = f(\mathbf{x} + d\mathbf{x}) - f(\mathbf{x})$ and the derivative w.r.t $x_1$ is roughly $\frac{\partial y}{\partial x_1} \approx dy/ \epsilon $
4. Repeat for $x_2$

In [9]:

def comp_grad_numerical_generic(f, x, epsilon = 0.1):
    grad = []
    for i, x_i in enumerate(x):
        dx = np.zeros_like(x)
        dx[i] = epsilon
        dy = f(x + dx) - f(x)
        grad.append(dy/epsilon)

    return np.array(grad).reshape(x.shape)

def tar_func(x):
    x_1, x_2 = x[0], x[1]
    return x_1 * x_1 + x_2 * x_2

x = np.array([3.0, 4.0])
print(f'numerical gradient: {comp_grad_numerical_generic(tar_func, x)}')

numerical gradient: [6.1 8.1]


#### Analytical gradient - back propagation

We will directly use pytorch to implement it. The essence here is that $f(\mathbf{x}) = x_1^2 + x_2^2$ involves square operation and sum operation, which are the computation graph nodes with known gradient

In [10]:
import torch

x = torch.tensor([3.0, 4.0], requires_grad=True)
x_1, x_2 = x[0], x[1]
y = x[0] * x[0] + x[1] * x[1] # y = torch.sum(x ** 2)
y.backward()

print(f'Analytical gradient from back propagation: {x.grad}')

Analytical gradient from back propagation: tensor([6., 8.])
