### Autograd and gradient descent

First understand how Gradient Descent and backprop works. Its basically differential derivation wrt the specific weights and biases and changing the params in accordance with the gradient discovered in multiple steps to find some local minimum.

Pt+1 = Pt - Learning rate * Gradient

How we understand this process is called the Symbolic Differentiation method, which calculates each element of the gradient vector.  This is easy to understand, but hard to implement (for complex neural nets or certain activation functions) - and in practice we use other methods. Namely: 

#### Automatic differentiation

The implementation might be different with the different frameworks, however. All frameworks use some variance of Automatic differentiation, which is conceptually difficult to understand, but easy to implement.

Its hard to understand, and relies on a Taylor series expansion - a mathematical trick.  This allows fast approximation of gradients! Not strictly necessary to understand this, just know its an approximation of gradient descent which scales and is simple to implement

#### Autograd Package

In PyTorch the implementation comes through the AutoGrad package.  We will look at how to use this here

In [1]:
import torch

In [4]:
tensor1 = torch.Tensor([[1,2,3], 
                        [4,5,6]])
tensor1

tensor([[1., 2., 3.],
        [4., 5., 6.]])

In [5]:
tensor2 = torch.Tensor([[7,8,9],
                        [10,11,12]])
tensor2

tensor([[ 7.,  8.,  9.],
        [10., 11., 12.]])

In [7]:
tensor1.requires_grad # When true it tracks computations for a tensor in forward phase
                      # and will calc gradients for this tensor in the backwards phase

False

In [8]:
# We need to enable the tracking history for the tensor
tensor1.requires_grad_()

tensor([[1., 2., 3.],
        [4., 5., 6.]], requires_grad=True)

In [11]:
print(tensor1.grad)  # hasnt been used in a computation yet

None


In [20]:
print(tensor1.grad_fn) # user created tensors have no grad fn

None


In [13]:
output_tensor = tensor1 * tensor2

In [15]:
output_tensor.requires_grad  # cause of tensor1

True

In [17]:
print(output_tensor.grad) # no gradients yet, we havent yet made any backwards passes 

None


In [19]:
output_tensor.grad_fn  # So it has a reference to the function that created this tensor

<MulBackward0 at 0x7fd0e7cabc40>

In [23]:
### Okay lets make a slightly different tensor and inspect it
output_tensor = (tensor1 * tensor2).mean()
output_tensor.grad_fn # It references only the last fn (note there are two funcs taking place above)

<MeanBackward0 at 0x7fd0e79d6880>

In [28]:
output_tensor

tensor(36.1667, grad_fn=<MeanBackward0>)

In [24]:
output_tensor.backward()

In [27]:
tensor1

tensor([[1., 2., 3.],
        [4., 5., 6.]], requires_grad=True)

In [26]:
tensor1.grad

tensor([[1.1667, 1.3333, 1.5000],
        [1.6667, 1.8333, 2.0000]])

In [43]:
# We can use a no_grad block to force pytorch to not propagate requires grads
with torch.no_grad():
    new_tensor = tensor1 * 3
    print(f'{new_tensor=}')
    print(f'{tensor1.requires_grad=}')
    print(f'{new_tensor.requires_grad=}') # Note that it is False

new_tensor=tensor([[ 3.,  6.,  9.],
        [12., 15., 18.]])
tensor1.requires_grad=True
new_tensor.requires_grad=False


In [47]:
def calculate(t):
    return t * 2

In [48]:
@torch.no_grad()
def calculate_with_no_grad(t):
    return t * 2

In [53]:
result_tensor = calculate(tensor1)
f'{result_tensor=}'

'result_tensor=tensor([[ 2.,  4.,  6.],\n        [ 8., 10., 12.]], grad_fn=<MulBackward0>)'

In [55]:
result_tensor_no_grad = calculate_with_no_grad(tensor1)
f'{result_tensor_no_grad=}'

'result_tensor_no_grad=tensor([[ 2.,  4.,  6.],\n        [ 8., 10., 12.]])'

In [57]:
# We can use a no_grad block to force pytorch to not propagate requires grads
with torch.no_grad():
    new_tensor_no_grad = tensor1 * 3
    print(f'{new_tensor_no_grad=}')
    print(f'{new_tensor_no_grad.requires_grad=}')
    with torch.enable_grad():
        new_tensor = tensor1 * 3
        print(f'{new_tensor=}')
        print(f'{tensor1.requires_grad=}')
        print(f'{new_tensor.requires_grad=}') # Note that it is False

new_tensor_no_grad=tensor([[ 3.,  6.,  9.],
        [12., 15., 18.]])
new_tensor_no_grad.requires_grad=False
new_tensor=tensor([[ 3.,  6.,  9.],
        [12., 15., 18.]], grad_fn=<MulBackward0>)
tensor1.requires_grad=True
new_tensor.requires_grad=True


In [101]:
t1 = torch.Tensor([2,4])
t2 = torch.Tensor([1,2])
t2.requires_grad_()
t2

tensor([1., 2.], requires_grad=True)

In [102]:
t3 = (t1 + t2).mean() - 0
t3

tensor(4.5000, grad_fn=<SubBackward0>)

In [103]:
t3.backward()
t2.grad

tensor([0.5000, 0.5000])