#Automatic Differentiation with `torch.autograd`


##Tensors, Functions and Computational Graph
- In below, `w` and `b` are parameters which we need to optimize.
- To compute the gradients of loss function with respect to those variables, we set the `requires_grad=True` of those tensors.

In [2]:
import torch

x = torch.ones(5)
y = torch.zeros(3)
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w) + b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

A function that we apply to tensors to construct computational graph is in fact an object of class `Function`. This object knows how to compute the function in the forward direction, and also how to compute its derivative during the backward propagation step. A reference to the backward propagation function is stored in `grad_fn` property of a tensor.

In [3]:
print(f"Gradient function for z = {z.grad_fn}")
print(f"Gradient function for loss = {loss.grad_fn}")

Gradient function for z = <AddBackward0 object at 0x7f6749ba4710>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x7f6749bee390>


##Computing Gradients


In [7]:
loss.backward()
print(w.grad)
print(b.grad)
print(z.grad)

tensor([[0.1952, 0.1672, 0.3291],
        [0.1952, 0.1672, 0.3291],
        [0.1952, 0.1672, 0.3291],
        [0.1952, 0.1672, 0.3291],
        [0.1952, 0.1672, 0.3291]])
tensor([0.1952, 0.1672, 0.3291])
None


##Disabling Gradient Tracking
- By default, all tensors with `requried_grad=True` are tracking their computational history and support gradient computation.
- However, we only want to do forward computations through the network.
- We can stop tracking computations by surrounding our computation code with `torch.no_grad()` block.

In [5]:
z = torch.matmul(x, w) + b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w) + b
print(z.requires_grad)

True
False


In [8]:
z = torch.matmul(x, w) + b
z_det = z.detach()
print(z_det.requires_grad)

False
