## Torch Autograd

Supports automatic computation of gradient "for any computation graph".
I assume all operations are differentiable?

In [1]:
import torch

In [91]:
x = torch.ones(5) + 3
y = torch.zeros(3)
w = torch.randn(5, 3, requires_grad=True)  # note, default is false
b = torch.randn(3)
b.requires_grad_(True)  # you can set grad requirements afterwards
z = torch.matmul(x, w) + b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

In [92]:
loss

tensor(1.9705, grad_fn=<BinaryCrossEntropyWithLogitsBackward>)

### Computation Graph
![](https://pytorch.org/tutorials/_images/comp-graph.png)

In [83]:
z.grad_fn, loss.grad_fn

(<AddBackward0 at 0x7fd4636dfd90>,
 <BinaryCrossEntropyWithLogitsBackward at 0x7fd4636df220>)

In [84]:
loss.backward()  # compute gradients

In [85]:
w.grad, b.grad  # grads are equal because x is all ones

(tensor([[0.0422, 1.3333, 1.3333],
         [0.0422, 1.3333, 1.3333],
         [0.0422, 1.3333, 1.3333],
         [0.0422, 1.3333, 1.3333],
         [0.0422, 1.3333, 1.3333]]),
 tensor([0.0106, 0.3333, 0.3333]))

### Conjuncture on Implementation

To obtain gradients for child nodes, you simply need to multiply the derivate w.r.t. the operation.
A full backprop would require computing all intermediate function values and the gradients
for the leaf nodes (inputs and weights) that _`requires_grad`_ (oh!).

I.e., `n.backward()` computes dn/di for all nodes i that `requires_grad`.

In [93]:
try:
    z.backward()
except Exception as e:
    print("Error:", e)

Error: grad can be implicitly created only for scalar outputs


### `backward()` Function

Does backward propagation starting from current scalar tensor assuming that the value is the loss.

- Q. What even is the definition of backprop?
- A. Finding the gradients of some input variable w.r.t. the cost function.
  Thus, recall, the cost value does not affect the gradients.

### no_grad()

In [87]:
z = torch.matmul(x, w) + b  # requires grad because non-zero leaf nodes requires_grad?
print(z.requires_grad)

True


In [88]:
with torch.no_grad():
    z = torch.matmul(x, w) + b
    print(z.requires_grad)
    print(w.requires_grad)
    print(b.requires_grad)

False
True
True


### Notes

`no_grad()` disables `requires_grad` for any new nodes (of a bigger computation graph).
Of course, leaf nodes preserve their `requires_grad`.

- Q. what about intermediate notes?
- A. yes! Any previously defined(?) nodes preserve their `requires_grad` (as they should)

In [90]:
z = torch.matmul(x, w) + b
with torch.no_grad():
    zz = torch.matmul(z, z)
z.requires_grad, zz.requires_grad

(True, False)

### `detach()`

Acquires a reference to the same tensor with `requires_grad` turned off.
Note that memory is shared.

In [95]:
zd = z.detach()
zd.requires_grad

False

In [96]:
z

tensor([-18.3748,  -3.6277,   5.8826], grad_fn=<AddBackward0>)

In [97]:
zd

tensor([-18.3748,  -3.6277,   5.8826])

In [99]:
zd.add_(1)
zd

tensor([-16.3748,  -1.6277,   7.8826])

In [100]:
z

tensor([-16.3748,  -1.6277,   7.8826], grad_fn=<AddBackward0>)

### Intermediate Break in `requires_grad`?

Will disable backprop to child nodes (because how would that happen?).
I'm assuming that there is some traversal logic that breaks the loop
when encountering a node that doesn't `requires_grad`.

In [111]:
one = torch.ones(3)
with torch.no_grad():
    z = torch.matmul(x, w) + b
zz = torch.matmul(z, one)
zz.requires_grad_(True)
print(z.requires_grad, zz.requires_grad)
zz.backward()

False True


In [116]:
x.grad, z.grad, zz.grad, w.grad

(None, None, tensor(1.), None)

Yes, there must be no breaks in `requires_grad` for full backprop (obviously).