Some notes on `torch.rand` vs `torch.randn`:

| Function    | Distribution      | Range     | Mean | Std Dev |
|------------|------------------|----------|------|---------|
| `torch.rand`  | Uniform          | [0, 1)   | 0.5  | ~0.29   |
| `torch.randn` | Normal (Gaussian) | (-∞, ∞) | 0    | 1       |

- Use **torch.rand** when you need uniform randomness (e.g., initializing weights between 0 and 1).
- Use **torch.randn** when you need Gaussian-distributed randomness (e.g., adding noise to a model).

In [16]:
import torch

y = torch.rand(10)
y

tensor([0.5213, 0.2838, 0.8541, 0.5054, 0.9969, 0.9367, 0.0811, 0.2498, 0.7966,
        0.0846])

In [17]:
# If we add the requires_grad = True parameter, it adds this flag to the tensor
x = torch.rand(10, requires_grad=True)
x

tensor([0.9248, 0.0383, 0.5832, 0.2598, 0.7364, 0.4143, 0.5559, 0.6527, 0.9889,
        0.3576], requires_grad=True)

In [18]:
# Let's do some basic computation on each and see what happens
print((y**2).mean())
print((x**2).mean())

# We can see that grad_fn is attached. This is attached to the result of any computation 
# on a tensor with requires_grad set to true. grad_fn is the function that pytorch uses
# in the background to perform backpropagation.

tensor(0.3919)
tensor(0.3819, grad_fn=<MeanBackward0>)


In [19]:
# We can calculate gradients for anything that has grad_fn attached to it. Let's call 
# backward on that resultant and see.

b = (x**2).mean()
print(b)
b.backward()

tensor(0.3819, grad_fn=<MeanBackward0>)


In [20]:
# backward runs backpropagation, and populates grad, that gradient of the tensor.
print(x.grad)

tensor([0.1850, 0.0077, 0.1166, 0.0520, 0.1473, 0.0829, 0.1112, 0.1305, 0.1978,
        0.0715])


In [21]:
# We can prove this by undoing our computation and comparing to the 
# original x:
print(x/10 * 2)
print(x.grad)

tensor([0.1850, 0.0077, 0.1166, 0.0520, 0.1473, 0.0829, 0.1112, 0.1305, 0.1978,
        0.0715], grad_fn=<MulBackward0>)
tensor([0.1850, 0.0077, 0.1166, 0.0520, 0.1473, 0.0829, 0.1112, 0.1305, 0.1978,
        0.0715])


Once we set the `requires_grad` parameter to true, pytorch automatically
builds the computation graph for that tensor. This allows us to call `backward` on a
scalar when we are ready to compute the gradient.

We cannot call backward twice on the same scalar. If we call backward multiple times, it will begin to sum the gradients. 

It is important to note that building the computation graph using `requires_grad` does 
cost a decent chunk of memory, but when you call `backward` it will collapse the tree and
reduce the memory overhead significantly.