In [None]:
import torch
import torch.optim as optim
from torch.utils.tensorboard import SummaryWriter

Basic backprop:

In [None]:
x = torch.tensor(5., requires_grad=True)

y = (x-1).pow(2)

In [None]:
y.backward()
x.grad

In-place operations have an underscore:

In [None]:
x = torch.tensor(5., requires_grad=True)

y = x-1
y.pow_(2)

y.backward()
x.grad

Here is a basic optimizer:

In [None]:
def f(in_x):
    return (in_x-1).pow(2)

x = torch.tensor(5., requires_grad=True)

# lr stands for learning rate, i.e. stepsize.
optimizer = optim.Adam([x], lr=0.1)

for step_idx in range(250):
    optimizer.zero_grad()
    y = f(x)
    y.backward()
    optimizer.step()
x

You may wonder "how does PyTorch know what to optimize here?" The point is that the optimizer _doesn't know about the optimality function and it doesn't need to._ It simply needs to know what the gradient of the optimality function is with respect to x. It has that information in `x.grad`, which is calculated (after zeroing) by `y.backward()`.

#### Tensorboard

At this point we may be interested in the convergence properties of our optimizer, and you may be tempted to reach for your favorite plotting library. Don't. Use [Tensorboard](https://www.tensorflow.org/tensorboard) instead.

Start tensorboard with

    tensorboard --logdir=runs
    
We'll deliberately slow things down so we can see the optimization happen in real time:

In [None]:
import time

# Make a new summary writer for tensorboard; defaults to a `runs` directory.
writer = SummaryWriter()

x = torch.tensor(5., requires_grad=True)
optimizer = optim.Adam([x], lr=0.1)
for step_idx in range(100):
    optimizer.zero_grad()
    y = f(x)
    y.backward()
    optimizer.step()
    writer.add_scalar("y", y, step_idx)
    time.sleep(1)