
PyTorch: Variables and autograd
-------------------------------

A fully-connected ReLU network with one hidden layer and no biases, trained to
predict y from x by minimizing squared Euclidean distance.

This implementation computes the forward pass using operations on PyTorch
Variables, and uses PyTorch autograd to compute gradients.

A PyTorch Variable is a wrapper around a PyTorch Tensor, and represents a node
in a computational graph. If x is a Variable then x.data is a Tensor giving its
value, and x.grad is another Variable holding the gradient of x with respect to
some scalar value.

PyTorch Variables have the same API as PyTorch tensors: (almost) any operation
you can do on a Tensor you can also do on a Variable; the difference is that
autograd allows you to automatically compute gradients.

Source Link: http://pytorch.org/tutorials/beginner/examples_autograd/two_layer_net_autograd.html

<h1 style="background-image: linear-gradient( 135deg, #ABDCFF 10%, #0396FF 100%);"> Orinal Tutorial code

In [1]:
%matplotlib inline

In [2]:
import torch
from torch.autograd import Variable

dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs, and wrap them in Variables.
# Setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Variables during the backward pass.
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)

# Create random Tensors for weights, and wrap them in Variables.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Variables during the backward pass.
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y using operations on Variables; these
    # are exactly the same operations we used to compute the forward pass using
    # Tensors, but we do not need to keep references to intermediate values since
    # we are not implementing the backward pass by hand.
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    # Compute and print loss using operations on Variables.
    # Now loss is a Variable of shape (1,) and loss.data is a Tensor of shape
    # (1,); loss.data[0] is a scalar value holding the loss.
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.data[0])

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Variables with requires_grad=True.
    # After this call w1.grad and w2.grad will be Variables holding the gradient
    # of the loss with respect to w1 and w2 respectively.
    loss.backward()

    # Update weights using gradient descent; w1.data and w2.data are Tensors,
    # w1.grad and w2.grad are Variables and w1.grad.data and w2.grad.data are
    # Tensors.
    w1.data -= learning_rate * w1.grad.data
    w2.data -= learning_rate * w2.grad.data

    # Manually zero the gradients after updating weights
    w1.grad.data.zero_()
    w2.grad.data.zero_()

0 32435778.0
1 27566394.0
2 26051080.0
3 24010326.0
4 19955918.0
5 14555348.0
6 9478826.0
7 5788778.0
8 3516427.25
9 2225696.0
10 1506760.0
11 1093481.0
12 841975.0
13 676903.625
14 560574.75
15 473537.3125
16 405448.90625
17 350482.625
18 305083.21875
19 267155.21875
20 235035.71875
21 207574.125
22 183923.4375
23 163435.53125
24 145631.25
25 130102.625
26 116509.1484375
27 104564.2890625
28 94048.59375
29 84753.71875
30 76521.421875
31 69214.7109375
32 62706.046875
33 56897.3046875
34 51704.33984375
35 47069.859375
36 42908.55859375
37 39165.109375
38 35795.76171875
39 32754.109375
40 30006.876953125
41 27518.8515625
42 25264.115234375
43 23216.369140625
44 21353.666015625
45 19658.45703125
46 18113.607421875
47 16704.01953125
48 15416.408203125
49 14238.5224609375
50 13160.0625
51 12171.9287109375
52 11265.3125
53 10432.8701171875
54 9668.044921875
55 8964.833984375
56 8317.6181640625
57 7721.30419921875
58 7171.51171875
59 6664.42919921875
60 6196.30419921875
61 5764.0927734375
62 

<h1 style="background-image: linear-gradient( 135deg, #ABDCFF 10%, #0396FF 100%);"> Without #annotation Version

In [3]:
import torch
from torch.autograd import Variable

dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)

# Create random Tensors for weights, and wrap them in Variables.
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # Forward pass
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.data[0])

    # Use autograd to compute the backward pass
    loss.backward()

    # Update weights using gradient descent
    w1.data -= learning_rate * w1.grad.data
    w2.data -= learning_rate * w2.grad.data

    # Manually zero the gradients after updating weights
    w1.grad.data.zero_()
    w2.grad.data.zero_()

0 40186456.0
1 38679132.0
2 38968388.0
3 33943436.0
4 23728504.0
5 13225834.0
6 6622705.0
7 3394681.0
8 1979982.25
9 1330394.125
10 992870.1875
11 787713.1875
12 646239.6875
13 540151.8125
14 456806.65625
15 389530.5625
16 334396.65625
17 288725.4375
18 250538.890625
19 218422.75
20 191237.9375
21 168069.609375
22 148211.390625
23 131130.828125
24 116368.078125
25 103549.1015625
26 92383.25
27 82627.890625
28 74086.9765625
29 66576.0859375
30 59958.71875
31 54103.67578125
32 48911.5234375
33 44296.6875
34 40186.65625
35 36513.94921875
36 33224.5390625
37 30275.16796875
38 27625.548828125
39 25238.169921875
40 23084.7109375
41 21140.40234375
42 19382.2421875
43 17794.375
44 16356.3828125
45 15050.724609375
46 13861.0244140625
47 12775.8232421875
48 11785.0283203125
49 10879.19921875
50 10050.177734375
51 9290.921875
52 8595.6328125
53 7956.3818359375
54 7369.29638671875
55 6829.5087890625
56 6332.99951171875
57 5875.48974609375
58 5453.9013671875
59 5065.16455078125
60 4706.1630859375
6

401 0.0003340636030770838
402 0.00032506551360711455
403 0.0003161711501888931
404 0.0003092814004048705
405 0.00030239493935368955
406 0.0002952552167698741
407 0.000287755043245852
408 0.0002808802528306842
409 0.0002740605268627405
410 0.000267561903456226
411 0.00026169815100729465
412 0.0002552659425418824
413 0.00025056549930013716
414 0.00024478178238496184
415 0.00023870989389251918
416 0.00023343593056779355
417 0.00022792293748352677
418 0.0002231706603197381
419 0.0002182705793529749
420 0.00021322461543604732
421 0.00020859550568275154
422 0.00020418758504092693
423 0.00020026958372909576
424 0.00019639609672594815
425 0.0001916886103572324
426 0.00018780511163640767
427 0.00018379111133981496
428 0.00018046698824036866
429 0.00017712355474941432
430 0.00017338921315968037
431 0.00017000782827381045
432 0.00016632885672152042
433 0.00016287063772324473
434 0.0001598523376742378
435 0.00015615053416695446
436 0.00015361698751803488
437 0.00015057843120303005
438 0.0001478320