# 第一课


什么是PyTorch?
================

PyTorch是一个基于Python的科学计算库，它有以下特点:

- 类似于NumPy，但是它可以使用GPU
- 可以用它定义深度学习模型，可以灵活地进行深度学习模型的训练和使用

Tensors
---------------


Tensor类似与NumPy的ndarray，唯一的区别是Tensor可以在GPU上加速运算。


In [2]:
import torch

构造一个未初始化的5x3矩阵:

In [4]:
x = torch.empty(5,3)
x

tensor([[ 0.0000e+00, -8.5899e+09,  0.0000e+00],
        [-8.5899e+09,         nan,  0.0000e+00],
        [ 2.7002e-06,  1.8119e+02,  1.2141e+01],
        [ 7.8503e+02,  6.7504e-07,  6.5200e-10],
        [ 2.9537e-06,  1.7186e-04,         nan]])

构建一个随机初始化的矩阵:

In [5]:
x = torch.rand(5,3)
x

tensor([[0.4628, 0.7432, 0.9785],
        [0.2068, 0.4441, 0.9176],
        [0.1027, 0.5275, 0.3884],
        [0.9380, 0.2113, 0.2839],
        [0.0094, 0.4001, 0.6483]])

构建一个全部为0，类型为long的矩阵:

In [8]:
x = torch.zeros(5,3,dtype=torch.long)
x

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

In [11]:
x = torch.zeros(5,3).long()
x.dtype

torch.int64

从数据直接直接构建tensor:

In [12]:
x = torch.tensor([5.5,3])
x

tensor([5.5000, 3.0000])

也可以从一个已有的tensor构建一个tensor。这些方法会重用原来tensor的特征，例如，数据类型，除非提供新的数据。

In [16]:
x = x.new_ones(5,3, dtype=torch.double)
x

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)

In [17]:
x = torch.randn_like(x, dtype=torch.float)
x

tensor([[ 0.2411, -0.3961, -0.9206],
        [-0.0508,  0.2653,  0.4685],
        [ 0.5368, -0.3606, -0.0073],
        [ 0.3383,  0.6826,  1.7368],
        [-0.0811, -0.6957, -0.4566]])

得到tensor的形状:

In [20]:
x.shape

torch.Size([5, 3])

<div class="alert alert-info"><h4>注意</h4><p>``torch.Size`` 返回的是一个tuple</p></div>

Operations


有很多种tensor运算。我们先介绍加法运算。



In [21]:
y = torch.rand(5,3)
y

tensor([[0.9456, 0.3996, 0.1981],
        [0.8728, 0.7097, 0.3721],
        [0.7489, 0.9502, 0.6241],
        [0.5176, 0.0200, 0.5130],
        [0.3552, 0.2710, 0.7392]])

In [23]:
x + y

tensor([[ 1.1866,  0.0035, -0.7225],
        [ 0.8220,  0.9750,  0.8406],
        [ 1.2857,  0.5896,  0.6168],
        [ 0.8559,  0.7026,  2.2498],
        [ 0.2741, -0.4248,  0.2826]])

另一种着加法的写法


In [24]:
torch.add(x, y)

tensor([[ 1.1866,  0.0035, -0.7225],
        [ 0.8220,  0.9750,  0.8406],
        [ 1.2857,  0.5896,  0.6168],
        [ 0.8559,  0.7026,  2.2498],
        [ 0.2741, -0.4248,  0.2826]])

加法：把输出作为一个变量

In [26]:
result = torch.empty(5,3)
torch.add(x, y, out=result)
# result = x + y
result

tensor([[ 1.1866,  0.0035, -0.7225],
        [ 0.8220,  0.9750,  0.8406],
        [ 1.2857,  0.5896,  0.6168],
        [ 0.8559,  0.7026,  2.2498],
        [ 0.2741, -0.4248,  0.2826]])

in-place加法

In [28]:
y.add_(x)
y

tensor([[ 1.1866,  0.0035, -0.7225],
        [ 0.8220,  0.9750,  0.8406],
        [ 1.2857,  0.5896,  0.6168],
        [ 0.8559,  0.7026,  2.2498],
        [ 0.2741, -0.4248,  0.2826]])

<div class="alert alert-info"><h4>注意</h4><p>任何in-place的运算都会以``_``结尾。
    举例来说：``x.copy_(y)``, ``x.t_()``, 会改变 ``x``。</p></div>

各种类似NumPy的indexing都可以在PyTorch tensor上面使用。


In [31]:
x[1:, 1:]

tensor([[ 0.2653,  0.4685],
        [-0.3606, -0.0073],
        [ 0.6826,  1.7368],
        [-0.6957, -0.4566]])

Resizing: 如果你希望resize/reshape一个tensor，可以使用``torch.view``：

In [39]:
x = torch.randn(4,4)
y = x.view(16)
z = x.view(-1,8)
z

tensor([[-0.5683,  1.3885, -2.0829, -0.7613, -1.9115,  0.3732, -0.2055, -1.2300],
        [-0.2612, -0.4682, -1.0596,  0.7447,  0.7603, -0.4281,  0.5495,  0.1025]])

如果你有一个只有一个元素的tensor，使用``.item()``方法可以把里面的value变成Python数值。

In [40]:
x = torch.randn(1)
x

tensor([-1.1493])

In [44]:
x.item()

-1.1493233442306519

In [48]:
z.transpose(1,0)

tensor([[-0.5683, -0.2612],
        [ 1.3885, -0.4682],
        [-2.0829, -1.0596],
        [-0.7613,  0.7447],
        [-1.9115,  0.7603],
        [ 0.3732, -0.4281],
        [-0.2055,  0.5495],
        [-1.2300,  0.1025]])

**更多阅读**


  各种Tensor operations, 包括transposing, indexing, slicing,
  mathematical operations, linear algebra, random numbers在
  `<https://pytorch.org/docs/torch>`.

Numpy和Tensor之间的转化
------------

在Torch Tensor和NumPy array之间相互转化非常容易。

Torch Tensor和NumPy array会共享内存，所以改变其中一项也会改变另一项。

把Torch Tensor转变成NumPy Array


In [49]:
a = torch.ones(5)
a

tensor([1., 1., 1., 1., 1.])

In [50]:
b = a.numpy()
b

array([1., 1., 1., 1., 1.], dtype=float32)

改变numpy array里面的值。

In [51]:
b[1] = 2
b

array([1., 2., 1., 1., 1.], dtype=float32)

In [52]:
a

tensor([1., 2., 1., 1., 1.])

把NumPy ndarray转成Torch Tensor

In [54]:
import numpy as np

In [55]:
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)

[2. 2. 2. 2. 2.]


In [56]:
b

tensor([2., 2., 2., 2., 2.], dtype=torch.float64)

所有CPU上的Tensor都支持转成numpy或者从numpy转成Tensor。

CUDA Tensors
------------

使用``.to``方法，Tensor可以被移动到别的device上。



In [60]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    y = torch.ones_like(x, device=device)
    x = x.to(device)
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))
    

False

In [None]:
y.to("cpu").data.numpy()
y.cpu().data.numpy()

In [None]:
model = model.cuda()



热身: 用numpy实现两层神经网络
--------------

一个全连接ReLU神经网络，一个隐藏层，没有bias。用来从x预测y，使用L2 Loss。
- $h = W_1X$
- $a = max(0, h)$
- $y_{hat} = W_2a$

这一实现完全使用numpy来计算前向神经网络，loss，和反向传播。
- forward pass
- loss
- backward pass

numpy ndarray是一个普通的n维array。它不知道任何关于深度学习或者梯度(gradient)的知识，也不知道计算图(computation graph)，只是一种用来计算数学运算的数据结构。



In [64]:
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    h = x.dot(w1) # N * H
    h_relu = np.maximum(h, 0) # N * H
    y_pred = h_relu.dot(w2) # N * D_out
    
    # compute loss
    loss = np.square(y_pred - y).sum()
    print(it, loss)
    
    # Backward pass
    # compute the gradient
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h<0] = 0
    grad_w1 = x.T.dot(grad_h)
    
    # update weights of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 27884200.296767063
1 22640088.482883573
2 19700057.605353158
3 16724246.083260009
4 13350994.753179666
5 9840421.182578644
6 6862769.252303395
7 4609486.733672963
8 3094497.3743887204
9 2114125.623717485
10 1498078.4119558851
11 1104373.9657159406
12 847811.7806453439
13 673128.6402630594
14 549282.1188442816
15 457619.51065383974
16 387370.33323569444
17 331752.6998531452
18 286671.9775316542
19 249429.3177774855
20 218262.56590404245
21 191877.99344866752
22 169361.88476617387
23 150034.20040185965
24 133333.22038760997
25 118829.25730351105
26 106181.4237264164
27 95119.13147604409
28 85404.63617664737
29 76858.3029379211
30 69317.42648477189
31 62643.32216774904
32 56719.08053304868
33 51452.123757564375
34 46754.20868009022
35 42556.81696509062
36 38799.364642542605
37 35429.9266461819
38 32398.839951437476
39 29668.51353053862
40 27205.611448915617
41 24978.250022122458
42 22960.66338467052
43 21130.50875543994
44 19468.977275527104
45 17957.17830825845
46 16579.793706936765
47

422 0.0001054587518991514
423 0.00010089480733376736
424 9.653032783787445e-05
425 9.235523772628476e-05
426 8.836425492176173e-05
427 8.454286068015431e-05
428 8.088800988006702e-05
429 7.739126679934971e-05
430 7.404745045995573e-05
431 7.084834157747923e-05
432 6.778755348704148e-05
433 6.48599298896743e-05
434 6.206055729434882e-05
435 5.938111058275883e-05
436 5.681833412766656e-05
437 5.4365575470968294e-05
438 5.2019119727940945e-05
439 4.977436386801127e-05
440 4.7627132257161274e-05
441 4.5572887733785296e-05
442 4.360827613769697e-05
443 4.1728007667897395e-05
444 3.992908419482027e-05
445 3.8208444227944933e-05
446 3.656145582335509e-05
447 3.4986214747250005e-05
448 3.347854781706068e-05
449 3.203630046151662e-05
450 3.065690981983372e-05
451 2.933686566801108e-05
452 2.8073746978085527e-05
453 2.6865086374005622e-05
454 2.5708587224072755e-05
455 2.4602140793655132e-05
456 2.354358919176184e-05
457 2.25307167679796e-05
458 2.1561695844968057e-05
459 2.063458963233601e-05
4


PyTorch: Tensors
----------------

这次我们使用PyTorch tensors来创建前向神经网络，计算损失，以及反向传播。

一个PyTorch Tensor很像一个numpy的ndarray。但是它和numpy ndarray最大的区别是，PyTorch Tensor可以在CPU或者GPU上运算。如果想要在GPU上运算，就需要把Tensor换成cuda类型。


In [70]:
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

w1 = torch.randn(D_in, H)
w2 = torch.randn(H, D_out)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    h = x.mm(w1) # N * H
    h_relu = h.clamp(min=0) # N * H
    y_pred = h_relu.mm(w2) # N * D_out
    
    # compute loss
    loss = (y_pred - y).pow(2).sum().item()
    print(it, loss)
    
    # Backward pass
    # compute the gradient
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h<0] = 0
    grad_w1 = x.t().mm(grad_h)
    
    # update weights of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 29762046.0
1 23066852.0
2 18711686.0
3 14801243.0
4 11081590.0
5 7897110.0
6 5449245.0
7 3736755.0
8 2600430.75
9 1863506.625
10 1384443.75
11 1066823.5
12 849056.375
13 694301.3125
14 579745.6875
15 491825.8125
16 422203.4375
17 365664.1875
18 318882.1875
19 279607.6875
20 246269.28125
21 217793.21875
22 193362.53125
23 172216.5
24 153804.609375
25 137681.765625
26 123520.8125
27 111042.7734375
28 100011.8671875
29 90226.9375
30 81530.6328125
31 73786.09375
32 66875.734375
33 60690.84375
34 55149.2578125
35 50171.6015625
36 45695.19140625
37 41664.26953125
38 38026.62890625
39 34739.9609375
40 31765.595703125
41 29070.78515625
42 26625.50390625
43 24404.0390625
44 22385.072265625
45 20547.837890625
46 18877.71875
47 17354.1875
48 15966.14453125
49 14698.5810546875
50 13539.5
51 12478.2666015625
52 11506.1162109375
53 10615.9287109375
54 9800.8994140625
55 9052.6728515625
56 8365.7060546875
57 7733.314453125
58 7152.142578125
59 6617.36279296875
60 6124.8115234375
61 5671.1171875
62 

403 9.213548037223518e-05
404 9.012558439280838e-05
405 8.867125143297017e-05
406 8.695671567693353e-05
407 8.490754407830536e-05
408 8.312655700137839e-05
409 8.103609434328973e-05
410 7.975030894158408e-05
411 7.847649976611137e-05
412 7.696000102441758e-05
413 7.55072760512121e-05
414 7.409827230731025e-05
415 7.291947258636355e-05
416 7.175814243964851e-05
417 7.038572221063077e-05
418 6.912941171322018e-05
419 6.782382115488872e-05
420 6.693716568406671e-05
421 6.592227146029472e-05
422 6.481944001279771e-05
423 6.370364280883223e-05
424 6.258935172809288e-05
425 6.171658606035635e-05
426 6.061311432858929e-05
427 5.965070522506721e-05
428 5.834892363054678e-05
429 5.745037560700439e-05
430 5.65473637834657e-05
431 5.5780627008061856e-05
432 5.494891956914216e-05
433 5.410148878581822e-05
434 5.3175201173871756e-05
435 5.213084659771994e-05
436 5.159818465472199e-05
437 5.078062531538308e-05
438 4.9985017540166155e-05
439 4.9173006118508056e-05
440 4.8425557906739414e-05
441 4.765

简单的autograd

In [72]:
x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

y = w*x + b # y = 2*1+3

y.backward()

# dy / dw = x
print(w.grad)
print(x.grad)
print(b.grad)


tensor(1.)
tensor(2.)
tensor(1.)



PyTorch: Tensor和autograd
-------------------------------

PyTorch的一个重要功能就是autograd，也就是说只要定义了forward pass(前向神经网络)，计算了loss之后，PyTorch可以自动求导计算模型所有参数的梯度。

一个PyTorch的Tensor表示计算图中的一个节点。如果``x``是一个Tensor并且``x.requires_grad=True``那么``x.grad``是另一个储存着``x``当前梯度(相对于一个scalar，常常是loss)的向量。


In [104]:
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

w1 = torch.randn(D_in, H, requires_grad=True)
w2 = torch.randn(H, D_out, requires_grad=True)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
    
    # compute loss
    loss = (y_pred - y).pow(2).sum() # computation graph
    print(it, loss.item())
    
    # Backward pass
    loss.backward()
    
    # update weights of w1 and w2
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        w1.grad.zero_()
        w2.grad.zero_()

0 30471860.0
1 26228534.0
2 23250712.0
3 19323444.0
4 14492299.0
5 9863642.0
6 6322839.0
7 3993820.5
8 2593148.75
9 1772902.75
10 1285633.625
11 982769.125
12 783322.875
13 643465.75
14 539724.875
15 459373.625
16 395244.0
17 342868.28125
18 299300.59375
19 262557.96875
20 231325.5
21 204574.59375
22 181519.578125
23 161545.640625
24 144163.359375
25 128993.734375
26 115730.0859375
27 104073.953125
28 93788.8828125
29 84687.84375
30 76613.078125
31 69429.625
32 63025.64453125
33 57302.21875
34 52180.94921875
35 47590.734375
36 43471.87890625
37 39764.48046875
38 36424.37890625
39 33403.87109375
40 30667.91796875
41 28185.20703125
42 25927.62109375
43 23873.6484375
44 22001.40234375
45 20293.447265625
46 18732.90625
47 17305.96484375
48 16000.0126953125
49 14803.1748046875
50 13705.9853515625
51 12698.6142578125
52 11772.5693359375
53 10921.0810546875
54 10137.337890625
55 9415.0673828125
56 8749.2607421875
57 8135.1337890625
58 7568.26171875
59 7044.49169921875
60 6560.39404296875
61 6

426 0.0005005761049687862
427 0.0004886656533926725
428 0.0004766747879330069
429 0.00046420813305303454
430 0.0004530760634224862
431 0.00044140161480754614
432 0.00043154825107194483
433 0.00042091053910553455
434 0.00041044564568437636
435 0.0004003559588454664
436 0.000391183712054044
437 0.0003828379267361015
438 0.00037331454223021865
439 0.0003651838924270123
440 0.0003571343549992889
441 0.00034884311025962234
442 0.00034054124262183905
443 0.0003324503777548671
444 0.00032492668833583593
445 0.0003185669193044305
446 0.00031215840135701
447 0.0003047320060431957
448 0.000297665799735114
449 0.00029143612482585013
450 0.0002847549912985414
451 0.0002797916531562805
452 0.0002737375907599926
453 0.0002680768957361579
454 0.0002621157036628574
455 0.0002568378404248506
456 0.0002514065126888454
457 0.0002464077842887491
458 0.00024145790666807443
459 0.0002364566025789827
460 0.00023189335479401052
461 0.0002266272495035082
462 0.00022262055426836014
463 0.00021841854322701693
46


PyTorch: nn
-----------


这次我们使用PyTorch中nn这个库来构建网络。
用PyTorch autograd来构建计算图和计算gradients，
然后PyTorch会帮我们自动计算gradient。




In [114]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H, bias=False), # w_1 * x + b_1
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out, bias=False),
)

torch.nn.init.normal_(model[0].weight)
torch.nn.init.normal_(model[2].weight)

# model = model.cuda()

loss_fn = nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    print(it, loss.item())
    
    # Backward pass
    loss.backward()
    
    # update weights of w1 and w2
    with torch.no_grad():
        for param in model.parameters(): # param (tensor, grad)
            param -= learning_rate * param.grad
            
    model.zero_grad()

0 28937890.0
1 23708218.0
2 25900276.0
3 31564730.0
4 36221696.0
5 34548928.0
6 25147132.0
7 13910646.0
8 6474728.0
9 3005202.25
10 1618558.5
11 1055975.625
12 795853.375
13 649322.4375
14 550674.4375
15 475871.40625
16 415458.09375
17 365069.0
18 322357.28125
19 285773.0625
20 254263.21875
21 226923.859375
22 203099.890625
23 182238.15625
24 163887.015625
25 147698.84375
26 133395.546875
27 120695.09375
28 109397.8125
29 99327.9296875
30 90320.203125
31 82259.09375
32 75033.1796875
33 68532.6640625
34 62677.9140625
35 57397.265625
36 52616.3359375
37 48279.73046875
38 44344.37109375
39 40770.19140625
40 37517.8203125
41 34553.5859375
42 31851.201171875
43 29383.748046875
44 27127.37109375
45 25062.505859375
46 23171.8984375
47 21438.802734375
48 19848.447265625
49 18388.103515625
50 17046.625
51 15812.09765625
52 14676.0859375
53 13629.376953125
54 12664.22265625
55 11773.703125
56 10951.4765625
57 10192.1376953125
58 9490.6220703125
59 8841.31640625
60 8241.046875
61 7685.80126953125

415 0.0009464324684813619
416 0.0009199553169310093
417 0.0008938985411077738
418 0.0008681747131049633
419 0.0008440786623395979
420 0.0008184001198969781
421 0.0007966457051225007
422 0.0007744226022623479
423 0.0007520546205341816
424 0.0007311642402783036
425 0.0007106095436029136
426 0.0006915915291756392
427 0.0006718204822391272
428 0.0006538843153975904
429 0.0006378090474754572
430 0.0006195709574967623
431 0.000604025786742568
432 0.000588531605899334
433 0.0005726809613406658
434 0.0005578958080150187
435 0.0005443562404252589
436 0.0005287613021209836
437 0.0005157238338142633
438 0.0005042870179750025
439 0.0004913591546937823
440 0.0004785044293384999
441 0.0004674119991250336
442 0.00045650688116438687
443 0.00044453126611188054
444 0.0004345918423496187
445 0.00042317615589126945
446 0.00041341138421557844
447 0.0004029229166917503
448 0.0003939396410714835
449 0.0003840120625682175
450 0.000375825387891382
451 0.0003669420548249036
452 0.0003586196980904788
453 0.00035

In [113]:
model[0].weight

Parameter containing:
tensor([[-0.0218,  0.0212,  0.0243,  ...,  0.0230,  0.0247,  0.0168],
        [-0.0144,  0.0177, -0.0221,  ...,  0.0161,  0.0098, -0.0172],
        [ 0.0086, -0.0122, -0.0298,  ..., -0.0236, -0.0187,  0.0295],
        ...,
        [ 0.0266, -0.0008, -0.0141,  ...,  0.0018,  0.0319, -0.0129],
        [ 0.0296, -0.0005,  0.0115,  ...,  0.0141, -0.0088, -0.0106],
        [ 0.0289, -0.0077,  0.0239,  ..., -0.0166, -0.0156, -0.0235]],
       requires_grad=True)


PyTorch: optim
--------------

这一次我们不再手动更新模型的weights,而是使用optim这个包来帮助我们更新参数。
optim这个package提供了各种不同的模型优化方法，包括SGD+momentum, RMSProp, Adam等等。


In [118]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H, bias=False), # w_1 * x + b_1
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out, bias=False),
)

torch.nn.init.normal_(model[0].weight)
torch.nn.init.normal_(model[2].weight)

# model = model.cuda()

loss_fn = nn.MSELoss(reduction='sum')
# learning_rate = 1e-4
# optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

learning_rate = 1e-6
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    print(it, loss.item())

    optimizer.zero_grad()
    # Backward pass
    loss.backward()
    
    # update model parameters
    optimizer.step()


0 24436214.0
1 20115276.0
2 18840850.0
3 18223790.0
4 17027580.0
5 14675071.0
6 11567663.0
7 8366201.5
8 5720385.0
9 3799774.75
10 2535152.0
11 1735162.5
12 1236944.25
13 922564.25
14 718680.1875
15 580795.125
16 483214.65625
17 410988.375
18 355345.84375
19 310880.21875
20 274325.625
21 243722.59375
22 217734.828125
23 195305.78125
24 175787.5625
25 158686.078125
26 143600.5
27 130217.0390625
28 118322.4765625
29 107720.890625
30 98256.671875
31 89756.2734375
32 82104.359375
33 75197.3125
34 68949.78125
35 63292.28515625
36 58161.140625
37 53495.71484375
38 49262.35546875
39 45406.51171875
40 41886.3671875
41 38671.0625
42 35729.078125
43 33036.390625
44 30567.708984375
45 28301.845703125
46 26222.076171875
47 24308.93359375
48 22548.6953125
49 20927.591796875
50 19433.642578125
51 18058.23046875
52 16788.662109375
53 15616.2177734375
54 14533.13671875
55 13530.798828125
56 12604.1884765625
57 11745.923828125
58 10950.625
59 10213.337890625
60 9529.8671875
61 8895.59375
62 8306.091796

438 0.0002393243630649522
439 0.00023363585933111608
440 0.00022893572167959064
441 0.00022382299357559532
442 0.00021802390983793885
443 0.0002135633840225637
444 0.00020877565839327872
445 0.00020438502542674541
446 0.000200132533791475
447 0.00019636568322312087
448 0.00019197550136595964
449 0.00018845757585950196
450 0.00018516821728553623
451 0.0001812220725696534
452 0.00017768006364349276
453 0.00017394236056134105
454 0.00017036692588590086
455 0.0001669702905928716
456 0.000163633594638668
457 0.0001608784223208204
458 0.00015729425649624318
459 0.00015425201854668558
460 0.0001512485760031268
461 0.00014837279741186649
462 0.00014571723295375705
463 0.00014315942826215178
464 0.0001404505455866456
465 0.00013795308768749237
466 0.00013533096353057772
467 0.00013275298988446593
468 0.0001303627504967153
469 0.00012791437620762736
470 0.00012587543460540473
471 0.00012379918189253658
472 0.000121756260341499
473 0.00011986290337517858
474 0.00011718282621586695
475 0.000115406


PyTorch: 自定义 nn Modules
--------------------------

我们可以定义一个模型，这个模型继承自nn.Module类。如果需要定义一个比Sequential模型更加复杂的模型，就需要定义nn.Module模型。



In [122]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(TwoLayerNet, self).__init__()
        # define the model architecture
        self.linear1 = torch.nn.Linear(D_in, H, bias=False)
        self.linear2 = torch.nn.Linear(H, D_out, bias=False)
    
    def forward(self, x):
        y_pred = self.linear2(self.linear1(x).clamp(min=0))
        return y_pred

model = TwoLayerNet(D_in, H, D_out)
loss_fn = nn.MSELoss(reduction='sum')
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    print(it, loss.item())

    optimizer.zero_grad()
    # Backward pass
    loss.backward()
    
    # update model parameters
    optimizer.step()


0 675.5318603515625
1 659.0227661132812
2 642.9276123046875
3 627.2772216796875
4 612.0651245117188
5 597.2562255859375
6 582.8500366210938
7 568.887451171875
8 555.327880859375
9 542.1878662109375
10 529.456787109375
11 517.1315307617188
12 505.17462158203125
13 493.5650329589844
14 482.40338134765625
15 471.6194152832031
16 461.161865234375
17 450.9877624511719
18 441.0876159667969
19 431.423828125
20 421.9713134765625
21 412.79229736328125
22 403.83502197265625
23 395.1403503417969
24 386.673583984375
25 378.3887939453125
26 370.2898254394531
27 362.3497009277344
28 354.5650634765625
29 346.9619140625
30 339.5443115234375
31 332.314208984375
32 325.2721252441406
33 318.3789978027344
34 311.65435791015625
35 305.1058349609375
36 298.7152099609375
37 292.43585205078125
38 286.27581787109375
39 280.24542236328125
40 274.3594055175781
41 268.6078796386719
42 262.9523010253906
43 257.3995666503906
44 251.9408721923828
45 246.5938262939453
46 241.3463592529297
47 236.2169647216797
48 231.

360 0.0006928329239599407
361 0.0006594658480025828
362 0.0006276500644162297
363 0.0005973356892354786
364 0.0005684461793862283
365 0.0005409115692600608
366 0.0005146677722223103
367 0.0004896665341220796
368 0.0004658424004446715
369 0.0004431433626450598
370 0.0004215217486489564
371 0.00040092665585689247
372 0.00038129440508782864
373 0.0003626033430919051
374 0.00034480434260331094
375 0.00032785360235720873
376 0.0003117044398095459
377 0.0002963263541460037
378 0.000281691609416157
379 0.00026776010054163635
380 0.000254485901677981
381 0.00024186100927181542
382 0.00022984233510214835
383 0.0002183937467634678
384 0.0002075097436318174
385 0.00019714680092874914
386 0.00018728163558989763
387 0.00017789709090720862
388 0.00016896944725885987
389 0.00016047449025791138
390 0.00015239401545841247
391 0.00014470981841441244
392 0.00013740228314418346
393 0.0001304488250752911
394 0.00012383922876324505
395 0.00011755021841963753
396 0.00011157765402458608
397 0.0001058938141795