# 第一课


什么是PyTorch?
================

PyTorch是一个基于Python的科学计算库，它有以下特点:

- 类似于NumPy，但是它可以使用GPU
- 可以用它定义深度学习模型，可以灵活地进行深度学习模型的训练和使用

Tensors
---------------


Tensor类似与NumPy的ndarray，唯一的区别是Tensor可以在GPU上加速运算。


In [39]:
import torch

构造一个未初始化的5x3矩阵:

In [40]:
x = torch.empty(5,3)
x

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

构建一个随机初始化的矩阵:

In [41]:
x = torch.rand(5,3)
x

tensor([[0.9714, 0.0522, 0.4063],
        [0.3002, 0.3822, 0.0254],
        [0.1797, 0.9057, 0.9375],
        [0.1087, 0.8963, 0.7988],
        [0.3210, 0.0283, 0.5349]])

构建一个全部为0，类型为long的矩阵:

In [42]:
x = torch.zeros(5,3,dtype=torch.long)
x

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

In [43]:
x = torch.zeros(5,3).long()
x.dtype

torch.int64

从数据直接直接构建tensor:

In [44]:
x = torch.tensor([5.5,3])
x

tensor([5.5000, 3.0000])

也可以从一个已有的tensor构建一个tensor。这些方法会重用原来tensor的特征，例如，数据类型，除非提供新的数据。

In [45]:
x = x.new_ones(5,3, dtype=torch.double)
x

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)

In [46]:
x = torch.randn_like(x, dtype=torch.float)
x

tensor([[-1.8272,  0.9588,  2.1063],
        [-0.0970, -1.6087, -0.0687],
        [ 1.1100,  1.1531,  0.3937],
        [ 0.1916,  0.4667, -0.4913],
        [ 1.1645, -1.0029,  0.3560]])

得到tensor的形状:

In [47]:
x.shape

torch.Size([5, 3])

<div class="alert alert-info"><h4>注意</h4><p>``torch.Size`` 返回的是一个tuple</p></div>

Operations


有很多种tensor运算。我们先介绍加法运算。



In [48]:
y = torch.rand(5,3)
y

tensor([[0.5740, 0.2321, 0.2448],
        [0.7973, 0.5477, 0.0486],
        [0.6712, 0.0246, 0.2956],
        [0.7044, 0.3786, 0.4191],
        [0.0285, 0.4063, 0.0399]])

In [49]:
x + y

tensor([[-1.2532,  1.1909,  2.3511],
        [ 0.7004, -1.0610, -0.0201],
        [ 1.7812,  1.1777,  0.6893],
        [ 0.8960,  0.8453, -0.0722],
        [ 1.1929, -0.5966,  0.3959]])

另一种着加法的写法


In [50]:
torch.add(x, y)

tensor([[-1.2532,  1.1909,  2.3511],
        [ 0.7004, -1.0610, -0.0201],
        [ 1.7812,  1.1777,  0.6893],
        [ 0.8960,  0.8453, -0.0722],
        [ 1.1929, -0.5966,  0.3959]])

加法：把输出作为一个变量

In [51]:
result = torch.empty(5,3)
torch.add(x, y, out=result)
# result = x + y
result

tensor([[-1.2532,  1.1909,  2.3511],
        [ 0.7004, -1.0610, -0.0201],
        [ 1.7812,  1.1777,  0.6893],
        [ 0.8960,  0.8453, -0.0722],
        [ 1.1929, -0.5966,  0.3959]])

in-place加法

In [52]:
y.add_(x)
y

tensor([[-1.2532,  1.1909,  2.3511],
        [ 0.7004, -1.0610, -0.0201],
        [ 1.7812,  1.1777,  0.6893],
        [ 0.8960,  0.8453, -0.0722],
        [ 1.1929, -0.5966,  0.3959]])

<div class="alert alert-info"><h4>注意</h4><p>任何in-place的运算都会以``_``结尾。
    举例来说：``x.copy_(y)``, ``x.t_()``, 会改变 ``x``。</p></div>

各种类似NumPy的indexing都可以在PyTorch tensor上面使用。


In [53]:
x[1:, 1:]

tensor([[-1.6087, -0.0687],
        [ 1.1531,  0.3937],
        [ 0.4667, -0.4913],
        [-1.0029,  0.3560]])

Resizing: 如果你希望resize/reshape一个tensor，可以使用``torch.view``：

In [54]:
x = torch.randn(4,4)
y = x.view(16)
z = x.view(-1,8)
z

tensor([[-0.5236, -0.6079, -0.0589, -1.0521,  1.2506,  1.1236,  0.0346,  0.4248],
        [ 0.4836,  0.3078, -2.1852,  0.3522, -1.5902, -0.1560, -0.5163,  0.3367]])

如果你有一个只有一个元素的tensor，使用``.item()``方法可以把里面的value变成Python数值。

In [55]:
x = torch.randn(1)
x

tensor([-0.0135])

In [56]:
x.item()

-0.013530677184462547

In [57]:
z.transpose(1,0)

tensor([[-0.5236,  0.4836],
        [-0.6079,  0.3078],
        [-0.0589, -2.1852],
        [-1.0521,  0.3522],
        [ 1.2506, -1.5902],
        [ 1.1236, -0.1560],
        [ 0.0346, -0.5163],
        [ 0.4248,  0.3367]])

**更多阅读**


  各种Tensor operations, 包括transposing, indexing, slicing,
  mathematical operations, linear algebra, random numbers在
  `<https://pytorch.org/docs/torch>`.

Numpy和Tensor之间的转化
------------

在Torch Tensor和NumPy array之间相互转化非常容易。

Torch Tensor和NumPy array会共享内存，所以改变其中一项也会改变另一项。

把Torch Tensor转变成NumPy Array


In [58]:
a = torch.ones(5)
a

tensor([1., 1., 1., 1., 1.])

In [59]:
b = a.numpy()
b

array([1., 1., 1., 1., 1.], dtype=float32)

改变numpy array里面的值。

In [60]:
b[1] = 2
b

array([1., 2., 1., 1., 1.], dtype=float32)

In [61]:
a

tensor([1., 2., 1., 1., 1.])

把NumPy ndarray转成Torch Tensor

In [62]:
import numpy as np

In [63]:
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)

[2. 2. 2. 2. 2.]


In [64]:
b

tensor([2., 2., 2., 2., 2.], dtype=torch.float64)

所有CPU上的Tensor都支持转成numpy或者从numpy转成Tensor。

CUDA Tensors
------------

使用``.to``方法，Tensor可以被移动到别的device上。



In [65]:
torch.cuda.is_available()

False

In [66]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    y = torch.ones_like(x, device=device)
    x = x.to(device)
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))
    

In [67]:
y.to("cpu").data.numpy()
y.cpu().data.numpy()

array([-0.5235799 , -0.6078731 , -0.05886782, -1.0520711 ,  1.2506382 ,
        1.1235696 ,  0.0345683 ,  0.4247931 ,  0.48359656,  0.30780393,
       -2.1852088 ,  0.3522228 , -1.5902361 , -0.15602021, -0.51632303,
        0.33669293], dtype=float32)

In [68]:
model = model.cuda()


AssertionError: Torch not compiled with CUDA enabled


热身: 用numpy实现两层神经网络
--------------

一个全连接ReLU神经网络，一个隐藏层，没有bias。用来从x预测y，使用L2 Loss。
- $h = W_1X$
- $a = max(0, h)$
- $y_{hat} = W_2a$

这一实现完全使用numpy来计算前向神经网络，loss，和反向传播。
- forward pass
- loss
- backward pass

numpy ndarray是一个普通的n维array。它不知道任何关于深度学习或者梯度(gradient)的知识，也不知道计算图(computation graph)，只是一种用来计算数学运算的数据结构。



In [36]:
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    h = x.dot(w1) # N * H
    h_relu = np.maximum(h, 0) # N * H
    y_pred = h_relu.dot(w2) # N * D_out
    
    # compute loss
    loss = np.square(y_pred - y).sum()
    print(it, loss)
    
    # Backward pass
    # compute the gradient
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h<0] = 0
    grad_w1 = x.T.dot(grad_h)
    
    # update weights of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 25250135.666233297
1 20035482.65923338
2 19395156.342213914
3 20498457.692693073
4 21453789.07256744
5 20550022.91859851
6 17351228.051822092
7 12691225.0329407
8 8267591.046402751
9 4990616.134062901
10 2960882.807282765
11 1803726.760755447
12 1167343.8542573187
13 812226.4080807143
14 605708.7824650104
15 477405.2930492705
16 391650.8994912789
17 329990.50758504355
18 282978.19241310464
19 245497.6694457304
20 214767.65185366786
21 189044.56297221687
22 167167.34611031177
23 148370.3584758169
24 132118.28005731734
25 117990.27353807155
26 105644.2091202003
27 94806.40095581808
28 85284.42428654662
29 76890.91761835765
30 69458.0291094233
31 62859.88110601035
32 56989.026111342566
33 51755.37487967838
34 47083.31835065319
35 42917.0186666597
36 39177.66633741387
37 35822.30004326107
38 32805.66772584387
39 30082.271674786905
40 27622.547575182303
41 25393.906453521755
42 23371.975613206916
43 21534.68076289299
44 19862.407800725683
45 18338.652064708764
46 16948.117102530698
47 156

481 3.1279000111310765e-06
482 2.979899079059689e-06
483 2.838925537345951e-06
484 2.7046689076290385e-06
485 2.5767147533512826e-06
486 2.4548283671808465e-06
487 2.338726521756138e-06
488 2.2281013759290303e-06
489 2.122710668845422e-06
490 2.0223271540341034e-06
491 1.9267177959244026e-06
492 1.8355975060772446e-06
493 1.7488106653686373e-06
494 1.6661241790200382e-06
495 1.587341461202875e-06
496 1.512295249182036e-06
497 1.4408020199678472e-06
498 1.372703156701728e-06
499 1.3078093888013226e-06



PyTorch: Tensors
----------------

这次我们使用PyTorch tensors来创建前向神经网络，计算损失，以及反向传播。

一个PyTorch Tensor很像一个numpy的ndarray。但是它和numpy ndarray最大的区别是，PyTorch Tensor可以在CPU或者GPU上运算。如果想要在GPU上运算，就需要把Tensor换成cuda类型。


In [27]:
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

w1 = torch.randn(D_in, H)
w2 = torch.randn(H, D_out)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    h = x.mm(w1) # N * H
    h_relu = h.clamp(min=0) # N * H
    y_pred = h_relu.mm(w2) # N * D_out
    
    # compute loss
    loss = (y_pred - y).pow(2).sum().item()
    print(it, loss)
    
    # Backward pass
    # compute the gradient
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h<0] = 0
    grad_w1 = x.t().mm(grad_h)
    
    # update weights of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 30219848.0
1 29508568.0
2 35702392.0
3 43014136.0
4 43332020.0
5 32335092.0
6 17288746.0
7 7406249.0
8 3185697.5
9 1677055.75
10 1111566.75
11 851699.375
12 698789.375
13 590800.25
14 506789.6875
15 438368.3125
16 381520.1875
17 333751.5625
18 293263.125
19 258697.03125
20 229071.40625
21 203561.84375
22 181484.15625
23 162319.5625
24 145603.515625
25 130948.1875
26 118059.609375
27 106685.46875
28 96612.265625
29 87666.578125
30 79696.7734375
31 72582.40625
32 66216.40625
33 60506.89453125
34 55378.921875
35 50762.65234375
36 46598.59765625
37 42838.36328125
38 39433.359375
39 36343.90234375
40 33538.41796875
41 30984.85546875
42 28658.96875
43 26537.32421875
44 24599.21875
45 22825.16015625
46 21200.474609375
47 19709.80078125
48 18341.154296875
49 17082.43359375
50 15923.888671875
51 14855.802734375
52 13871.98046875
53 12963.5283203125
54 12123.82421875
55 11347.685546875
56 10629.3046875
57 9962.6376953125
58 9343.951171875
59 8769.45703125
60 8235.6171875
61 7738.98583984375
62

简单的autograd

In [70]:
x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

y = w*x + b # y = 2*1+3

y.backward()

# dy / dw = x
print(w.grad)
print(x.grad)
print(b.grad)


None
None
None



PyTorch: Tensor和autograd
-------------------------------

PyTorch的一个重要功能就是autograd，也就是说只要定义了forward pass(前向神经网络)，计算了loss之后，PyTorch可以自动求导计算模型所有参数的梯度。

一个PyTorch的Tensor表示计算图中的一个节点。如果``x``是一个Tensor并且``x.requires_grad=True``那么``x.grad``是另一个储存着``x``当前梯度(相对于一个scalar，常常是loss)的向量。


In [29]:
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

w1 = torch.randn(D_in, H, requires_grad=True)
w2 = torch.randn(H, D_out, requires_grad=True)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
    
    # compute loss
    loss = (y_pred - y).pow(2).sum() # computation graph
    print(it, loss.item())
    
    # Backward pass
    loss.backward()
    
    # update weights of w1 and w2
    with torch.no_grad():# 不记住w1 
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        w1.grad.zero_()
        w2.grad.zero_()

0 29354896.0
1 24517342.0
2 23098660.0
3 21776690.0
4 18776608.0
5 14388392.0
6 9773254.0
7 6150140.5
8 3750491.5
9 2335122.0
10 1530115.25
11 1070440.125
12 796859.125
13 623912.625
14 506807.6875
15 422399.34375
16 358217.53125
17 307447.03125
18 266129.71875
19 231866.9375
20 203106.65625
21 178726.25
22 157908.5625
23 139978.234375
24 124472.609375
25 111012.34375
26 99272.078125
27 88987.8515625
28 79957.0703125
29 72004.7890625
30 64970.5625
31 58732.59375
32 53187.8203125
33 48237.8359375
34 43817.34765625
35 39862.5625
36 36323.54296875
37 33144.28515625
38 30286.435546875
39 27711.294921875
40 25387.345703125
41 23285.998046875
42 21382.9140625
43 19656.9921875
44 18089.76953125
45 16665.10546875
46 15367.9169921875
47 14185.068359375
48 13105.7939453125
49 12119.927734375
50 11217.875
51 10392.5107421875
52 9636.1474609375
53 8942.2353515625
54 8304.939453125
55 7718.8466796875
56 7179.578125
57 6682.931640625
58 6225.07421875
59 5802.7900390625
60 5412.96484375
61 5052.60937

406 0.0037968403194099665
407 0.0036776720080524683
408 0.0035583730787038803
409 0.003447328694164753
410 0.0033431658521294594
411 0.0032393462024629116
412 0.0031409154180437326
413 0.0030428224708884954
414 0.00294701405800879
415 0.002855692058801651
416 0.002770806197077036
417 0.0026898325886577368
418 0.002609414979815483
419 0.0025315640959888697
420 0.00245482986792922
421 0.0023787319660186768
422 0.0023077602963894606
423 0.002241068985313177
424 0.0021750808227807283
425 0.002108415588736534
426 0.002047429559752345
427 0.0019872747361660004
428 0.001931672915816307
429 0.001877663074992597
430 0.001823310973122716
431 0.0017727430677041411
432 0.0017218637512996793
433 0.001673052553087473
434 0.0016258037649095058
435 0.00158192147500813
436 0.0015385765582323074
437 0.0014957452658563852
438 0.0014537353999912739
439 0.0014136192621663213
440 0.0013773508835583925
441 0.001339772716164589
442 0.0013021482154726982
443 0.0012682689120993018
444 0.0012367113959044218
445 


PyTorch: nn
-----------


这次我们使用PyTorch中nn这个库来构建网络。
用PyTorch autograd来构建计算图和计算gradients，
然后PyTorch会帮我们自动计算gradient。




In [72]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H, bias=False), # w_1 * x + b_1
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out, bias=False),
)

torch.nn.init.normal_(model[0].weight)
torch.nn.init.normal_(model[2].weight)

# model = model.cuda()

loss_fn = nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    print(it, loss.item())
    
    # Backward pass
    loss.backward()
    
    # update weights of w1 and w2
    with torch.no_grad():
        for param in model.parameters(): # param (tensor, grad)
            param -= learning_rate * param.grad
            
    model.zero_grad()

0 35895356.0
1 32334904.0
2 31158580.0
3 27401592.0
4 20375792.0
5 12765371.0
6 7185037.0
7 4005996.25
8 2391291.25
9 1581230.625
10 1147521.625
11 890492.625
12 720503.625
13 597890.6875
14 503921.6875
15 429212.0625
16 368369.0625
17 318054.1875
18 276031.78125
19 240632.78125
20 210603.15625
21 184979.375
22 163027.921875
23 144131.875
24 127809.921875
25 113655.4296875
26 101335.1171875
27 90577.5078125
28 81157.125
29 72884.609375
30 65602.90625
31 59178.125
32 53487.71484375
33 48432.421875
34 43934.1875
35 39922.62890625
36 36336.4375
37 33124.84375
38 30243.0859375
39 27652.291015625
40 25321.1328125
41 23216.931640625
42 21314.326171875
43 19591.212890625
44 18028.48828125
45 16609.078125
46 15318.2685546875
47 14142.5654296875
48 13070.126953125
49 12091.267578125
50 11196.0849609375
51 10376.35546875
52 9625.267578125
53 8935.98828125
54 8303.01953125
55 7720.708984375
56 7184.7236328125
57 6690.89794921875
58 6236.4296875
59 5817.2255859375
60 5429.92041015625
61 5071.76660

460 0.00019566946139093488
461 0.00019155802146997303
462 0.00018852365610655397
463 0.00018422848370391876
464 0.00018051503866445273
465 0.00017723800556268543
466 0.0001735570258460939
467 0.00016951304860413074
468 0.0001666808529989794
469 0.00016333798703271896
470 0.00015976553549990058
471 0.00015643906954210252
472 0.00015374485519714653
473 0.0001510721631348133
474 0.00014829564315732569
475 0.00014597291010431945
476 0.000142583143315278
477 0.00013979457435198128
478 0.00013714887609239668
479 0.00013492033758666366
480 0.00013214937644079328
481 0.0001296122354688123
482 0.00012775913637597114
483 0.00012511796376202255
484 0.00012292985047679394
485 0.00012073105608578771
486 0.00011871040624100715
487 0.00011701132461894304
488 0.00011475250357761979
489 0.00011300275946268812
490 0.00011101971176685765
491 0.00010901474161073565
492 0.00010723536252044141
493 0.00010581276001175866
494 0.0001039438066072762
495 0.00010215784277534112
496 0.00010091236617881805
497 9.88

In [31]:
model[0].weight

Parameter containing:
tensor([[-1.2407,  0.1527, -0.5275,  ...,  0.3028, -0.2395, -0.1965],
        [ 0.9596, -1.3609, -1.3248,  ..., -0.6273, -0.8682, -0.6012],
        [ 0.6899,  1.1756, -0.0707,  ..., -0.1837, -0.4682,  0.3689],
        ...,
        [-0.1459,  0.9367, -0.6568,  ..., -0.1345, -2.0906, -0.4084],
        [-1.4582,  0.8315, -0.5785,  ..., -0.0471,  0.9356,  1.0780],
        [-0.4608,  0.5913, -1.2196,  ...,  0.6498,  0.7127,  0.0837]],
       requires_grad=True)


PyTorch: optim
--------------

这一次我们不再手动更新模型的weights,而是使用optim这个包来帮助我们更新参数。
optim这个package提供了各种不同的模型优化方法，包括SGD+momentum, RMSProp, Adam等等。


In [74]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H, bias=False), # w_1 * x + b_1
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out, bias=False),
)

torch.nn.init.normal_(model[0].weight)
torch.nn.init.normal_(model[2].weight)

# model = model.cuda()

loss_fn = nn.MSELoss(reduction='sum')
# learning_rate = 1e-4
# optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

learning_rate = 1e-6
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    print(it, loss.item())

    optimizer.zero_grad()
    # Backward pass
    loss.backward()
    
    # update model parameters
    optimizer.step()


0 44023920.0
1 39908464.0
2 34860260.0
3 25495116.0
4 15362966.0
5 8209458.0
6 4467896.5
7 2711819.75
8 1867639.125
9 1411455.0
10 1126800.75
11 927079.625
12 776221.1875
13 657708.5
14 561830.25
15 483154.8125
16 417885.28125
17 363341.8125
18 317362.875
19 278406.125
20 245216.515625
21 216797.21875
22 192299.296875
23 171093.453125
24 152671.828125
25 136611.484375
26 122531.484375
27 110164.9453125
28 99275.2421875
29 89656.0234375
30 81135.65625
31 73555.5
32 66794.875
33 60751.875
34 55342.484375
35 50483.88671875
36 46113.3671875
37 42173.1015625
38 38615.0625
39 35397.92578125
40 32484.328125
41 29839.8515625
42 27438.611328125
43 25254.708984375
44 23266.3046875
45 21452.1796875
46 19795.71875
47 18281.59375
48 16897.3828125
49 15629.1953125
50 14466.310546875
51 13398.57421875
52 12417.609375
53 11515.810546875
54 10685.603515625
55 9921.30859375
56 9217.6962890625
57 8570.138671875
58 7972.24755859375
59 7419.642578125
60 6908.4130859375
61 6435.3818359375
62 5997.5703125
63

480 5.238951416686177e-05
481 5.1539711421355605e-05
482 5.084367876406759e-05
483 5.008694279240444e-05
484 4.9289428716292605e-05
485 4.85954005853273e-05
486 4.7969766455935314e-05
487 4.7210531192831695e-05
488 4.66282399429474e-05
489 4.590407479554415e-05
490 4.515946056926623e-05
491 4.4515654735732824e-05
492 4.361835453892127e-05
493 4.303583045839332e-05
494 4.250190977472812e-05
495 4.1883005906129256e-05
496 4.1274004615843296e-05
497 4.075744436704554e-05
498 4.014807564090006e-05
499 3.968972305301577e-05



PyTorch: 自定义 nn Modules
--------------------------

我们可以定义一个模型，这个模型继承自nn.Module类。如果需要定义一个比Sequential模型更加复杂的模型，就需要定义nn.Module模型。



In [75]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(TwoLayerNet, self).__init__()
        # define the model architecture
        self.linear1 = torch.nn.Linear(D_in, H, bias=False)
        self.linear2 = torch.nn.Linear(H, D_out, bias=False)
    
    def forward(self, x):
        y_pred = self.linear2(self.linear1(x).clamp(min=0))
        return y_pred

model = TwoLayerNet(D_in, H, D_out)
loss_fn = nn.MSELoss(reduction='sum')
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    print(it, loss.item())

    optimizer.zero_grad()
    # Backward pass
    loss.backward()
    
    # update model parameters
    optimizer.step()


0 741.367919921875
1 723.1994018554688
2 705.5691528320312
3 688.4307861328125
4 671.8004760742188
5 655.5983276367188
6 639.82861328125
7 624.4682006835938
8 609.4628295898438
9 594.8193969726562
10 580.5537109375
11 566.6222534179688
12 553.0783081054688
13 539.9537353515625
14 527.1505126953125
15 514.7080688476562
16 502.5781555175781
17 490.7689208984375
18 479.28436279296875
19 468.068359375
20 457.13232421875
21 446.4938049316406
22 436.1311340332031
23 426.0333557128906
24 416.1996765136719
25 406.578369140625
26 397.1629638671875
27 387.9591369628906
28 378.9703674316406
29 370.2292785644531
30 361.73321533203125
31 353.4710693359375
32 345.3526916503906
33 337.35302734375
34 329.4936218261719
35 321.80535888671875
36 314.2646484375
37 306.8771057128906
38 299.64599609375
39 292.60235595703125
40 285.71514892578125
41 278.9482116699219
42 272.30914306640625
43 265.81829833984375
44 259.4661865234375
45 253.22535705566406
46 247.116943359375
47 241.1483154296875
48 235.30384826

363 0.00018425809685140848
364 0.0001750338706187904
365 0.0001662639988353476
366 0.00015792266640346497
367 0.0001500002108514309
368 0.0001424619258614257
369 0.00013530178694054484
370 0.0001284906466025859
371 0.00012202076322864741
372 0.00011586421896936372
373 0.00011002228711731732
374 0.0001044580785674043
375 9.917588613461703e-05
376 9.415853128302842e-05
377 8.938251266954467e-05
378 8.485325815854594e-05
379 8.05414019851014e-05
380 7.645088044228032e-05
381 7.25605568732135e-05
382 6.886084884172305e-05
383 6.535212742164731e-05
384 6.200971256475896e-05
385 5.884281563339755e-05
386 5.582921585300937e-05
387 5.2966915973229334e-05
388 5.0248945626663044e-05
389 4.7662480938015506e-05
390 4.521236405707896e-05
391 4.288506170269102e-05
392 4.067265035700984e-05
393 3.8571459299419075e-05
394 3.657730849226937e-05
395 3.468135400908068e-05
396 3.288297011749819e-05
397 3.117313463008031e-05
398 2.955781383207068e-05
399 2.801453956635669e-05
400 2.6553163479547948e-05
401