# 第一课

褚则伟 zeweichu@gmail.com

[参考资料 reference](https://pytorch.org/tutorials/beginner/pytorch_with_examples.html)


什么是PyTorch?
================

PyTorch是一个基于Python的科学计算库，它有以下特点:

- 类似于NumPy，但是它可以使用GPU
- 可以用它定义深度学习模型，可以灵活地进行深度学习模型的训练和使用

Tensors
---------------


Tensor类似与NumPy的ndarray，唯一的区别是Tensor可以在GPU上加速运算。


In [2]:
from __future__ import print_function
import torch

构造一个未初始化的5x3矩阵:

In [3]:
x = torch.empty(5, 3)
print(x)

tensor([[4.0803e-34, 0.0000e+00, 4.0802e-34],
        [0.0000e+00, 4.0804e-34, 0.0000e+00],
        [4.0804e-34, 0.0000e+00, 4.0802e-34],
        [0.0000e+00, 4.0802e-34, 0.0000e+00],
        [4.0801e-34, 0.0000e+00, 4.0801e-34]])


构建一个随机初始化的矩阵:

In [4]:
x = torch.rand(5, 3)
print(x)

tensor([[0.6923, 0.4466, 0.5979],
        [0.6293, 0.9607, 0.8093],
        [0.0310, 0.2723, 0.5436],
        [0.6810, 0.2583, 0.7221],
        [0.6866, 0.8760, 0.1252]])


构建一个全部为0，类型为long的矩阵:

In [5]:
x = torch.zeros(5, 3, dtype=torch.long)
print(x)

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])


从数据直接直接构建tensor:

In [6]:
x = torch.tensor([5.5, 3])
print(x)

tensor([5.5000, 3.0000])


也可以从一个已有的tensor构建一个tensor。这些方法会重用原来tensor的特征，例如，数据类型，除非提供新的数据。

In [7]:
x = x.new_ones(5, 3, dtype=torch.double)      # new_* methods take in sizes
print(x)

x = torch.randn_like(x, dtype=torch.float)    # override dtype!
print(x)                                      # result has the same size

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)
tensor([[ 0.2076,  0.0085, -0.1971],
        [ 1.2965,  1.2431,  0.4297],
        [ 0.8921, -0.8453, -0.2545],
        [ 0.9012,  0.1385, -0.9099],
        [-0.3527, -0.7621,  0.9253]])


得到tensor的形状:

In [8]:
print(x.size())

torch.Size([5, 3])


<div class="alert alert-info"><h4>注意</h4><p>``torch.Size`` 返回的是一个tuple</p></div>

Operations


有很多种tensor运算。我们先介绍加法运算。



In [9]:
y = torch.rand(5, 3)
print(x + y)

tensor([[ 1.1331,  0.0626,  0.1831],
        [ 2.0775,  1.2809,  0.7278],
        [ 1.1321, -0.1051,  0.2828],
        [ 1.0618,  0.3110,  0.0255],
        [ 0.4819, -0.3708,  1.0159]])


另一种着加法的写法


In [10]:
print(torch.add(x, y))

tensor([[ 1.1331,  0.0626,  0.1831],
        [ 2.0775,  1.2809,  0.7278],
        [ 1.1321, -0.1051,  0.2828],
        [ 1.0618,  0.3110,  0.0255],
        [ 0.4819, -0.3708,  1.0159]])


加法：把输出作为一个变量

In [11]:
result = torch.empty(5, 3)
torch.add(x, y, out=result)
print(result)

tensor([[ 1.1331,  0.0626,  0.1831],
        [ 2.0775,  1.2809,  0.7278],
        [ 1.1321, -0.1051,  0.2828],
        [ 1.0618,  0.3110,  0.0255],
        [ 0.4819, -0.3708,  1.0159]])


in-place加法

In [12]:
# adds x to y
y.add_(x)
print(y)

tensor([[ 1.1331,  0.0626,  0.1831],
        [ 2.0775,  1.2809,  0.7278],
        [ 1.1321, -0.1051,  0.2828],
        [ 1.0618,  0.3110,  0.0255],
        [ 0.4819, -0.3708,  1.0159]])


<div class="alert alert-info"><h4>注意</h4><p>任何in-place的运算都会以``_``结尾。
    举例来说：``x.copy_(y)``, ``x.t_()``, 会改变 ``x``。</p></div>

各种类似NumPy的indexing都可以在PyTorch tensor上面使用。


In [13]:
print(x[:, 1])

tensor([ 0.0085,  1.2431, -0.8453,  0.1385, -0.7621])


Resizing: 如果你希望resize/reshape一个tensor，可以使用``torch.view``：

In [14]:
x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
print(x.size(), y.size(), z.size())

torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])


如果你有一个只有一个元素的tensor，使用``.item()``方法可以把里面的value变成Python数值。

In [15]:
x = torch.randn(1)
print(x)
print(x.item())

tensor([0.5799])
0.5799381136894226


**更多阅读**


  各种Tensor operations, 包括transposing, indexing, slicing,
  mathematical operations, linear algebra, random numbers在
  `<https://pytorch.org/docs/torch>`.

Numpy和Tensor之间的转化
------------

在Torch Tensor和NumPy array之间相互转化非常容易。

Torch Tensor和NumPy array会共享内存，所以改变其中一项也会改变另一项。

把Torch Tensor转变成NumPy Array


In [16]:
a = torch.ones(5)
print(a)

tensor([1., 1., 1., 1., 1.])


In [17]:
b = a.numpy()
print(b)

[1. 1. 1. 1. 1.]


改变numpy array里面的值。

In [18]:
a.add_(1)
print(a)
print(b)

tensor([2., 2., 2., 2., 2.])
[2. 2. 2. 2. 2.]


把NumPy ndarray转成Torch Tensor

In [19]:
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)


所有CPU上的Tensor都支持转成numpy或者从numpy转成Tensor。

CUDA Tensors
------------

使用``.to``方法，Tensor可以被移动到别的device上。



In [20]:
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!


热身: 用numpy实现两层神经网络
--------------

一个全连接ReLU神经网络，一个隐藏层，没有bias。用来从x预测y，使用L2 Loss。

这一实现完全使用numpy来计算前向神经网络，loss，和反向传播。

numpy ndarray是一个普通的n维array。它不知道任何关于深度学习或者梯度(gradient)的知识，也不知道计算图(computation graph)，只是一种用来计算数学运算的数据结构。



In [21]:
import numpy as np

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y
    h = x.dot(w1)
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2)

    # Compute and print loss
    loss = np.square(y_pred - y).sum()
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    
    # loss = (y_pred - y) ** 2
    grad_y_pred = 2.0 * (y_pred - y)
    # 
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0
    grad_w1 = x.T.dot(grad_h)

    # Update weights
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 28868052.57356777
1 27115275.03508952
2 31291475.05254549
3 36302121.99052882
4 36287345.86749549
5 28174127.210597765
6 16587001.553902388
7 7938172.257247687
8 3660729.6697602975
9 1899107.5546131027
10 1189324.5312735755
11 868141.8878879608
12 693610.3965701633
13 579713.502252846
14 495424.57333640003
15 428565.2851574268
16 373590.59444716736
17 327615.2599573383
18 288629.45621817734
19 255299.14535285157
20 226613.9591706127
21 201820.56314657573
22 180287.53449350415
23 161496.22472958692
24 145031.52086992003
25 130560.25261010249
26 117797.68005167045
27 106505.07470140357
28 96502.94188954728
29 87612.63336911226
30 79680.24502064346
31 72591.66993989836
32 66242.68272904513
33 60541.763100286684
34 55408.20490757318
35 50776.946734590325
36 46593.70375101612
37 42807.34880428959
38 39373.264509031236
39 36253.71582424566
40 33417.44236603436
41 30833.30284397974
42 28476.57875081805
43 26325.36635069699
44 24358.97706427971
45 22557.976552249675
46 20906.467432163718
47 

431 0.00018811575563015912
432 0.00018032404571764632
433 0.000172856828872306
434 0.0001657010226845256
435 0.00015884108314448466
436 0.00015226696357951427
437 0.00014596553187158124
438 0.00013992526799636485
439 0.0001341373578270604
440 0.00012858824270555293
441 0.00012326873717901756
442 0.00011817084156482767
443 0.0001132839096106661
444 0.00010860196673607986
445 0.00010411182027667209
446 9.980741924031327e-05
447 9.568202832909735e-05
448 9.172771309381655e-05
449 8.793786721117195e-05
450 8.430492216874929e-05
451 8.082243670190801e-05
452 7.748409438499e-05
453 7.428348882425407e-05
454 7.121551879668399e-05
455 6.827532731981041e-05
456 6.545615020018506e-05
457 6.27543271032003e-05
458 6.0164050411769975e-05
459 5.768085391444808e-05
460 5.530118606106452e-05
461 5.301943229142931e-05
462 5.08315280172722e-05
463 4.873409485465922e-05
464 4.6723472618261035e-05
465 4.4797082752335744e-05
466 4.2949757521381274e-05
467 4.117823966746819e-05
468 3.9480047020647116e-05
46


PyTorch: Tensors
----------------

这次我们使用PyTorch tensors来创建前向神经网络，计算损失，以及反向传播。

一个PyTorch Tensor很像一个numpy的ndarray。但是它和numpy ndarray最大的区别是，PyTorch Tensor可以在CPU或者GPU上运算。如果想要在GPU上运算，就需要把Tensor换成cuda类型。


In [22]:
import torch


dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Randomly initialize weights
w1 = torch.randn(D_in, H, device=device, dtype=dtype)
w2 = torch.randn(H, D_out, device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y
    h = x.mm(w1)
    h_relu = h.clamp(min=0)
    y_pred = h_relu.mm(w2)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum().item()
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)

    # Update weights using gradient descent
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 32956090.0
1 27097914.0
2 23614044.0
3 19576670.0
4 14652448.0
5 9967384.0
6 6359029.5
7 4007776.5
8 2601171.75
9 1785718.625
10 1301448.625
11 999863.0
12 799561.5
13 657965.5
14 552200.1875
15 469708.6875
16 403358.78125
17 348874.5625
18 303461.125
19 265178.6875
20 232671.1875
21 204858.625
22 180966.921875
23 160318.828125
24 142440.9375
25 126922.0390625
26 113374.8828125
27 101495.390625
28 91053.7109375
29 81857.9921875
30 73749.0
31 66559.65625
32 60171.37890625
33 54490.12109375
34 49420.89453125
35 44886.5234375
36 40824.4453125
37 37184.3671875
38 33915.8671875
39 30972.748046875
40 28317.734375
41 25920.28515625
42 23750.81640625
43 21785.75390625
44 20004.26953125
45 18386.236328125
46 16915.955078125
47 15578.9443359375
48 14360.3271484375
49 13249.484375
50 12235.0986328125
51 11307.298828125
52 10458.353515625
53 9680.435546875
54 8967.3271484375
55 8313.08984375
56 7711.58203125
57 7158.62109375
58 6650.458984375
59 6182.4794921875
60 5751.23681640625
61 5353.424804

376 0.00361926993355155
377 0.0034923870116472244
378 0.003370524849742651
379 0.0032517150975763798
380 0.0031378043349832296
381 0.0030334291514009237
382 0.0029294025152921677
383 0.0028252508491277695
384 0.002731822896748781
385 0.0026373667642474174
386 0.0025465653743594885
387 0.0024617582093924284
388 0.0023804130032658577
389 0.0022999965585768223
390 0.002224275143817067
391 0.0021548080258071423
392 0.002081837970763445
393 0.002016559476032853
394 0.0019506642129272223
395 0.0018879337003454566
396 0.0018292403547093272
397 0.0017698075389489532
398 0.0017155016539618373
399 0.0016602501273155212
400 0.0016090035205706954
401 0.0015593778807669878
402 0.0015113624976947904
403 0.0014660642482340336
404 0.001420457847416401
405 0.001376377185806632
406 0.0013358358992263675
407 0.0012945857597514987
408 0.0012563893105834723
409 0.001218687742948532
410 0.0011816875776275992
411 0.001146637718193233
412 0.0011110807536169887
413 0.0010805249912664294
414 0.00104948657099157

简单的autograd

In [22]:
# Create tensors.
x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

# Build a computational graph.
y = w * x + b    # y = 2 * x + 3

# Compute gradients.
y.backward()

# Print out the gradients.
print(x.grad)    # x.grad = 2 
print(w.grad)    # w.grad = 1 
print(b.grad)    # b.grad = 1 

tensor(2.)
tensor(1.)
tensor(1.)



PyTorch: Tensor和autograd
-------------------------------

PyTorch的一个重要功能就是autograd，也就是说只要定义了forward pass(前向神经网络)，计算了loss之后，PyTorch可以自动求导计算模型所有参数的梯度。

一个PyTorch的Tensor表示计算图中的一个节点。如果``x``是一个Tensor并且``x.requires_grad=True``那么``x.grad``是另一个储存着``x``当前梯度(相对于一个scalar，常常是loss)的向量。


In [24]:
import torch

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU

# N 是 batch size; D_in 是 input dimension;
# H 是 hidden dimension; D_out 是 output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# 创建随机的Tensor来保存输入和输出
# 设定requires_grad=False表示在反向传播的时候我们不需要计算gradient
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# 创建随机的Tensor和权重。
# 设置requires_grad=True表示我们希望反向传播的时候计算Tensor的gradient
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # 前向传播:通过Tensor预测y；这个和普通的神经网络的前向传播没有任何不同，
    # 但是我们不需要保存网络的中间运算结果，因为我们不需要手动计算反向传播。
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    # 通过前向传播计算loss
    # loss是一个形状为(1，)的Tensor
    # loss.item()可以给我们返回一个loss的scalar
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.item())

    # PyTorch给我们提供了autograd的方法做反向传播。如果一个Tensor的requires_grad=True，
    # backward会自动计算loss相对于每个Tensor的gradient。在backward之后，
    # w1.grad和w2.grad会包含两个loss相对于两个Tensor的gradient信息。
    loss.backward()

    # 我们可以手动做gradient descent(后面我们会介绍自动的方法)。
    # 用torch.no_grad()包含以下statements，因为w1和w2都是requires_grad=True，
    # 但是在更新weights之后我们并不需要再做autograd。
    # 另一种方法是在weight.data和weight.grad.data上做操作，这样就不会对grad产生影响。
    # tensor.data会我们一个tensor，这个tensor和原来的tensor指向相同的内存空间，
    # 但是不会记录计算图的历史。
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        # Manually zero the gradients after updating weights
        w1.grad.zero_()
        w2.grad.zero_()

0 33571884.0
1 31929312.0
2 33276808.0
3 31921340.0
4 25814330.0
5 16884990.0
6 9453852.0
7 4965474.0
8 2758214.75
9 1719456.25
10 1210101.75
11 930089.625
12 754499.5625
13 630972.9375
14 537056.25
15 462069.59375
16 400510.84375
17 349197.0
18 305926.46875
19 269106.09375
20 237603.9375
21 210565.84375
22 187256.65625
23 167005.40625
24 149354.09375
25 133902.171875
26 120314.078125
27 108327.953125
28 97731.984375
29 88327.296875
30 79964.1484375
31 72520.0859375
32 65881.6328125
33 59940.60546875
34 54615.7265625
35 49834.5078125
36 45532.21484375
37 41651.41015625
38 38151.9765625
39 34989.80859375
40 32122.451171875
41 29519.587890625
42 27154.046875
43 25002.115234375
44 23041.447265625
45 21252.388671875
46 19617.90234375
47 18123.943359375
48 16757.15234375
49 15504.8359375
50 14357.3837890625
51 13303.3583984375
52 12335.654296875
53 11446.6826171875
54 10629.0009765625
55 9876.03125
56 9181.96484375
57 8541.224609375
58 7949.3125
59 7402.14404296875
60 6896.064453125
61 6427

395 0.000256311206612736
396 0.00024881219724193215
397 0.00024159329768735915
398 0.00023518210218753666
399 0.0002279756445204839
400 0.00022176736092660576
401 0.00021581229520961642
402 0.0002097237011184916
403 0.0002043000713456422
404 0.0001989398879231885
405 0.00019341582083143294
406 0.00018870840722229332
407 0.00018378450477030128
408 0.0001790841342881322
409 0.00017468162695877254
410 0.000169952807482332
411 0.00016584989498369396
412 0.00016170708113349974
413 0.00015763999545015395
414 0.00015358079690486193
415 0.00015000958228483796
416 0.00014676700811833143
417 0.0001430653064744547
418 0.00013947510160505772
419 0.0001365327916573733
420 0.00013288285117596388
421 0.00013032557035330683
422 0.00012723769759759307
423 0.00012406150926835835
424 0.00012134780990891159
425 0.00011836851626867428
426 0.00011615081166382879
427 0.00011359989730408415
428 0.00011063810961786658
429 0.00010837314039235935
430 0.00010598198423394933
431 0.00010371387179475278
432 0.000101


PyTorch: nn
-----------


这次我们使用PyTorch中nn这个库来构建网络。
用PyTorch autograd来构建计算图和计算gradients，
然后PyTorch会帮我们自动计算gradient。




In [26]:
import torch

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-4
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Tensor of input data to the Module and it produces
    # a Tensor of output data.
    y_pred = model(x)

    # Compute and print loss. We pass Tensors containing the predicted and true
    # values of y, and the loss function returns a Tensor containing the
    # loss.
    loss = loss_fn(y_pred, y)
    print(t, loss.item())

    # Zero the gradients before running the backward pass.
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Tensors with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Tensor, so
    # we can access its gradients like we did before.
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

0 630.85986328125
1 582.67041015625
2 541.1131591796875
3 504.8827209472656
4 472.46527099609375
5 443.42401123046875
6 417.10595703125
7 392.8735046386719
8 370.28314208984375
9 349.292236328125
10 329.595458984375
11 310.9827880859375
12 293.49700927734375
13 277.09954833984375
14 261.648681640625
15 246.99583435058594
16 233.16456604003906
17 220.13485717773438
18 207.7181854248047
19 195.93637084960938
20 184.74789428710938
21 174.068359375
22 163.90664672851562
23 154.2133331298828
24 145.00772094726562
25 136.22647094726562
26 127.88792419433594
27 120.00291442871094
28 112.5560531616211
29 105.52362823486328
30 98.86790466308594
31 92.55335998535156
32 86.6054916381836
33 81.00811767578125
34 75.75829315185547
35 70.83821868896484
36 66.22423553466797
37 61.899906158447266
38 57.844520568847656
39 54.04759979248047
40 50.49897766113281
41 47.1849479675293
42 44.06108856201172
43 41.14775466918945
44 38.42915344238281
45 35.894100189208984
46 33.52758026123047
47 31.3200454711914

370 7.074732275214046e-05
371 6.863824091851711e-05
372 6.659422069787979e-05
373 6.460691656684503e-05
374 6.268275319598615e-05
375 6.082063555368222e-05
376 5.901150871068239e-05
377 5.726024028263055e-05
378 5.555655661737546e-05
379 5.390537990024313e-05
380 5.230847091297619e-05
381 5.075562512502074e-05
382 4.925175380776636e-05
383 4.779342998517677e-05
384 4.637831443687901e-05
385 4.500804789131507e-05
386 4.3676787754520774e-05
387 4.238497422193177e-05
388 4.113196337129921e-05
389 3.9917165850056335e-05
390 3.874160029226914e-05
391 3.759563696803525e-05
392 3.6489662306848913e-05
393 3.5414148442214355e-05
394 3.436878614593297e-05
395 3.335912697366439e-05
396 3.237826967961155e-05
397 3.142527202726342e-05
398 3.050278792215977e-05
399 2.960606616397854e-05
400 2.87373477476649e-05
401 2.789456266327761e-05
402 2.7074032914242707e-05
403 2.628388028824702e-05
404 2.5513101718388498e-05
405 2.4767006834736094e-05
406 2.4041921278694645e-05
407 2.3335256628342904e-05
408 


PyTorch: optim
--------------

这一次我们不再手动更新模型的weights,而是使用optim这个包来帮助我们更新参数。
optim这个package提供了各种不同的模型优化方法，包括SGD+momentum, RMSProp, Adam等等。


In [27]:
import torch

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(reduction='sum')

# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algoriths. The first argument to the Adam constructor tells the
# optimizer which Tensors it should update.
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(x)

    # Compute and print loss.
    loss = loss_fn(y_pred, y)
    print(t, loss.item())

    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables it will update (which are the learnable
    # weights of the model). This is because by default, gradients are
    # accumulated in buffers( i.e, not overwritten) whenever .backward()
    # is called. Checkout docs of torch.autograd.backward for more details.
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to model
    # parameters
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its
    # parameters
    optimizer.step()

0 688.0264892578125
1 670.944580078125
2 654.359130859375
3 638.3897705078125
4 622.9364624023438
5 607.8897705078125
6 593.2568359375
7 579.0560913085938
8 565.3549194335938
9 552.1868286132812
10 539.4688110351562
11 527.09912109375
12 515.0761108398438
13 503.3723449707031
14 491.9695129394531
15 480.96319580078125
16 470.23773193359375
17 459.80914306640625
18 449.62982177734375
19 439.681396484375
20 429.95880126953125
21 420.4364929199219
22 411.13214111328125
23 402.05078125
24 393.1586608886719
25 384.416015625
26 375.84747314453125
27 367.4931945800781
28 359.349609375
29 351.44476318359375
30 343.7060241699219
31 336.2020568847656
32 328.8648681640625
33 321.69708251953125
34 314.6783142089844
35 307.7843017578125
36 301.01953125
37 294.3859558105469
38 287.869384765625
39 281.4539489746094
40 275.1696472167969
41 269.0242004394531
42 263.011474609375
43 257.1146545410156
44 251.33160400390625
45 245.66079711914062
46 240.09506225585938
47 234.62721252441406
48 229.2420349121

397 0.00011393141176085919
398 0.00010888657561736181
399 0.00010405841749161482
400 9.944082557922229e-05
401 9.50242902035825e-05
402 9.079783194465563e-05
403 8.675449498696253e-05
404 8.288954268209636e-05
405 7.919271592982113e-05
406 7.56545050535351e-05
407 7.227329479064792e-05
408 6.903617759235203e-05
409 6.594533624593168e-05
410 6.298521475400776e-05
411 6.01552237640135e-05
412 5.745018279412761e-05
413 5.4865467973286286e-05
414 5.239238453214057e-05
415 5.0026734243147075e-05
416 4.7767905925866216e-05
417 4.560576053336263e-05
418 4.3536296288948506e-05
419 4.156272188993171e-05
420 3.9676055166637525e-05
421 3.787132300203666e-05
422 3.6145673220744357e-05
423 3.4496861189836636e-05
424 3.2921816455200315e-05
425 3.141547495033592e-05
426 2.9978080419823527e-05
427 2.8603690225281753e-05
428 2.7288595447316766e-05
429 2.603433676995337e-05
430 2.4834369469317608e-05
431 2.368946479691658e-05
432 2.259447137475945e-05
433 2.15488689718768e-05
434 2.0553030481096357e-05



PyTorch: 自定义 nn Modules
--------------------------

我们可以定义一个模型，这个模型继承自nn.Module类。如果需要定义一个比Sequential模型更加复杂的模型，就需要定义nn.Module模型。



In [28]:
import torch


class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we instantiate two nn.Linear modules and assign them as
        member variables.
        """
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        In the forward function we accept a Tensor of input data and we must return
        a Tensor of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Tensors.
        """
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        return y_pred


# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Construct our model by instantiating the class defined above
model = TwoLayerNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

0 665.3380737304688
1 615.0828247070312
2 571.7913818359375
3 533.4841918945312
4 499.0491027832031
5 468.1438903808594
6 440.30767822265625
7 414.8002014160156
8 391.4633483886719
9 369.9346008300781
10 349.84002685546875
11 330.90155029296875
12 313.066650390625
13 296.2475891113281
14 280.3448486328125
15 265.3222961425781
16 251.02975463867188
17 237.39248657226562
18 224.3234100341797
19 211.87503051757812
20 200.0669708251953
21 188.83961486816406
22 178.1814422607422
23 168.0193634033203
24 158.37615966796875
25 149.19854736328125
26 140.5041961669922
27 132.26853942871094
28 124.47457885742188
29 117.09898376464844
30 110.13732147216797
31 103.5682373046875
32 97.3167495727539
33 91.40064239501953
34 85.8296890258789
35 80.58805847167969
36 75.63665008544922
37 70.98626708984375
38 66.61186981201172
39 62.505393981933594
40 58.65114212036133
41 55.038394927978516
42 51.65107345581055
43 48.47539520263672
44 45.490631103515625
45 42.69648361206055
46 40.074886322021484
47 37.619

393 1.6788251741672866e-05
394 1.6258236428257078e-05
395 1.5743915355415083e-05
396 1.5246666407620069e-05
397 1.4764643310627434e-05
398 1.4299067515821662e-05
399 1.3849773495167028e-05
400 1.3414000932243653e-05
401 1.2992491974728182e-05
402 1.2584440810314845e-05
403 1.2190929737698752e-05
404 1.1808726412709802e-05
405 1.1438743968028575e-05
406 1.1081637239840347e-05
407 1.0733876479207538e-05
408 1.0399793609394692e-05
409 1.0076848411699757e-05
410 9.762939043866936e-06
411 9.460719411436003e-06
412 9.16464068723144e-06
413 8.881499525159597e-06
414 8.605771654401906e-06
415 8.339302439708263e-06
416 8.081084160949104e-06
417 7.831304174032994e-06
418 7.589589131384855e-06
419 7.356352853093995e-06
420 7.128825473046163e-06
421 6.909640433150344e-06
422 6.697578101011459e-06
423 6.491952262877021e-06
424 6.292283160291845e-06
425 6.099975053075468e-06
426 5.913933819101658e-06
427 5.7329466471855994e-06
428 5.557969416258857e-06
429 5.387428245740011e-06
430 5.223613243288128

# FizzBuzz

FizzBuzz是一个简单的小游戏。游戏规则如下：从1开始往上数数，当遇到3的倍数的时候，说fizz，当遇到5的倍数，说buzz，当遇到15的倍数，就说fizzbuzz，其他情况下则正常数数。

我们可以写一个简单的小程序来决定要返回正常数值还是fizz, buzz 或者 fizzbuzz。

In [30]:
# One-hot encode the desired outputs: [number, "fizz", "buzz", "fizzbuzz"]
def fizz_buzz_encode(i):
    if   i % 15 == 0: return 3
    elif i % 5  == 0: return 2
    elif i % 3  == 0: return 1
    else:             return 0
    
def fizz_buzz_decode(i, prediction):
    return [str(i), "fizz", "buzz", "fizzbuzz"][prediction]

print(fizz_buzz_decode(1, fizz_buzz_encode(1)))
print(fizz_buzz_decode(2, fizz_buzz_encode(2)))
print(fizz_buzz_decode(5, fizz_buzz_encode(5)))
print(fizz_buzz_decode(12, fizz_buzz_encode(12)))
print(fizz_buzz_decode(15, fizz_buzz_encode(15)))

1
2
buzz
fizz
fizzbuzz


我们首先定义模型的输入与输出(训练数据)

In [31]:
import numpy as np
import torch

NUM_DIGITS = 10

# Represent each input by an array of its binary digits.
def binary_encode(i, num_digits):
    return np.array([i >> d & 1 for d in range(num_digits)])

trX = torch.Tensor([binary_encode(i, NUM_DIGITS) for i in range(101, 2 ** NUM_DIGITS)])
trY = torch.LongTensor([fizz_buzz_encode(i) for i in range(101, 2 ** NUM_DIGITS)])

然后我们用PyTorch定义模型

In [32]:
# Define the model
NUM_HIDDEN = 100
model = torch.nn.Sequential(
    torch.nn.Linear(NUM_DIGITS, NUM_HIDDEN),
    torch.nn.ReLU(),
    torch.nn.Linear(NUM_HIDDEN, 4)
)

- 为了让我们的模型学会FizzBuzz这个游戏，我们需要定义一个损失函数，和一个优化算法。
- 这个优化算法会不断优化（降低）损失函数，使得模型的在该任务上取得尽可能低的损失值。
- 损失值低往往表示我们的模型表现好，损失值高表示我们的模型表现差。
- 由于FizzBuzz游戏本质上是一个分类问题，我们选用Cross Entropyy Loss函数。
- 优化函数我们选用Stochastic Gradient Descent。

In [33]:
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.05)

以下是模型的训练代码

In [34]:
# Start training it
BATCH_SIZE = 128
for epoch in range(10000):
    for start in range(0, len(trX), BATCH_SIZE):
        end = start + BATCH_SIZE
        batchX = trX[start:end]
        batchY = trY[start:end]

        y_pred = model(batchX)
        loss = loss_fn(y_pred, batchY)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    # Find loss on training data
    loss = loss_fn(model(trX), trY).item()
    print('Epoch:', epoch, 'Loss:', loss)

Epoch: 0 Loss: 1.1935508251190186
Epoch: 1 Loss: 1.1643295288085938
Epoch: 2 Loss: 1.1539536714553833
Epoch: 3 Loss: 1.1493875980377197
Epoch: 4 Loss: 1.1470458507537842
Epoch: 5 Loss: 1.1456778049468994
Epoch: 6 Loss: 1.1447780132293701
Epoch: 7 Loss: 1.144127607345581
Epoch: 8 Loss: 1.1436216831207275
Epoch: 9 Loss: 1.1432068347930908
Epoch: 10 Loss: 1.1428511142730713
Epoch: 11 Loss: 1.142538070678711
Epoch: 12 Loss: 1.1422566175460815
Epoch: 13 Loss: 1.141999363899231
Epoch: 14 Loss: 1.141761302947998
Epoch: 15 Loss: 1.1415361166000366
Epoch: 16 Loss: 1.1413235664367676
Epoch: 17 Loss: 1.1411240100860596
Epoch: 18 Loss: 1.1409335136413574
Epoch: 19 Loss: 1.140749216079712
Epoch: 20 Loss: 1.1405739784240723
Epoch: 21 Loss: 1.1404038667678833
Epoch: 22 Loss: 1.1402405500411987
Epoch: 23 Loss: 1.140082836151123
Epoch: 24 Loss: 1.139930248260498
Epoch: 25 Loss: 1.1397836208343506
Epoch: 26 Loss: 1.1396406888961792
Epoch: 27 Loss: 1.1395002603530884
Epoch: 28 Loss: 1.1393667459487915
Ep

Epoch: 241 Loss: 1.121208667755127
Epoch: 242 Loss: 1.1210708618164062
Epoch: 243 Loss: 1.120936393737793
Epoch: 244 Loss: 1.1207857131958008
Epoch: 245 Loss: 1.1206512451171875
Epoch: 246 Loss: 1.1204979419708252
Epoch: 247 Loss: 1.12035071849823
Epoch: 248 Loss: 1.1201863288879395
Epoch: 249 Loss: 1.1200536489486694
Epoch: 250 Loss: 1.119877815246582
Epoch: 251 Loss: 1.119739294052124
Epoch: 252 Loss: 1.1195684671401978
Epoch: 253 Loss: 1.119421124458313
Epoch: 254 Loss: 1.119261622428894
Epoch: 255 Loss: 1.11910879611969
Epoch: 256 Loss: 1.118928074836731
Epoch: 257 Loss: 1.1187739372253418
Epoch: 258 Loss: 1.118602991104126
Epoch: 259 Loss: 1.1184405088424683
Epoch: 260 Loss: 1.118254542350769
Epoch: 261 Loss: 1.1181004047393799
Epoch: 262 Loss: 1.1179044246673584
Epoch: 263 Loss: 1.1177715063095093
Epoch: 264 Loss: 1.1175622940063477
Epoch: 265 Loss: 1.117408275604248
Epoch: 266 Loss: 1.1172043085098267
Epoch: 267 Loss: 1.1170350313186646
Epoch: 268 Loss: 1.1168056726455688
Epoch:

Epoch: 485 Loss: 1.043981671333313
Epoch: 486 Loss: 1.0441464185714722
Epoch: 487 Loss: 1.043056845664978
Epoch: 488 Loss: 1.043007493019104
Epoch: 489 Loss: 1.0421439409255981
Epoch: 490 Loss: 1.0417488813400269
Epoch: 491 Loss: 1.040831208229065
Epoch: 492 Loss: 1.041581392288208
Epoch: 493 Loss: 1.039963960647583
Epoch: 494 Loss: 1.040024995803833
Epoch: 495 Loss: 1.0388249158859253
Epoch: 496 Loss: 1.038525938987732
Epoch: 497 Loss: 1.0376614332199097
Epoch: 498 Loss: 1.0373626947402954
Epoch: 499 Loss: 1.0364928245544434
Epoch: 500 Loss: 1.0362681150436401
Epoch: 501 Loss: 1.035320520401001
Epoch: 502 Loss: 1.0346912145614624
Epoch: 503 Loss: 1.0349137783050537
Epoch: 504 Loss: 1.0343878269195557
Epoch: 505 Loss: 1.034001111984253
Epoch: 506 Loss: 1.0328458547592163
Epoch: 507 Loss: 1.0324289798736572
Epoch: 508 Loss: 1.0327140092849731
Epoch: 509 Loss: 1.0311836004257202
Epoch: 510 Loss: 1.0307331085205078
Epoch: 511 Loss: 1.0309778451919556
Epoch: 512 Loss: 1.0298060178756714
Ep

Epoch: 720 Loss: 0.8737407326698303
Epoch: 721 Loss: 0.8727753162384033
Epoch: 722 Loss: 0.8708634376525879
Epoch: 723 Loss: 0.870244562625885
Epoch: 724 Loss: 0.8690597414970398
Epoch: 725 Loss: 0.8684432506561279
Epoch: 726 Loss: 0.8673935532569885
Epoch: 727 Loss: 0.8662580251693726
Epoch: 728 Loss: 0.8660253882408142
Epoch: 729 Loss: 0.86440509557724
Epoch: 730 Loss: 0.863584041595459
Epoch: 731 Loss: 0.8619486689567566
Epoch: 732 Loss: 0.8611655831336975
Epoch: 733 Loss: 0.8600929379463196
Epoch: 734 Loss: 0.8591097593307495
Epoch: 735 Loss: 0.8579716682434082
Epoch: 736 Loss: 0.856694221496582
Epoch: 737 Loss: 0.855548620223999
Epoch: 738 Loss: 0.8546916842460632
Epoch: 739 Loss: 0.8529385924339294
Epoch: 740 Loss: 0.8536899089813232
Epoch: 741 Loss: 0.850752055644989
Epoch: 742 Loss: 0.8506751656532288
Epoch: 743 Loss: 0.849148690700531
Epoch: 744 Loss: 0.8475183248519897
Epoch: 745 Loss: 0.8470127582550049
Epoch: 746 Loss: 0.8461018800735474
Epoch: 747 Loss: 0.844601035118103
E

Epoch: 950 Loss: 0.5873972773551941
Epoch: 951 Loss: 0.5856122970581055
Epoch: 952 Loss: 0.5847106575965881
Epoch: 953 Loss: 0.5835503339767456
Epoch: 954 Loss: 0.5827803611755371
Epoch: 955 Loss: 0.5809729695320129
Epoch: 956 Loss: 0.5795504450798035
Epoch: 957 Loss: 0.5782725214958191
Epoch: 958 Loss: 0.5774836540222168
Epoch: 959 Loss: 0.5760705471038818
Epoch: 960 Loss: 0.5752243399620056
Epoch: 961 Loss: 0.5733708739280701
Epoch: 962 Loss: 0.5728501677513123
Epoch: 963 Loss: 0.5710330605506897
Epoch: 964 Loss: 0.5704750418663025
Epoch: 965 Loss: 0.5689558982849121
Epoch: 966 Loss: 0.5673718452453613
Epoch: 967 Loss: 0.5667601227760315
Epoch: 968 Loss: 0.5652709603309631
Epoch: 969 Loss: 0.5645804405212402
Epoch: 970 Loss: 0.5632051825523376
Epoch: 971 Loss: 0.5622420310974121
Epoch: 972 Loss: 0.5614804029464722
Epoch: 973 Loss: 0.5594627857208252
Epoch: 974 Loss: 0.5577782988548279
Epoch: 975 Loss: 0.5572025775909424
Epoch: 976 Loss: 0.5554510951042175
Epoch: 977 Loss: 0.555225968

Epoch: 1179 Loss: 0.36338838934898376
Epoch: 1180 Loss: 0.36292120814323425
Epoch: 1181 Loss: 0.3624146580696106
Epoch: 1182 Loss: 0.3615139424800873
Epoch: 1183 Loss: 0.36091873049736023
Epoch: 1184 Loss: 0.3602304756641388
Epoch: 1185 Loss: 0.3595029413700104
Epoch: 1186 Loss: 0.3585662543773651
Epoch: 1187 Loss: 0.3583712875843048
Epoch: 1188 Loss: 0.3571169674396515
Epoch: 1189 Loss: 0.356743186712265
Epoch: 1190 Loss: 0.35582390427589417
Epoch: 1191 Loss: 0.35526055097579956
Epoch: 1192 Loss: 0.354393869638443
Epoch: 1193 Loss: 0.35425645112991333
Epoch: 1194 Loss: 0.35319799184799194
Epoch: 1195 Loss: 0.35241976380348206
Epoch: 1196 Loss: 0.35216283798217773
Epoch: 1197 Loss: 0.3509347140789032
Epoch: 1198 Loss: 0.3503079414367676
Epoch: 1199 Loss: 0.3497981131076813
Epoch: 1200 Loss: 0.34907668828964233
Epoch: 1201 Loss: 0.34884220361709595
Epoch: 1202 Loss: 0.34780353307724
Epoch: 1203 Loss: 0.3475281000137329
Epoch: 1204 Loss: 0.34621065855026245
Epoch: 1205 Loss: 0.3459352254

Epoch: 1400 Loss: 0.2431095689535141
Epoch: 1401 Loss: 0.2428750991821289
Epoch: 1402 Loss: 0.24222278594970703
Epoch: 1403 Loss: 0.24187667667865753
Epoch: 1404 Loss: 0.2415074110031128
Epoch: 1405 Loss: 0.24137982726097107
Epoch: 1406 Loss: 0.24100692570209503
Epoch: 1407 Loss: 0.24031637609004974
Epoch: 1408 Loss: 0.2399897277355194
Epoch: 1409 Loss: 0.23962019383907318
Epoch: 1410 Loss: 0.23961882293224335
Epoch: 1411 Loss: 0.23882603645324707
Epoch: 1412 Loss: 0.23862339556217194
Epoch: 1413 Loss: 0.2380097657442093
Epoch: 1414 Loss: 0.23759859800338745
Epoch: 1415 Loss: 0.23737433552742004
Epoch: 1416 Loss: 0.23691913485527039
Epoch: 1417 Loss: 0.23654450476169586
Epoch: 1418 Loss: 0.23596780002117157
Epoch: 1419 Loss: 0.2357207089662552
Epoch: 1420 Loss: 0.23535242676734924
Epoch: 1421 Loss: 0.23535646498203278
Epoch: 1422 Loss: 0.23443584144115448
Epoch: 1423 Loss: 0.2342406064271927
Epoch: 1424 Loss: 0.23399804532527924
Epoch: 1425 Loss: 0.23357146978378296
Epoch: 1426 Loss: 0

Epoch: 1620 Loss: 0.1748911589384079
Epoch: 1621 Loss: 0.1745273470878601
Epoch: 1622 Loss: 0.1743646264076233
Epoch: 1623 Loss: 0.17426569759845734
Epoch: 1624 Loss: 0.17398467659950256
Epoch: 1625 Loss: 0.17395471036434174
Epoch: 1626 Loss: 0.17339137196540833
Epoch: 1627 Loss: 0.1731652319431305
Epoch: 1628 Loss: 0.17297546565532684
Epoch: 1629 Loss: 0.17265257239341736
Epoch: 1630 Loss: 0.17248444259166718
Epoch: 1631 Loss: 0.17213313281536102
Epoch: 1632 Loss: 0.1721889227628708
Epoch: 1633 Loss: 0.17191950976848602
Epoch: 1634 Loss: 0.17169487476348877
Epoch: 1635 Loss: 0.17146196961402893
Epoch: 1636 Loss: 0.1711539924144745
Epoch: 1637 Loss: 0.1710202544927597
Epoch: 1638 Loss: 0.17068880796432495
Epoch: 1639 Loss: 0.17026832699775696
Epoch: 1640 Loss: 0.17024733126163483
Epoch: 1641 Loss: 0.1698412448167801
Epoch: 1642 Loss: 0.1696302890777588
Epoch: 1643 Loss: 0.16946430504322052
Epoch: 1644 Loss: 0.1694747358560562
Epoch: 1645 Loss: 0.1691400557756424
Epoch: 1646 Loss: 0.168

Epoch: 1848 Loss: 0.13113094866275787
Epoch: 1849 Loss: 0.13100245594978333
Epoch: 1850 Loss: 0.13085870444774628
Epoch: 1851 Loss: 0.13053908944129944
Epoch: 1852 Loss: 0.13047167658805847
Epoch: 1853 Loss: 0.13041676580905914
Epoch: 1854 Loss: 0.1301671862602234
Epoch: 1855 Loss: 0.12997475266456604
Epoch: 1856 Loss: 0.12976863980293274
Epoch: 1857 Loss: 0.12973414361476898
Epoch: 1858 Loss: 0.12979799509048462
Epoch: 1859 Loss: 0.12938444316387177
Epoch: 1860 Loss: 0.12932173907756805
Epoch: 1861 Loss: 0.12923963367938995
Epoch: 1862 Loss: 0.12891517579555511
Epoch: 1863 Loss: 0.1288333237171173
Epoch: 1864 Loss: 0.12865279614925385
Epoch: 1865 Loss: 0.12848612666130066
Epoch: 1866 Loss: 0.12828415632247925
Epoch: 1867 Loss: 0.12818744778633118
Epoch: 1868 Loss: 0.12798355519771576
Epoch: 1869 Loss: 0.12805861234664917
Epoch: 1870 Loss: 0.12767630815505981
Epoch: 1871 Loss: 0.1277802586555481
Epoch: 1872 Loss: 0.12744542956352234
Epoch: 1873 Loss: 0.12737230956554413
Epoch: 1874 Los

Epoch: 2077 Loss: 0.10214802622795105
Epoch: 2078 Loss: 0.10197548568248749
Epoch: 2079 Loss: 0.10202232003211975
Epoch: 2080 Loss: 0.10179951786994934
Epoch: 2081 Loss: 0.10166376829147339
Epoch: 2082 Loss: 0.1015709638595581
Epoch: 2083 Loss: 0.10149800777435303
Epoch: 2084 Loss: 0.10155168920755386
Epoch: 2085 Loss: 0.1014132872223854
Epoch: 2086 Loss: 0.10124729573726654
Epoch: 2087 Loss: 0.10107357054948807
Epoch: 2088 Loss: 0.10096876323223114
Epoch: 2089 Loss: 0.1008598729968071
Epoch: 2090 Loss: 0.10072267055511475
Epoch: 2091 Loss: 0.10077453404664993
Epoch: 2092 Loss: 0.10061092674732208
Epoch: 2093 Loss: 0.10057484358549118
Epoch: 2094 Loss: 0.10040275007486343
Epoch: 2095 Loss: 0.10029936581850052
Epoch: 2096 Loss: 0.10026378184556961
Epoch: 2097 Loss: 0.10011971741914749
Epoch: 2098 Loss: 0.09996891766786575
Epoch: 2099 Loss: 0.09983088076114655
Epoch: 2100 Loss: 0.09987082332372665
Epoch: 2101 Loss: 0.09975840151309967
Epoch: 2102 Loss: 0.09957456588745117
Epoch: 2103 Los

KeyboardInterrupt: 

最后我们用训练好的模型尝试在1到100这些数字上玩FizzBuzz游戏

In [35]:
# Output now
testX = torch.Tensor([binary_encode(i, NUM_DIGITS) for i in range(1, 101)])
with torch.no_grad():
    testY = model(testX)
predictions = zip(range(1, 101), list(testY.max(1)[1].data.tolist()))

print([fizz_buzz_decode(i, x) for (i, x) in predictions])

['1', '2', 'fizz', '4', 'buzz', 'fizz', '7', '8', 'fizz', '10', '11', 'fizz', '13', '14', 'fizzbuzz', '16', '17', 'fizz', '19', 'buzz', 'fizz', '22', '23', 'fizz', 'buzz', '26', 'fizz', '28', '29', 'fizzbuzz', '31', '32', 'fizz', '34', 'buzz', 'fizz', '37', '38', 'fizz', '40', '41', '42', '43', '44', 'fizzbuzz', '46', '47', 'fizz', '49', 'buzz', 'fizz', '52', '53', 'fizz', 'buzz', '56', 'fizz', '58', '59', 'fizzbuzz', '61', '62', 'fizz', '64', 'buzz', 'fizz', '67', 'buzz', 'buzz', 'buzz', '71', 'fizz', '73', '74', 'fizzbuzz', '76', '77', 'fizz', '79', 'buzz', 'fizz', '82', '83', 'fizz', 'buzz', '86', 'fizz', '88', '89', 'fizzbuzz', '91', '92', 'fizz', 'buzz', 'buzz', 'fizz', '97', '98', 'fizz', 'buzz']


In [36]:
print(np.sum(testY.max(1)[1].numpy() == np.array([fizz_buzz_encode(i) for i in range(1,101)])))
testY.max(1)[1].numpy() == np.array([fizz_buzz_encode(i) for i in range(1,101)])

94


array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True, False,  True, False,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True, False, False,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True, False,  True,  True,  True,  True,  True,
        True])