# 第一课

褚则伟 zeweichu@gmail.com

[参考资料 reference](https://pytorch.org/tutorials/beginner/pytorch_with_examples.html)


什么是PyTorch?
================

PyTorch是一个基于Python的科学计算库，它有以下特点:

- 类似于NumPy，但是它可以使用GPU
- 可以用它定义深度学习模型，可以灵活地进行深度学习模型的训练和使用

Tensors
---------------


Tensor类似与NumPy的ndarray，唯一的区别是Tensor可以在GPU上加速运算。


In [1]:
from __future__ import print_function
import torch

构造一个未初始化的5x3矩阵:

In [8]:
x = torch.empty(5, 3)
print(x)
print(x.size())
print(x.shape)

tensor([[4.7755e-39, 4.5000e-39, 4.2246e-39],
        [1.0286e-38, 1.0653e-38, 1.0194e-38],
        [8.4490e-39, 1.0469e-38, 9.3674e-39],
        [9.9184e-39, 8.7245e-39, 9.2755e-39],
        [8.9082e-39, 9.9184e-39, 8.4490e-39]])
torch.Size([5, 3])
torch.Size([5, 3])


构建一个随机初始化的矩阵:

In [3]:
x = torch.rand(5, 3)
print(x)

tensor([[0.4821, 0.3854, 0.8517],
        [0.7962, 0.0632, 0.5409],
        [0.8891, 0.6112, 0.7829],
        [0.0715, 0.8069, 0.2608],
        [0.3292, 0.0119, 0.2759]])


构建一个全部为0，类型为long的矩阵:

In [4]:
x = torch.zeros(5, 3, dtype=torch.int64)
print(x)
print(x.dtype)


tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])
torch.int64


In [13]:
y = x.view(15)
print(y)
print(y.shape)


tensor([4.7755e-39, 4.5000e-39, 4.2246e-39, 1.0286e-38, 1.0653e-38, 1.0194e-38,
        8.4490e-39, 1.0469e-38, 9.3674e-39, 9.9184e-39, 8.7245e-39, 9.2755e-39,
        8.9082e-39, 9.9184e-39, 8.4490e-39])
torch.Size([15])


从数据直接直接构建tensor:

In [5]:
x = torch.tensor([5.5, 3])
print(x)

tensor([5.5000, 3.0000])


也可以从一个已有的tensor构建一个tensor。这些方法会重用原来tensor的特征，例如，数据类型，除非提供新的数据。

In [6]:
x = x.new_ones(5, 3, dtype=torch.double)      # new_* methods take in sizes
print(x)

x = torch.randn_like(x, dtype=torch.float)    # override dtype!
print(x)                                      # result has the same size

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)
tensor([[ 1.4793, -2.4772,  0.9738],
        [ 2.0328,  1.3981,  1.7509],
        [-0.7931, -0.0291, -0.6803],
        [-1.2944, -0.7352, -0.9346],
        [ 0.5917, -0.5149, -1.8149]])


得到tensor的形状:

In [7]:
print(x.size())

torch.Size([5, 3])


<div class="alert alert-info"><h4>注意</h4><p>``torch.Size`` 返回的是一个tuple</p></div>

Operations


有很多种tensor运算。我们先介绍加法运算。



In [8]:
y = torch.rand(5, 3)
print(x + y)

tensor([[ 1.7113, -1.5490,  1.4009],
        [ 2.4590,  1.6504,  2.6889],
        [-0.3609,  0.4950, -0.3357],
        [-0.5029, -0.3086, -0.1498],
        [ 1.2850, -0.3189, -0.8868]])


另一种着加法的写法


In [9]:
print(torch.add(x, y))

tensor([[ 1.7113, -1.5490,  1.4009],
        [ 2.4590,  1.6504,  2.6889],
        [-0.3609,  0.4950, -0.3357],
        [-0.5029, -0.3086, -0.1498],
        [ 1.2850, -0.3189, -0.8868]])


加法：把输出作为一个变量

In [10]:
result = torch.empty(5, 3)
torch.add(x, y, out=result)
print(result)

tensor([[ 1.7113, -1.5490,  1.4009],
        [ 2.4590,  1.6504,  2.6889],
        [-0.3609,  0.4950, -0.3357],
        [-0.5029, -0.3086, -0.1498],
        [ 1.2850, -0.3189, -0.8868]])


in-place加法

In [11]:
# adds x to y
y.add_(x)
print(y)

tensor([[ 1.7113, -1.5490,  1.4009],
        [ 2.4590,  1.6504,  2.6889],
        [-0.3609,  0.4950, -0.3357],
        [-0.5029, -0.3086, -0.1498],
        [ 1.2850, -0.3189, -0.8868]])


<div class="alert alert-info"><h4>注意</h4><p>任何in-place的运算都会以``_``结尾。
    举例来说：``x.copy_(y)``, ``x.t_()``, 会改变 ``x``。</p></div>

各种类似NumPy的indexing都可以在PyTorch tensor上面使用。


In [12]:
print(x[:, 1])

tensor([-2.4772,  1.3981, -0.0291, -0.7352, -0.5149])


Resizing: 如果你希望resize/reshape一个tensor，可以使用``torch.view``：

In [13]:
x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
print(x.size(), y.size(), z.size())

torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])


如果你有一个只有一个元素的tensor，使用``.item()``方法可以把里面的value变成Python数值。

In [14]:
x = torch.randn(1)
print(x)
print(x.item())

tensor([0.4726])
0.4726296067237854


**更多阅读**


  各种Tensor operations, 包括transposing, indexing, slicing,
  mathematical operations, linear algebra, random numbers在
  `<https://pytorch.org/docs/torch>`.

Numpy和Tensor之间的转化
------------

在Torch Tensor和NumPy array之间相互转化非常容易。

Torch Tensor和NumPy array会共享内存，所以改变其中一项也会改变另一项。

把Torch Tensor转变成NumPy Array


In [15]:
a = torch.ones(5)
print(a)

tensor([1., 1., 1., 1., 1.])


In [16]:
b = a.numpy()
print(b)

[1. 1. 1. 1. 1.]


改变numpy array里面的值。

In [17]:
a.add_(1)
print(a)
print(b)

tensor([2., 2., 2., 2., 2.])
[2. 2. 2. 2. 2.]


把NumPy ndarray转成Torch Tensor

In [18]:
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)


所有CPU上的Tensor都支持转成numpy或者从numpy转成Tensor。

CUDA Tensors
------------

使用``.to``方法，Tensor可以被移动到别的device上。



In [19]:
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!


热身: 用numpy实现两层神经网络
--------------

一个全连接ReLU神经网络，一个隐藏层，没有bias。用来从x预测y，使用L2 Loss。

这一实现完全使用numpy来计算前向神经网络，loss，和反向传播。

numpy ndarray是一个普通的n维array。它不知道任何关于深度学习或者梯度(gradient)的知识，也不知道计算图(computation graph)，只是一种用来计算数学运算的数据结构。



In [1]:
import numpy as np

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y
    h = x.dot(w1)
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2)

    # Compute and print loss
    loss = np.square(y_pred - y).sum()
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    
    # loss = (y_pred - y) ** 2
    grad_y_pred = 2.0 * (y_pred - y)
    # 
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0
    grad_w1 = x.T.dot(grad_h)

    # Update weights
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 29212791.859680884
1 27861586.98489558
2 34070741.420994505
3 42770225.44678885
4 45859550.59920603
5 35862431.1329285
6 19518863.417024966
7 8048288.139459409
8 3347717.0324660894
9 1772114.1522542774
10 1209185.8892528694
11 949339.7414622486
12 790845.6393862172
13 675135.760130713
14 583127.4718763639
15 507284.89575062005
16 443615.99400353734
17 389708.61466754053
18 343682.80844853725
19 304136.5403121205
20 269976.10694006266
21 240329.31577578594
22 214493.05719204637
23 191900.63053206058
24 172062.7829313175
25 154595.3206658974
26 139203.74225135922
27 125591.17942194763
28 113511.94769618416
29 102751.61744513738
30 93145.08968311112
31 84552.30602355736
32 76853.2868236468
33 69939.44085974054
34 63733.76596974168
35 58140.09758160783
36 53094.78620192627
37 48539.80172423748
38 44415.186001896494
39 40679.66631811239
40 37291.587050186616
41 34215.53513794873
42 31418.396812765346
43 28871.25227040959
44 26543.620479985755
45 24419.942981159642
46 22481.450777000402
47

369 0.00047050049515680304
370 0.00044931028324234035
371 0.00042907948336257877
372 0.0004097732178079005
373 0.00039133867969161004
374 0.0003737473932471266
375 0.00035694753693724633
376 0.00034090960513486325
377 0.00032559827953511085
378 0.0003109813461990422
379 0.00029702468671869234
380 0.00028370195169810327
381 0.00027097925760644965
382 0.00025883172076388727
383 0.00024723456828620367
384 0.00023616020322294635
385 0.00022558716776547588
386 0.00021549093386315514
387 0.00020584971824134913
388 0.00019664494132284253
389 0.00018785322459160034
390 0.00017945785686163687
391 0.00017144429723087634
392 0.00016378897728563179
393 0.0001564770498419237
394 0.0001494958829279835
395 0.00014282715786780614
396 0.00013645848311252622
397 0.00013037696809582692
398 0.00012456749124797555
399 0.00011901941795717887
400 0.00011372006206246297
401 0.0001086587141671113
402 0.00010382435506081633
403 9.920722864924933e-05
404 9.479629285873174e-05
405 9.058348473158665e-05
406 8.6559


PyTorch: Tensors
----------------

这次我们使用PyTorch tensors来创建前向神经网络，计算损失，以及反向传播。

一个PyTorch Tensor很像一个numpy的ndarray。但是它和numpy ndarray最大的区别是，PyTorch Tensor可以在CPU或者GPU上运算。如果想要在GPU上运算，就需要把Tensor换成cuda类型。


In [21]:
import torch


dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU



# Randomly initialize weights
w1 = torch.randn(D_in, H, device=device, dtype=dtype)
w2 = torch.randn(H, D_out, device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y
    h = x.mm(w1)
    h_relu = h.clamp(min=0)
    y_pred = h_relu.mm(w2)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum().item()
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)

    # Update weights using gradient descent
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 31704728.0
1 25331164.0
2 22378086.0
3 19262238.0
4 15348289.0
5 11017595.0
6 7356282.0
7 4705923.5
8 3027346.5
9 2012536.375
10 1409662.25
11 1041771.75
12 807321.0625
13 649262.0
14 536533.1875
15 451980.875
16 385983.53125
17 332925.53125
18 289368.1875
19 253030.78125
20 222354.703125
21 196214.3125
22 173766.515625
23 154378.140625
24 137539.375
25 122867.1015625
26 110037.3515625
27 98769.4921875
28 88842.109375
29 80063.15625
30 72279.015625
31 65361.66796875
32 59195.42578125
33 53687.4453125
34 48757.57421875
35 44338.4453125
36 40370.34765625
37 36803.1484375
38 33587.4453125
39 30684.1640625
40 28059.435546875
41 25683.255859375
42 23528.814453125
43 21570.8515625
44 19792.4296875
45 18175.244140625
46 16704.6640625
47 15364.2578125
48 14141.7509765625
49 13026.609375
50 12007.3115234375
51 11075.3896484375
52 10221.8857421875
53 9439.876953125
54 8722.13671875
55 8063.46826171875
56 7458.20703125
57 6901.8876953125
58 6390.34375
59 5919.4794921875
60 5485.79345703125
61 5

375 0.0002844816190190613
376 0.00027625024085864425
377 0.0002687727683223784
378 0.0002608516369946301
379 0.00025311342324130237
380 0.0002469048195052892
381 0.00024049097555689514
382 0.0002342124644201249
383 0.00022811403323430568
384 0.00022231723414734006
385 0.0002166029589716345
386 0.00021077181736472994
387 0.00020510501053649932
388 0.00020020001102238894
389 0.0001948442222783342
390 0.00018990584067068994
391 0.00018529882072471082
392 0.00018070911755785346
393 0.00017650797963142395
394 0.00017214834224432707
395 0.0001683011942077428
396 0.00016451899136882275
397 0.00016050187696237117
398 0.00015686434926465154
399 0.00015321985119953752
400 0.0001501761726103723
401 0.00014639270375482738
402 0.00014274154091253877
403 0.0001396275474689901
404 0.0001364489580737427
405 0.00013346801279112697
406 0.00013024920190218836
407 0.00012755846546497196
408 0.00012532222899608314
409 0.0001224723382620141
410 0.00011974618973908946
411 0.00011740042100427672
412 0.0001144

简单的autograd

In [22]:
# Create tensors.
x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

# Build a computational graph.
y = w * x + b    # y = 2 * x + 3

# Compute gradients.
y.backward()

# Print out the gradients.
print(x.grad)    # x.grad = 2 
print(w.grad)    # w.grad = 1 
print(b.grad)    # b.grad = 1 

tensor(2.)
tensor(1.)
tensor(1.)



PyTorch: Tensor和autograd
-------------------------------

PyTorch的一个重要功能就是autograd，也就是说只要定义了forward pass(前向神经网络)，计算了loss之后，PyTorch可以自动求导计算模型所有参数的梯度。

一个PyTorch的Tensor表示计算图中的一个节点。如果``x``是一个Tensor并且``x.requires_grad=True``那么``x.grad``是另一个储存着``x``当前梯度(相对于一个scalar，常常是loss)的向量。


In [18]:
x = torch.ones(2, 2, requires_grad=True)
y = x + 2
z = y * y * 3
out = z.mean()

out.backward()
print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


In [9]:
import torch

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU

# N 是 batch size; D_in 是 input dimension;
# H 是 hidden dimension; D_out 是 output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# 创建随机的Tensor来保存输入和输出
# 设定requires_grad=False表示在反向传播的时候我们不需要计算gradient
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# 创建随机的Tensor和权重。
# 设置requires_grad=True表示我们希望反向传播的时候计算Tensor的gradient
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)



learning_rate = 1e-6
for t in range(500):
    # 前向传播:通过Tensor预测y；这个和普通的神经网络的前向传播没有任何不同，
    # 但是我们不需要保存网络的中间运算结果，因为我们不需要手动计算反向传播
    z = x.mm(w1)
    h  = z.clamp(min=0)
    y_pred = h.mm(w2)
    
    # 通过前向传播计算loss
    # loss是一个形状为(1，)的Tensor
    # loss.item()可以给我们返回一个loss的scalar
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.item())

    # PyTorch给我们提供了autograd的方法做反向传播。如果一个Tensor的requires_grad=True，
    # backward会自动计算loss相对于每个Tensor的gradient。在backward之后，
    # w1.grad和w2.grad会包含两个loss相对于两个Tensor的gradient信息。
    loss.backward()

    # 我们可以手动做gradient descent(后面我们会介绍自动的方法)。
    # 用torch.no_grad()包含以下statements，因为w1和w2都是requires_grad=True，
    # 但是在更新weights之后我们并不需要再做autograd。
    # 另一种方法是在weight.data和weight.grad.data上做操作，这样就不会对grad产生影响。
    # tensor.data会我们一个tensor，这个tensor和原来的tensor指向相同的内存空间，
    # 但是不会记录计算图的历史。
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        # Manually zero the gradients after updating weights
        w1.grad.zero_()
        w2.grad.zero_()

0 42752492.0
1 40156208.0
2 37741104.0
3 29867704.0
4 18967408.0
5 10095180.0
6 5173225.0
7 2891358.25
8 1865961.875
9 1357967.75
10 1066396.625
11 873625.0625
12 732421.875
13 622521.0625
14 534213.875
15 461792.75
16 401711.0625
17 351394.0625
18 308853.28125
19 272654.8125
20 241640.125
21 214918.625
22 191830.765625
23 171738.015625
24 154174.21875
25 138824.5625
26 125313.109375
27 113372.875
28 102792.4609375
29 93383.3671875
30 84989.6484375
31 77478.09375
32 70738.875
33 64691.28125
34 59244.63671875
35 54329.7265625
36 49885.01953125
37 45856.37109375
38 42202.8203125
39 38879.83203125
40 35853.671875
41 33096.0625
42 30579.59375
43 28280.71484375
44 26175.216796875
45 24247.478515625
46 22477.4375
47 20850.515625
48 19355.130859375
49 17979.01953125
50 16710.525390625
51 15540.9228515625
52 14461.517578125
53 13464.1328125
54 12542.8876953125
55 11690.197265625
56 10902.865234375
57 10173.9384765625
58 9498.19140625
59 8871.5830078125
60 8289.81640625
61 7749.42431640625
62 7


PyTorch: nn
-----------


这次我们使用PyTorch中nn这个库来构建网络。
用PyTorch autograd来构建计算图和计算gradients，
然后PyTorch会帮我们自动计算gradient。




In [22]:
import torch

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)

torch.nn.init.xavier_normal_(model[0].weight)
torch.nn.init.xavier_normal_(model[2].weight)

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Tensor of input data to the Module and it produces
    # a Tensor of output data.
    y_pred = model(x)

    # Compute and print loss. We pass Tensors containing the predicted and true
    # values of y, and the loss function returns a Tensor containing the
    # loss.
    loss = loss_fn(y_pred, y)
    print(t, loss.item())

    # Zero the gradients before running the backward pass.
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Tensors with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Tensor, so
    # we can access its gradients like we did before.
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

0 1600.8101806640625
1 1592.060791015625
2 1583.3863525390625
3 1574.7919921875
4 1566.282470703125
5 1557.848876953125
6 1549.4859619140625
7 1541.19384765625
8 1532.970703125
9 1524.820556640625
10 1516.7374267578125
11 1508.7215576171875
12 1500.7716064453125
13 1492.8951416015625
14 1485.0865478515625
15 1477.343994140625
16 1469.6800537109375
17 1462.0853271484375
18 1454.5545654296875
19 1447.0858154296875
20 1439.676025390625
21 1432.327880859375
22 1425.040283203125
23 1417.8131103515625
24 1410.6444091796875
25 1403.5374755859375
26 1396.4937744140625
27 1389.5074462890625
28 1382.576171875
29 1375.6995849609375
30 1368.8836669921875
31 1362.1265869140625
32 1355.422607421875
33 1348.77001953125
34 1342.171875
35 1335.624755859375
36 1329.12744140625
37 1322.681396484375
38 1316.286865234375
39 1309.9417724609375
40 1303.6439208984375
41 1297.3935546875
42 1291.190185546875
43 1285.03369140625
44 1278.927001953125
45 1272.8646240234375
46 1266.8477783203125
47 1260.87475585937

382 409.8556823730469
383 408.81683349609375
384 407.782470703125
385 406.75274658203125
386 405.725341796875
387 404.7022705078125
388 403.6829833984375
389 402.6669616699219
390 401.6535339355469
391 400.6441650390625
392 399.6383056640625
393 398.635986328125
394 397.6372375488281
395 396.6422424316406
396 395.6510009765625
397 394.66387939453125
398 393.6800537109375
399 392.6969909667969
400 391.717041015625
401 390.7403564453125
402 389.7679138183594
403 388.7990417480469
404 387.8339538574219
405 386.8717346191406
406 385.9132385253906
407 384.9574890136719
408 384.0051574707031
409 383.055419921875
410 382.1091003417969
411 381.16534423828125
412 380.2247314453125
413 379.2875671386719
414 378.35406494140625
415 377.4237976074219
416 376.4972839355469
417 375.57421875
418 374.6548156738281
419 373.738525390625
420 372.8254089355469
421 371.916015625
422 371.0097961425781
423 370.10614013671875
424 369.20599365234375
425 368.3086853027344
426 367.4137268066406
427 366.5216064453

In [16]:
model

Sequential(
  (0): Linear(in_features=1000, out_features=100, bias=True)
  (1): ReLU()
  (2): Linear(in_features=100, out_features=10, bias=True)
)


PyTorch: optim
--------------

这一次我们不再手动更新模型的weights,而是使用optim这个包来帮助我们更新参数。
optim这个package提供了各种不同的模型优化方法，包括SGD+momentum, RMSProp, Adam等等。


In [11]:
import torch

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(reduction='sum')

# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algoriths. The first argument to the Adam constructor tells the
# optimizer which Tensors it should update.
learning_rate = 1e-6
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(x)

    # Compute and print loss.
    loss = loss_fn(y_pred, y)
    print(t, loss.item())

    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables it will update (which are the learnable
    # weights of the model). This is because by default, gradients are
    # accumulated in buffers( i.e, not overwritten) whenever .backward()
    # is called. Checkout docs of torch.autograd.backward for more details.
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to model
    # parameters
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its
    # parameters
    optimizer.step()

0 649.0877075195312
1 648.9208374023438
2 648.75390625
3 648.5870361328125
4 648.4202270507812
5 648.25341796875
6 648.0866088867188
7 647.9198608398438
8 647.753173828125
9 647.5864868164062
10 647.419921875
11 647.25341796875
12 647.0869750976562
13 646.9204711914062
14 646.7540283203125
15 646.587646484375
16 646.4213256835938
17 646.2550659179688
18 646.0888061523438
19 645.9225463867188
20 645.7564086914062
21 645.5902099609375
22 645.4241333007812
23 645.2581787109375
24 645.0922241210938
25 644.9263305664062
26 644.7604370117188
27 644.5946044921875
28 644.4288330078125
29 644.2630615234375
30 644.0973510742188
31 643.931640625
32 643.7660522460938
33 643.6004638671875
34 643.4349365234375
35 643.2694091796875
36 643.1039428710938
37 642.9384765625
38 642.7730712890625
39 642.6077270507812
40 642.4423828125
41 642.277099609375
42 642.11181640625
43 641.9466552734375
44 641.781494140625
45 641.6163330078125
46 641.4512329101562
47 641.2861328125
48 641.1210327148438
49 640.955993


PyTorch: 自定义 nn Modules
--------------------------

我们可以定义一个模型，这个模型继承自nn.Module类。如果需要定义一个比Sequential模型更加复杂的模型，就需要定义nn.Module模型。



In [12]:
import torch


class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we instantiate two nn.Linear modules and assign them as
        member variables.
        """
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        In the forward function we accept a Tensor of input data and we must return
        a Tensor of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Tensors.
        """
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        return y_pred


# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Construct our model by instantiating the class defined above
model = TwoLayerNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

0 639.5822143554688
1 590.610595703125
2 548.9027709960938
3 513.0135498046875
4 481.6715087890625
5 453.6847839355469
6 428.40911865234375
7 405.3240966796875
8 384.060302734375
9 364.4021911621094
10 346.1095275878906
11 329.0256652832031
12 313.0076904296875
13 297.9903564453125
14 283.7117919921875
15 270.14373779296875
16 257.19989013671875
17 244.807861328125
18 232.8994903564453
19 221.48219299316406
20 210.51161193847656
21 199.99395751953125
22 189.93020629882812
23 180.29815673828125
24 171.07766723632812
25 162.28298950195312
26 153.85958862304688
27 145.84361267089844
28 138.21031188964844
29 130.93370056152344
30 124.01283264160156
31 117.42324829101562
32 111.15813446044922
33 105.2000961303711
34 99.51827239990234
35 94.11679077148438
36 88.98484802246094
37 84.1142578125
38 79.496337890625
39 75.1228256225586
40 70.98758697509766
41 67.06743621826172
42 63.3592643737793
43 59.847206115722656
44 56.51774978637695
45 53.37541580200195
46 50.40936279296875
47 47.6128196716

# FizzBuzz

FizzBuzz是一个简单的小游戏。游戏规则如下：从1开始往上数数，当遇到3的倍数的时候，说fizz，当遇到5的倍数，说buzz，当遇到15的倍数，就说fizzbuzz，其他情况下则正常数数。

我们可以写一个简单的小程序来决定要返回正常数值还是fizz, buzz 或者 fizzbuzz。

In [27]:
# One-hot encode the desired outputs: [number, "fizz", "buzz", "fizzbuzz"]
def fizz_buzz_encode(i):
    if   i % 15 == 0: return 3
    elif i % 5  == 0: return 2
    elif i % 3  == 0: return 1
    else:             return 0
    
def fizz_buzz_decode(i, prediction):
    return [str(i), "fizz", "buzz", "fizzbuzz"][prediction]

print(fizz_buzz_decode(1, fizz_buzz_encode(1)))
print(fizz_buzz_decode(2, fizz_buzz_encode(2)))
print(fizz_buzz_decode(5, fizz_buzz_encode(5)))
print(fizz_buzz_decode(12, fizz_buzz_encode(12)))
print(fizz_buzz_decode(15, fizz_buzz_encode(15)))

1
2
buzz
fizz
fizzbuzz


我们首先定义模型的输入与输出(训练数据)

In [28]:
import numpy as np
import torch

NUM_DIGITS = 10

# Represent each input by an array of its binary digits.
def binary_encode(i, num_digits):
    return np.array([i >> d & 1 for d in range(num_digits)])

trX = torch.Tensor([binary_encode(i, NUM_DIGITS) for i in range(101, 2 ** NUM_DIGITS)])
trY = torch.LongTensor([fizz_buzz_encode(i) for i in range(101, 2 ** NUM_DIGITS)])

然后我们用PyTorch定义模型

In [29]:
# Define the model
NUM_HIDDEN = 100
model = torch.nn.Sequential(
    torch.nn.Linear(NUM_DIGITS, NUM_HIDDEN),
    torch.nn.ReLU(),
    torch.nn.Linear(NUM_HIDDEN, 4)
)

- 为了让我们的模型学会FizzBuzz这个游戏，我们需要定义一个损失函数，和一个优化算法。
- 这个优化算法会不断优化（降低）损失函数，使得模型的在该任务上取得尽可能低的损失值。
- 损失值低往往表示我们的模型表现好，损失值高表示我们的模型表现差。
- 由于FizzBuzz游戏本质上是一个分类问题，我们选用Cross Entropyy Loss函数。
- 优化函数我们选用Stochastic Gradient Descent。

In [30]:
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.05)

以下是模型的训练代码

In [31]:
# Start training it
BATCH_SIZE = 128
for epoch in range(10000):
    for start in range(0, len(trX), BATCH_SIZE):
        end = start + BATCH_SIZE
        batchX = trX[start:end]
        batchY = trY[start:end]

        y_pred = model(batchX)
        loss = loss_fn(y_pred, batchY)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    # Find loss on training data
    loss = loss_fn(model(trX), trY).item()
    print('Epoch:', epoch, 'Loss:', loss)

Epoch: 0 Loss: 1.1863759756088257
Epoch: 1 Loss: 1.157894492149353
Epoch: 2 Loss: 1.1498943567276
Epoch: 3 Loss: 1.1466518640518188
Epoch: 4 Loss: 1.1449916362762451
Epoch: 5 Loss: 1.143990159034729
Epoch: 6 Loss: 1.143303394317627
Epoch: 7 Loss: 1.142789602279663
Epoch: 8 Loss: 1.1423760652542114
Epoch: 9 Loss: 1.1420289278030396
Epoch: 10 Loss: 1.141723394393921
Epoch: 11 Loss: 1.141450047492981
Epoch: 12 Loss: 1.1412018537521362
Epoch: 13 Loss: 1.1409754753112793
Epoch: 14 Loss: 1.1407641172409058
Epoch: 15 Loss: 1.1405678987503052
Epoch: 16 Loss: 1.1403822898864746
Epoch: 17 Loss: 1.1402071714401245
Epoch: 18 Loss: 1.1400408744812012
Epoch: 19 Loss: 1.1398837566375732
Epoch: 20 Loss: 1.1397329568862915
Epoch: 21 Loss: 1.139588713645935
Epoch: 22 Loss: 1.1394506692886353
Epoch: 23 Loss: 1.139319896697998
Epoch: 24 Loss: 1.1391912698745728
Epoch: 25 Loss: 1.1390684843063354
Epoch: 26 Loss: 1.1389490365982056
Epoch: 27 Loss: 1.138834834098816
Epoch: 28 Loss: 1.138723611831665
Epoch: 2

Epoch: 235 Loss: 1.1243518590927124
Epoch: 236 Loss: 1.1242260932922363
Epoch: 237 Loss: 1.1241044998168945
Epoch: 238 Loss: 1.123986005783081
Epoch: 239 Loss: 1.12385892868042
Epoch: 240 Loss: 1.123731255531311
Epoch: 241 Loss: 1.1236205101013184
Epoch: 242 Loss: 1.123486876487732
Epoch: 243 Loss: 1.1233718395233154
Epoch: 244 Loss: 1.1232503652572632
Epoch: 245 Loss: 1.1231154203414917
Epoch: 246 Loss: 1.1229811906814575
Epoch: 247 Loss: 1.1228554248809814
Epoch: 248 Loss: 1.1227210760116577
Epoch: 249 Loss: 1.1225866079330444
Epoch: 250 Loss: 1.1224478483200073
Epoch: 251 Loss: 1.1223106384277344
Epoch: 252 Loss: 1.122177004814148
Epoch: 253 Loss: 1.1220247745513916
Epoch: 254 Loss: 1.1218791007995605
Epoch: 255 Loss: 1.1217379570007324
Epoch: 256 Loss: 1.1215908527374268
Epoch: 257 Loss: 1.121443271636963
Epoch: 258 Loss: 1.121286153793335
Epoch: 259 Loss: 1.1211326122283936
Epoch: 260 Loss: 1.1209757328033447
Epoch: 261 Loss: 1.1208267211914062
Epoch: 262 Loss: 1.1206504106521606


Epoch: 477 Loss: 1.0169470310211182
Epoch: 478 Loss: 1.0164058208465576
Epoch: 479 Loss: 1.0156841278076172
Epoch: 480 Loss: 1.015245795249939
Epoch: 481 Loss: 1.0140904188156128
Epoch: 482 Loss: 1.0138357877731323
Epoch: 483 Loss: 1.0129069089889526
Epoch: 484 Loss: 1.0119367837905884
Epoch: 485 Loss: 1.0111967325210571
Epoch: 486 Loss: 1.0102176666259766
Epoch: 487 Loss: 1.009842038154602
Epoch: 488 Loss: 1.0088571310043335
Epoch: 489 Loss: 1.0086297988891602
Epoch: 490 Loss: 1.0071672201156616
Epoch: 491 Loss: 1.0068907737731934
Epoch: 492 Loss: 1.0057222843170166
Epoch: 493 Loss: 1.0049949884414673
Epoch: 494 Loss: 1.0045706033706665
Epoch: 495 Loss: 1.0041097402572632
Epoch: 496 Loss: 1.0033283233642578
Epoch: 497 Loss: 1.0022178888320923
Epoch: 498 Loss: 1.0014426708221436
Epoch: 499 Loss: 1.0003271102905273
Epoch: 500 Loss: 1.000455379486084
Epoch: 501 Loss: 0.9996194243431091
Epoch: 502 Loss: 0.9984729290008545
Epoch: 503 Loss: 1.0009974241256714
Epoch: 504 Loss: 0.997575283050

Epoch: 706 Loss: 0.8089808821678162
Epoch: 707 Loss: 0.8104071617126465
Epoch: 708 Loss: 0.8067026138305664
Epoch: 709 Loss: 0.8088684678077698
Epoch: 710 Loss: 0.8047152757644653
Epoch: 711 Loss: 0.8059985041618347
Epoch: 712 Loss: 0.8031206727027893
Epoch: 713 Loss: 0.8029946684837341
Epoch: 714 Loss: 0.800683319568634
Epoch: 715 Loss: 0.8016392588615417
Epoch: 716 Loss: 0.7986423373222351
Epoch: 717 Loss: 0.7988008856773376
Epoch: 718 Loss: 0.7977583408355713
Epoch: 719 Loss: 0.7947818040847778
Epoch: 720 Loss: 0.7966616153717041
Epoch: 721 Loss: 0.7928966283798218
Epoch: 722 Loss: 0.7956968545913696
Epoch: 723 Loss: 0.7919869422912598
Epoch: 724 Loss: 0.7916623950004578
Epoch: 725 Loss: 0.7900850772857666
Epoch: 726 Loss: 0.7895097732543945
Epoch: 727 Loss: 0.7883893847465515
Epoch: 728 Loss: 0.7864912748336792
Epoch: 729 Loss: 0.7869855165481567
Epoch: 730 Loss: 0.7848015427589417
Epoch: 731 Loss: 0.7843725085258484
Epoch: 732 Loss: 0.7833723425865173
Epoch: 733 Loss: 0.7817403078

Epoch: 942 Loss: 0.5768752694129944
Epoch: 943 Loss: 0.5766890645027161
Epoch: 944 Loss: 0.574909508228302
Epoch: 945 Loss: 0.5747119784355164
Epoch: 946 Loss: 0.5732300281524658
Epoch: 947 Loss: 0.5732151865959167
Epoch: 948 Loss: 0.5713380575180054
Epoch: 949 Loss: 0.5713919401168823
Epoch: 950 Loss: 0.5702218413352966
Epoch: 951 Loss: 0.5690693259239197
Epoch: 952 Loss: 0.5682777166366577
Epoch: 953 Loss: 0.5681558847427368
Epoch: 954 Loss: 0.5673319697380066
Epoch: 955 Loss: 0.5662036538124084
Epoch: 956 Loss: 0.5652552843093872
Epoch: 957 Loss: 0.5641111135482788
Epoch: 958 Loss: 0.5636101961135864
Epoch: 959 Loss: 0.5629693865776062
Epoch: 960 Loss: 0.5614902377128601
Epoch: 961 Loss: 0.5607122182846069
Epoch: 962 Loss: 0.5602133870124817
Epoch: 963 Loss: 0.559083878993988
Epoch: 964 Loss: 0.558386504650116
Epoch: 965 Loss: 0.557803213596344
Epoch: 966 Loss: 0.5569421648979187
Epoch: 967 Loss: 0.5554912090301514
Epoch: 968 Loss: 0.5554984211921692
Epoch: 969 Loss: 0.5535005331039

Epoch: 1171 Loss: 0.41417765617370605
Epoch: 1172 Loss: 0.4141392111778259
Epoch: 1173 Loss: 0.4130527079105377
Epoch: 1174 Loss: 0.4125176966190338
Epoch: 1175 Loss: 0.41161713004112244
Epoch: 1176 Loss: 0.4111669659614563
Epoch: 1177 Loss: 0.41089871525764465
Epoch: 1178 Loss: 0.4098060727119446
Epoch: 1179 Loss: 0.4094087779521942
Epoch: 1180 Loss: 0.4089564383029938
Epoch: 1181 Loss: 0.40804198384284973
Epoch: 1182 Loss: 0.40796804428100586
Epoch: 1183 Loss: 0.40726423263549805
Epoch: 1184 Loss: 0.4063868820667267
Epoch: 1185 Loss: 0.4060220718383789
Epoch: 1186 Loss: 0.4052112400531769
Epoch: 1187 Loss: 0.4045411944389343
Epoch: 1188 Loss: 0.4041464626789093
Epoch: 1189 Loss: 0.403473824262619
Epoch: 1190 Loss: 0.4029342234134674
Epoch: 1191 Loss: 0.40284866094589233
Epoch: 1192 Loss: 0.4018949866294861
Epoch: 1193 Loss: 0.40180134773254395
Epoch: 1194 Loss: 0.4005822539329529
Epoch: 1195 Loss: 0.40030086040496826
Epoch: 1196 Loss: 0.3991815745830536
Epoch: 1197 Loss: 0.3986361026

Epoch: 1402 Loss: 0.29939204454421997
Epoch: 1403 Loss: 0.29862555861473083
Epoch: 1404 Loss: 0.29839155077934265
Epoch: 1405 Loss: 0.2978818416595459
Epoch: 1406 Loss: 0.29752880334854126
Epoch: 1407 Loss: 0.29764828085899353
Epoch: 1408 Loss: 0.296686589717865
Epoch: 1409 Loss: 0.2964232861995697
Epoch: 1410 Loss: 0.29574665427207947
Epoch: 1411 Loss: 0.2957584857940674
Epoch: 1412 Loss: 0.29485228657722473
Epoch: 1413 Loss: 0.29470449686050415
Epoch: 1414 Loss: 0.2943603992462158
Epoch: 1415 Loss: 0.29390591382980347
Epoch: 1416 Loss: 0.29357513785362244
Epoch: 1417 Loss: 0.2933928668498993
Epoch: 1418 Loss: 0.2927872836589813
Epoch: 1419 Loss: 0.2922026216983795
Epoch: 1420 Loss: 0.29222404956817627
Epoch: 1421 Loss: 0.29159119725227356
Epoch: 1422 Loss: 0.2912271320819855
Epoch: 1423 Loss: 0.29068514704704285
Epoch: 1424 Loss: 0.29047834873199463
Epoch: 1425 Loss: 0.2899259626865387
Epoch: 1426 Loss: 0.2897847294807434
Epoch: 1427 Loss: 0.2894900441169739
Epoch: 1428 Loss: 0.28894

Epoch: 1627 Loss: 0.22674560546875
Epoch: 1628 Loss: 0.22668349742889404
Epoch: 1629 Loss: 0.2263282835483551
Epoch: 1630 Loss: 0.2261933535337448
Epoch: 1631 Loss: 0.22580178081989288
Epoch: 1632 Loss: 0.22570860385894775
Epoch: 1633 Loss: 0.22543734312057495
Epoch: 1634 Loss: 0.2251395583152771
Epoch: 1635 Loss: 0.22485210001468658
Epoch: 1636 Loss: 0.2246858775615692
Epoch: 1637 Loss: 0.22418570518493652
Epoch: 1638 Loss: 0.22408628463745117
Epoch: 1639 Loss: 0.22388315200805664
Epoch: 1640 Loss: 0.22342777252197266
Epoch: 1641 Loss: 0.22338324785232544
Epoch: 1642 Loss: 0.2229192554950714
Epoch: 1643 Loss: 0.22274376451969147
Epoch: 1644 Loss: 0.22254346311092377
Epoch: 1645 Loss: 0.2222624570131302
Epoch: 1646 Loss: 0.22198820114135742
Epoch: 1647 Loss: 0.22187311947345734
Epoch: 1648 Loss: 0.22135613858699799
Epoch: 1649 Loss: 0.2214001566171646
Epoch: 1650 Loss: 0.22109432518482208
Epoch: 1651 Loss: 0.22070461511611938
Epoch: 1652 Loss: 0.22048650681972504
Epoch: 1653 Loss: 0.22

Epoch: 1853 Loss: 0.17884890735149384
Epoch: 1854 Loss: 0.17870470881462097
Epoch: 1855 Loss: 0.1785806566476822
Epoch: 1856 Loss: 0.17833738029003143
Epoch: 1857 Loss: 0.1781667023897171
Epoch: 1858 Loss: 0.17819958925247192
Epoch: 1859 Loss: 0.17786657810211182
Epoch: 1860 Loss: 0.17756561934947968
Epoch: 1861 Loss: 0.1775447428226471
Epoch: 1862 Loss: 0.17735233902931213
Epoch: 1863 Loss: 0.17718777060508728
Epoch: 1864 Loss: 0.17704997956752777
Epoch: 1865 Loss: 0.17674113810062408
Epoch: 1866 Loss: 0.17658349871635437
Epoch: 1867 Loss: 0.17650727927684784
Epoch: 1868 Loss: 0.17637978494167328
Epoch: 1869 Loss: 0.17611035704612732
Epoch: 1870 Loss: 0.17586664855480194
Epoch: 1871 Loss: 0.17589786648750305
Epoch: 1872 Loss: 0.17563015222549438
Epoch: 1873 Loss: 0.17554159462451935
Epoch: 1874 Loss: 0.17527280747890472
Epoch: 1875 Loss: 0.17503415048122406
Epoch: 1876 Loss: 0.17505910992622375
Epoch: 1877 Loss: 0.17476093769073486
Epoch: 1878 Loss: 0.17462843656539917
Epoch: 1879 Los

Epoch: 2084 Loss: 0.14500771462917328
Epoch: 2085 Loss: 0.14479458332061768
Epoch: 2086 Loss: 0.14463625848293304
Epoch: 2087 Loss: 0.14454081654548645
Epoch: 2088 Loss: 0.1444566398859024
Epoch: 2089 Loss: 0.14431659877300262
Epoch: 2090 Loss: 0.14415238797664642
Epoch: 2091 Loss: 0.14405903220176697
Epoch: 2092 Loss: 0.1438966691493988
Epoch: 2093 Loss: 0.1437695324420929
Epoch: 2094 Loss: 0.14366938173770905
Epoch: 2095 Loss: 0.1435183435678482
Epoch: 2096 Loss: 0.14344394207000732
Epoch: 2097 Loss: 0.1433134824037552
Epoch: 2098 Loss: 0.1431715190410614
Epoch: 2099 Loss: 0.1430383324623108
Epoch: 2100 Loss: 0.14290861785411835
Epoch: 2101 Loss: 0.14282990992069244
Epoch: 2102 Loss: 0.14267833530902863
Epoch: 2103 Loss: 0.14252060651779175
Epoch: 2104 Loss: 0.14244794845581055
Epoch: 2105 Loss: 0.1423359513282776
Epoch: 2106 Loss: 0.1421981006860733
Epoch: 2107 Loss: 0.142067551612854
Epoch: 2108 Loss: 0.14189712703227997
Epoch: 2109 Loss: 0.14187049865722656
Epoch: 2110 Loss: 0.141

Epoch: 2302 Loss: 0.12094835191965103
Epoch: 2303 Loss: 0.12082891911268234
Epoch: 2304 Loss: 0.12072534114122391
Epoch: 2305 Loss: 0.12065427005290985
Epoch: 2306 Loss: 0.12057747691869736
Epoch: 2307 Loss: 0.12043754756450653
Epoch: 2308 Loss: 0.12038204073905945
Epoch: 2309 Loss: 0.12027539312839508
Epoch: 2310 Loss: 0.1201474666595459
Epoch: 2311 Loss: 0.12011440843343735
Epoch: 2312 Loss: 0.12001331150531769
Epoch: 2313 Loss: 0.11988004297018051
Epoch: 2314 Loss: 0.11981222033500671
Epoch: 2315 Loss: 0.11968223005533218
Epoch: 2316 Loss: 0.11963416635990143
Epoch: 2317 Loss: 0.11959797888994217
Epoch: 2318 Loss: 0.1194261908531189
Epoch: 2319 Loss: 0.11933813244104385
Epoch: 2320 Loss: 0.11927976459264755
Epoch: 2321 Loss: 0.11914707720279694
Epoch: 2322 Loss: 0.11906152963638306
Epoch: 2323 Loss: 0.11897692829370499
Epoch: 2324 Loss: 0.11889945715665817
Epoch: 2325 Loss: 0.11876726150512695
Epoch: 2326 Loss: 0.11878830939531326
Epoch: 2327 Loss: 0.11858037859201431
Epoch: 2328 Lo

Epoch: 2522 Loss: 0.10196613520383835
Epoch: 2523 Loss: 0.10189534723758698
Epoch: 2524 Loss: 0.10180655121803284
Epoch: 2525 Loss: 0.10179479420185089
Epoch: 2526 Loss: 0.10169700533151627
Epoch: 2527 Loss: 0.10156562179327011
Epoch: 2528 Loss: 0.10150012373924255
Epoch: 2529 Loss: 0.1015271246433258
Epoch: 2530 Loss: 0.10137586295604706
Epoch: 2531 Loss: 0.10125355422496796
Epoch: 2532 Loss: 0.10121708363294601
Epoch: 2533 Loss: 0.1011032685637474
Epoch: 2534 Loss: 0.101102314889431
Epoch: 2535 Loss: 0.1009720116853714
Epoch: 2536 Loss: 0.10090232640504837
Epoch: 2537 Loss: 0.10086604952812195
Epoch: 2538 Loss: 0.10076442360877991
Epoch: 2539 Loss: 0.10071643441915512
Epoch: 2540 Loss: 0.10058549046516418
Epoch: 2541 Loss: 0.10049436241388321
Epoch: 2542 Loss: 0.10044986009597778
Epoch: 2543 Loss: 0.10034430027008057
Epoch: 2544 Loss: 0.10030896216630936
Epoch: 2545 Loss: 0.10021300613880157
Epoch: 2546 Loss: 0.10015030950307846
Epoch: 2547 Loss: 0.10009554773569107
Epoch: 2548 Loss:

Epoch: 2747 Loss: 0.08582881093025208
Epoch: 2748 Loss: 0.0857662782073021
Epoch: 2749 Loss: 0.08567773550748825
Epoch: 2750 Loss: 0.08563879132270813
Epoch: 2751 Loss: 0.08557826280593872
Epoch: 2752 Loss: 0.08553162217140198
Epoch: 2753 Loss: 0.08547022938728333
Epoch: 2754 Loss: 0.08538629859685898
Epoch: 2755 Loss: 0.08532965928316116
Epoch: 2756 Loss: 0.08527617156505585
Epoch: 2757 Loss: 0.08518342673778534
Epoch: 2758 Loss: 0.0851394534111023
Epoch: 2759 Loss: 0.08507613837718964
Epoch: 2760 Loss: 0.08500506728887558
Epoch: 2761 Loss: 0.08496491611003876
Epoch: 2762 Loss: 0.08490445464849472
Epoch: 2763 Loss: 0.08489058166742325
Epoch: 2764 Loss: 0.08478289842605591
Epoch: 2765 Loss: 0.08472540974617004
Epoch: 2766 Loss: 0.08464295417070389
Epoch: 2767 Loss: 0.08458083868026733
Epoch: 2768 Loss: 0.08453255146741867
Epoch: 2769 Loss: 0.08445384353399277
Epoch: 2770 Loss: 0.08441763371229172
Epoch: 2771 Loss: 0.08430495113134384
Epoch: 2772 Loss: 0.0842999517917633
Epoch: 2773 Los

Epoch: 2965 Loss: 0.07356490939855576
Epoch: 2966 Loss: 0.07354529947042465
Epoch: 2967 Loss: 0.07350584864616394
Epoch: 2968 Loss: 0.07340973615646362
Epoch: 2969 Loss: 0.07335770130157471
Epoch: 2970 Loss: 0.07330495119094849
Epoch: 2971 Loss: 0.07325077801942825
Epoch: 2972 Loss: 0.07318269461393356
Epoch: 2973 Loss: 0.07317445427179337
Epoch: 2974 Loss: 0.07309599965810776
Epoch: 2975 Loss: 0.07305600494146347
Epoch: 2976 Loss: 0.07301383465528488
Epoch: 2977 Loss: 0.07298077642917633
Epoch: 2978 Loss: 0.07291411608457565
Epoch: 2979 Loss: 0.0728539228439331
Epoch: 2980 Loss: 0.07278301566839218
Epoch: 2981 Loss: 0.07276879996061325
Epoch: 2982 Loss: 0.07269257307052612
Epoch: 2983 Loss: 0.07265006005764008
Epoch: 2984 Loss: 0.07264236360788345
Epoch: 2985 Loss: 0.07255671918392181
Epoch: 2986 Loss: 0.07255373150110245
Epoch: 2987 Loss: 0.07245618104934692
Epoch: 2988 Loss: 0.07243408262729645
Epoch: 2989 Loss: 0.07237093895673752
Epoch: 2990 Loss: 0.0723646730184555
Epoch: 2991 Lo

Epoch: 3185 Loss: 0.06357269734144211
Epoch: 3186 Loss: 0.06353369355201721
Epoch: 3187 Loss: 0.06349340081214905
Epoch: 3188 Loss: 0.06345989555120468
Epoch: 3189 Loss: 0.06341288238763809
Epoch: 3190 Loss: 0.0634339302778244
Epoch: 3191 Loss: 0.0633382797241211
Epoch: 3192 Loss: 0.06330052018165588
Epoch: 3193 Loss: 0.06323433667421341
Epoch: 3194 Loss: 0.06322506815195084
Epoch: 3195 Loss: 0.06316656619310379
Epoch: 3196 Loss: 0.06319136172533035
Epoch: 3197 Loss: 0.06311500072479248
Epoch: 3198 Loss: 0.0630607008934021
Epoch: 3199 Loss: 0.06301185488700867
Epoch: 3200 Loss: 0.06297914683818817
Epoch: 3201 Loss: 0.06294315308332443
Epoch: 3202 Loss: 0.06290165334939957
Epoch: 3203 Loss: 0.06283605098724365
Epoch: 3204 Loss: 0.06281360238790512
Epoch: 3205 Loss: 0.06276153773069382
Epoch: 3206 Loss: 0.06276555359363556
Epoch: 3207 Loss: 0.06268615275621414
Epoch: 3208 Loss: 0.06269682198762894
Epoch: 3209 Loss: 0.06260818988084793
Epoch: 3210 Loss: 0.0625920370221138
Epoch: 3211 Loss

Epoch: 3414 Loss: 0.05518385395407677
Epoch: 3415 Loss: 0.05515581741929054
Epoch: 3416 Loss: 0.05510089918971062
Epoch: 3417 Loss: 0.05507148429751396
Epoch: 3418 Loss: 0.055032409727573395
Epoch: 3419 Loss: 0.055004190653562546
Epoch: 3420 Loss: 0.05498422682285309
Epoch: 3421 Loss: 0.05494361370801926
Epoch: 3422 Loss: 0.05491350591182709
Epoch: 3423 Loss: 0.05486806482076645
Epoch: 3424 Loss: 0.054837729781866074
Epoch: 3425 Loss: 0.05482647940516472
Epoch: 3426 Loss: 0.05478033423423767
Epoch: 3427 Loss: 0.05474850907921791
Epoch: 3428 Loss: 0.05471814051270485
Epoch: 3429 Loss: 0.054683517664670944
Epoch: 3430 Loss: 0.05466316640377045
Epoch: 3431 Loss: 0.054595593363046646
Epoch: 3432 Loss: 0.054579559713602066
Epoch: 3433 Loss: 0.054536182433366776
Epoch: 3434 Loss: 0.05453020706772804
Epoch: 3435 Loss: 0.05447082221508026
Epoch: 3436 Loss: 0.05447249487042427
Epoch: 3437 Loss: 0.05440276488661766
Epoch: 3438 Loss: 0.05441807582974434
Epoch: 3439 Loss: 0.05435502529144287
Epoch

Epoch: 3629 Loss: 0.04869323596358299
Epoch: 3630 Loss: 0.04866374284029007
Epoch: 3631 Loss: 0.04861191287636757
Epoch: 3632 Loss: 0.0486217699944973
Epoch: 3633 Loss: 0.04858650639653206
Epoch: 3634 Loss: 0.048589807003736496
Epoch: 3635 Loss: 0.0485529787838459
Epoch: 3636 Loss: 0.0485168918967247
Epoch: 3637 Loss: 0.04846867918968201
Epoch: 3638 Loss: 0.04847374185919762
Epoch: 3639 Loss: 0.04844004288315773
Epoch: 3640 Loss: 0.04842530936002731
Epoch: 3641 Loss: 0.048388298600912094
Epoch: 3642 Loss: 0.0483526811003685
Epoch: 3643 Loss: 0.04832077398896217
Epoch: 3644 Loss: 0.04829636588692665
Epoch: 3645 Loss: 0.048253320157527924
Epoch: 3646 Loss: 0.048239342868328094
Epoch: 3647 Loss: 0.048190582543611526
Epoch: 3648 Loss: 0.048195064067840576
Epoch: 3649 Loss: 0.048155881464481354
Epoch: 3650 Loss: 0.0481334887444973
Epoch: 3651 Loss: 0.048104576766490936
Epoch: 3652 Loss: 0.04809153452515602
Epoch: 3653 Loss: 0.04806551709771156
Epoch: 3654 Loss: 0.04802782088518143
Epoch: 36

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.



Traceback (most recent call last):
  File "/Users/zeweichu/anaconda3/envs/py36/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-31-c2cf5b37f087>", line 9, in <module>
    y_pred = model(batchX)
  File "/Users/zeweichu/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/zeweichu/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/Users/zeweichu/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/zeweichu/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 67, in forward
    return F.linear(input, self.weight, self.bias)
  File "/Users/zeweichu/anaconda3/en

KeyboardInterrupt: 

最后我们用训练好的模型尝试在1到100这些数字上玩FizzBuzz游戏

In [None]:
# Output now
testX = torch.Tensor([binary_encode(i, NUM_DIGITS) for i in range(1, 101)])
with torch.no_grad():
    testY = model(testX)
predictions = zip(range(1, 101), list(testY.max(1)[1].data.tolist()))

print([fizz_buzz_decode(i, x) for (i, x) in predictions])

In [None]:
print(np.sum(testY.max(1)[1].numpy() == np.array([fizz_buzz_encode(i) for i in range(1,101)])))
testY.max(1)[1].numpy() == np.array([fizz_buzz_encode(i) for i in range(1,101)])