# 第一课


什么是PyTorch?
================

PyTorch是一个基于Python的科学计算库，它有以下特点:

- 类似于NumPy，但是它可以使用GPU
- 可以用它定义深度学习模型，可以灵活地进行深度学习模型的训练和使用

Tensors
---------------


Tensor类似与NumPy的ndarray，唯一的区别是Tensor可以在GPU上加速运算。


In [118]:
import torch

构造一个未初始化的5x3矩阵:

In [119]:
x = torch.empty(5,3)
x

tensor([[7.7986e+19, 4.5691e-41, 3.4396e-37],
        [0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00]])

构建一个随机初始化的矩阵:

In [120]:
x = torch.rand(5,3)
x

tensor([[0.3148, 0.5820, 0.6892],
        [0.5551, 0.1520, 0.9265],
        [0.3510, 0.5425, 0.8213],
        [0.8385, 0.6578, 0.8176],
        [0.1967, 0.7105, 0.1195]])

构建一个全部为0，类型为long的矩阵:

In [121]:
x = torch.zeros(5,3,dtype=torch.long)
x

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

In [122]:
x = torch.zeros(5,3).long()
x.dtype

torch.int64

从数据直接直接构建tensor:

In [123]:
x = torch.tensor([5.5,3])
x

tensor([5.5000, 3.0000])

也可以从一个已有的tensor构建一个tensor。这些方法会重用原来tensor的特征，例如，数据类型，除非提供新的数据。

In [124]:
x = x.new_ones(5,3, dtype=torch.double)
x

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)

In [125]:
x = torch.randn_like(x, dtype=torch.float)
x

tensor([[ 1.8381, -0.0824,  0.0219],
        [-1.4915, -2.1107,  0.9779],
        [ 1.4244, -0.7605, -0.9730],
        [-0.1420, -0.8863, -2.2631],
        [-0.6980, -1.0067, -0.0095]])

得到tensor的形状:

In [126]:
x.shape

torch.Size([5, 3])

<div class="alert alert-info"><h4>注意</h4><p>``torch.Size`` 返回的是一个tuple</p></div>

Operations


有很多种tensor运算。我们先介绍加法运算。



In [127]:
y = torch.rand(5,3)
y

tensor([[0.4954, 0.9024, 0.9164],
        [0.3689, 0.4199, 0.9162],
        [0.9710, 0.8278, 0.7800],
        [0.3130, 0.7839, 0.8759],
        [0.9094, 0.4873, 0.5652]])

In [128]:
x + y

tensor([[ 2.3335,  0.8199,  0.9383],
        [-1.1226, -1.6908,  1.8941],
        [ 2.3954,  0.0672, -0.1930],
        [ 0.1710, -0.1023, -1.3872],
        [ 0.2114, -0.5194,  0.5557]])

另一种着加法的写法


In [129]:
torch.add(x, y)

tensor([[ 2.3335,  0.8199,  0.9383],
        [-1.1226, -1.6908,  1.8941],
        [ 2.3954,  0.0672, -0.1930],
        [ 0.1710, -0.1023, -1.3872],
        [ 0.2114, -0.5194,  0.5557]])

加法：把输出作为一个变量

In [130]:
result = torch.empty(5,3)
torch.add(x, y, out=result)
# result = x + y
result

tensor([[ 2.3335,  0.8199,  0.9383],
        [-1.1226, -1.6908,  1.8941],
        [ 2.3954,  0.0672, -0.1930],
        [ 0.1710, -0.1023, -1.3872],
        [ 0.2114, -0.5194,  0.5557]])

in-place加法

In [131]:
y.add_(x)
y

tensor([[ 2.3335,  0.8199,  0.9383],
        [-1.1226, -1.6908,  1.8941],
        [ 2.3954,  0.0672, -0.1930],
        [ 0.1710, -0.1023, -1.3872],
        [ 0.2114, -0.5194,  0.5557]])

<div class="alert alert-info"><h4>注意</h4><p>任何in-place的运算都会以``_``结尾。
    举例来说：``x.copy_(y)``, ``x.t_()``, 会改变 ``x``。</p></div>

各种类似NumPy的indexing都可以在PyTorch tensor上面使用。


In [132]:
x[1:, 1:]

tensor([[-2.1107,  0.9779],
        [-0.7605, -0.9730],
        [-0.8863, -2.2631],
        [-1.0067, -0.0095]])

Resizing: 如果你希望resize/reshape一个tensor，可以使用``torch.view``：

In [133]:
x = torch.randn(4,4)
y = x.view(16)
z = x.view(-1,8)
z

tensor([[ 1.9573e+00,  1.2257e-01,  9.7921e-01,  1.3878e+00,  3.3277e-01,
         -1.4437e-03,  7.8901e-01,  1.3432e+00],
        [ 1.6884e+00,  7.9933e-01, -2.0388e+00,  1.9692e+00, -1.8358e-01,
          6.7984e-01,  1.0780e-01,  9.6673e-01]])

如果你有一个只有一个元素的tensor，使用``.item()``方法可以把里面的value变成Python数值。

In [134]:
x = torch.randn(1)
x

tensor([-0.7729])

In [135]:
x.item()

-0.7728920578956604

In [136]:
z.transpose(1,0)

tensor([[ 1.9573e+00,  1.6884e+00],
        [ 1.2257e-01,  7.9933e-01],
        [ 9.7921e-01, -2.0388e+00],
        [ 1.3878e+00,  1.9692e+00],
        [ 3.3277e-01, -1.8358e-01],
        [-1.4437e-03,  6.7984e-01],
        [ 7.8901e-01,  1.0780e-01],
        [ 1.3432e+00,  9.6673e-01]])

**更多阅读**


  各种Tensor operations, 包括transposing, indexing, slicing,
  mathematical operations, linear algebra, random numbers在
  `<https://pytorch.org/docs/torch>`.

Numpy和Tensor之间的转化
------------

在Torch Tensor和NumPy array之间相互转化非常容易。

Torch Tensor和NumPy array会共享内存，所以改变其中一项也会改变另一项。

把Torch Tensor转变成NumPy Array


In [137]:
a = torch.ones(5)
a

tensor([1., 1., 1., 1., 1.])

In [138]:
b = a.numpy()
b

array([1., 1., 1., 1., 1.], dtype=float32)

改变numpy array里面的值。

In [139]:
b[1] = 2
b

array([1., 2., 1., 1., 1.], dtype=float32)

In [140]:
a

tensor([1., 2., 1., 1., 1.])

把NumPy ndarray转成Torch Tensor

In [141]:
import numpy as np

In [142]:
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)

[2. 2. 2. 2. 2.]


In [143]:
b

tensor([2., 2., 2., 2., 2.], dtype=torch.float64)

所有CPU上的Tensor都支持转成numpy或者从numpy转成Tensor。

CUDA Tensors
------------

使用``.to``方法，Tensor可以被移动到别的device上。



In [144]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    y = torch.ones_like(x, device=device)
    x = x.to(device)
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))
    

In [145]:
y.to("cpu").data.numpy()
y.cpu().data.numpy()

array([ 1.95728540e+00,  1.22565255e-01,  9.79205191e-01,  1.38781381e+00,
        3.32774490e-01, -1.44365639e-03,  7.89014876e-01,  1.34321415e+00,
        1.68836606e+00,  7.99332917e-01, -2.03880811e+00,  1.96919990e+00,
       -1.83580667e-01,  6.79840565e-01,  1.07799731e-01,  9.66726363e-01],
      dtype=float32)

In [146]:
model = model.cuda()


AssertionError: 
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx


热身: 用numpy实现两层神经网络
--------------

一个全连接ReLU神经网络，一个隐藏层，没有bias。用来从x预测y，使用L2 Loss。
- $h = W_1X$
- $a = max(0, h)$
- $y_{hat} = W_2a$

这一实现完全使用numpy来计算前向神经网络，loss，和反向传播。
- forward pass
- loss
- backward pass

numpy ndarray是一个普通的n维array。它不知道任何关于深度学习或者梯度(gradient)的知识，也不知道计算图(computation graph)，只是一种用来计算数学运算的数据结构。



In [None]:
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    h = x.dot(w1) # N * H
    h_relu = np.maximum(h, 0) # N * H
    y_pred = h_relu.dot(w2) # N * D_out
    
    # compute loss
    loss = np.square(y_pred - y).sum()
    print(it, loss)
    
    # Backward pass
    # compute the gradient
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h<0] = 0
    grad_w1 = x.T.dot(grad_h)
    
    # update weights of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2


PyTorch: Tensors
----------------

这次我们使用PyTorch tensors来创建前向神经网络，计算损失，以及反向传播。

一个PyTorch Tensor很像一个numpy的ndarray。但是它和numpy ndarray最大的区别是，PyTorch Tensor可以在CPU或者GPU上运算。如果想要在GPU上运算，就需要把Tensor换成cuda类型。


In [None]:
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

w1 = torch.randn(D_in, H)
w2 = torch.randn(H, D_out)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    h = x.mm(w1) # N * H
    h_relu = h.clamp(min=0) # N * H
    y_pred = h_relu.mm(w2) # N * D_out
    
    # compute loss
    loss = (y_pred - y).pow(2).sum().item()
    print(it, loss)
    
    # Backward pass
    # compute the gradient
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h<0] = 0
    grad_w1 = x.t().mm(grad_h)
    
    # update weights of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

简单的autograd

In [None]:
x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

y = w*x + b # y = 2*1+3

y.backward()

# dy / dw = x
print(w.grad)
print(x.grad)
print(b.grad)



PyTorch: Tensor和autograd
-------------------------------

PyTorch的一个重要功能就是autograd，也就是说只要定义了forward pass(前向神经网络)，计算了loss之后，PyTorch可以自动求导计算模型所有参数的梯度。

一个PyTorch的Tensor表示计算图中的一个节点。如果``x``是一个Tensor并且``x.requires_grad=True``那么``x.grad``是另一个储存着``x``当前梯度(相对于一个scalar，常常是loss)的向量。


In [None]:
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

w1 = torch.randn(D_in, H, requires_grad=True)
w2 = torch.randn(H, D_out, requires_grad=True)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
    
    # compute loss
    loss = (y_pred - y).pow(2).sum() # computation graph
    print(it, loss.item())
    
    # Backward pass
    loss.backward()
    
    # update weights of w1 and w2
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        w1.grad.zero_()
        w2.grad.zero_()


PyTorch: nn
-----------


这次我们使用PyTorch中nn这个库来构建网络。
用PyTorch autograd来构建计算图和计算gradients，
然后PyTorch会帮我们自动计算gradient。




In [147]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H, bias=False), # w_1 * x + b_1
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out, bias=False),
)

torch.nn.init.normal_(model[0].weight)
torch.nn.init.normal_(model[2].weight)

# model = model.cuda()

loss_fn = nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    print(it, loss.item())
    
    # Backward pass
    loss.backward()
    
    # update weights of w1 and w2
    with torch.no_grad():
        for param in model.parameters(): # param (tensor, grad)
            param -= learning_rate * param.grad
            
    model.zero_grad()

0 25072262.0
1 19905086.0
2 17571300.0
3 15858608.0
4 13857529.0
5 11411684.0
6 8789994.0
7 6407556.0
8 4491858.5
9 3104328.25
10 2150641.75
11 1519154.125
12 1102484.875
13 826666.9375
14 640141.8125
15 510812.40625
16 418157.65625
17 349592.40625
18 297217.375
19 256058.125
20 222880.921875
21 195608.453125
22 172814.71875
23 153456.234375
24 136853.796875
25 122502.796875
26 110056.0234375
27 99151.3515625
28 89545.453125
29 81047.703125
30 73501.484375
31 66781.9765625
32 60774.3984375
33 55392.13671875
34 50572.57421875
35 46234.8046875
36 42320.3671875
37 38781.29296875
38 35577.09375
39 32669.32421875
40 30028.361328125
41 27629.3828125
42 25443.94921875
43 23450.453125
44 21629.78125
45 19964.931640625
46 18440.498046875
47 17044.193359375
48 15763.48046875
49 14587.8291015625
50 13507.2744140625
51 12513.7333984375
52 11599.1669921875
53 10759.337890625
54 9985.216796875
55 9271.345703125
56 8612.20703125
57 8003.66455078125
58 7441.3701171875
59 6921.11865234375
60 6439.79296

In [None]:
model[0].weight


PyTorch: optim
--------------

这一次我们不再手动更新模型的weights,而是使用optim这个包来帮助我们更新参数。
optim这个package提供了各种不同的模型优化方法，包括SGD+momentum, RMSProp, Adam等等。


In [148]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H, bias=False), # w_1 * x + b_1
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out, bias=False),
)

torch.nn.init.normal_(model[0].weight)
torch.nn.init.normal_(model[2].weight)

# model = model.cuda()

loss_fn = nn.MSELoss(reduction='sum')
# learning_rate = 1e-4
# optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

learning_rate = 1e-6
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

i = 0 
for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    
    if (i % 5 ==0 ):
        print(it, loss.item())
    i = i +1
    optimizer.zero_grad()
    # Backward pass
    loss.backward()
    
    # update model parameters
    optimizer.step()


0 29080318.0
5 10436923.0
10 2154657.0
15 532390.0
20 230546.546875
25 124364.1953125
30 73062.9453125
35 45142.203125
40 28932.646484375
45 19078.97265625
50 12885.3955078125
55 8874.8388671875
60 6213.01416015625
65 4410.171875
70 3168.23388671875
75 2300.213134765625
80 1685.985107421875
85 1246.32373046875
90 928.1376342773438
95 695.8846435546875
100 524.798095703125
105 397.8555908203125
110 303.046875
115 231.83480834960938
120 178.02894592285156
125 137.17715454101562
130 106.04219818115234
135 82.20759582519531
140 63.885318756103516
145 49.7601318359375
150 38.83662033081055
155 30.367061614990234
160 23.784595489501953
165 18.656888961791992
170 14.656617164611816
175 11.527595520019531
180 9.076955795288086
185 7.154776573181152
190 5.6459269523620605
195 4.4589433670043945
200 3.524202585220337
205 2.787775993347168
210 2.206611156463623
215 1.7478580474853516
220 1.3851509094238281
225 1.0984537601470947
230 0.8714742660522461
235 0.6917911171913147
240 0.5493975877761841


PyTorch: 自定义 nn Modules
--------------------------

我们可以定义一个模型，这个模型继承自nn.Module类。如果需要定义一个比Sequential模型更加复杂的模型，就需要定义nn.Module模型。



In [150]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(TwoLayerNet, self).__init__()
        # define the model architecture
        self.linear1 = torch.nn.Linear(D_in, H, bias=False)
        self.linear2 = torch.nn.Linear(H, D_out, bias=False)
    
    def forward(self, x):
        y_pred = self.linear2(self.linear1(x).clamp(min=0))
        return y_pred

model = TwoLayerNet(D_in, H, D_out)
loss_fn = nn.MSELoss(reduction='sum')
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

i = 0
for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    if (i % 5 ==0 ):
        print(it, loss.item())
    i = i +1

    optimizer.zero_grad()
    # Backward pass
    loss.backward()
    
    # update model parameters
    optimizer.step()


0 696.9291381835938
5 615.1718139648438
10 544.909423828125
15 484.7691650390625
20 432.4112854003906
25 386.5163879394531
30 345.7978210449219
35 309.2781982421875
40 276.3218688964844
45 246.56204223632812
50 219.54469299316406
55 194.89508056640625
60 172.42190551757812
65 151.8444061279297
70 132.96307373046875
75 115.81351470947266
80 100.37520599365234
85 86.48570251464844
90 74.0639419555664
95 63.031436920166016
100 53.30567932128906
105 44.7796516418457
110 37.372623443603516
115 30.988656997680664
120 25.535381317138672
125 20.90952491760254
130 17.021394729614258
135 13.773482322692871
140 11.087059020996094
145 8.878859519958496
150 7.076192378997803
155 5.6155686378479
160 4.438809394836426
165 3.496457576751709
170 2.7458159923553467
175 2.1499686241149902
180 1.6794917583465576
185 1.3087910413742065
190 1.0175358057022095
195 0.7892367839813232
200 0.6106471419334412
205 0.47118714451789856
210 0.3626347482204437
215 0.27827179431915283
220 0.2128976583480835
225 0.1623