### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [3]:
%load_ext autotime

The autotime extension is already loaded. To reload it, use:
  %reload_ext autotime
time: 0 ns (started: 2021-02-11 16:54:03 +08:00)


In [4]:
import torch
device = torch.device('cpu')

time: 16 ms (started: 2021-02-11 16:54:04 +08:00)


In [5]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# 初始化weight W1, W2
w1 = torch.randn(D_in, H, device=device)
w2 = torch.randn(H, D_out, device=device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  h = x.mm(w1)
  h_relu = h.clamp(min=0)
  y_pred = h_relu.mm(w2)

  # 計算loss
  loss = (y_pred - y).pow(2).sum()
  print(t, loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  grad_y_pred = 2.0 * (y_pred - y)
  grad_w2 = h_relu.t().mm(grad_y_pred)
  grad_h_relu = grad_y_pred.mm(w2.t())
  grad_h = grad_h_relu.clone()
  grad_h[h < 0] = 0
  grad_w1 = x.t().mm(grad_h)

  # 參數更新
  w1 -= learning_rate * grad_w1
  w2 -= learning_rate * grad_w2

0 31508580.0
1 24548224.0
2 22363872.0
3 21033546.0
4 18763032.0
5 15073500.0
6 10915092.0
7 7234812.0
8 4610894.0
9 2931560.5
10 1932227.25
11 1341197.625
12 985655.1875
13 761161.375
14 611387.125
15 505248.59375
16 425868.96875
17 363912.03125
18 314042.90625
19 273018.90625
20 238673.515625
21 209689.421875
22 184936.109375
23 163639.4375
24 145270.109375
25 129327.796875
26 115424.03125
27 103273.5
28 92628.7578125
29 83256.4921875
30 74985.6015625
31 67674.015625
32 61190.10546875
33 55420.55859375
34 50284.42578125
35 45693.1953125
36 41582.5703125
37 37896.2421875
38 34585.8203125
39 31607.6953125
40 28923.576171875
41 26501.77734375
42 24309.962890625
43 22324.7421875
44 20524.109375
45 18887.166015625
46 17398.603515625
47 16043.083984375
48 14806.865234375
49 13678.828125
50 12647.3994140625
51 11703.302734375
52 10838.685546875
53 10045.8037109375
54 9317.826171875
55 8648.703125
56 8033.28662109375
57 7466.32568359375
58 6944.00048828125
59 6462.12158203125
60 6016.9570312

### 使用Pytorch的Autograd

In [None]:
import torch
device = torch.device('cpu')

In [6]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# 初始化weight W1, W2
w1 = torch.randn(D_in, H, device=device, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, requires_grad=True)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  y_pred = x.mm(w1).clamp(min=0).mm(w2)
  
  # 計算loss
  loss = (y_pred - y).pow(2).sum()
  print(t, loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  loss.backward()

  # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
  with torch.no_grad():
    w1 -= learning_rate * w1.grad
    w2 -= learning_rate * w2.grad

    # Manually zero the gradients after running the backward pass
    w1.grad.zero_()
    w2.grad.zero_()

0 35057264.0
1 35506408.0
2 40343504.0
3 41183720.0
4 32918338.0
5 19361440.0
6 9053010.0
7 4043210.75
8 2095643.25
9 1338065.875
10 994377.9375
11 799550.0625
12 666827.3125
13 566144.5
14 485555.28125
15 419344.75
16 364295.09375
17 318028.0625
18 278871.3125
19 245563.59375
20 217025.234375
21 192434.953125
22 171155.71875
23 152647.796875
24 136489.578125
25 122339.5703125
26 109900.34375
27 98931.0078125
28 89227.65625
29 80623.6953125
30 72982.2421875
31 66168.65625
32 60080.40234375
33 54626.84375
34 49738.5625
35 45345.40625
36 41389.203125
37 37818.83984375
38 34591.32421875
39 31670.30078125
40 29023.66015625
41 26621.837890625
42 24438.994140625
43 22453.40234375
44 20648.34375
45 19001.484375
46 17498.19921875
47 16124.6962890625
48 14868.6533203125
49 13719.4091796875
50 12666.396484375
51 11700.908203125
52 10814.9384765625
53 10001.6103515625
54 9254.21484375
55 8566.9580078125
56 7934.51611328125
57 7352.70166015625
58 6816.66015625
59 6323.0791015625
60 5867.9438476562