### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [1]:
import torch
device = torch.device('cpu')

In [3]:
N, D_in, H, D_out = 64, 1000, 100, 10
x = torch.randn(N, D_in, device = device)
w1 = torch.randn(D_in, H, device = device)
h = x.mm(w1)
h_relu = h.clamp(min = 0)
print(h_relu)

tensor([[17.1077,  0.0000,  0.0000,  ..., 74.5968,  0.0000, 58.3127],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000, 11.1072,  0.0000],
        [23.1358,  0.0000, 37.9345,  ...,  0.0000, 17.3626,  0.0000],
        ...,
        [ 0.0000,  0.0000,  0.0000,  ..., 14.2878,  3.4867, 30.5293],
        [ 3.3147,  7.0831, 12.6791,  ...,  0.0000, 47.3290,  0.0000],
        [31.0226, 25.0662, 17.3854,  ...,  0.0000,  0.0000,  0.0000]])


In [4]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
###<your code>###
x = torch.randn(N, D_in, device = device)
y = torch.randn(N, D_out, device = device)

# 初始化weight W1, W2
###<your code>###
w1 = torch.randn(D_in, H, device = device)
w2 = torch.randn(H, D_out, device = device)
# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  ###<your code>###
  h = x.mm(w1) # x1 mul w1
  h_relu = h.clamp(min = 0) # equal to relu
  y_pred = h_relu.mm(w2)

  # 計算loss
  ###<your code>###
  loss = (y_pred - y).pow(2).sum()
  print(t, loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  grad_y_pred = 2.0 * (y_pred - y)
  grad_w2 = h_relu.t().mm(grad_y_pred)
  grad_h_relu = grad_y_pred.mm(w2.t())
  grad_h = grad_h_relu.clone()
  grad_h[h < 0] = 0
  grad_w1 = x.t().mm(grad_h)

  # 參數更新
  w1 -= learning_rate * grad_w1
  w2 -= learning_rate * grad_w2

0 23012110.0
1 18336116.0
2 17597164.0
3 18319200.0
4 18828840.0
5 17892428.0
6 15175182.0
7 11374008.0
8 7683622.0
9 4840722.0
10 2977026.0
11 1850948.375
12 1197304.75
13 816849.25
14 590481.9375
15 449945.25
16 358121.1875
17 294658.46875
18 248385.765625
19 213006.890625
20 184974.921875
21 162058.734375
22 142932.90625
23 126721.875
24 112815.2578125
25 100779.515625
26 90293.5390625
27 81121.171875
28 73075.6640625
29 65979.390625
30 59689.390625
31 54101.9921875
32 49124.18359375
33 44677.21484375
34 40695.74609375
35 37125.3359375
36 33915.18359375
37 31023.572265625
38 28414.80859375
39 26056.30859375
40 23921.05859375
41 21994.240234375
42 20253.291015625
43 18669.775390625
44 17227.73828125
45 15913.0244140625
46 14712.33203125
47 13614.2392578125
48 12608.9609375
49 11687.4228515625
50 10841.970703125
51 10065.21484375
52 9350.91796875
53 8693.5048828125
54 8087.82373046875
55 7529.5849609375
56 7014.224609375
57 6538.25390625
58 6098.279296875
59 5691.1162109375
60 5314.07

### 使用Pytorch的Autograd

In [5]:
import torch
device = torch.device('cpu')

In [7]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
###<your code>###
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device = device)

# 初始化weight W1, W2
###<your code>###
w1 = torch.randn(D_in, H, device = device, requires_grad=True)
w2 = torch.randn(H, D_out, device = device, requires_grad=True)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  ###<your code>###
  y_pred = x.mm(w1).clamp(min = 0).mm(w2)
  
  # 計算loss
  ###<your code>###
  loss = (y_pred-y).pow(2).sum()
  print(t, loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  ###<your code>###
  loss.backward()

  # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
  with torch.no_grad():
    # 更新參數W1 W2
    ###<your code>###
    w1 -= learning_rate * w1.grad
    w2 -= learning_rate * w2.grad

    # 將紀錄的gradient清空(因為已經更新參數)
    w1.grad.zero_()
    w2.grad.zero_()

0 24698974.0
1 20135458.0
2 18345370.0
3 17043706.0
4 15188817.0
5 12558057.0
6 9571705.0
7 6815713.5
8 4646282.0
9 3121506.0
10 2120791.5
11 1484363.875
12 1081572.25
13 822562.25
14 650365.0
15 531031.125
16 444690.375
17 379607.5625
18 328493.6875
19 287073.84375
20 252717.296875
21 223707.359375
22 198927.8125
23 177523.984375
24 158944.5625
25 142716.078125
26 128446.6640625
27 115842.2265625
28 104676.6875
29 94753.1171875
30 85913.046875
31 78031.25
32 70978.140625
33 64644.3203125
34 58958.578125
35 53846.86328125
36 49251.109375
37 45104.28515625
38 41356.08203125
39 37961.26171875
40 34882.765625
41 32085.796875
42 29541.705078125
43 27224.03515625
44 25111.052734375
45 23182.255859375
46 21419.94921875
47 19807.208984375
48 18330.046875
49 16975.3515625
50 15732.6015625
51 14591.765625
52 13542.083984375
53 12576.033203125
54 11685.943359375
55 10865.220703125
56 10107.630859375
57 9408.2275390625
58 8762.287109375
59 8164.818359375
60 7611.9296875
61 7100.0029296875
62 6625