### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [1]:
import torch
device = torch.device('cpu')

In [4]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# 初始化weight W1, W2
w1 = torch.randn(D_in, H, device=device)
w2 = torch.randn(H, D_out, device=device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    h = x.mm(w1)
    h_relu = h.clamp(min=0)
    y_pred = h_relu.mm(w2)
    
    # 計算loss
    loss = (y_pred-y).pow(2).sum()
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    grad_y_pred = 2.0*(y_pred-y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)

    # 參數更新
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 24037928.0
1 18716746.0
2 16474865.0
3 15153904.0
4 13680714.0
5 11731814.0
6 9399214.0
7 7083945.0
8 5085456.0
9 3560216.75
10 2477972.0
11 1748039.0
12 1263851.5
13 943600.25
14 728101.625
15 579540.6875
16 473752.8125
17 395790.9375
18 336350.4375
19 289596.59375
20 251806.625
21 220626.53125
22 194469.46875
23 172257.6875
24 153191.625
25 136692.390625
26 122314.15625
27 109725.640625
28 98660.53125
29 88897.515625
30 80253.3046875
31 72586.8046875
32 65765.265625
33 59678.9609375
34 54238.859375
35 49367.203125
36 44998.6484375
37 41077.17578125
38 37552.65625
39 34369.609375
40 31493.580078125
41 28889.359375
42 26528.880859375
43 24386.626953125
44 22440.69921875
45 20669.55078125
46 19055.49609375
47 17582.984375
48 16238.939453125
49 15009.5673828125
50 13883.98046875
51 12854.490234375
52 11910.00390625
53 11042.75390625
54 10246.083984375
55 9515.3525390625
56 8844.3974609375
57 8225.947265625
58 7655.34814453125
59 7129.0400390625
60 6642.89306640625
61 6193.4384765625
62

### 使用Pytorch的Autograd

In [5]:
import torch
device = torch.device('cpu')

In [6]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# 初始化weight W1, W2
w1 = torch.randn(D_in, H, device=device, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, requires_grad=True)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  y_pred = x.mm(w1).clamp(min=0).mm(w2)
  
  # 計算loss
  loss = (y_pred - y).pow(2).sum()
  print(t, loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  loss.backward()

  # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
  with torch.no_grad():
    w1 -= learning_rate * w1.grad
    w2 -= learning_rate * w2.grad

    # Manually zero the gradients after running the backward pass
    w1.grad.zero_()
    w2.grad.zero_()

0 30028314.0
1 22053538.0
2 18442708.0
3 15927830.0
4 13403562.0
5 10667800.0
6 8001995.0
7 5730674.0
8 3995762.5
9 2773601.0
10 1947078.25
11 1401018.125
12 1038720.0625
13 795291.4375
14 627362.25
15 508162.84375
16 420661.125
17 354350.5625
18 302560.1875
19 261224.859375
20 227503.09375
21 199500.75
22 175946.46875
23 155885.84375
24 138634.5625
25 123695.8046875
26 110710.7578125
27 99342.4765625
28 89339.5625
29 80515.203125
30 72701.21875
31 65757.90625
32 59578.734375
33 54065.8359375
34 49134.78125
35 44712.734375
36 40744.9453125
37 37175.0703125
38 33958.359375
39 31055.697265625
40 28431.3046875
41 26054.48828125
42 23899.701171875
43 21944.046875
44 20165.662109375
45 18547.546875
46 17073.197265625
47 15729.2021484375
48 14502.802734375
49 13381.9208984375
50 12356.25
51 11417.2744140625
52 10556.4462890625
53 9766.609375
54 9041.40234375
55 8374.294921875
56 7760.95068359375
57 7199.48583984375
58 6686.73828125
59 6213.86572265625
60 5777.2978515625
61 5373.955078125
62 