### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [2]:
import torch
device = torch.device('cpu')

In [12]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# 初始化weight W1, W2
w1 = torch.randn(D_in, H, device=device)
w2 = torch.randn(H, D_out, device=device)
 
# 設置learning rate
learning_rate = 1e-6 

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
#   ###<your code>###

#   # 計算loss
#   ###<your code>###
#   print(t, loss.item())

#   # 倒傳遞: 計算W1與W2對loss的微分(梯度)
#   ###<your code>###

#   # 參數更新
#   ###<your code>###

9.999999999999999e-06

### 使用Pytorch的Autograd

In [1]:
import torch
device = torch.device('cpu')

In [15]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# 初始化weight W1, W2
w1 = torch.randn(D_in, H, device=device)
w2 = torch.randn(H, D_out, device=device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  h = x.mm(w1)
  h_relu = h.clamp(min=0)
  y_pred = h_relu.mm(w2)
  
  # 計算loss
  loss = (y_pred-y).pow(2).sum()
  print(t, loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  grad_y_pred = 2.0*(y_pred-y)
  grad_w2 = h_relu.t().mm(grad_y_pred)
  grad_h_relu = grad_y_pred.mm(w2.t())
  grad_h = grad_h_relu.clone()
  grad_h[h<0] = 0
  grad_w1 = x.t().mm(grad_h)

  # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
  with torch.no_grad():
    # 更新參數W1 W2
    w1 -= learning_rate*grad_w1
    w2 -= learning_rate*grad_w2

0 38011796.0
1 36699684.0
2 37898300.0
3 34270876.0
4 24937866.0
5 14305570.0
6 7269846.0
7 3778647.5
8 2258714.5
9 1565708.375
10 1202998.125
11 978256.4375
12 819188.3125
13 697051.125
14 599244.5625
15 518760.15625
16 451544.0625
17 394845.03125
18 346651.71875
19 305499.4375
20 270093.90625
21 239570.140625
22 213135.265625
23 190095.46875
24 170050.46875
25 152506.171875
26 137084.3125
27 123491.1171875
28 111466.90625
29 100797.7890625
30 91320.8046875
31 82877.078125
32 75332.5859375
33 68577.8828125
34 62517.078125
35 57069.69140625
36 52167.01953125
37 47747.55859375
38 43755.4375
39 40143.4453125
40 36871.5859375
41 33902.2421875
42 31204.873046875
43 28750.75390625
44 26516.36328125
45 24478.470703125
46 22617.705078125
47 20914.826171875
48 19356.5546875
49 17925.794921875
50 16614.103515625
51 15409.873046875
52 14303.8427734375
53 13286.541015625
54 12350.4267578125
55 11488.345703125
56 10693.171875
57 9958.845703125
58 9280.8037109375
59 8654.7529296875
60 8075.69384765

使用Pytorch的Autograd

In [16]:
import torch
device = torch.device('cpu')

In [18]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# 初始化weight W1, W2
w1 = torch.randn(D_in, H, device=device, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, requires_grad=True)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  y_pred = x.mm(w1).clamp(min=0).mm(w2)
  
  # 計算loss
  loss = (y_pred - y).pow(2).sum()
  print(t, loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  loss.backward()

  # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
  with torch.no_grad():
    w1 -= learning_rate * w1.grad
    w2 -= learning_rate * w2.grad

    # Manually zero the gradients after running the backward pass
    w1.grad.zero_()
    w2.grad.zero_()

0 33490348.0
1 27814308.0
2 23899532.0
3 18977766.0
4 13463171.0
5 8670337.0
6 5327908.0
7 3295176.0
8 2138003.75
9 1481901.75
10 1095866.5
11 853920.1875
12 690877.875
13 573438.75
14 484164.84375
15 413687.96875
16 356781.375
17 309816.1875
18 270456.0
19 237137.890625
20 208711.703125
21 184319.796875
22 163281.359375
23 145049.546875
24 129184.28125
25 115357.4765625
26 103261.578125
27 92630.4140625
28 83254.515625
29 74965.6640625
30 67621.78125
31 61097.4296875
32 55272.484375
33 50083.6953125
34 45447.640625
35 41303.83203125
36 37588.15234375
37 34250.97265625
38 31250.341796875
39 28543.830078125
40 26098.59765625
41 23888.26171875
42 21885.203125
43 20070.662109375
44 18423.482421875
45 16925.54296875
46 15562.0224609375
47 14317.052734375
48 13180.7939453125
49 12143.4794921875
50 11203.2509765625
51 10342.2353515625
52 9553.7607421875
53 8831.099609375
54 8167.5947265625
55 7558.6298828125
56 6998.9541015625
57 6484.22705078125
58 6010.43115234375
59 5574.27783203125
60 51