### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [1]:
import torch
device = torch.device('cpu')

In [3]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn((N, D_in)).to(device)
y = torch.randn((N, D_out)).to(device)

# 初始化weight W1, W2
W1 = torch.randn((D_in, H)).to(device)
W2 = torch.randn((H, D_out)).to(device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    h = torch.matmul(x, W1)
    h_relu = torch.relu(h)
    y_pred = torch.matmul(h_relu, W2)

    # 計算loss
    loss = torch.square(y_pred - y).sum()
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    y_pred_grad = 2. * (y_pred - y)
    W2_grad = h_relu.T.mm(y_pred_grad)
    h_grad = y_pred_grad.mm(W2.T) * (h > 0.)
    W1_grad = x.T.mm(h_grad)

    # 參數更新
    W1.data -= learning_rate * W1_grad
    W2.data -= learning_rate * W2_grad

0 33824644.0
1 28769508.0
2 26712980.0
3 23669802.0
4 18821730.0
5 13139070.0
6 8355264.0
7 5061499.5
8 3103347.25
9 1995764.0
10 1371592.125
11 1005154.5
12 776291.0625
13 622744.4375
14 512632.96875
15 429571.125
16 364569.71875
17 312214.0
18 269215.875
19 233457.0625
20 203410.15625
21 177994.65625
22 156321.78125
23 137775.875
24 121845.671875
25 108063.40625
26 96101.578125
27 85668.6796875
28 76554.4453125
29 68558.078125
30 61519.6171875
31 55308.421875
32 49814.64453125
33 44941.6796875
34 40607.51953125
35 36745.4375
36 33297.65625
37 30214.20703125
38 27451.48828125
39 24970.34765625
40 22737.634765625
41 20726.443359375
42 18913.095703125
43 17275.453125
44 15794.3037109375
45 14453.37890625
46 13236.9609375
47 12132.4580078125
48 11128.212890625
49 10214.34765625
50 9382.4111328125
51 8623.470703125
52 7931.11328125
53 7298.56298828125
54 6720.3544921875
55 6191.23828125
56 5706.7861328125
57 5262.81884765625
58 4856.0498046875
59 4482.4697265625
60 4139.42919921875
61 382

### 使用Pytorch的Autograd

In [4]:
import torch
device = torch.device('cpu')

In [5]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn((N, D_in)).to(device)
y = torch.randn((N, D_out)).to(device)

# 初始化weight W1, W2
W1 = torch.randn((D_in, H), requires_grad=True).to(device)
W2 = torch.randn((H, D_out), requires_grad=True).to(device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    y_pred = torch.matmul(torch.relu(torch.matmul(x, W1)), W2)

    # 計算loss
    loss = torch.square(y_pred - y).sum()
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    loss.backward()

    # 參數更新: 這裡在更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
    with torch.no_grad():
        # 更新參數W1 W2
        W1.data -= learning_rate * W1.grad
        W2.data -= learning_rate * W2.grad

        # 將紀錄的gradient清空(因為已經更新參數)
        W1.grad.zero_()
        W2.grad.zero_()

0 26716772.0
1 22105446.0
2 21012478.0
3 20475166.0
4 18964022.0
5 15830608.0
6 11856766.0
7 8039481.0
8 5142452.0
9 3221592.5
10 2055475.0
11 1365986.5
12 957594.75
13 707946.25
14 548115.25
15 439949.6875
16 362667.5
17 304744.96875
18 259622.40625
19 223323.671875
20 193470.75
21 168534.453125
22 147480.65625
23 129524.53125
24 114128.859375
25 100846.734375
26 89336.7265625
27 79317.71875
28 70604.7265625
29 63006.6953125
30 56331.90234375
31 50454.1953125
32 45268.2421875
33 40692.73046875
34 36649.51953125
35 33056.84375
36 29858.5234375
37 27006.7421875
38 24460.19140625
39 22180.435546875
40 20136.71484375
41 18301.732421875
42 16652.923828125
43 15169.46484375
44 13832.4736328125
45 12626.185546875
46 11534.634765625
47 10547.4375
48 9653.5185546875
49 8842.287109375
50 8106.2021484375
51 7437.03076171875
52 6828.19189453125
53 6273.69921875
54 5767.8251953125
55 5306.3046875
56 4884.943359375
57 4499.90576171875
58 4147.5771484375
59 3824.80078125
60 3529.224365234375
61 3258