### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [1]:
import torch
device = torch.device('cpu')

In [2]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn((N, D_in))
y = torch.randn((N, D_out))

# 初始化weight W1, W2
w1 = torch.randn((D_in, H), requires_grad=True)
w2 = torch.randn((H, D_out), requires_grad=True)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    y_hat = torch.relu(x.mm(w1)).mm(w2)

    # 計算loss
    residual = y_hat - y
    loss = (residual ** 2).sum()
    print(t, loss.item())
 
    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    w1_grad = (residual.mm(w2.T) * torch.where(x.mm(w1)>0, torch.tensor(1.), torch.tensor(0.))).T.mm(x)
    w2_grad = residual.T.mm(torch.relu(x.mm(w1)))

    # 參數更新
    w1.data -= 2 * learning_rate * w1_grad.T
    w2.data -= 2 * learning_rate * w2_grad.T

0 37175408.0
1 43424176.0
2 56170196.0
3 60003528.0
4 44192064.0
5 19852742.0
6 6710638.5
7 2680558.5
8 1609835.5
9 1220952.0
10 1003092.8125
11 846127.1875
12 722513.375
13 622138.4375
14 539249.0625
15 470018.0
16 411737.625
17 362336.3125
18 320203.40625
19 284050.25
20 252878.15625
21 225947.5625
22 202535.03125
23 182071.78125
24 164125.171875
25 148320.0
26 134348.0
27 121959.8125
28 110939.765625
29 101116.6796875
30 92343.953125
31 84474.859375
32 77405.6328125
33 71039.3984375
34 65293.71875
35 60096.1953125
36 55391.0859375
37 51121.34375
38 47240.1484375
39 43708.96875
40 40486.38671875
41 37542.48046875
42 34850.91015625
43 32386.0546875
44 30125.630859375
45 28050.904296875
46 26147.271484375
47 24396.28125
48 22782.94921875
49 21294.779296875
50 19920.625
51 18653.228515625
52 17481.6953125
53 16396.525390625
54 15390.984375
55 14457.53125
56 13590.283203125
57 12784.3623046875
58 12034.5322265625
59 11336.1376953125
60 10685.7734375
61 10079.111328125
62 9512.658203125
6

387 0.11976519227027893
388 0.11623319983482361
389 0.1128009557723999
390 0.1094713807106018
391 0.10621307790279388
392 0.10306760668754578
393 0.10001429170370102
394 0.09705635905265808
395 0.09420274198055267
396 0.09142209589481354
397 0.08870351314544678
398 0.08612120151519775
399 0.08355878293514252
400 0.08108048886060715
401 0.07868681102991104
402 0.07638822495937347
403 0.07411102205514908
404 0.07193981111049652
405 0.0698244571685791
406 0.06774572283029556
407 0.06574269384145737
408 0.0637972354888916
409 0.06192683055996895
410 0.060087595134973526
411 0.05832628160715103
412 0.05663030222058296
413 0.05495394393801689
414 0.05334911495447159
415 0.051789164543151855
416 0.050259917974472046
417 0.0487760491669178
418 0.04735102131962776
419 0.04597344622015953
420 0.044611431658267975
421 0.04329044371843338
422 0.04202975332736969
423 0.040793538093566895
424 0.039601460099220276
425 0.0384528674185276
426 0.03731822222471237
427 0.036224156618118286
428 0.035154506

### 使用Pytorch的Autograd

In [3]:
import torch
device = torch.device('cpu')

In [4]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn((N, D_in))
y = torch.randn((N, D_out))

# 初始化weight W1, W2
w1 = torch.randn((D_in, H), requires_grad=True)
w2 = torch.randn((H, D_out), requires_grad=True)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    y_hat = torch.relu(x.mm(w1)).mm(w2)

    # 計算loss
    loss = ((y_hat - y) **2).sum()
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    loss.backward()

    # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
    with torch.no_grad():
        w1.data -= 2 * learning_rate * w1.grad 
        w2.data -= 2 * learning_rate * w2.grad 

    # 將紀錄的gradient清空(因為已經更新參數)
    w1.grad.zero_()
    w2.grad.zero_()

0 36033100.0
1 167343024.0
2 734649472.0
3 7967878.5
4 935397.1875
5 546821.75
6 385044.96875
7 290068.78125
8 226726.90625
9 182590.359375
10 149142.984375
11 123127.2265625
12 102542.2578125
13 86167.578125
14 72927.265625
15 62166.87109375
16 53346.40625
17 46091.36328125
18 40029.00390625
19 34886.828125
20 30493.728515625
21 26746.099609375
22 23554.818359375
23 20788.123046875
24 18389.4765625
25 16310.962890625
26 14491.564453125
27 12905.037109375
28 11506.95703125
29 10281.0537109375
30 9200.232421875
31 8249.2939453125
32 7409.54296875
33 6669.99267578125
34 6015.046875
35 5434.55859375
36 4920.5927734375
37 4463.2607421875
38 4059.26708984375
39 3698.8486328125
40 3377.66650390625
41 3091.22802734375
42 2835.514404296875
43 2606.9833984375
44 2403.1796875
45 2221.637451171875
46 2058.4453125
47 1912.546630859375
48 1781.5247802734375
49 1664.328125
50 1559.573974609375
51 1465.1875
52 1380.4390869140625
53 1303.9566650390625
54 1235.0755615234375
55 1172.927734375
56 1116.81

396 575.0078735351562
397 575.0076904296875
398 575.0074462890625
399 575.0072021484375
400 575.0069580078125
401 575.0067138671875
402 575.0065307617188
403 575.0062866210938
404 575.0060424804688
405 575.005859375
406 575.005615234375
407 575.00537109375
408 575.0051879882812
409 575.0049438476562
410 575.0046997070312
411 575.0045166015625
412 575.0042724609375
413 575.0040283203125
414 575.0037841796875
415 575.0036010742188
416 575.0033569335938
417 575.0031127929688
418 575.0028686523438
419 575.0026245117188
420 575.00244140625
421 575.0022583007812
422 575.001953125
423 575.0017700195312
424 575.0015869140625
425 575.0013427734375
426 575.0010986328125
427 575.0008544921875
428 575.0006713867188
429 575.0004272460938
430 575.0001831054688
431 574.9999389648438
432 574.999755859375
433 574.9995727539062
434 574.999267578125
435 574.9990844726562
436 574.9988403320312
437 574.9986572265625
438 574.9983520507812
439 574.9981689453125
440 574.9979248046875
441 574.9976806640625
442