### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [1]:
import torch
device = torch.device('cpu')

In [2]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn((N, D_in))
y = torch.randn((N, D_out))

# 初始化weight W1, W2
w1 = torch.randn((D_in, H), requires_grad=True)
w2 = torch.randn((H, D_out), requires_grad=True)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    y_hat = torch.relu(x.mm(w1)).mm(w2)

    # 計算loss
    residual = y_hat - y
    loss = (residual ** 2).sum()
    print(t, loss.item())
 
    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    w1_grad = (residual.mm(w2.T) * torch.where(x.mm(w1)>0, torch.tensor(1.), torch.tensor(0.))).T.mm(x)
    w2_grad = residual.T.mm(torch.relu(x.mm(w1)))

    # 參數更新
    w1.data -= 2 * learning_rate * w1_grad.T
    w2.data -= 2 * learning_rate * w2_grad.T

0 37232832.0
1 34822976.0
2 34941072.0
3 31264714.0
4 23067658.0
5 13835398.0
6 7371872.0
7 3905209.75
8 2280820.25
9 1514217.25
10 1119296.875
11 885922.625
12 729530.5
13 614124.4375
14 523721.125
15 450493.84375
16 389908.3125
17 339174.5625
18 296391.0
19 260064.78125
20 228998.28125
21 202288.640625
22 179237.234375
23 159252.171875
24 141858.59375
25 126674.9375
26 113382.40625
27 101719.90625
28 91444.1171875
29 82364.59375
30 74319.625
31 67177.078125
32 60814.8671875
33 55139.984375
34 50068.03125
35 45523.97265625
36 41445.35546875
37 37782.05078125
38 34486.7421875
39 31511.478515625
40 28822.23046875
41 26389.5234375
42 24185.69921875
43 22186.181640625
44 20372.4375
45 18722.98046875
46 17220.8359375
47 15851.294921875
48 14601.1357421875
49 13458.890625
50 12414.2685546875
51 11458.1015625
52 10581.7529296875
53 9777.98828125
54 9040.6318359375
55 8363.224609375
56 7740.701171875
57 7168.15087890625
58 6641.15966796875
59 6155.677734375
60 5708.427734375
61 5295.784179687

369 0.0002858289808500558
370 0.00027718310593627393
371 0.00026945763966068625
372 0.0002619648876134306
373 0.00025473436107859015
374 0.00024828992900438607
375 0.0002413849433651194
376 0.00023446345585398376
377 0.00022861023899167776
378 0.00022273774084169418
379 0.00021706342522520572
380 0.00021172850392758846
381 0.0002059590769931674
382 0.00020064435375388712
383 0.0001960883819265291
384 0.0001906491379486397
385 0.00018568101222626865
386 0.00018141197506338358
387 0.00017700239550322294
388 0.0001733724493533373
389 0.00016897820751182735
390 0.00016554989269934595
391 0.00016154279001057148
392 0.000157681672135368
393 0.00015420264389831573
394 0.00015084243204910308
395 0.00014718607417307794
396 0.00014401294174604118
397 0.00014044386625755578
398 0.00013746751938015223
399 0.00013484503142535686
400 0.0001318853464908898
401 0.00012869510101154447
402 0.0001261494617210701
403 0.00012316823995206505
404 0.00012122537009418011
405 0.00011834535689558834
406 0.000116

### 使用Pytorch的Autograd

In [3]:
import torch
device = torch.device('cpu')

In [4]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn((N, D_in))
y = torch.randn((N, D_out))

# 初始化weight W1, W2
w1 = torch.randn((D_in, H), requires_grad=True)
w2 = torch.randn((H, D_out), requires_grad=True)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    y_hat = torch.relu(x.mm(w1)).mm(w2)

    # 計算loss
    loss = ((y_hat - y) **2).sum()
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    loss.backward()

    # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
    with torch.no_grad():
        w1.data -= learning_rate * w1.grad 
        w2.data -= learning_rate * w2.grad 

    # 將紀錄的gradient清空(因為已經更新參數)
    w1.grad.zero_()
    w2.grad.zero_()

0 34861556.0
1 28787300.0
2 25784012.0
3 21925274.0
4 16674957.0
5 11289573.0
6 7060354.5
7 4313097.5
8 2706564.5
9 1802662.5
10 1284974.5
11 972849.0625
12 771423.875
13 631884.75
14 529078.5
15 449784.03125
16 386556.75
17 334892.78125
18 292040.625
19 255993.0
20 225359.21875
21 199131.1875
22 176540.71875
23 156990.859375
24 140015.46875
25 125195.875
26 112206.1875
27 100788.078125
28 90721.15625
29 81813.296875
30 73937.234375
31 66944.5390625
32 60718.5078125
33 55155.421875
34 50174.59375
35 45706.87890625
36 41691.06640625
37 38078.265625
38 34819.33203125
39 31875.87890625
40 29214.47265625
41 26803.384765625
42 24617.486328125
43 22632.33203125
44 20826.94140625
45 19183.544921875
46 17686.1640625
47 16320.3037109375
48 15073.287109375
49 13933.25390625
50 12889.666015625
51 11933.947265625
52 11057.4140625
53 10252.814453125
54 9513.716796875
55 8834.5087890625
56 8209.640625
57 7634.22509765625
58 7103.9482421875
59 6614.81494140625
60 6163.263671875
61 5746.21630859375
62

378 0.007549749221652746
379 0.007297886535525322
380 0.007046550512313843
381 0.006811772007495165
382 0.006580808199942112
383 0.006356360390782356
384 0.006138919852674007
385 0.005930761806666851
386 0.005734529811888933
387 0.005545827094465494
388 0.005357684567570686
389 0.005179860163480043
390 0.005005280487239361
391 0.0048383623361587524
392 0.004677774850279093
393 0.004525898024439812
394 0.004375901538878679
395 0.004234540741890669
396 0.004097094759345055
397 0.0039624301716685295
398 0.0038311888929456472
399 0.003708834294229746
400 0.0035876024048775434
401 0.003471105359494686
402 0.003361415583640337
403 0.003252682276070118
404 0.0031497208401560783
405 0.0030497051775455475
406 0.002951697912067175
407 0.0028599868528544903
408 0.0027706788387149572
409 0.0026848148554563522
410 0.0026022661477327347
411 0.002522536553442478
412 0.0024444060400128365
413 0.002370986621826887
414 0.002295780461281538
415 0.002224948722869158
416 0.0021571912802755833
417 0.0020927