### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [1]:
import torch
device = torch.device('cpu')

In [2]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
###<your code>###
x = torch.randn((N,D_in))
y = torch.randn((N,D_out))

# 初始化weight W1, W2
###<your code>###
w1 = torch.randn((D_in,H),requires_grad = True)
w2 = torch.randn((H,D_out),requires_grad = True)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    y_hat = torch.relu(x.mm(w1)).mm(w2)

    # 計算loss
    residual = y_hat - y
    loss = (residual ** 2).sum()
    print(t, loss.item())
 
    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    w1_grad = (residual.mm(w2.T) * torch.where(x.mm(w1)>0, torch.tensor(1.), torch.tensor(0.))).T.mm(x)
    w2_grad = residual.T.mm(torch.relu(x.mm(w1)))

    # 參數更新
    w1.data -= 2 * learning_rate * w1_grad.T
    w2.data -= 2 * learning_rate * w2_grad.T

0 31480740.0
1 31306418.0
2 31622678.0
3 28441084.0
4 21231132.0
5 13193777.0
6 7295791.5
7 3987757.5
8 2359625.25
9 1568990.375
10 1158467.875
11 919464.625
12 761907.5625
13 647164.25
14 557970.375
15 485801.0
16 425955.53125
17 375730.125
18 333000.8125
19 296332.28125
20 264677.78125
21 237229.03125
22 213297.546875
23 192320.1875
24 173859.015625
25 157557.3125
26 143126.671875
27 130286.875
28 118841.6171875
29 108608.640625
30 99428.015625
31 91170.9609375
32 83731.0390625
33 77020.9765625
34 70957.0078125
35 65455.96875
36 60457.9296875
37 55906.91015625
38 51753.4609375
39 47959.3203125
40 44487.5078125
41 41307.9609375
42 38396.8671875
43 35722.234375
44 33259.91015625
45 30991.1640625
46 28901.703125
47 26973.494140625
48 25193.85546875
49 23546.5546875
50 22020.17578125
51 20605.150390625
52 19292.1640625
53 18072.083984375
54 16936.611328125
55 15880.818359375
56 14898.505859375
57 13984.0341796875
58 13131.201171875
59 12335.89453125
60 11593.6298828125
61 10900.404296875

427 0.0013987371930852532
428 0.0013561330270022154
429 0.0013161487877368927
430 0.001275662099942565
431 0.0012368133757263422
432 0.0012019388377666473
433 0.00116674043238163
434 0.0011336004827171564
435 0.0011005650740116835
436 0.0010684884618967772
437 0.00103643792681396
438 0.0010079313069581985
439 0.000977282994426787
440 0.0009505372727289796
441 0.0009242293890565634
442 0.0008982125436887145
443 0.0008729651453904808
444 0.000848440220579505
445 0.0008260906906798482
446 0.000804270152002573
447 0.0007828167290426791
448 0.0007614369387738407
449 0.0007413321291096509
450 0.0007217499660328031
451 0.0007021233323030174
452 0.0006842723814770579
453 0.0006666373810730875
454 0.000649068271741271
455 0.0006337111699394882
456 0.0006176413153298199
457 0.000601526815444231
458 0.0005876736249774694
459 0.000573572819121182
460 0.0005588137428276241
461 0.0005452964687719941
462 0.0005331918364390731
463 0.0005200002924539149
464 0.0005075610824860632
465 0.00049540307372808

### 使用Pytorch的Autograd

In [3]:
import torch
device = torch.device('cpu')

In [4]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
###<your code>###
x = torch.randn((N,D_in))
y = torch.randn((N,D_out))

# 初始化weight W1, W2
###<your code>###
w1 = torch.randn((D_in,H),requires_grad = True)
w2 = torch.randn((H,D_out),requires_grad = True)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    y_hat = torch.relu(x.mm(w1)).mm(w2)

    # 計算loss
    loss = ((y_hat - y) **2).sum()
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    loss.backward()

    # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
    with torch.no_grad():
        w1.data -= learning_rate * w1.grad 
        w2.data -= learning_rate * w2.grad 

    # 將紀錄的gradient清空(因為已經更新參數)
    w1.grad.zero_()
    w2.grad.zero_()


0 24633476.0
1 18544020.0
2 16544405.0
3 16154951.0
4 15995438.0
5 15190672.0
6 13354802.0
7 10744150.0
8 7971277.0
9 5545358.5
10 3726257.5
11 2478526.0
12 1671090.5
13 1159677.5
14 836232.8125
15 628082.875
16 490429.0625
17 395991.5
18 328662.125
19 278585.96875
20 239987.140625
21 209266.984375
22 184144.84375
23 163158.984375
24 145357.125
25 130067.890625
26 116805.3671875
27 105225.9140625
28 95058.578125
29 86094.40625
30 78156.6875
31 71102.75
32 64813.3828125
33 59191.421875
34 54156.07421875
35 49638.78515625
36 45586.1953125
37 41936.9140625
38 38638.2890625
39 35651.2109375
40 32942.05078125
41 30480.46875
42 28241.1328125
43 26199.4453125
44 24337.083984375
45 22634.974609375
46 21076.41796875
47 19647.375
48 18335.7890625
49 17130.564453125
50 16027.8984375
51 15019.126953125
52 14088.2998046875
53 13227.908203125
54 12431.7197265625
55 11694.2587890625
56 11010.4169921875
57 10375.365234375
58 9785.857421875
59 9237.1865234375
60 8726.28515625
61 8250.111328125
62 7805.

410 0.0773552879691124
411 0.07512004673480988
412 0.07294633239507675
413 0.07085320353507996
414 0.06880820542573929
415 0.06684251874685287
416 0.06491239368915558
417 0.06305744498968124
418 0.06126152351498604
419 0.05949730798602104
420 0.057782240211963654
421 0.05611714348196983
422 0.05453085899353027
423 0.052969835698604584
424 0.051448702812194824
425 0.04997443035244942
426 0.048548489809036255
427 0.04716982692480087
428 0.04581749066710472
429 0.04450729861855507
430 0.04322466254234314
431 0.04199958220124245
432 0.04080897569656372
433 0.0396370105445385
434 0.03851832449436188
435 0.037409551441669464
436 0.03636263683438301
437 0.03532704338431358
438 0.03433135151863098
439 0.0333588644862175
440 0.032408617436885834
441 0.031481921672821045
442 0.030591221526265144
443 0.029725201427936554
444 0.02888460084795952
445 0.0280572809278965
446 0.02726813405752182
447 0.026501771062612534
448 0.02575579658150673
449 0.02503257244825363
450 0.024327054619789124
451 0.023