### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [3]:
import torch
device = torch.device('cpu')

In [None]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn((N, D_in)).to(device)
y = torch.randn((N, D_out)).to(device)


# 初始化weight W1, W2
W1 = torch.randn((D_in, H)).to(device)
W2 = torch.randn((H, D_out)).to(device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  h = torch.matmul(x, W1)
  h_relu = torch.relu(h)
  y_pred = torch.matmul(h_relu, W2)

  # 計算loss
  loss = torch.square(y_pred - y).sum()
  print(t, loss.item())


  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  y_pred_grad = 2. * (y_pred - y)
  W2_grad = h_relu.T.mm(y_pred_grad)
  h_grad = y_pred_grad.mm(W2.T) * (h > 0.)
  W1_grad = x.T.mm(h_grad)

  # 參數更新
  W1.data -= learning_rate * W1_grad
  W2.data -= learning_rate * W2_grad

### 使用Pytorch的Autograd

In [None]:
import torch
device = torch.device('cpu')

In [6]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn((N, D_in)).to(device)
y = torch.randn((N, D_out)).to(device)

# 初始化weight W1, W2
W1 = torch.randn((D_in, H), requires_grad=True).to(device)
W2 = torch.randn((H, D_out), requires_grad=True).to(device)


# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  h = torch.matmul(x, W1)
  h_relu = torch.relu(h)
  y_pred = torch.matmul(h_relu, W2)
  
  # 計算loss
  loss = torch.square(y_pred - y).sum()
  print(t, loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  loss.backward()

  # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
  with torch.no_grad():
    # 更新參數W1 W2
    W1.data -= learning_rate * W1.grad 
    W2.data -= learning_rate * W2.grad 

    # 將紀錄的gradient清空(因為已經更新參數)
    W1.grad.zero_()
    W2.grad.zero_()

0 27830312.0
1 20844748.0
2 17281254.0
3 14677181.0
4 12097094.0
5 9516668.0
6 7098132.0
7 5104751.5
8 3585382.0
9 2510410.25
10 1772468.25
11 1278700.75
12 947012.0
13 721973.5
14 565726.25
15 454414.6875
16 372728.8125
17 311022.8125
18 263183.125
19 225172.84375
20 194368.203125
21 169007.890625
22 147793.28125
23 129850.6484375
24 114539.3046875
25 101402.5703125
26 90062.25
27 80208.3984375
28 71619.09375
29 64099.9296875
30 57496.16015625
31 51677.09375
32 46534.859375
33 41979.8515625
34 37937.1015625
35 34339.9375
36 31133.60546875
37 28269.767578125
38 25706.658203125
39 23408.02734375
40 21343.15625
41 19485.79296875
42 17813.638671875
43 16304.8232421875
44 14943.572265625
45 13712.318359375
46 12596.216796875
47 11583.330078125
48 10663.0
49 9825.751953125
50 9063.125
51 8367.603515625
52 7732.48779296875
53 7153.63232421875
54 6625.0556640625
55 6140.888671875
56 5696.732421875
57 5289.06591796875
58 4914.43310546875
59 4570.2490234375
60 4253.54833984375
61 3961.867919921