## Learning PyTorch with Examples (2)

Codes are identical to: [pytorch tutorial](https://pytorch.org/tutorials/beginner/pytorch_with_examples.html).

### PyTorch: nn

`nn` package provides higher level abstractions: layers, common loss functions.

In [1]:
import torch

N = 64
D_in, H, D_out = 1000, 100, 10

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out)
)
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-4
for t in range(500):
    y_pred = model(x)
    
    loss = loss_fn(y_pred, y)
    if t % 20 == 0: print(t, loss.item())
    
    # reset gradients
    model.zero_grad()
    loss.backward()
    
    # update parameters (weights)
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

0 596.638671875
20 191.3060302734375
40 63.37615203857422
60 21.443504333496094
80 7.837594985961914
100 3.039156436920166
120 1.230210304260254
140 0.5139384269714355
160 0.222162127494812
180 0.09935349225997925
200 0.04575412720441818
220 0.02160056121647358
240 0.010423238389194012
260 0.005140047520399094
280 0.0025885088834911585
300 0.0013219524407759309
320 0.0006835238891653717
340 0.0003575181763153523
360 0.0001888629049062729
380 0.00010066367394756526
400 5.407274147728458e-05
420 2.9255164918140508e-05
440 1.5925021216389723e-05
460 8.719014658709057e-06
480 4.798503141500987e-06


### PyTorch: optim

`optim` package provides optimization algorithms, like RMSProp, Adam.

We now don't need to manually update parameters using gradients.

In [3]:
import torch

N = 64
D_in, H, D_out = 1000, 100, 10

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out)
)
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-4
optimizer = torch.optim.Adam(
    model.parameters(),
    lr=learning_rate
)
for t in range(500):
    y_pred = model(x)
    
    loss = loss_fn(y_pred, y)
    if t % 20 == 0: print(t, loss.item())
    
    # reset gradients
    optimizer.zero_grad()
    
    loss.backward()
    
    # now we don't update parameters manually!
    optimizer.step()

0 671.2236938476562
20 410.0433044433594
40 255.60887145996094
60 157.2102813720703
80 92.89498901367188
100 52.19229507446289
120 27.536178588867188
140 13.678248405456543
160 6.4642157554626465
180 2.898084878921509
200 1.2635167837142944
220 0.5475822687149048
240 0.24350719153881073
260 0.11298777908086777
280 0.05504155904054642
300 0.027734756469726562
320 0.014103146269917488
340 0.0071294959634542465
360 0.0035456218756735325
380 0.0017235410632565618
400 0.0008161450969055295
420 0.0003756010555662215
440 0.00016776268603280187
460 7.26645375834778e-05
480 3.0509087082464248e-05


### PyTorch: Custom nn Modules

For complex nets, you can define it by defining a subclass of `nn.Module`.

The `forward` function receives input Tensor, and returns output Tensor.

In [5]:
import torch


class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super().__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)
        
    def forward(self, x):
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        return y_pred
    
N = 64
D_in, H, D_out = 1000, 100, 10

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = TwoLayerNet(D_in, H, D_out)
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-4
optimizer = torch.optim.SGD(
    model.parameters(), 
    lr=learning_rate,
    momentum=0.9
)
for t in range(500):
    y_pred = model(x)
    
    loss = loss_fn(y_pred, y)
    if t % 20 == 0: print(t, loss.item())
        
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

0 688.5731811523438
20 23.339492797851562
40 3.4149560928344727
60 0.4524213969707489
80 0.054702699184417725
100 0.006798021495342255
120 0.0008228255901485682
140 8.810869621811435e-05
160 1.2004516065644566e-05
180 1.4686361282656435e-06
200 1.8112606881004467e-07
220 2.1172434472305213e-08
240 2.7380162581636114e-09
260 2.997810610860796e-10
280 4.1663280464510066e-11
300 9.575203477329985e-12
320 4.100678130392055e-12
340 3.464664319330346e-12
360 3.120985241078511e-12
380 2.9305527188966396e-12
400 2.7256051148699667e-12
420 2.401070661739446e-12
440 2.947020665694522e-12
460 2.4598916653628677e-12
480 2.9936728009744007e-12


### PyTorch: Control Flow + Weight Sharing

In here, we will define a network that # of hidden layer changes.
(but each layer shares same weights).

In [8]:
import random
import torch


class DynamicNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super().__init__()
        self.input_linear = torch.nn.Linear(D_in, H)
        self.middle_linear = torch.nn.Linear(H, H)
        self.output_linear = torch.nn.Linear(H, D_out)
        
    def forward(self, x):
        h_relu = self.input_linear(x).clamp(min=0)
        for _ in range(random.randint(0, 3)):
            h_relu = self.middle_linear(h_relu).clamp(min=0)
        y_pred = self.output_linear(h_relu)
        return y_pred

N = 64
D_in, H, D_out = 1000, 100, 10

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = DynamicNet(D_in, H, D_out)
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-4
optimizer = torch.optim.SGD(
    model.parameters(),
    lr=learning_rate,
    momentum=0.9
)
for t in range(500):
    y_pred = model(x)
    
    loss = loss_fn(y_pred, y)
    if t % 20 == 0: print(t, loss.item())
        
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

0 632.0711669921875
20 419.6955871582031
40 251.12831115722656
60 98.99148559570312
80 10.708175659179688
100 10.54945182800293
120 6.828139781951904
140 3.1934988498687744
160 3.984523296356201
180 1.022364854812622
200 11.797147750854492
220 10.55497932434082
240 11.350359916687012
260 3.1293466091156006
280 1.3497684001922607
300 1.7455763816833496
320 0.49634039402008057
340 0.6553925275802612
360 0.26412442326545715
380 0.49488359689712524
400 0.6152457594871521
420 0.432864785194397
440 0.2651008069515228
460 0.2529798746109009
480 0.34129878878593445
