**y= sin(x)를 fitting하는 문제를 다룰 것!**

전반적으로 앞의 세 tutorial(tensor, autograd, nn)을 복습하는 느낌

## Table of Contents
### 0. Tensors
* numpy를 이용하여 Nerual Network 설계
* Tensor를 이용하여 Neural Network 설계

구조 상으로도, accuracy 상으로도 거의 차이가 없음. 다만, Tensor는 GPU를 사용할 수 있기 때문에, 시간의 측면에서 성능 차이가 남

### 1. Autograd
* autograd를 이용하 backward pass를 구현하기
* customizing autograd

### 2. nn Module
* nn Module을 사용해 Neural Network 설계
* customizing nn module

---

## Tensors

### Warm-up: numpy
비록 numpy가 deep learning을 포함한 복잡한 연산에 최적화 되어있지는 않지만, 여전히 numpy로도 forward/backward 구현이 가능

In [1]:
import numpy as np
import math

# create random input and output data
x= np.linspace(-math.pi, math.pi,  2000)
y= np.sin(x)

# randomly initialzie weights
a= np.random.randn()
b= np.random.randn()
c= np.random.randn()
d= np.random.randn()

learning_rate= 1e-6
for t in range(2000):
    # forward pass - compute predicted y
    # y = a + bx + cx^2 + dx^3
    y_pred= a + b*x +  c*(x**2) + d*(x**3)
    
    # compute and print loss
    loss= np.square(y_pred - y).sum()
    if t % 100 == 99:
        print(t, loss)
        
    # backprop to compute gradients of a, b, c, d with respect to loss
    grad_y_pred= 2.0 * (y_pred - y)
    grad_a= grad_y_pred.sum()
    grad_b= (grad_y_pred * x).sum()
    grad_c= (grad_y_pred * x ** 2).sum()
    grad_d= (grad_y_pred * x ** 3).sum()
    
    # update weights
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d
    
print(f'Result: y= {a} + {b}x + {c}x^2 + {d}x^3')

99 2340.702466078339
199 1554.2998485804023
299 1033.2621478778576
399 687.9966575224632
499 459.1737648492539
599 307.4994241573625
699 206.94633450518404
799 140.27272720480255
899 96.05539325798216
999 66.72510444116288
1099 47.26564733295402
1199 34.35222344563479
1299 25.780770833896106
1399 20.08993395821267
1499 16.310617814669083
1599 13.800045579097526
1699 12.131792149072778
1799 11.022901060325921
1899 10.285571832022322
1999 9.795129215912723
Result: y= -0.010919999357367796 + 0.828043194680475x + 0.0018838815987195895x^2 + -0.08924839845797145x^3


###  PyTorch: Tensors

numpy와 대비되는 tensor의 가장 큰 장점은, **GPU**를 사용 할 수 있다는 점이다(사실 그 외의 다른점에서는 그다지 크게 다르지 않다고 한다)

In [2]:
import torch
import math

dtype= torch.float
device= torch.device('cpu')

# create random input and output data
x= torch.linspace(-math.pi, math.pi, 2000, device= device, dtype= dtype)
y= torch.sin(x)

# randomly initialize weights
a= torch.randn((), device= device, dtype= dtype)
b= torch.randn((), device= device, dtype= dtype)
c= torch.randn((), device= device, dtype= dtype)
d= torch.randn((), device= device, dtype= dtype)

learning_rate= 1e-6
for t in range(2000):
    # forward pass - compute predicted y
    y_pred= a + b*x + c*(x**2) + d*(x**3)
    
    # compute and print loss
    loss= (y_pred - y).pow(2).sum().item()
    if t % 100 == 99:
        print(t, loss)
    
    # back propagation to compute gradients of a, b, c, d with respect to loss
    grad_y_pred= 2.0 * (y_pred - y)
    grad_a= grad_y_pred.sum()
    grad_b= (grad_y_pred * x).sum()
    grad_c= (grad_y_pred * x ** 2).sum()
    grad_d= (grad_y_pred * x ** 3).sum()
    
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d
    
print('Result: y= {} + {}x + {}x^2 + {}x^3'.format(a, b, c, d))

99 127.18893432617188
199 90.88992309570312
299 65.77717590332031
399 48.3851432800293
499 36.328285217285156
599 27.961936950683594
699 22.151092529296875
799 18.111495971679688
899 15.300758361816406
999 13.343435287475586
1099 11.979296684265137
1199 11.027802467346191
1299 10.3636474609375
1399 9.899721145629883
1499 9.575419425964355
1599 9.348576545715332
1699 9.189800262451172
1799 9.078601837158203
1899 9.000678062438965
1999 8.946041107177734
Result: y= -0.011340231634676456 + 0.8530916571617126x + 0.001956378808245063x^2 + -0.09281131625175476x^3


당연히 성능에는 차이가 없다. device='cuda'로 설정 시 속도가 차이날 수는 있음

---
## Autograd
### PyTorch: Tensors and autograd

위의 예시에서는 직접 forward pass와 backward pass를 구현하였지만, 실제로는 backward pass를 구현할 필요가 없다. **Autograd**의 존재 덕분. (autograd에 대한 자세한 내용은 Tutorial_Autograd에서 다룸)

따라서 우리는 forward pass로 network가 가지는 computational graph를 설계해주고, autograd가 backward pass를 진행하도록 ``.backward()``를 불러주기만 하면 된다.

* ``tensor.item()`` - element가 1개밖에 없는 tensor에서, 그 element의 값을 scalar로 반환
* weight을 update할 때에는 연산을 추적할 필요가 없으므로 ``with torch.no_grad()``:로 감싼다!
* weight을 update할 때에는 매 update후 gradient를 초기화(clear)해야 함.

In [3]:
import torch
import math

dtype= torch.float
device= torch.device('cpu')

x= torch.linspace(-math.pi, math.pi, 2000, device= device, dtype= dtype)
y= torch.sin(x)

# create random tensors for weights
# set requires_grad=True to indicate we want to compute gradients of these Tensors during backward pass
a= torch.randn((),  device= device, dtype= dtype, requires_grad=True)
b= torch.randn((),  device= device, dtype= dtype, requires_grad=True)
c= torch.randn((),  device= device, dtype= dtype, requires_grad=True)
d= torch.randn((),  device= device, dtype= dtype, requires_grad=True)

learning_rate= 1e-6
for t in range(2000):
    # forward pass
    y_pred= a + b*x + c*(x**2) + d*(x**3)
    
    # compute loss, which is scala value here
    loss=  (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        # tensor.item() -> only to tensor with only one element, its value to scalar
        print(t, loss.item())
    
    # use .backward() to compute backward pass
    loss.backward()
    
    # manually update weights using gradient descent.
    # wrap in torch.no_grad() b/c we don't need to track this update computaition in autograd
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad
    
        # manually clear the gradients after updating weights
        a.grad= None
        b.grad= None
        c.grad= None
        d.grad= None

print('Result: y= {} + {}x + {}x^2 + {}x^3'.format(a.item(), b.item(), c.item(), d.item()))

99 7313.1875
199 4875.544921875
299 3252.921875
399 2172.367431640625
499 1452.474853515625
599 972.642822265625
699 652.6620483398438
799 439.1701354980469
899 296.6507568359375
999 201.45591735839844
1099 137.83328247070312
1199 95.28516387939453
1299 66.81243896484375
1399 47.745582580566406
1499 34.96830749511719
1599 26.399581909179688
1699 20.648761749267578
1799 16.785982131958008
1899 14.189315795898438
1999 12.44221019744873
Result: y= -0.0352456234395504 + 0.8079785108566284x + 0.006080457009375095x^2 + -0.08639436960220337x^3


### PyTorch: Defining new autograd functions

필요한 경우, 새로운 **autograd function**을 만들 수 있다. 이 때,
* ``torch.autograd.Function``의 subclass로 만든다.
* ``forward``와 ``backward`` function을 implement한다.

여기서는, $y=a+b P_3(c+dx)$라는 ``Legendre Polynomial``을 구현한다.

In [4]:
import torch
import math

class LegendrePolynomial3(torch.autograd.Function):
    
    @staticmethod
    def forward(ctx, input):
        # ctx: context object - used for stashing information for backward computation.
        # can cache arbitrary objects for use in the backward pass using the ctx.save_for_backward method
        ctx.save_for_backward(input)
        
        return 0.5 * (5 * input ** 3 - 3 * input)
    
    @staticmethod
    def backward(ctx, grad_output):
        # in the backward pass, we recieve a Tensor containing the gradient of the loss w.r.t. to the output
        # we need to compute the gradient of the loss w.r.t. the input
        input, =ctx.saved_tensors
        return grad_output * 1.5 * (5 * input ** 2 - 1)

dtype= torch.float
device= torch.device("cpu")

x= torch.linspace(-math.pi, math.pi, 2000, device= device, dtype= dtype)
y= torch.sin(x)

# create random tensors for weights
# set requires_grad=True to indicate we want to compute gradients of these Tensors during backward pass
a= torch.full((), 0.0, device= device, dtype= dtype, requires_grad=True)
b= torch.full((), -1.0, device= device, dtype= dtype, requires_grad=True)
c= torch.full((), 0.0, device= device, dtype= dtype, requires_grad=True)
d= torch.full((), 0.3, device= device, dtype= dtype, requires_grad=True)

learning_rate= 5e-6
for t in range(2000):
    # to apply our function, we use Function.apply method. we alias this as 'P3'.
    P3= LegendrePolynomial3.apply
    
    # forward pass - compute predicted y using our operations.
    y_pred= a + b * P3(c + d*x)
    
    # compute loss
    loss= (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())
        
    # use autograd to  compute the backward pass
    loss.backward()
    
    # update weights using g.d.
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad
    
        # manually clear the gradients after updating weights
        a.grad= None
        b.grad= None
        c.grad= None
        d.grad= None

print('Result: y= {} + {} * P3({} + {}x))'.format(a.item(), b.item(), c.item(), d.item()))

99 209.95834350585938
199 144.66018676757812
299 100.70249938964844
399 71.03519439697266
499 50.97850799560547
599 37.403133392333984
699 28.206867218017578
799 21.973188400268555
899 17.7457275390625
999 14.877889633178711
1099 12.931766510009766
1199 11.610918045043945
1299 10.714258193969727
1399 10.10548210144043
1499 9.692106246948242
1599 9.411375999450684
1699 9.220745086669922
1799 9.091285705566406
1899 9.003361701965332
1999 8.943639755249023
Result: y= -6.8844756562214116e-09 + -2.208526849746704 * P3(1.5037101563919464e-09 + 0.2554861009120941x))


## nn Module
### PyTorch: nn

PyTorch의 ``nn`` 패키지는 neural network를 설계하기 위한 high-level 추상화 및 연산을 수행한다.
대표적으로, 
* nn의 Module은 neural network의 layer와 같은 역할을 한다.
* nn의 Module은 input Tensor를 받아 output Tensor를 반환하며, 그 과정에서 learnable parameters와 같은 정보를 보관한다.
* nn의 Module은 loss function 계산과 같이 neural network에 자주 사용되는 method를 정의하고 있다.

In [5]:
import torch
import math

x= torch.linspace(-math.pi, math.pi, 2000)
y= torch.sin(x)

``y=sin(x)``를 linear funcion으로 만들고 싶은 상황. 

즉, ``y= ax + bx^2 + cx^3 + d``의 꼴을 원하고 있다.
이 때, y의 값을 ``(x,  x^2, x^3)``의 Linear Combination으로 생각 할 수 있다.
따라서, input value x를 ``(x, x^2, x^3)``으로 만든다.

**Shape 따지기**
* x.unsqueeze = (2000, 1)
* p= (3, )
* xx= (2000, 3) by *broadcasting*

In [6]:
p= torch.tensor([1, 2, 3])
xx= x.unsqueeze(-1).pow(p)

* ``nn.Sequential``은 다른 module들을 담아, 순서대로 apply (scikit learn의 pipeline 느낌인 듯)
* ``nn.Linear``: input, output features를 parameter로 담아, linear comb를 진행
* ``nn.Flatten``: flattens output of the linear layer to a 1D tensor

In [7]:
model= torch.nn.Sequential(
        torch.nn.Linear(3, 1), 
        torch.nn.Flatten(0, 1)
)

``nn.MESLoss``: nn package에는 다양한 loss funciton이 내장되어 있다.

In [8]:
loss_fn= torch.nn.MSELoss(reduction= 'sum')

In [10]:
learning_rate= 1e-6
for t in range(2000):
    # forward pass
    ## by doing this I can pass tensor as an input data and get output data model produced
    y_pred= model(xx)
    
    # compute loss
    loss= loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t,loss.item())
    
    # zero the gradients
    model.zero_grad()
    
    # backward pass
    ## parameters of each module are stored in Tensors(requires_grad= True)
    loss.backward()
    
    with torch.no_grad():
        for param in model.parameters():
            param -=  learning_rate * param.grad

99 200.06491088867188
199 137.82827758789062
299 95.92902374267578
399 67.69600677490234
499 48.65394973754883
599 35.79831314086914
699 27.11073875427246
799 21.233810424804688
899 17.25417137145996
999 14.556417465209961
1099 12.725672721862793
1199 11.481929779052734
1299 10.63603687286377
1399 10.060086250305176
1499 9.667488098144531
1599 9.39957332611084
1699 9.216527938842773
1799 9.09132194519043
1899 9.005584716796875
1999 8.946805000305176


Now we can access the layers of 'model' by indexing.

In [11]:
linear_layer= model[0]

bias and weights are stored in the linear layer

In [12]:
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

Result: y = -0.009286156855523586 + 0.8496859669685364 x + 0.001602016156539321 x^2 + -0.09232688695192337 x^3


### PyTorch: optim
``torch.optim`` package contains many optimization algortihms

In [13]:
import torch
import math


# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# Prepare the input tensor (x, x^2, x^3).
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)

# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate= 1e-3
optimizer= torch.optim.RMSprop(model.parameters(),  lr= learning_rate)
for t in range(2000):
    # forward pass
    y_pred=  model(xx)
    
    # compute loss
    loss= loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())
        
    # before backwad pass, we have to zero gradients
    optimizer.zero_grad()
    
    loss.backward()
    
    # calling the step function on an Optimizer makes an update to its parameters
    optimizer.step()
    
linear_layer= model[0]
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

99 446.4188537597656
199 250.09664916992188
299 139.93463134765625
399 72.65043640136719
499 34.09986877441406
599 15.825347900390625
699 9.85751724243164
799 8.870123863220215
899 8.829363822937012
999 8.844720840454102
1099 8.857608795166016
1199 8.907981872558594
1299 8.933329582214355
1399 8.907007217407227
1499 8.905284881591797
1599 8.925554275512695
1699 8.927555084228516
1799 8.918675422668457
1899 8.91840934753418
1999 8.921612739562988
Result: y = -0.0005127650802023709 + 0.857241690158844 x + -0.0005128494813106954 x^2 + -0.09282944351434708 x^3


### PyTorch: Custom nn Modules

Sometimes we need some models that are more complex than a sequence of exisitng Modules. For these cases we can define our own Modules by subclassing ``nn.Module`` and defining ``forawrd`` (like we've already done in tutorial_nn)

In [15]:
import torch
import math

class Polynomial3(torch.nn.Module):
    def __init__(self):
        
        # we need to instantiate four parameters and assign them as member parameters
        super().__init__()
        self.a= torch.nn.Parameter(torch.randn(()))
        self.b= torch.nn.Parameter(torch.randn(()))
        self.c= torch.nn.Parameter(torch.randn(()))
        self.d= torch.nn.Parameter(torch.randn(()))
        
    def forward(self, x):
        # we need to accept a input Tensor data and return output Tensor data
        return self.a + self.b*x + self.c*(x**2) + self.d*(x**3)
    
    def string(self):
        return f'y= {self.a.item()} + {self.b.item()}x + {self.c.item()}x^2 + {self.d.item()}x^3'
    

x= torch.linspace(-math.pi, math.pi,2000)
y= torch.sin(x)

model= Polynomial3()

criterion= torch.nn.MSELoss(reduction= 'sum')
# by calling model.parameters() inside the SGD constructor, optimizer will contain learnable parameters(of Linear Module)
optimizer= torch.optim.SGD(model.parameters(), lr= 1e-6)

for t in range(2000):
    y_pred= model(x)
    
    loss= criterion(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())
        
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
print(f'Result: {model.string()}')

99 290.34405517578125
199 195.2422332763672
299 132.27935791015625
399 90.5899658203125
499 62.98386001586914
599 44.701622009277344
699 32.59307861328125
799 24.572511672973633
899 19.25904083251953
999 15.738619804382324
1099 13.405855178833008
1199 11.859786033630371
1299 10.835033416748047
1399 10.15568733215332
1499 9.705238342285156
1599 9.406512260437012
1699 9.20836067199707
1799 9.076900482177734
1899 8.98967170715332
1999 8.93175983428955
Result: y= 0.003001720178872347 + 0.8467066287994385x + -0.0005178470746614039x^2 + -0.09190310537815094x^3


### PyTorch: Control Flow + Weight Sharing


In [20]:
import random
import torch
import math

class DynamicNet(torch.nn.Module):
    def __init__(self):
        super().__init__()
        # instantiate five parameters
        self.a= torch.nn.Parameter(torch.randn(()))
        self.b= torch.nn.Parameter(torch.randn(()))
        self.c= torch.nn.Parameter(torch.randn(()))
        self.d= torch.nn.Parameter(torch.randn(()))
        self.e= torch.nn.Parameter(torch.randn(()))
        
    def forward(self, x):
        # random하게 linear식을 4제곱과 5제곱 중 하나로 고른다. 이 때, 몇 제곱이든지 상관없이 4제곱 이상에서는 e를 parameter로 사용한다.
        # 이를 구현하는 과정에서 python의 control-flow operator중 일부인 loop나 conditional statement를 사용하여도 상관없다.
        y= self.a + self.b*x + self.c*(x**2) + self.d*(x**3)
        for exp in range(4, random.randint(4, 6)):
            y+= self.e * (x ** exp)
            
        return y
    
    def string(self):
        
        return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x^2 + {self.d.item()} x^3 + {self.e.item()} x^4 ? + {self.e.item()} x^5 ?'
    
    
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

model= DynamicNet()

criterion= torch.nn.MSELoss(reduction= 'sum')
optimizer= torch.optim.SGD(model.parameters(), lr= 1e-8,  momentum= 0.9)

for t in range(30000):
    y_pred= model(x)
    
    loss= criterion(y_pred,  y)
    if t % 2000 == 1999:
        print(t, loss)
        
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
print()
print(f'Result: {model.string()}')

1999 tensor(829.7263, grad_fn=<MseLossBackward>)
3999 tensor(407.3547, grad_fn=<MseLossBackward>)
5999 tensor(194.6680, grad_fn=<MseLossBackward>)
7999 tensor(92.6954, grad_fn=<MseLossBackward>)
9999 tensor(48.3777, grad_fn=<MseLossBackward>)
11999 tensor(27.8341, grad_fn=<MseLossBackward>)
13999 tensor(17.9325, grad_fn=<MseLossBackward>)
15999 tensor(13.1424, grad_fn=<MseLossBackward>)
17999 tensor(44.0211, grad_fn=<MseLossBackward>)
19999 tensor(9.8035, grad_fn=<MseLossBackward>)
21999 tensor(9.2782, grad_fn=<MseLossBackward>)
23999 tensor(9.4929, grad_fn=<MseLossBackward>)
25999 tensor(8.7682, grad_fn=<MseLossBackward>)
27999 tensor(8.8916, grad_fn=<MseLossBackward>)
29999 tensor(8.9219, grad_fn=<MseLossBackward>)

Result: y = -0.004061676561832428 + 0.8549294471740723 x + 0.00014459357771556824 x^2 + -0.09356360882520676 x^3 + 0.00012509786756709218 x^4 ? + 0.00012509786756709218 x^5 ?
