## optimizer 최적화를 수행하는 알고리즘
```
손실함수의 최솟값 찾는 과정
```

### BGD : 배치 경사 하강법
```
전체 데이터셋에 대해 한번에 학습

* 학습시간 매우 길어짐

* 지역 최솟값을 빠져나오기 어려움
```
### SGD : 확률 경사 하강법
```
전체 학습 데이터 중 특정 크기만큼 임의로 선택해서 학습
```

### 미니 배치
```
epoch : 전체 데이터셋이 한번 최적화를 수행한 것
iteration : 하나의 배치가 한번 최적화를 수행한 것

전체 데이터셋 : 1000개, 배치 사이즈 : 200개

각 200개의 서브 데이터셋 = 미니배치

1 epoch => 5 iteration
```

### Local minima
```
지역 최소점, 주변 cost 함수값 중 최소값을 가지는 지점
```
### Global minima
```
전역 최소점, 전체 cost 함수값 중 최소값을 가지는 지점
```


In [6]:
import torch

x = torch.rand(1, requires_grad=True)
y = torch.rand(1, requires_grad=True)

loss = y-x

loss

tensor([0.7513], grad_fn=<SubBackward0>)

In [7]:
loss.backward()
x.grad, y.grad

(tensor([-1.]), tensor([1.]))

## autograd

### W.grad : 가중치 미분값
### b.grad : 상수 미분값

In [8]:
# 입력, 출력, 가중치. b값 모두 생성

import torch

x = torch.ones(4)  # input tensor
y = torch.zeros(3)  # expected output
W = torch.rand(4, 3, requires_grad=True)
b = torch.rand(3, requires_grad=True)
z = torch.matmul(x, W) + b
print (W, b, z)

tensor([[0.4967, 0.4260, 0.7743],
        [0.8774, 0.7788, 0.8450],
        [0.7319, 0.6691, 0.0249],
        [0.2951, 0.1788, 0.1588]], requires_grad=True) tensor([0.2347, 0.8928, 0.7382], requires_grad=True) tensor([2.6359, 2.9455, 2.5413], grad_fn=<AddBackward0>)


In [9]:
# loss.backward 하는 순간 역전파 시작, w1 미분 부터 다시 계산

import torch.nn.functional as F

loss = F.mse_loss(z, y)
loss.backward()
print(loss, W.grad, b.grad)

tensor(7.3605, grad_fn=<MseLossBackward0>) tensor([[1.7572, 1.9636, 1.6942],
        [1.7572, 1.9636, 1.6942],
        [1.7572, 1.9636, 1.6942],
        [1.7572, 1.9636, 1.6942]]) tensor([1.7572, 1.9636, 1.6942])


In [10]:
threshold = 0.1
learning_rate = 0.1
iteration_num = 0

# 최적화 완료한 in[37]
# 다시 한번 역전파 최적화

while loss > threshold:
    iteration_num += 1
    W = W - learning_rate * W.grad
    b = b - learning_rate * b.grad
    print(iteration_num, loss, z, y)
    
    W.detach_().requires_grad_(True)
    b.detach_().requires_grad_(True)
    
    z = torch.matmul(x, W) + b
    loss = F.mse_loss(z, y)
    loss.backward()
    
print(iteration_num + 1, loss, z, y)

1 tensor(7.3605, grad_fn=<MseLossBackward0>) tensor([2.6359, 2.9455, 2.5413], grad_fn=<AddBackward0>) tensor([0., 0., 0.])
2 tensor(3.2713, grad_fn=<MseLossBackward0>) tensor([1.7572, 1.9636, 1.6942], grad_fn=<AddBackward0>) tensor([0., 0., 0.])
3 tensor(1.4539, grad_fn=<MseLossBackward0>) tensor([1.1715, 1.3091, 1.1294], grad_fn=<AddBackward0>) tensor([0., 0., 0.])
4 tensor(0.6462, grad_fn=<MseLossBackward0>) tensor([0.7810, 0.8727, 0.7530], grad_fn=<AddBackward0>) tensor([0., 0., 0.])
5 tensor(0.2872, grad_fn=<MseLossBackward0>) tensor([0.5207, 0.5818, 0.5020], grad_fn=<AddBackward0>) tensor([0., 0., 0.])
6 tensor(0.1276, grad_fn=<MseLossBackward0>) tensor([0.3471, 0.3879, 0.3347], grad_fn=<AddBackward0>) tensor([0., 0., 0.])
7 tensor(0.0567, grad_fn=<MseLossBackward0>) tensor([0.2314, 0.2586, 0.2231], grad_fn=<AddBackward0>) tensor([0., 0., 0.])


In [11]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class LinearRegressionModel(nn.Module):
    def __init__(self, input_dim, output_dim):
        super().__init__()
        self.linear = nn.Linear(input_dim, output_dim)
        
    def forward(self, x):
        return self.linear(x)
    
model = LinearRegressionModel(4, 3)

In [12]:
x = torch.ones(4)
y = torch.zeros(3)

In [13]:
learning_rate = 0.01
nb_epochs = 1000 
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

for epoch in range(nb_epochs + 1):

    pred = model(x)
    loss = F.mse_loss(pred, y)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

In [14]:
print(loss)
for param in model.parameters():
    print (param)

tensor(6.0256e-13, grad_fn=<MseLossBackward0>)
Parameter containing:
tensor([[-0.0788,  0.2903,  0.1098, -0.4175],
        [-0.2188, -0.3833,  0.3814, -0.1126],
        [ 0.3285,  0.1401, -0.1803, -0.1561]], requires_grad=True)
Parameter containing:
tensor([ 0.0962,  0.3334, -0.1322], requires_grad=True)


In [15]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_diabetes
diabetes_data = load_diabetes()

# 데이터 세트에 대한 전체 설명
print (diabetes_data.DESCR) 

x = torch.from_numpy(np.array(diabetes_data.data[:, :-1], dtype=np.float32))
y = torch.from_numpy(np.array(diabetes_data.data[:, [-1]], dtype=np.float32))
#shape

.. _diabetes_dataset:

Diabetes dataset
----------------

Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.

**Data Set Characteristics:**

  :Number of Instances: 442

  :Number of Attributes: First 10 columns are numeric predictive values

  :Target: Column 11 is a quantitative measure of disease progression one year after baseline

  :Attribute Information:
      - age     age in years
      - sex
      - bmi     body mass index
      - bp      average blood pressure
      - s1      tc, total serum cholesterol
      - s2      ldl, low-density lipoproteins
      - s3      hdl, high-density lipoproteins
      - s4      tch, total cholesterol / HDL
      - s5      ltg, possibly log of serum triglycerides level
      - s6      glu, blood sugar level

Note: Each of these 1

In [16]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class LinearRegressionModel(nn.Module):
    def __init__(self, input_dim, output_dim):
        super().__init__()
        self.linear = nn.Linear(input_dim, output_dim)
        
    def forward(self, x):
        return self.linear(x)
    
model = LinearRegressionModel(x.size(1), y.size(1))

In [17]:
learning_rate = 0.01
nb_epochs = 10000
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

for epoch in range(nb_epochs + 1):

    y_pred = model(x)
    loss = F.mse_loss(y_pred, y)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

In [18]:
print(loss)
for param in model.parameters():
    print (param)

tensor(0.0022, grad_fn=<MseLossBackward0>)
Parameter containing:
tensor([[ 0.0370,  0.2380, -0.1211, -0.0592,  0.0807,  0.2076, -0.1292,  0.2640,
         -0.0649]], requires_grad=True)
Parameter containing:
tensor([-2.2402e-10], requires_grad=True)
