In [8]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
# For reproducibility
torch.manual_seed(1)

<torch._C.Generator at 0x1dfbf4820b0>

# MLE ( Maximum Likelihood Estimation )

동전이 떨어질 때 앞면 / 뒷면으로 떨어진다.

두 가지이므로 베르누이 분포

n=100 일 때 
앞면이 나오는 횟수 = 27 일 때, 세타를 구하자.

![image-2.png](attachment:image-2.png)


이 공식을 변형하여 세타에 관한 함수를 만들자.


![image.png](attachment:image.png)

여기서 y값이 최대가 되는 지점을 찾고, 그 지점의 세타값을 찾자.

= 0.27

y값이 최대가 되는 지점을 어떻게 찾을까 ?

=> 기울기를 활용하자 => Gradient Ascent(=Local Maxima) 를 활용하자

단점 : overfitting 이 따른다.

# Overfitting

![image.png](attachment:image.png)

overfitting이 일어나서 갈색 선처럼 생겨질 수 있다.

![image.png](attachment:image.png)

따라서 training set, validation set, test set 으로 데이터를 나눈다.

그렇다면 overfitting은 어떤 시점부터 일어난다고 할 수 있을까 ?
![image.png](attachment:image.png)

validation loss 가 커지는 순간부터라고 할 수 있다

따라서, validation loss가 최소화 되는 지점 또는 증가하는 때부터 학습을 stop 할 수 있다.

이외에 다른 방법으로 overfitting을 막을 수 있을까 ?

1 ) more data : 데이터가 많을수록 좋다

2 ) less features

3 ) regularization
* early stopping : validation loss가 더 이상 낮아지지 않을 때
* reducing network size 
* weight decay : nn의 parameter 크기 제한
* dropout
* batch normalization

# 실습해보자

In [12]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

# For reproducibility
torch.manual_seed(1)

<torch._C.Generator at 0x1dfbf4820b0>

## data

In [13]:
x_train = torch.FloatTensor([[1, 2, 1],
                             [1, 3, 2],
                             [1, 3, 4],
                             [1, 5, 5],
                             [1, 7, 5],
                             [1, 2, 5],
                             [1, 6, 6],
                             [1, 7, 7]
                            ])
y_train = torch.LongTensor([2, 2, 2, 1, 1, 1, 0, 0])

In [14]:
x_test = torch.FloatTensor([[2, 1, 1], [3, 1, 2], [3, 3, 4]])
y_test = torch.LongTensor([2, 2, 2])

## model

In [16]:
class SoftmaxClassifierModel(nn.Module) :
    def __init__(self) :
        super().__init__()
        self.linear = nn.Linear(3,3) # train 3열, test output 개수 3개
        
    def forward(self, x) :
        return self.linear(x)
    
model = SoftmaxClassifierModel()

In [17]:
optimizer = optim.SGD(model.parameters(), lr=0.1)

def train(model, optimizer, x_train, y_train) :
    
    nb_epochs = 20
    
    for epoch in range(nb_epochs) :
        
        prediction = model(x_train)
        
        cost = F.cross_entropy(prediction, y_train)
        
        optimizer.zero_grad()
        cost.backward()
        optimizer.step()
        
        print('Epoch {:4d}/{} Cost: {:.6f}'.format(
            epoch, nb_epochs, cost.item()
        ))

In [18]:
def test(model, optimizer, x_test, y_test):
    
    prediction = model(x_test)
    
    predicted_classes = prediction.max(1)[1]
    
    correct_count = (predicted_classes == y_test).sum().item()
    
    cost = F.cross_entropy(prediction, y_test)

    print('Accuracy: {}% Cost: {:.6f}'.format(
         correct_count / len(y_test) * 100, cost.item()
    ))

In [19]:
train(model, optimizer, x_train, y_train)

Epoch    0/20 Cost: 2.203667
Epoch    1/20 Cost: 1.199645
Epoch    2/20 Cost: 1.142985
Epoch    3/20 Cost: 1.117769
Epoch    4/20 Cost: 1.100901
Epoch    5/20 Cost: 1.089523
Epoch    6/20 Cost: 1.079872
Epoch    7/20 Cost: 1.071320
Epoch    8/20 Cost: 1.063325
Epoch    9/20 Cost: 1.055720
Epoch   10/20 Cost: 1.048378
Epoch   11/20 Cost: 1.041245
Epoch   12/20 Cost: 1.034285
Epoch   13/20 Cost: 1.027478
Epoch   14/20 Cost: 1.020813
Epoch   15/20 Cost: 1.014279
Epoch   16/20 Cost: 1.007872
Epoch   17/20 Cost: 1.001586
Epoch   18/20 Cost: 0.995419
Epoch   19/20 Cost: 0.989365


In [20]:
test(model, optimizer, x_test, y_test)
# cost값이 높아졌다. 아마 test loss가 이미 올라가버린 것으로 예상

Accuracy: 0.0% Cost: 1.425844


## Learning Rate

learning rate 가 너무 크면 발산

learning rate 가 너무 작으면 목표점까지 가는데 오랜 시간이 발생

따라서, 적절한 숫자로 시작해 발산하면 작게, cost가 줄어들지 않으면 크게 조정하자.


### learning rate가 너무 클 때

In [21]:
model = SoftmaxClassifierModel()
optimizer = optim.SGD(model.parameters(), lr=1e5)
train(model, optimizer, x_train, y_train)

Epoch    0/20 Cost: 1.280268
Epoch    1/20 Cost: 976950.750000
Epoch    2/20 Cost: 1279135.125000
Epoch    3/20 Cost: 1198379.000000
Epoch    4/20 Cost: 1098825.750000
Epoch    5/20 Cost: 1968197.625000
Epoch    6/20 Cost: 284763.218750
Epoch    7/20 Cost: 1532260.000000
Epoch    8/20 Cost: 1651504.000000
Epoch    9/20 Cost: 521878.500000
Epoch   10/20 Cost: 1397263.250000
Epoch   11/20 Cost: 750986.250000
Epoch   12/20 Cost: 918691.500000
Epoch   13/20 Cost: 1487888.125000
Epoch   14/20 Cost: 1582260.125000
Epoch   15/20 Cost: 685818.062500
Epoch   16/20 Cost: 1140048.750000
Epoch   17/20 Cost: 940566.562500
Epoch   18/20 Cost: 931638.250000
Epoch   19/20 Cost: 1971322.625000


### learning rate가 너무 작을 때

In [22]:
model = SoftmaxClassifierModel()
optimizer = optim.SGD(model.parameters(), lr=1e-1)
train(model, optimizer, x_train, y_train)

Epoch    0/20 Cost: 3.187324
Epoch    1/20 Cost: 1.334308
Epoch    2/20 Cost: 1.047911
Epoch    3/20 Cost: 0.996043
Epoch    4/20 Cost: 0.985740
Epoch    5/20 Cost: 0.977224
Epoch    6/20 Cost: 0.970065
Epoch    7/20 Cost: 0.963589
Epoch    8/20 Cost: 0.957561
Epoch    9/20 Cost: 0.951825
Epoch   10/20 Cost: 0.946302
Epoch   11/20 Cost: 0.940942
Epoch   12/20 Cost: 0.935719
Epoch   13/20 Cost: 0.930613
Epoch   14/20 Cost: 0.925613
Epoch   15/20 Cost: 0.920711
Epoch   16/20 Cost: 0.915902
Epoch   17/20 Cost: 0.911182
Epoch   18/20 Cost: 0.906547
Epoch   19/20 Cost: 0.901994


## Data Preprocessing(데이터 전처리)

$$ x'_j = \frac{x_j - \mu_j}{\sigma_j} $$
여기서 $\sigma$ 는 standard deviation, $\mu$ 는 평균값 이다.

In [24]:
x_train = torch.FloatTensor([[73, 80, 75],
                             [93, 88, 93],
                             [89, 91, 90],
                             [96, 98, 100],
                             [73, 66, 70]])
y_train = torch.FloatTensor([[152], [185], [180], [196], [142]])

mu = x_train.mean(dim=0)
sigma = x_train.std(dim=0)
norm_x_train = (x_train - mu) / sigma

print(norm_x_train)

tensor([[-1.0674, -0.3758, -0.8398],
        [ 0.7418,  0.2778,  0.5863],
        [ 0.3799,  0.5229,  0.3486],
        [ 1.0132,  1.0948,  1.1409],
        [-1.0674, -1.5197, -1.2360]])


In [25]:
class MultivariateLinearRegressionModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(3, 1)

    def forward(self, x):
        return self.linear(x)

In [26]:
model = MultivariateLinearRegressionModel()
optimizer = optim.SGD(model.parameters(), lr=1e-1)

def train(model, optimizer, x_train, y_train):
    nb_epochs = 20
    for epoch in range(nb_epochs):

        # H(x) 계산
        prediction = model(x_train)

        # cost 계산
        cost = F.mse_loss(prediction, y_train)

        # cost로 H(x) 개선
        optimizer.zero_grad()
        cost.backward()
        optimizer.step()

        print('Epoch {:4d}/{} Cost: {:.6f}'.format(
            epoch, nb_epochs, cost.item()
        ))

In [27]:
train(model, optimizer, norm_x_train, y_train)

Epoch    0/20 Cost: 29729.949219
Epoch    1/20 Cost: 18889.082031
Epoch    2/20 Cost: 12048.976562
Epoch    3/20 Cost: 7699.843750
Epoch    4/20 Cost: 4924.700195
Epoch    5/20 Cost: 3151.020264
Epoch    6/20 Cost: 2016.562866
Epoch    7/20 Cost: 1290.709106
Epoch    8/20 Cost: 826.216003
Epoch    9/20 Cost: 528.952271
Epoch   10/20 Cost: 338.703308
Epoch   11/20 Cost: 216.940063
Epoch   12/20 Cost: 139.006989
Epoch   13/20 Cost: 89.125130
Epoch   14/20 Cost: 57.196083
Epoch   15/20 Cost: 36.757317
Epoch   16/20 Cost: 23.672049
Epoch   17/20 Cost: 15.293401
Epoch   18/20 Cost: 9.927165
Epoch   19/20 Cost: 6.488903
