# Maximum Likelihood Estimation (MLE)
- likelihood (가능도, 우도) 
- 최대 가능도 추정 / 최대 우도 추정

$K \sim B(n,\theta)  $ <br><br>
$ P(K = k) = \binom{n}{k}\theta^{k}(1-\theta)^{n-k} $ <br><br>
         $ = \frac{n!}{k!(n-k)!} \cdot\theta^{k}(1-\theta)^{n-k}$ <br><br>
- observation 을 가장 잘 나타내는 $\theta$ 를 찾아내는 과정

# Optimization via Gradient Descent

# Overfitting and Regurlarization
- Dataset = train(0.8)/dev(혹은 val, 0 ~ 0.1)/test(0.1 ~ 0.2) 로 구성
## Overfitting 방지하는 방법
- More Data
- Less features (데이터 분포를 나타내는 데이터)
- Regularization
## Regularization (overfitting 방지)
- Early Stopping : validation Loss 가 더 이상 낮아지지 않을 때
- Reducing network size 
- Weight Decay (parameter 크기 제한)
- **Dropout (많이 사용)**
- **Batch normalization (많이 사용)**
## Basic Approach to Train DNN
1. 신경망 구조 만들기
2. Train, 모델이 오버 피팅이 되었는지 확인하기
 - 오버 피팅이 되지 않았다면, model size를 증가(deeper & wider)
 - 오버 피팅이 되면, regularization (drop-out, batch-normalization) 추가한다.
3. 2번의 단계를 반복

## Import

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

In [None]:
# 재현성을 좋게 하기 위해서
torch.manual_seed(1)

<torch._C.Generator at 0x7f724409f170>

## Training and Test Dataset

In [None]:
x_train = torch.FloatTensor([[1,2,1], 
                             [1,3,2,],
                             [1,3,4],
                             [1,5,5],
                             [1,7,5],
                             [1,2,5],
                             [1,6,6],
                             [1,7,7]])       #(m,3)
y_train = torch.LongTensor([2,2,2,1,1,1,0,0])  # (m,)

In [None]:
x_test = torch.FloatTensor([[2,1,1],[3,1,2],[3,3,4]]) # (m',3)
y_test = torch.LongTensor([2,2,2]) # (m', )

## Model

In [None]:
class SoftmaxClassifierModel(nn.Module):
  def __init__(self):
    super().__init__()
    self.linear = nn.Linear(3,3)
  def forward(self, x):
    return self.linear(x)

In [None]:
model = SoftmaxClassifierModel()

In [None]:
# Optimizer
optimizer = optim.SGD(model.parameters(), lr = 0.1)

## Training

In [None]:
def train(model, optimizer, x_train, y_train):
  nb_epochs = 20
  for epoch in range(nb_epochs):

    # H(x) 계산
    prediction = model(x_train)

    # cost 계산
    cost = F.cross_entropy(prediction, y_train)

    # cost 로 H(x) update
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    print('Epoch {:4d}/{} Cost: {:.6f}'.format(epoch, nb_epochs, cost.item()))
                     

## Test (Valication)

In [None]:
def test(model, optimizer, x_test, y_test):
  prediction = model(x_test)
  predicted_classes = prediction.max(1)[1]
  correct_count = (predicted_classes == y_test).sum().item()
  cost = F.cross_entropy(prediction, y_test)

  print('Accuracy: {}% Cost: {:.6f}'.format(correct_count / len(y_test) * 100, cost.item()))

## Run

In [None]:
train(model, optimizer, x_train, y_train)

Epoch    0/20 Cost: 2.203667
Epoch    1/20 Cost: 1.199645
Epoch    2/20 Cost: 1.142985
Epoch    3/20 Cost: 1.117769
Epoch    4/20 Cost: 1.100901
Epoch    5/20 Cost: 1.089523
Epoch    6/20 Cost: 1.079872
Epoch    7/20 Cost: 1.071320
Epoch    8/20 Cost: 1.063325
Epoch    9/20 Cost: 1.055720
Epoch   10/20 Cost: 1.048378
Epoch   11/20 Cost: 1.041245
Epoch   12/20 Cost: 1.034285
Epoch   13/20 Cost: 1.027478
Epoch   14/20 Cost: 1.020813
Epoch   15/20 Cost: 1.014279
Epoch   16/20 Cost: 1.007872
Epoch   17/20 Cost: 1.001586
Epoch   18/20 Cost: 0.995419
Epoch   19/20 Cost: 0.989365


In [None]:
test(model, optimizer, x_test, y_test)

Accuracy: 0.0% Cost: 1.425844


# Learing Rate
- learning rate 가 너무 크면 diverge 하면서 cost가 overshooting
- $ \theta \leftarrow \theta - \alpha \nabla_{\theta} L(X_{i}\theta) $ (학습률)

In [None]:
model = SoftmaxClassifierModel()

In [None]:
optimizer = optim.SGD(model.parameters(), lr = 1e5)

In [None]:
train(model, optimizer, x_train, y_train)

Epoch    0/20 Cost: 1.280268
Epoch    1/20 Cost: 976950.750000
Epoch    2/20 Cost: 1279135.250000
Epoch    3/20 Cost: 1198378.875000
Epoch    4/20 Cost: 1098825.750000
Epoch    5/20 Cost: 1968197.750000
Epoch    6/20 Cost: 284763.250000
Epoch    7/20 Cost: 1532260.250000
Epoch    8/20 Cost: 1651503.750000
Epoch    9/20 Cost: 521878.593750
Epoch   10/20 Cost: 1397263.250000
Epoch   11/20 Cost: 750986.375000
Epoch   12/20 Cost: 918691.375000
Epoch   13/20 Cost: 1487888.250000
Epoch   14/20 Cost: 1582260.250000
Epoch   15/20 Cost: 685818.125000
Epoch   16/20 Cost: 1140048.875000
Epoch   17/20 Cost: 940566.375000
Epoch   18/20 Cost: 931638.250000
Epoch   19/20 Cost: 1971322.750000


- learning rate 가 너무 작으면 cost가 거의 줄어들지 않음

In [None]:
model = SoftmaxClassifierModel()

In [None]:
optimizer = optim.SGD(model.parameters(), lr = 1e-10)

In [None]:
train(model, optimizer, x_train, y_train)

Epoch    0/20 Cost: 3.187324
Epoch    1/20 Cost: 3.187324
Epoch    2/20 Cost: 3.187324
Epoch    3/20 Cost: 3.187324
Epoch    4/20 Cost: 3.187324
Epoch    5/20 Cost: 3.187324
Epoch    6/20 Cost: 3.187324
Epoch    7/20 Cost: 3.187324
Epoch    8/20 Cost: 3.187324
Epoch    9/20 Cost: 3.187324
Epoch   10/20 Cost: 3.187324
Epoch   11/20 Cost: 3.187324
Epoch   12/20 Cost: 3.187324
Epoch   13/20 Cost: 3.187324
Epoch   14/20 Cost: 3.187324
Epoch   15/20 Cost: 3.187324
Epoch   16/20 Cost: 3.187324
Epoch   17/20 Cost: 3.187324
Epoch   18/20 Cost: 3.187324
Epoch   19/20 Cost: 3.187324


- 적절한 숫자로 시작해 발산하면 작게, cost가 줄어들지 않으면 크게 조정.

In [None]:
model = SoftmaxClassifierModel()

In [None]:
optimizer = optim.SGD(model.parameters(), lr = 1e-1)

In [None]:
train(model, optimizer, x_train, y_train)

Epoch    0/20 Cost: 2.939317
Epoch    1/20 Cost: 1.887239
Epoch    2/20 Cost: 1.055398
Epoch    3/20 Cost: 0.936401
Epoch    4/20 Cost: 0.917493
Epoch    5/20 Cost: 0.911811
Epoch    6/20 Cost: 0.906384
Epoch    7/20 Cost: 0.901102
Epoch    8/20 Cost: 0.895959
Epoch    9/20 Cost: 0.890947
Epoch   10/20 Cost: 0.886062
Epoch   11/20 Cost: 0.881298
Epoch   12/20 Cost: 0.876650
Epoch   13/20 Cost: 0.872114
Epoch   14/20 Cost: 0.867685
Epoch   15/20 Cost: 0.863359
Epoch   16/20 Cost: 0.859132
Epoch   17/20 Cost: 0.855000
Epoch   18/20 Cost: 0.850961
Epoch   19/20 Cost: 0.847009


# Data Processing

In [None]:
x_train = torch.FloatTensor([[73, 80, 75],
                             [93, 88, 93],
                             [89, 91, 90],
                             [96, 98, 100],
                             [73, 66, 70]])
y_train = torch.FloatTensor([[152], [185], [180], [196], [142]])

$ X'_{j} = \frac{x_{j}-\mu_{j}}{𝜎_{j}} $ (정규분포화 가정)<br><br>
𝜎 는 standard deviation, μ:  평균 값

In [None]:
mu = x_train.mean(dim =0)

In [None]:
sigma = x_train.std(dim = 0)

In [None]:
norm_x_train = (x_train - mu) / sigma

In [None]:
print(norm_x_train) # ~N(0,1) 정규분포를 따름

tensor([[-1.0674, -0.3758, -0.8398],
        [ 0.7418,  0.2778,  0.5863],
        [ 0.3799,  0.5229,  0.3486],
        [ 1.0132,  1.0948,  1.1409],
        [-1.0674, -1.5197, -1.2360]])


## Training with Preprocessed Data

In [None]:
class MultivariateLinearRegressionModel(nn.Module):
  def __init__(self):
    super().__init__()
    self.linear = nn.Linear(3,1)
  
  def forward(self, x):
    return self.linear(x)

In [None]:
model = MultivariateLinearRegressionModel()

In [None]:
optimizer = optim.SGD(model.parameters(), lr=1e-1)

In [None]:
def train(model, optimizer, x_train, y_train):
  nb_epochs = 20
  for epoch in range(nb_epochs):
    
    # H(x) 계산
    prediction = model(x_train)

    # cost 계산
    cost = F.mse_loss(prediction, y_train)

    # cost 로 H(x) update
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    print('Epoch {:4d}/{} Cost: {:.6f}'.format(epoch, nb_epochs, cost.item()))

In [None]:
train(model, optimizer, norm_x_train, y_train)

Epoch    0/20 Cost: 29474.621094
Epoch    1/20 Cost: 18722.042969
Epoch    2/20 Cost: 11941.124023
Epoch    3/20 Cost: 7630.645996
Epoch    4/20 Cost: 4880.464844
Epoch    5/20 Cost: 3122.821289
Epoch    6/20 Cost: 1998.639404
Epoch    7/20 Cost: 1279.363037
Epoch    8/20 Cost: 819.076416
Epoch    9/20 Cost: 524.500732
Epoch   10/20 Cost: 335.968170
Epoch   11/20 Cost: 215.298920
Epoch   12/20 Cost: 138.062103
Epoch   13/20 Cost: 88.621628
Epoch   14/20 Cost: 56.971245
Epoch   15/20 Cost: 36.706795
Epoch   16/20 Cost: 23.729731
Epoch   17/20 Cost: 15.416880
Epoch   18/20 Cost: 10.089418
Epoch   19/20 Cost: 6.672885


# 5주차 퀴즈 관련 풀이

In [None]:
import torch
torch_tensor = torch.FloatTensor([[[1],[2],[3]],[[4],[5],[6]], [[7],[8],[9]]])
print(torch_tensor.squeeze(), torch_tensor.squeeze().shape)

tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]]) torch.Size([3, 3])


In [None]:
torch_t1 = torch.FloatTensor([1,2,3])
torch_t2 = torch.FloatTensor([[4],[5],[6]])
print(torch_t1+torch_t2)

tensor([[5., 6., 7.],
        [6., 7., 8.],
        [7., 8., 9.]])


In [None]:
torch_tensor = torch.FloatTensor([[1,2,3,4],[5,6,7,8]])
torch_tensor = torch_tensor.view([-1,2])
print(torch_tensor)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.],
        [7., 8.]])


In [None]:
torch_tensor1 = torch.FloatTensor([[1,2,3,4]])
torch_tensor2 = torch.FloatTensor([[5,6,7,8]])
print(torch.cat([torch_tensor1, torch_tensor2], axis = 0))

tensor([[1., 2., 3., 4.],
        [5., 6., 7., 8.]])
