<br><br>
<center><font size='5'><b>Deep Learning for All_pytorch</b></font><br><br><font size='5'>Chap3. Logistic Regression<b></b></font>

# Logistic Regression

![image](https://user-images.githubusercontent.com/48466625/61102264-2395ed80-a4a8-11e9-95da-887407b967a0.png)

In [5]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable

In [2]:
# 계속 똑같이 결과를 재현해주기 위해서 
torch.manual_seed(1)

<torch._C.Generator at 0x23725599090>

## Training Data

In [3]:
x_data = [[1, 2], [2, 3], [3, 1], [4, 3], [5, 3], [6, 2]]
y_data = [[0], [0], [0], [1], [1], [1]]

In [8]:
x_train = torch.FloatTensor(x_data)
y_train = torch.FloatTensor(y_data)

In [7]:
#np.ndarray 타입일때는
#dtype = torch.FloatTensor
#x_train = Variable(torch.from_numpy(x_data).type(dtype), requires_grad=False)로 가져오기

In [13]:
print(x_train.shape)
print(y_train.shape)

torch.Size([6, 2])
torch.Size([6, 1])


## Hypothesis

In [17]:
W = torch.zeros((2,1), requires_grad=True)
b = torch.zeros((1), requires_grad=True)

In [19]:
hypothesis = torch.sigmoid(x_train.matmul(W) + b)
print(hypothesis)

tensor([[0.5000],
        [0.5000],
        [0.5000],
        [0.5000],
        [0.5000],
        [0.5000]], grad_fn=<SigmoidBackward>)


## Cost function(Binary Cross entropy loss)

In [20]:
F.binary_cross_entropy(hypothesis, y_train)

tensor(0.6931, grad_fn=<BinaryCrossEntropyBackward>)

## Fullcode

In [43]:
x_data = [[1, 2], [2, 3], [3, 1], [4, 3], [5, 3], [6, 2]]
y_data = [[0], [0], [0], [1], [1], [1]]
x_train = torch.FloatTensor(x_data)
y_train = torch.FloatTensor(y_data)

print(x_train.shape)
print(y_train.shape)

torch.Size([6, 2])
torch.Size([6, 1])


In [23]:
# 모델 초기화
W = torch.zeros((2, 1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)
# optimizer 설정
optimizer = optim.SGD([W, b], lr=1)

nb_epochs = 1000
for epoch in range(nb_epochs + 1):

    # Cost 계산
    hypothesis = torch.sigmoid(x_train.matmul(W) + b) # or .mm or @
    cost = F.binary_cross_entropy(hypothesis, y_train)

    # cost로 H(x) 개선
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    # 100번마다 로그 출력
    if epoch % 100 == 0:
        print('Epoch {:4d}/{} Cost: {:.6f}'.format(
            epoch, nb_epochs, cost.item()
        ))

Epoch    0/1000 Cost: 0.693147
Epoch  100/1000 Cost: 0.134722
Epoch  200/1000 Cost: 0.080643
Epoch  300/1000 Cost: 0.057900
Epoch  400/1000 Cost: 0.045300
Epoch  500/1000 Cost: 0.037261
Epoch  600/1000 Cost: 0.031672
Epoch  700/1000 Cost: 0.027556
Epoch  800/1000 Cost: 0.024394
Epoch  900/1000 Cost: 0.021888
Epoch 1000/1000 Cost: 0.019852


## Accuracy

In [32]:
hypothesis = torch.sigmoid(x_train.matmul(W) + b) # x_train이 아니라 x_test를 넣어야함
print(hypothesis.shape)
print(hypothesis[:5,:])

# 확률을 구했다

torch.Size([6, 1])
tensor([[2.7648e-04],
        [3.1608e-02],
        [3.8977e-02],
        [9.5622e-01],
        [9.9823e-01]], grad_fn=<SliceBackward>)


In [33]:
prediction = hypothesis >= torch.FloatTensor([0.5]) # True값이 들어가는 ByteTensor

print(prediction[:5,:])

tensor([[0],
        [0],
        [0],
        [1],
        [1]], dtype=torch.uint8)


In [34]:
print(y_train[:5,:])

tensor([[0.],
        [0.],
        [0.],
        [1.],
        [1.]])


In [36]:
correct_prediction = prediction.float() == y_train # prediction은 bytetensor라서 .float()으로 바꾸어줌

print(correct_prediction[:5,:])

tensor([[1],
        [1],
        [1],
        [1],
        [1]], dtype=torch.uint8)


In [41]:
accuracy = correct_prediction.sum().item() / len(correct_prediction)
print('accuracy : {}%'.format(accuracy*100))

accuracy : 100.0%


## Higher with nn.Module

In [44]:
class BinaryClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(2,1) # W와 b가 다 들어있는 linear layer. Weight가 2*1, biase는 1
        self.sigmoid = nn.Sigmoid()
        
    def forward(self, x): # 전체 모델의 forward 함수를 의미한다.
        return self.sigmoid(self.linear(x))

In [45]:
model = BinaryClassifier()

In [46]:
# optimizer 설정
optimizer = optim.SGD(model.parameters(), lr=1)

nb_epochs = 100
for epoch in range(nb_epochs + 1):

    # H(x) 계산
    hypothesis = model(x_train)

    # cost 계산
    cost = F.binary_cross_entropy(hypothesis, y_train)

    # cost로 H(x) 개선
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()
    
    # 20번마다 로그 출력
    if epoch % 10 == 0:
        prediction = hypothesis >= torch.FloatTensor([0.5])
        correct_prediction = prediction.float() == y_train
        accuracy = correct_prediction.sum().item() / len(correct_prediction)
        print('Epoch {:4d}/{} Cost: {:.6f} Accuracy {:2.2f}%'.format(
            epoch, nb_epochs, cost.item(), accuracy * 100,
        ))

Epoch    0/100 Cost: 0.539713 Accuracy 83.33%
Epoch   10/100 Cost: 0.614853 Accuracy 66.67%
Epoch   20/100 Cost: 0.441875 Accuracy 66.67%
Epoch   30/100 Cost: 0.373145 Accuracy 83.33%
Epoch   40/100 Cost: 0.316358 Accuracy 83.33%
Epoch   50/100 Cost: 0.266094 Accuracy 83.33%
Epoch   60/100 Cost: 0.220498 Accuracy 100.00%
Epoch   70/100 Cost: 0.182095 Accuracy 100.00%
Epoch   80/100 Cost: 0.157299 Accuracy 100.00%
Epoch   90/100 Cost: 0.144091 Accuracy 100.00%
Epoch  100/100 Cost: 0.134272 Accuracy 100.00%


# Softmax Classification

- max값을 위에 Logistic Regression처럼 argmax로 그냥 뽑는게 아니라,
- 좀 더 soft하게 뽑아준다(위에 binary의 경우는 그냥 0.5보다 높으면 1 아니면 0으로 할 수 있었지만)

![image](https://user-images.githubusercontent.com/48466625/61103799-8a1e0a00-a4ae-11e9-9c45-a4d88a456a6b.png)

In [47]:
z = torch.FloatTensor([1,2,3])
hypothesis = F.softmax(z, dim=0)
print(hypothesis)

##############################################
##########P(보l가위) = 0.6652, 가위를 냈을때 보를 낼 확률이 0.6652퍼센트 일 것################

tensor([0.0900, 0.2447, 0.6652])


## Cross Entropy

![image](https://user-images.githubusercontent.com/48466625/61104049-c0a85480-a4af-11e9-8f3f-30b462a9a0b2.png)

- 두 개의 확률 분포 P와 Q가 있을때
- 확률분포 P에서 x를 샘플링 하고, 
- 그 x를 Q에 넣어서 log를 씌운 값의 평균

Cross Entropy를 구해서, 이것을 minimize 하도록 하면,Q_2 -> Q_1 -> P. 우리가 가지고 있는 모델의 확률 분포 함수는  점점 P에 근사하게 될 것

- P(x)는 정답 label / Q(x)는 prediction

### Low-level

In [98]:
z = torch.randn((3,5), requires_grad=True)
print(z)

hypothesis = F.softmax(z, dim=1)
print(hypothesis) #예측값

tensor([[ 0.1974, -2.4616, -0.2671,  1.5170, -1.9809],
        [ 0.5254,  0.3045,  0.8922,  2.1862, -0.4101],
        [ 0.5338, -1.5819, -0.0689, -1.1353, -2.4166]], requires_grad=True)
tensor([[0.1801, 0.0126, 0.1132, 0.6738, 0.0204],
        [0.1123, 0.0901, 0.1621, 0.5914, 0.0441],
        [0.5239, 0.0632, 0.2868, 0.0987, 0.0274]], grad_fn=<SoftmaxBackward>)


In [99]:
y = torch.randint(5, (3,)) # 5까지 범위에서, (3,) 짜리 1차원 벡터 랜덤 생성해서 정답이라고 해보자
y

tensor([3, 3, 1])

In [100]:
y_one_hot = torch.zeros_like(hypothesis) # hypothesis와 같은 3*5 영행렬을 일단 만들어서

y_one_hot.scatter_(1, y.unsqueeze(1),1)

#y.unsqueeze(1)은 (3,)짜리를 3x1로 만들어 준거고, 
#앞에 1은 dim=1에 대해서,
#맨뒤 1은 1을 뿌려라

tensor([[0., 0., 0., 1., 0.],
        [0., 0., 0., 1., 0.],
        [0., 1., 0., 0., 0.]])

In [101]:
cost = (y_one_hot * -torch.log(hypothesis)).sum(dim=1).mean()
print(cost)

tensor(1.2274, grad_fn=<MeanBackward0>)


### High level

순서 : softmax -> log -> log likelihood 

In [102]:
z = torch.randn((3,5), requires_grad=True)
print(z)

tensor([[-0.3919,  1.3774,  0.7206, -3.3305, -1.1509],
        [ 0.3037, -2.2122,  0.0484, -0.0916,  0.1386],
        [-0.6317,  0.3046, -0.3564,  0.7867,  0.3036]], requires_grad=True)


In [103]:
cost1 = (y_one_hot * -torch.log(F.softmax(z, dim=1))).sum(dim=1).mean
print(cost1())
cost2 = (y_one_hot * -F.log_softmax(z, dim=1)).sum(dim=1).mean
print(cost2())

tensor(2.8018, grad_fn=<MeanBackward0>)
tensor(2.8018, grad_fn=<MeanBackward0>)


In [104]:
# High level
# nll = negative log likelihood

# softmax 값을 출력할 때가 있으므로 이것도 많이 쓸 것 같다
cost3 = F.nll_loss(F.log_softmax(z, dim=1), y)
print(cost3)

tensor(2.8018, grad_fn=<NllLossBackward>)


In [105]:
# 아니면 F.cross_entropy (F.log_softmax와 F.nll_loss를 합친 것)

F.cross_entropy(z, y)

tensor(2.8018, grad_fn=<NllLossBackward>)

## Full code at lower level

In [84]:
x_train = [[1, 2, 1, 1],
           [2, 1, 3, 2],
           [3, 1, 3, 4],
           [4, 1, 5, 5],
           [1, 7, 5, 5],
           [1, 2, 5, 6],
           [1, 6, 6, 6],
           [1, 7, 7, 7]]
y_train = [2, 2, 2, 1, 1, 1, 0, 0] # class 갯수는 3개 
# weight크기는 4*3으로, 아웃풋이 8*3에서 argmax 취해서 나온 것들이 y_train 값들일 것이니까.
x_train = torch.FloatTensor(x_train)
y_train = torch.LongTensor(y_train)

print(x_train.shape)
print(y_train.shape)

torch.Size([8, 4])
torch.Size([8])


In [91]:
# 모델 초기화
W = torch.zeros((4, 3), requires_grad=True) 
b = torch.zeros(1, requires_grad=True)
# optimizer 설정
optimizer = optim.SGD([W, b], lr=0.1)

nb_epochs = 1000
for epoch in range(nb_epochs + 1):

    # Cost 계산 (1)
    hypothesis = F.softmax(x_train.matmul(W) + b, dim=1) # or .mm or @
    y_one_hot = torch.zeros_like(hypothesis)
    y_one_hot.scatter_(1, y_train.unsqueeze(1), 1)
    cost = (y_one_hot * -torch.log(hypothesis)).sum(dim=1).mean()

    # cost로 H(x) 개선
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    # 100번마다 로그 출력
    if epoch % 100 == 0:
        print('Epoch {:4d}/{} Cost: {:.6f}'.format(
            epoch, nb_epochs, cost.item()
        ))

Epoch    0/1000 Cost: 1.098612
Epoch  100/1000 Cost: 0.761050
Epoch  200/1000 Cost: 0.689991
Epoch  300/1000 Cost: 0.643229
Epoch  400/1000 Cost: 0.604117
Epoch  500/1000 Cost: 0.568255
Epoch  600/1000 Cost: 0.533922
Epoch  700/1000 Cost: 0.500291
Epoch  800/1000 Cost: 0.466908
Epoch  900/1000 Cost: 0.433507
Epoch 1000/1000 Cost: 0.399962


In [86]:
# 모델 초기화
W = torch.zeros((4, 3), requires_grad=True)
b = torch.zeros(1, requires_grad=True)
# optimizer 설정
optimizer = optim.SGD([W, b], lr=0.1)

nb_epochs = 1000
for epoch in range(nb_epochs + 1):

    # Cost 계산 (2)
    z = x_train.matmul(W) + b # or .mm or @
    cost = F.cross_entropy(z, y_train) # F.cross_entropy 사용. one-hot 인코딩 할 필요가 없어졌다.

    # cost로 H(x) 개선
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    # 100번마다 로그 출력
    if epoch % 100 == 0:
        print('Epoch {:4d}/{} Cost: {:.6f}'.format(
            epoch, nb_epochs, cost.item()
        ))

Epoch    0/1000 Cost: 1.098612
Epoch  100/1000 Cost: 0.761050
Epoch  200/1000 Cost: 0.689991
Epoch  300/1000 Cost: 0.643229
Epoch  400/1000 Cost: 0.604117
Epoch  500/1000 Cost: 0.568255
Epoch  600/1000 Cost: 0.533922
Epoch  700/1000 Cost: 0.500291
Epoch  800/1000 Cost: 0.466908
Epoch  900/1000 Cost: 0.433507
Epoch 1000/1000 Cost: 0.399962


## Full code with nn.Module 

In [87]:
class SoftmaxClassifierModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(4, 3) # Output이 3!

    def forward(self, x):
        return self.linear(x)

In [88]:
model = SoftmaxClassifierModel()

In [89]:
# optimizer 설정
optimizer = optim.SGD(model.parameters(), lr=0.1)

nb_epochs = 1000
for epoch in range(nb_epochs + 1):

    # H(x) 계산
    prediction = model(x_train)

    # cost 계산
    cost = F.cross_entropy(prediction, y_train)

    # cost로 H(x) 개선
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()
    
    # 20번마다 로그 출력
    if epoch % 100 == 0:
        print('Epoch {:4d}/{} Cost: {:.6f}'.format(
            epoch, nb_epochs, cost.item()
        ))

Epoch    0/1000 Cost: 1.333909
Epoch  100/1000 Cost: 0.657153
Epoch  200/1000 Cost: 0.576202
Epoch  300/1000 Cost: 0.522045
Epoch  400/1000 Cost: 0.477590
Epoch  500/1000 Cost: 0.438068
Epoch  600/1000 Cost: 0.401299
Epoch  700/1000 Cost: 0.365866
Epoch  800/1000 Cost: 0.330623
Epoch  900/1000 Cost: 0.294698
Epoch 1000/1000 Cost: 0.259046


--------------------

- 클래스가 2개일때는,
  - Binary Cross Entropy와 Sigmoid 사용
- 다중 클래스일때는,
  - Cross Entropy와 Softmax 사용

# Tips

## Maximum Likelihood Estimation

![image](https://user-images.githubusercontent.com/48466625/61106152-20562e00-a4b7-11e9-93bf-d8e26e2c819c.png)

## Overfitting

- 데이터를 많이 확보하거나,
- Feature 수를 줄이거나,
- __Regularization__

## Regularization

- Early stopping : Validation loss가 높아지기 시작할때 멈춤
- Dropout
- Batch Normalization
- Model 사이즈를 줄이기

## 예제

### LearningRate 문제?

In [106]:
x_train = torch.FloatTensor([[1, 2, 1],
                             [1, 3, 2],
                             [1, 3, 4],
                             [1, 5, 5],
                             [1, 7, 5],
                             [1, 2, 5],
                             [1, 6, 6],
                             [1, 7, 7]
                            ])
y_train = torch.LongTensor([2, 2, 2, 1, 1, 1, 0, 0])

In [107]:
x_test = torch.FloatTensor([[2, 1, 1], [3, 1, 2], [3, 3, 4]])
y_test = torch.LongTensor([2, 2, 2])

In [108]:
class SoftmaxClassifierModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(3, 3)
    def forward(self, x):
        return self.linear(x)

In [109]:
model = SoftmaxClassifierModel()

In [110]:
# optimizer 설정
optimizer = optim.SGD(model.parameters(), lr=0.1)

In [111]:
def train(model, optimizer, x_train, y_train):
    nb_epochs = 20
    for epoch in range(nb_epochs):

        # H(x) 계산
        prediction = model(x_train)

        # cost 계산
        cost = F.cross_entropy(prediction, y_train)

        # cost로 H(x) 개선
        optimizer.zero_grad()
        cost.backward()
        optimizer.step()

        print('Epoch {:4d}/{} Cost: {:.6f}'.format(
            epoch, nb_epochs, cost.item()
        ))

In [112]:
def test(model, optimizer, x_test, y_test):
    prediction = model(x_test)
    predicted_classes = prediction.max(1)[1]
    correct_count = (predicted_classes == y_test).sum().item()
    cost = F.cross_entropy(prediction, y_test)

    print('Accuracy: {}% Cost: {:.6f}'.format(
         correct_count / len(y_test) * 100, cost.item()
    ))

In [113]:
train(model, optimizer, x_train, y_train)

Epoch    0/20 Cost: 3.525460
Epoch    1/20 Cost: 2.114969
Epoch    2/20 Cost: 1.237921
Epoch    3/20 Cost: 1.042164
Epoch    4/20 Cost: 1.000805
Epoch    5/20 Cost: 0.985407
Epoch    6/20 Cost: 0.972492
Epoch    7/20 Cost: 0.962170
Epoch    8/20 Cost: 0.952626
Epoch    9/20 Cost: 0.944189
Epoch   10/20 Cost: 0.936234
Epoch   11/20 Cost: 0.928885
Epoch   12/20 Cost: 0.921889
Epoch   13/20 Cost: 0.915285
Epoch   14/20 Cost: 0.908959
Epoch   15/20 Cost: 0.902918
Epoch   16/20 Cost: 0.897105
Epoch   17/20 Cost: 0.891517
Epoch   18/20 Cost: 0.886123
Epoch   19/20 Cost: 0.880915


In [114]:
test(model, optimizer, x_test, y_test)

Accuracy: 100.0% Cost: 0.270031


Learning rate가 너무 크면 diverge하면서 cost가 점점 늘어난다(overshoot)

In [115]:
model = SoftmaxClassifierModel()

In [116]:
optimizer = optim.SGD(model.parameters(), lr=1e5)

In [117]:
train(model, optimizer, x_train, y_train)

Epoch    0/20 Cost: 1.306830
Epoch    1/20 Cost: 784394.125000
Epoch    2/20 Cost: 983080.000000
Epoch    3/20 Cost: 1535176.250000
Epoch    4/20 Cost: 1451581.750000
Epoch    5/20 Cost: 1269042.375000
Epoch    6/20 Cost: 1015644.125000
Epoch    7/20 Cost: 1052363.750000
Epoch    8/20 Cost: 1390917.375000
Epoch    9/20 Cost: 1704706.750000
Epoch   10/20 Cost: 576854.937500
Epoch   11/20 Cost: 1277099.250000
Epoch   12/20 Cost: 1285176.125000
Epoch   13/20 Cost: 676794.562500
Epoch   14/20 Cost: 1729706.750000
Epoch   15/20 Cost: 339402.562500
Epoch   16/20 Cost: 1242724.250000
Epoch   17/20 Cost: 991426.125000
Epoch   18/20 Cost: 800292.437500
Epoch   19/20 Cost: 2079706.750000


Learning rate가 너무 작으면 cost가 거의 줄지 않는다.

In [118]:
model = SoftmaxClassifierModel()

In [119]:
optimizer = optim.SGD(model.parameters(), lr=1e-10)

In [120]:
train(model, optimizer, x_train, y_train)

Epoch    0/20 Cost: 2.135019
Epoch    1/20 Cost: 2.135019
Epoch    2/20 Cost: 2.135019
Epoch    3/20 Cost: 2.135019
Epoch    4/20 Cost: 2.135019
Epoch    5/20 Cost: 2.135019
Epoch    6/20 Cost: 2.135019
Epoch    7/20 Cost: 2.135019
Epoch    8/20 Cost: 2.135019
Epoch    9/20 Cost: 2.135019
Epoch   10/20 Cost: 2.135019
Epoch   11/20 Cost: 2.135019
Epoch   12/20 Cost: 2.135019
Epoch   13/20 Cost: 2.135019
Epoch   14/20 Cost: 2.135019
Epoch   15/20 Cost: 2.135019
Epoch   16/20 Cost: 2.135019
Epoch   17/20 Cost: 2.135019
Epoch   18/20 Cost: 2.135019
Epoch   19/20 Cost: 2.135019


### 데이터 전처리(normalization)

- normalization(standardization)을 해주면서 학습이 더 수월할 수 있음
- 정규 분포를 따르는 training set을 만들어줌

In [122]:
x_train = torch.FloatTensor([[73, 80, 75],
                             [93, 88, 93],
                             [89, 91, 90],
                             [96, 98, 100],
                             [73, 66, 70]])
y_train = torch.FloatTensor([[152], [185], [180], [196], [142]])

In [123]:
mu = x_train.mean(dim=0)
sigma = x_train.std(dim=0)

In [124]:
norm_x_train = (x_train - mu) / sigma
print(norm_x_train)

tensor([[-1.0674, -0.3758, -0.8398],
        [ 0.7418,  0.2778,  0.5863],
        [ 0.3799,  0.5229,  0.3486],
        [ 1.0132,  1.0948,  1.1409],
        [-1.0674, -1.5197, -1.2360]])


In [125]:
class MultivariateLinearRegressionModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(3, 1)

    def forward(self, x):
        return self.linear(x)

In [126]:
model = MultivariateLinearRegressionModel()

In [127]:
optimizer = optim.SGD(model.parameters(), lr=1e-1)

In [128]:
def train(model, optimizer, x_train, y_train):
    nb_epochs = 20
    for epoch in range(nb_epochs):

        # H(x) 계산
        prediction = model(x_train)

        # cost 계산
        cost = F.mse_loss(prediction, y_train)

        # cost로 H(x) 개선
        optimizer.zero_grad()
        cost.backward()
        optimizer.step()

        print('Epoch {:4d}/{} Cost: {:.6f}'.format(
            epoch, nb_epochs, cost.item()
        ))

In [129]:
train(model, optimizer, norm_x_train, y_train)

Epoch    0/20 Cost: 29669.376953
Epoch    1/20 Cost: 18853.203125
Epoch    2/20 Cost: 12026.902344
Epoch    3/20 Cost: 7686.023438
Epoch    4/20 Cost: 4915.989746
Epoch    5/20 Cost: 3145.527832
Epoch    6/20 Cost: 2013.113892
Epoch    7/20 Cost: 1288.561035
Epoch    8/20 Cost: 824.897644
Epoch    9/20 Cost: 528.162292
Epoch   10/20 Cost: 338.248932
Epoch   11/20 Cost: 216.698837
Epoch   12/20 Cost: 138.900085
Epoch   13/20 Cost: 89.102600
Epoch   14/20 Cost: 57.225838
Epoch   15/20 Cost: 36.818623
Epoch   16/20 Cost: 23.752193
Epoch   17/20 Cost: 15.383974
Epoch   18/20 Cost: 10.022925
Epoch   19/20 Cost: 6.586579


### Overfitting -> Regularization

In [130]:
def train_with_regularization(model, optimizer, x_train, y_train):
    nb_epochs = 20
    for epoch in range(nb_epochs):

        # H(x) 계산
        prediction = model(x_train)

        # cost 계산
        cost = F.mse_loss(prediction, y_train)
        
        # l2 norm 계산
        l2_reg = 0
        for param in model.parameters():
            l2_reg += torch.norm(param)
            
        cost += l2_reg

        # cost로 H(x) 개선
        optimizer.zero_grad()
        cost.backward()
        optimizer.step()

        print('Epoch {:4d}/{} Cost: {:.6f}'.format(
            epoch+1, nb_epochs, cost.item()
        ))

In [131]:
model = MultivariateLinearRegressionModel()

optimizer = optim.SGD(model.parameters(), lr=1e-1)

train_with_regularization(model, optimizer, norm_x_train, y_train)

Epoch    1/20 Cost: 29593.251953
Epoch    2/20 Cost: 18853.375000
Epoch    3/20 Cost: 12092.292969
Epoch    4/20 Cost: 7793.868652
Epoch    5/20 Cost: 5051.215332
Epoch    6/20 Cost: 3298.339355
Epoch    7/20 Cost: 2177.197998
Epoch    8/20 Cost: 1459.866089
Epoch    9/20 Cost: 1000.827148
Epoch   10/20 Cost: 707.052612
Epoch   11/20 Cost: 519.036194
Epoch   12/20 Cost: 398.701477
Epoch   13/20 Cost: 321.682037
Epoch   14/20 Cost: 272.384674
Epoch   15/20 Cost: 240.829529
Epoch   16/20 Cost: 220.629486
Epoch   17/20 Cost: 207.697067
Epoch   18/20 Cost: 199.416183
Epoch   19/20 Cost: 194.112396
Epoch   20/20 Cost: 190.714249
