<a href="https://colab.research.google.com/github/eunhwa99/NLP/blob/main/linear_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Theoretical Overview
$$ H(x) = Wx + b $$
$$ cost(W, b) = \frac{1}{m} \sum^m_{i=1} \left( H(x^{(i)}) - y^{(i)} \right)^2 $$
$H(x)$: 주어진 $x$ 값에 대해 예측을 어떻게 할 것인가
$cost(W, b)$: $H(x)$ 가 $y$ 를 얼마나 잘 예측했는가

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

In [2]:
# For reproducibility
torch.manual_seed(1)

<torch._C.Generator at 0x7f559d9e2b58>

**Data**

We will use fake data for this example.
- 데이터는 torch.tensor!
- 입력 따로, 출력 따로

In [3]:
x_train = torch.FloatTensor([[1], [2], [3]])
y_train = torch.FloatTensor([[1], [2], [3]])

In [4]:
print(x_train)
print(x_train.shape)

tensor([[1.],
        [2.],
        [3.]])
torch.Size([3, 1])


In [5]:
print(y_train)
print(y_train.shape)

tensor([[1.],
        [2.],
        [3.]])
torch.Size([3, 1])


**Weight Initialization**

- Weight와 Bias 0으로 초기화
- requires_grad=True: 학습할 것이라고 명시(W,b를 학습시킬 것이다.)

In [6]:
W = torch.zeros(1, requires_grad=True)
print(W)
b = torch.zeros(1, requires_grad=True)
print(b)

tensor([0.], requires_grad=True)
tensor([0.], requires_grad=True)


**Hypothesis**
$$ H(x) = Wx + b $$

In [7]:
hypothesis = x_train * W + b
print(hypothesis)

tensor([[0.],
        [0.],
        [0.]], grad_fn=<AddBackward0>)


**Cost**
$$ cost(W, b) = \frac{1}{m} \sum^m_{i=1} \left( H(x^{(i)}) - y^{(i)} \right)^2 $$

In [8]:
print(hypothesis - y_train)
print((hypothesis - y_train) ** 2)

tensor([[-1.],
        [-2.],
        [-3.]], grad_fn=<SubBackward0>)
tensor([[1.],
        [4.],
        [9.]], grad_fn=<PowBackward0>)


In [9]:
cost = torch.mean((hypothesis - y_train) ** 2) #torch.mean: 평균으로 계산
print(cost)

tensor(4.6667, grad_fn=<MeanBackward0>)


**Gradient Descent**

1. torch.optim 라이브러리 사용
- [W,b]는 학습할 tensor들
- lr=0.01은 learning rate

2. 항상 붙어다니는 3줄
- optimizer.zero_grad()로 gradient를 0으로 초기화
- cost.backward()로 gradient 계산
- optimizer.step()으로 개선

In [10]:
optimizer = optim.SGD([W, b], lr=0.01)

In [11]:
optimizer.zero_grad()
cost.backward()
optimizer.step()

In [12]:
print(W)
print(b)

tensor([0.0933], requires_grad=True)
tensor([0.0400], requires_grad=True)


**Training with Full Code**

In reality, we will be training on the dataset for multiple epochs. This can be done simply with loops.

In [13]:
# 데이터
x_train = torch.FloatTensor([[1], [2], [3]])
y_train = torch.FloatTensor([[1], [2], [3]])
# 모델 초기화
W = torch.zeros(1, requires_grad=True)
b = torch.zeros(1, requires_grad=True)
# optimizer 설정
optimizer = optim.SGD([W, b], lr=0.01)

nb_epochs = 1000
for epoch in range(nb_epochs + 1):
    
    # H(x) 계산
    hypothesis = x_train * W + b
    
    # cost 계산
    cost = torch.mean((hypothesis - y_train) ** 2)
    
    # cost로 H(x) 개선
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()
    
    #gradient 구현
    #gradient=torch.sum((W8x_train-y_train)*x_train)
    #W-=lr*gradient
    #b-=lr*gradient

    # 100번마다 로그 출력
    if epoch % 100 == 0:
        print('Epoch {:4d}/{} W: {:.3f}, b: {:.3f} Cost: {:.6f}'.format(
            epoch, nb_epochs, W.item(), b.item(), cost.item()
        ))

Epoch    0/1000 W: 0.093, b: 0.040 Cost: 4.666667
Epoch  100/1000 W: 0.873, b: 0.289 Cost: 0.012043
Epoch  200/1000 W: 0.900, b: 0.227 Cost: 0.007442
Epoch  300/1000 W: 0.921, b: 0.179 Cost: 0.004598
Epoch  400/1000 W: 0.938, b: 0.140 Cost: 0.002842
Epoch  500/1000 W: 0.951, b: 0.110 Cost: 0.001756
Epoch  600/1000 W: 0.962, b: 0.087 Cost: 0.001085
Epoch  700/1000 W: 0.970, b: 0.068 Cost: 0.000670
Epoch  800/1000 W: 0.976, b: 0.054 Cost: 0.000414
Epoch  900/1000 W: 0.981, b: 0.042 Cost: 0.000256
Epoch 1000/1000 W: 0.985, b: 0.033 Cost: 0.000158


**Multivariate Linear Regression**
- 지금까지는 한 종류(행)의 입력이 주어지면 출력을 예상하는 Simple Linear Regrssion
- 여러 종류(행)의 입력이 주어지면 어떻게 연산해야 할까

1. 데이터 정의
2. 모델 정의
3. optimizer 정의
4. Hypothesis 계산
5. Cost 계산(MSE)
6. Gradient descent

In [14]:
x_train = torch.FloatTensor([[73, 80, 75],
                             [93, 88, 93],
                             [89, 91, 90],
                             [96, 98, 100],
                             [73, 66, 70]])
y_train = torch.FloatTensor([[152], [185], [180], [196], [142]])

In [15]:
print(x_train.shape)
print(y_train.shape)

torch.Size([5, 3])
torch.Size([5, 1])


In [16]:
# 모델 초기화
W = torch.zeros((3, 1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)
# optimizer 설정
optimizer = optim.SGD([W, b], lr=1e-5)

nb_epochs = 20
for epoch in range(nb_epochs + 1):
    
    # H(x) 계산
    hypothesis = x_train.matmul(W) + b # 행렬로 계산==> 더 간결, x의 길이가 바뀌어도 코드 바꿀 필요X, 속도 향상

    # cost 계산
    cost = torch.mean((hypothesis - y_train) ** 2)

    # cost로 H(x) 개선
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    # 100번마다 로그 출력
    print('Epoch {:4d}/{} hypothesis: {} Cost: {:.6f}'.format(
        epoch, nb_epochs, hypothesis.squeeze().detach(), cost.item()
    ))

Epoch    0/20 hypothesis: tensor([0., 0., 0., 0., 0.]) Cost: 29661.800781
Epoch    1/20 hypothesis: tensor([67.2578, 80.8397, 79.6523, 86.7394, 61.6605]) Cost: 9298.520508
Epoch    2/20 hypothesis: tensor([104.9128, 126.0990, 124.2466, 135.3015,  96.1821]) Cost: 2915.712402
Epoch    3/20 hypothesis: tensor([125.9942, 151.4381, 149.2133, 162.4896, 115.5097]) Cost: 915.040527
Epoch    4/20 hypothesis: tensor([137.7967, 165.6247, 163.1911, 177.7112, 126.3307]) Cost: 287.936096
Epoch    5/20 hypothesis: tensor([144.4044, 173.5674, 171.0168, 186.2332, 132.3891]) Cost: 91.371063
Epoch    6/20 hypothesis: tensor([148.1035, 178.0143, 175.3980, 191.0042, 135.7812]) Cost: 29.758249
Epoch    7/20 hypothesis: tensor([150.1744, 180.5042, 177.8509, 193.6753, 137.6805]) Cost: 10.445267
Epoch    8/20 hypothesis: tensor([151.3336, 181.8983, 179.2240, 195.1707, 138.7440]) Cost: 4.391237
Epoch    9/20 hypothesis: tensor([151.9824, 182.6789, 179.9928, 196.0079, 139.3396]) Cost: 2.493121
Epoch   10/20 hypo

**Matrix Data Representation**
$$
\begin{pmatrix}
x_1 & x_2 & x_3
\end{pmatrix}
\cdot
\begin{pmatrix}
w_1 \\
w_2 \\
w_3 \\
\end{pmatrix}
=
\begin{pmatrix}
x_1w_1 + x_2w_2 + x_3w_3
\end{pmatrix}
$$
$$ H(X) = XW $$


---



기본적으로 PyTorch의 모든 모델은 제공되는 nn.Module을 inherit 해서 만들게 됩니다. 

- nn.Module을 상속해서 모델 생성
- nn.Linear(3,1)

1.   입력 차원:3
2.   출력 차원:1
- Hypothesis(predict) 계산은 forward() 에서!
- Gradient 계산은 PyTorch가 알아서 해준다


In [17]:
class MultivariateLinearRegressionModel(nn.Module):
  def __init__(self):
    super().__init__()
    self.linear=nn.Linear(3,1)
  
  def forward(self,x):
    return self.linear(x)

**F.mse_loss**
- torch.nn.functional 에서 제공하는 loss function 사용
- 쉽게 다른 loss와 교체 가능

In [18]:
# 데이터
x_train = torch.FloatTensor([[73, 80, 75],
                             [93, 88, 93],
                             [89, 91, 90],
                             [96, 98, 100],
                             [73, 66, 70]])
y_train = torch.FloatTensor([[152], [185], [180], [196], [142]])
# 모델 초기화
model = MultivariateLinearRegressionModel()
# optimizer 설정
optimizer = optim.SGD(model.parameters(), lr=1e-5)

nb_epochs = 20
for epoch in range(nb_epochs+1):
    
    # H(x) 계산
    prediction = model(x_train)
    
    # cost 계산
    cost = F.mse_loss(prediction, y_train)
    
    # cost로 H(x) 개선
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()
    
    # 20번마다 로그 출력
    print('Epoch {:4d}/{} Cost: {:.6f}'.format(
        epoch, nb_epochs, cost.item()
    ))

Epoch    0/20 Cost: 31667.597656
Epoch    1/20 Cost: 9926.265625
Epoch    2/20 Cost: 3111.513672
Epoch    3/20 Cost: 975.451477
Epoch    4/20 Cost: 305.908630
Epoch    5/20 Cost: 96.042679
Epoch    6/20 Cost: 30.260782
Epoch    7/20 Cost: 9.641681
Epoch    8/20 Cost: 3.178685
Epoch    9/20 Cost: 1.152871
Epoch   10/20 Cost: 0.517862
Epoch   11/20 Cost: 0.318802
Epoch   12/20 Cost: 0.256388
Epoch   13/20 Cost: 0.236810
Epoch   14/20 Cost: 0.230660
Epoch   15/20 Cost: 0.228719
Epoch   16/20 Cost: 0.228095
Epoch   17/20 Cost: 0.227880
Epoch   18/20 Cost: 0.227799
Epoch   19/20 Cost: 0.227759
Epoch   20/20 Cost: 0.227732


**Minibatch Gradient Descent**
- 엄청난 양의 데이터를 한 번에 학습시킬 수 없다.
- 일부분의 데이터로만 학습하자
- 전체 데이터를 균일하게 나눠서 학습하는 것
- 업데이트를 좀 더 빠르게 할 수 있다.
- 전체 데이터를 쓰지 않아서 잘못된 방향으로 업데이트를 할 수도 있다.

**PyTorch Dataset**
- torch.utils.data.Dataset 상속
- __len__(): 이 데이터셋의 총 데이터 수
- __getitem__(): 어떠한 인덱스 idx 를 받았을 때, 그에 상응하는 입출력 데이터 반환

In [19]:
from torch.utils.data import Dataset

class CustomDataset(Dataset):
  def __init__(self):
    self.x_data=[[73, 80, 75],
                  [93, 88, 93],
                  [89, 91, 90],
                  [96, 98, 100],
                  [73, 66, 70]]
    self.y_data=[[152],[185],[180],[196],[142]]
  
  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    x=torch.FloatTensor(self.x_data[idx])
    y=torch.FloatTensor(self.y_data[idx])

    return x,y

dataset=CustomDataset()

**PyTorch DataLoader**
- torch.utils.data.DataLoader 사용
- batch_size=2: 각 minibatch의 크기, 통상적으로 2의 제곱수로 설정

*   shuffle=True: Epoch마다 데이터셋을 섞어서, 데이터가 학습되는 순서를 바꾼다.




In [20]:
from torch.utils.data import DataLoader

dataloader=DataLoader(
    dataset,
    batch_size=2,
    shuffle=True,
)

**Full code with Dataset and DataLoader**
- enumerate(dataloader): minibatch 인덱스와 데이터를 받음
- len(dataloader): 한 epoch당 minibatch 개수

In [21]:
nb_epochs=20
for epch in range(nb_epochs+1):
  for batch_idx, samples in enumerate(dataloader):
    x_train,y_train=samples
    #H(x) 계산
    prediction=model(x_train)

    #cost 계산
    cost=F.mse_loss(prediction, y_train)

    #cost로 H(x) 계산
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    print('Epoch {:4d}/{} Batch {}/{} Cost: {:6f}'.format(
        epoch, nb_epochs, batch_idx+1, len(dataloader),
        cost.item()
    ))

Epoch   20/20 Batch 1/3 Cost: 0.010698
Epoch   20/20 Batch 2/3 Cost: 0.511435
Epoch   20/20 Batch 3/3 Cost: 0.142174
Epoch   20/20 Batch 1/3 Cost: 0.562570
Epoch   20/20 Batch 2/3 Cost: 0.045300
Epoch   20/20 Batch 3/3 Cost: 0.047180
Epoch   20/20 Batch 1/3 Cost: 0.052634
Epoch   20/20 Batch 2/3 Cost: 0.528745
Epoch   20/20 Batch 3/3 Cost: 0.014046
Epoch   20/20 Batch 1/3 Cost: 0.116527
Epoch   20/20 Batch 2/3 Cost: 0.043575
Epoch   20/20 Batch 3/3 Cost: 0.976579
Epoch   20/20 Batch 1/3 Cost: 0.507152
Epoch   20/20 Batch 2/3 Cost: 0.044356
Epoch   20/20 Batch 3/3 Cost: 0.217999
Epoch   20/20 Batch 1/3 Cost: 0.491854
Epoch   20/20 Batch 2/3 Cost: 0.226925
Epoch   20/20 Batch 3/3 Cost: 0.066735
Epoch   20/20 Batch 1/3 Cost: 0.048714
Epoch   20/20 Batch 2/3 Cost: 0.563030
Epoch   20/20 Batch 3/3 Cost: 0.006366
Epoch   20/20 Batch 1/3 Cost: 0.129833
Epoch   20/20 Batch 2/3 Cost: 0.578195
Epoch   20/20 Batch 3/3 Cost: 0.009279
Epoch   20/20 Batch 1/3 Cost: 0.416764
Epoch   20/20 Batch 2/3 C