# Chapter 04. Multivariate Linear Regression
calculate one output prediction value from multiple informations.
- Simple Linear Regression Review
- Multivariate Linear Regression Theory
- Native Data Repressentation
- Matrix Data Representation
- Multivariate Linear Regression Code
- about nn.Module
- about F.mse_loss

---

## I. Simple Linear Regression Review
one income, one output
- but in most case, we need more information -> multivariate could solve this problem.

---

## II. Multivariate Linear Regression
more than one income, one output

> example data

|Quiz 1 (x1)|Quiz 2 (x2)|Quiz 3 (x3)|Final (y)|
|---|---|---|---|
|73|80|75|152|
|93|88|93|185|
|89|91|80|180|
|96|98|100|196|
73|66|70|142|

In [1]:
import torch

In [4]:
x_train = torch.FloatTensor([[73, 80, 75],
                             [93, 88, 93],
                             [89, 91, 90],
                             [96, 98, 100],
                             [73, 66, 70]])
y_train = torch.FloatTensor([[152], [185], [180], [196], [142]])

### Hypothesis Function
structure of neural network

$$ H(x) = x_1w_1 + x_2w_2 + x_3w_3 + b $$

if the model has *three inputs*, **weight** would be *three* too.

---

## III. Native Data Repressentation

### Hypothesis Function: Naive
- the simple hypothesis definition

In [None]:
# calculate H(x)
hypothesis = x1_train * w1 + x2_train * w2 + x3_train * w3 + b

- but, how can we code this if x is the vector that length is 1000?

we can calculate 3, but when the data is more much, we can't do that with this. (impossible because it would be toooo long)
> the answer is matmul()

---

## IV. Matrix Data Representation

### Hypothesis Function: Matrix
- we can calculate Hypothesis Function *at once* by using *matmal()*.
    - more simple
    - don't need to change code even if the length of x changes.
    - faster

$$ 𝐻(𝑥)=𝑊𝑥 + b $$

In [None]:
# calculate H(x)
hypothesis = x_train.matmul(W) + b # or .mm or @

---

## V. Multivariate Linear Regression Code

### Cost Function: MSE
same code with original Simple Linear Regression code

$$ cost(W, b) = \frac{1}{m} \sum^m_{i=1} \left( H(x^{(i)}) - y^{(i)} \right)^2 $$

> $\sum\$: mean

> $x^{(i)}$: Prediction

> $y^{(i)}$: Target

In [None]:
cost = torch.mean((hypothesis - y_train) ** 2)

### Gradient Descent with ```torch.optim```
$$ \nabla W = \frac{\partial cost}{\partial W} = \frac{2}{m} \sum^m_{i=1} \left( Wx^{(i)} - y^{(i)} \right)x^{(i)} $$
$$ W := W - \alpha \nabla W $$

In [None]:
# set optimizer
optimizer = optim.SGD([W, b], lr=1e-5)

# how to use optimizer
optimizer.zero_grad()
cost.backward()
optimizer.step()

### Full Code with ```torch.optim(1)```

> 1. set data
    * only the definition of W is different.
> 2. set model
> 3. set optimizer
> 4. calculate Hypothesis
> 5. calculate Cost (with MSE)
> 6. Gradient Descent

In [4]:
# data
x_train = torch.FloatTensor([[73, 80, 75],
                             [93, 88, 93],
                             [89, 91, 90],
                             [96, 98, 100],
                             [73, 66, 70]])
y_train = torch.FloatTensor([[152], [185], [180], [196], [142]])

# initialize model
W = torch.zeros((3, 1), requires_grad = True)
b = torch.zeros(1, requires_grad = True)

# set optimizer
optimizer = torch.optim.SGD([W, b], lr=1e-5)


nb_epochs = 20
for epoch in range(nb_epochs + 1):
    
    # calculate H(x)
    hypothesis = x_train.matmul(W) + b # or .mm or @
    
    # calculate cost
    cost = torch.mean((hypothesis - y_train) ** 2)
    
    # improve H(x) by cost
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()
    
    print('Epoch {:4d}/{} hypothesis: {} Cost: {:.6f}'.format(epoch, nb_epochs, hypothesis.squeeze().detach(), cost.item()))

Epoch    0/20 hypothesis: tensor([0., 0., 0., 0., 0.]) Cost: 29661.800781
Epoch    1/20 hypothesis: tensor([67.2578, 80.8397, 79.6523, 86.7394, 61.6605]) Cost: 9298.520508
Epoch    2/20 hypothesis: tensor([104.9128, 126.0990, 124.2466, 135.3015,  96.1821]) Cost: 2915.712402
Epoch    3/20 hypothesis: tensor([125.9942, 151.4381, 149.2133, 162.4896, 115.5097]) Cost: 915.040527
Epoch    4/20 hypothesis: tensor([137.7968, 165.6247, 163.1911, 177.7112, 126.3307]) Cost: 287.936005
Epoch    5/20 hypothesis: tensor([144.4044, 173.5674, 171.0168, 186.2332, 132.3891]) Cost: 91.371010
Epoch    6/20 hypothesis: tensor([148.1035, 178.0144, 175.3980, 191.0042, 135.7812]) Cost: 29.758139
Epoch    7/20 hypothesis: tensor([150.1744, 180.5042, 177.8508, 193.6753, 137.6805]) Cost: 10.445305
Epoch    8/20 hypothesis: tensor([151.3336, 181.8983, 179.2240, 195.1707, 138.7440]) Cost: 4.391228
Epoch    9/20 hypothesis: tensor([151.9824, 182.6789, 179.9928, 196.0079, 139.3396]) Cost: 2.493135
Epoch   10/20 hypo

result overview
- reducing cost -> going closer to 0.
- H(x) going closer to y.
- It can divergence depending on learning rate.

---

## VI. about nn.Module

---

## VII. about F.mse_loss