# **4-1. Multivariate Linear Regression**

**Jonathan Choi 2021**

**[Deep Learning By Torch] End to End study scripts of Deep Learning by implementing code practice with Pytorch.**

If you have an any issue, please PR below.

[[Deep Learning By Torch] - Github @JonyChoi](https://github.com/jonychoi/Deep-Learning-By-Torch)

## Theoretical Overview

$ H(x_1, x_2, x_3) = x_1w_1 + x_2w_2 + x_3w_3 + b $

$ cost(W, b) = \frac{1}{m} \sum^m_{i=1} \left( H(x^{(i)}) - y^{(i)} \right)^2 $

- $H(x)$: How to Predict with given $x$.
- $cost(W, b)$: How well $H(x)$ predict $y$.

## Imports

In [13]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

In [14]:
torch.manual_seed(1)

<torch._C.Generator at 0x204e34d0910>

## Naive Data Representation

In [15]:
#Data

x1_train = torch.FloatTensor([73, 93, 89, 96, 73])
x2_train = torch.FloatTensor([80, 88, 91, 98, 66])
x3_train = torch.FloatTensor([75, 93, 90, 100, 70])
y_train = torch.FloatTensor([152, 185, 180, 196, 142])

''' At the original Script, but, top stil working
x1_train = torch.FloatTensor([[73], [93], [89], [96], [73]])
x2_train = torch.FloatTensor([[80], [88], [91], [98], [66]])
x3_train = torch.FloatTensor([[75], [93], [90], [100], [70]])
y_train = torch.FloatTensor([[152], [185], [180], [196], [142]])
'''

' At the original Script, but, top stil working\nx1_train = torch.FloatTensor([[73], [93], [89], [96], [73]])\nx2_train = torch.FloatTensor([[80], [88], [91], [98], [66]])\nx3_train = torch.FloatTensor([[75], [93], [90], [100], [70]])\ny_train = torch.FloatTensor([[152], [185], [180], [196], [142]])\n'

In [16]:
#Model Initialize
w1 = torch.zeros(1, requires_grad = True)
w2 = torch.zeros(1, requires_grad = True)
w3 = torch.zeros(1, requires_grad = True)
b = torch.zeros(1, requires_grad = True)

#Set Optimizer
optimizer = optim.SGD([w1, w2, w3, b], lr=1e-5)

nb_epochs = 1000

for epoch in range(nb_epochs + 1):

    #Hypothesis
    hypothesis = x1_train * w1 + x2_train * w2 + x3_train * w3 + b

    #Cost Function
    cost = torch.mean((hypothesis - y_train)**2)

    #Gradients
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    if epoch % 100 == 0:
        print('Epoch {:4d}/{} w1: {:.3f} w2: {:.3f} w3: {:.3f} b: {:.3f} Cost: {:.6f}'.format(epoch, nb_epochs, w1.item(), w2.item(), w3.item(), b.item(), cost.item()))

Epoch    0/1000 w1: 0.294 w2: 0.294 w3: 0.297 b: 0.003 Cost: 29661.800781
Epoch  100/1000 w1: 0.674 w2: 0.661 w3: 0.676 b: 0.008 Cost: 1.563634
Epoch  200/1000 w1: 0.679 w2: 0.655 w3: 0.677 b: 0.008 Cost: 1.497607
Epoch  300/1000 w1: 0.684 w2: 0.649 w3: 0.677 b: 0.008 Cost: 1.435026
Epoch  400/1000 w1: 0.689 w2: 0.643 w3: 0.678 b: 0.008 Cost: 1.375730
Epoch  500/1000 w1: 0.694 w2: 0.638 w3: 0.678 b: 0.009 Cost: 1.319511
Epoch  600/1000 w1: 0.699 w2: 0.633 w3: 0.679 b: 0.009 Cost: 1.266222
Epoch  700/1000 w1: 0.704 w2: 0.627 w3: 0.679 b: 0.009 Cost: 1.215696
Epoch  800/1000 w1: 0.709 w2: 0.622 w3: 0.679 b: 0.009 Cost: 1.167818
Epoch  900/1000 w1: 0.713 w2: 0.617 w3: 0.680 b: 0.009 Cost: 1.122429
Epoch 1000/1000 w1: 0.718 w2: 0.613 w3: 0.680 b: 0.009 Cost: 1.079378


## Matrix Data Representation

$ \begin{pmatrix} x_1 & x_2 & x_3 \end{pmatrix} \cdot \begin{pmatrix} w_1 \\ w_2 \\ w_3 \\ \end{pmatrix} = \begin{pmatrix} x_1w_1 + x_2w_2 + x_3w_3 \end{pmatrix} $

$ H(X) = XW $

In [17]:
x_train = torch.FloatTensor([[73, 80, 75],
                             [93, 88, 93],
                             [89, 91, 90],
                             [96, 98, 100],
                             [73, 66, 70]])
y_train = torch.FloatTensor([[152], [185], [180], [196], [142]])

In [18]:
print(x_train.shape)
print(y_train.shape)

torch.Size([5, 3])
torch.Size([5, 1])


### Take a Moment!

```tensor.detach()```

tensor.detach() creates a tensor that shares storage with tensor that does not require grad. It detaches the output from the computational graph. So no gradient will be backpropagated along this variable.

**StackOverflow**

https://stackoverflow.com/questions/56816241/difference-between-detach-and-with-torch-nograd-in-pytorch

In [19]:
#Model Initialize
W = torch.zeros((3, 1), requires_grad = True)
b = torch.zeros(1, requires_grad = True)

#Set Optimizer

optimizer = optim.SGD([W, b], lr=1e-5)

nb_epochs = 20

for epoch in range(nb_epochs + 1):

    #hypothesis
    pred = x_train.matmul(W) + b

    #cost
    cost = torch.mean((pred - y_train)**2)

    print('Epoch {:4d}/{} Hypothesis: {} Cost: {:.6f}'.format(epoch, nb_epochs, pred.squeeze().detach(), cost.item()))

    #reduce cost
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

Epoch    0/20 Hypothesis: tensor([0., 0., 0., 0., 0.]) Cost: 29661.800781
Epoch    1/20 Hypothesis: tensor([67.2578, 80.8397, 79.6523, 86.7394, 61.6605]) Cost: 9298.520508
Epoch    2/20 Hypothesis: tensor([104.9128, 126.0990, 124.2466, 135.3015,  96.1821]) Cost: 2915.712891
Epoch    3/20 Hypothesis: tensor([125.9942, 151.4381, 149.2133, 162.4896, 115.5097]) Cost: 915.040527
Epoch    4/20 Hypothesis: tensor([137.7968, 165.6247, 163.1911, 177.7112, 126.3307]) Cost: 287.936005
Epoch    5/20 Hypothesis: tensor([144.4044, 173.5674, 171.0168, 186.2332, 132.3891]) Cost: 91.371010
Epoch    6/20 Hypothesis: tensor([148.1035, 178.0144, 175.3980, 191.0042, 135.7812]) Cost: 29.758139
Epoch    7/20 Hypothesis: tensor([150.1744, 180.5042, 177.8508, 193.6753, 137.6805]) Cost: 10.445305
Epoch    8/20 Hypothesis: tensor([151.3336, 181.8983, 179.2240, 195.1707, 138.7440]) Cost: 4.391228
Epoch    9/20 Hypothesis: tensor([151.9824, 182.6789, 179.9928, 196.0079, 139.3396]) Cost: 2.493135
Epoch   10/20 Hypo

## High-level Implementation with ```nn.Module```
Do you remember this model?

In [20]:
class LinearRegressionModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(1, 1)

    def forward(self, x):
        return self.linear(x)

We just neet to change the input dimension from 1 to 3!

In [21]:
class MultivariateLinearRegression(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(3, 1)

    def forward(self, x):
        return self.linear(x)

In [22]:
# Data

x_train = torch.FloatTensor([[73, 80, 75],
                             [93, 88, 93],
                             [89, 91, 90],
                             [96, 98, 100],
                             [73, 66, 70]])
y_train = torch.FloatTensor([[152], [185], [180], [196], [142]])

#Model Initialize

model = MultivariateLinearRegression()

#Set Optimizer

optimizer = optim.SGD(model.parameters(), lr=1e-5)

nb_epochs = 20

for epoch in range(nb_epochs + 1):

    #Hypothesis
    pred = model(x_train)

    #Cost
    cost = F.mse_loss(pred, y_train)

    print('Epoch {:4d}/{} Hypothesis: {} Cost: {:.6f}'.format(epoch, nb_epochs, pred.squeeze().detach(), cost.item()))

    #Reduce Cost
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

Epoch    0/20 Hypothesis: tensor([-6.7933, -4.8968, -6.5155, -7.3361, -2.6660]) Cost: 31667.599609
Epoch    1/20 Hypothesis: tensor([62.7036, 78.6330, 75.7880, 82.2903, 61.0462]) Cost: 9926.265625
Epoch    2/20 Hypothesis: tensor([101.6124, 125.3983, 121.8667, 132.4688,  96.7163]) Cost: 3111.513916
Epoch    3/20 Hypothesis: tensor([123.3960, 151.5805, 147.6645, 160.5620, 116.6867]) Cost: 975.451355
Epoch    4/20 Hypothesis: tensor([135.5919, 166.2389, 162.1078, 176.2903, 127.8673]) Cost: 305.908539
Epoch    5/20 Hypothesis: tensor([142.4200, 174.4456, 170.1940, 185.0960, 134.1269]) Cost: 96.042496
Epoch    6/20 Hypothesis: tensor([146.2428, 179.0401, 174.7213, 190.0260, 137.6314]) Cost: 30.260748
Epoch    7/20 Hypothesis: tensor([148.3830, 181.6125, 177.2559, 192.7861, 139.5934]) Cost: 9.641701
Epoch    8/20 Hypothesis: tensor([149.5814, 183.0526, 178.6749, 194.3314, 140.6919]) Cost: 3.178671
Epoch    9/20 Hypothesis: tensor([150.2523, 183.8588, 179.4694, 195.1966, 141.3068]) Cost: 1.1