# Lab 2: Linear Regression

Edited By Steve Ive

Reference from SeungJae Lee, 

https://github.com/deeplearningzerotoall/PyTorch/blob/master/lab-02_linear_regression.ipynb

## Theoretical Overview

$H(x)$: How to predict for a given $x$ value.

$cost(W,b)$: How well $H(x)$ predicted $y$.

## Imports

In [291]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

In [292]:
torch.manual_seed(1)

<torch._C.Generator at 0x1f51b314470>

## Data

Basically the PyTorch has a NCHW fomat

In [293]:
x_train = torch.FloatTensor([[1], [2], [3]])
y_train = torch.FloatTensor([[1], [2], [3]])

In [294]:
print(x_train)
print(x_train.shape)

tensor([[1.],
        [2.],
        [3.]])
torch.Size([3, 1])


In [295]:
print(y_train)
print(y_train.shape)

tensor([[1.],
        [2.],
        [3.]])
torch.Size([3, 1])


## Weight Initialization

In [296]:
W = torch.zeros(1, requires_grad = True)
print(W)

tensor([0.], requires_grad=True)


In [297]:
b = torch.zeros(1, requires_grad = True)
print(b)

tensor([0.], requires_grad=True)


## Hyptothesis

$H(x) = Wx + b$

In [298]:
hypothesis = x_train * W + b
print(hypothesis)

tensor([[0.],
        [0.],
        [0.]], grad_fn=<AddBackward0>)


## Cost Function

 $ cost(W, b) = \frac{1}{m} \sum^m_{i=1}( H(x^{(i)}) - y^{(i)})^2 $

In [299]:
print(hypothesis)
print(y_train)
print(hypothesis - y_train)

tensor([[0.],
        [0.],
        [0.]], grad_fn=<AddBackward0>)
tensor([[1.],
        [2.],
        [3.]])
tensor([[-1.],
        [-2.],
        [-3.]], grad_fn=<SubBackward0>)


In [300]:
print((hypothesis - y_train)**2)

tensor([[1.],
        [4.],
        [9.]], grad_fn=<PowBackward0>)


In [301]:
cost = torch.mean((hypothesis - y_train)**2)
print(cost)

tensor(4.6667, grad_fn=<MeanBackward0>)


## Gradient Descent

In [302]:
optimizer = optim.SGD([W, b], lr=0.01)

In [303]:
optimizer.zero_grad()
cost.backward()#Computes the gradient of current tensor w.r.t. graph leaves.
optimizer.step()#Performs a single optimization step

In [304]:
print(W)
print(b)

tensor([0.0933], requires_grad=True)
tensor([0.0400], requires_grad=True)


Now Let's check whether the hypothesis works better

In [305]:
hypothesis = x_train * W + b
print(hypothesis)

tensor([[0.1333],
        [0.2267],
        [0.3200]], grad_fn=<AddBackward0>)


In [306]:
cost = torch.mean((hypothesis - y_train) ** 2)
print(cost)

tensor(3.6927, grad_fn=<MeanBackward0>)


## Training with Full Code

In reality, we will be training on the dataset for multiple epochs. This can be done simply with loops.



In [307]:
#DATA
x_train = torch.FloatTensor([[1], [2], [3]])
y_train = torch.FloatTensor([[1], [2], [3]])

#Model Initialization
W = torch.zeros(1, requires_grad=True)
b = torch.zeros(1, requires_grad=True)

#Set Optimizer
optimizer = optim.SGD([W, b], lr=0.01)

nb_epochs = 1000

for epoch in range(nb_epochs + 1):

    hypothesis = x_train * W + b

    cost = torch.mean((hypothesis - y_train)**2)

    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    #print logs for 100 epochs
    if epoch % 100 == 0:
        print('Epoch {:4d}/{} W: {:.3f}, b: {:.3f} Cost: {:.6f}'.format(epoch, nb_epochs, W.item(), b.item(), cost.item()))

Epoch    0/1000 W: 0.093, b: 0.040 Cost: 4.666667
Epoch  100/1000 W: 0.873, b: 0.289 Cost: 0.012043
Epoch  200/1000 W: 0.900, b: 0.227 Cost: 0.007442
Epoch  300/1000 W: 0.921, b: 0.179 Cost: 0.004598
Epoch  400/1000 W: 0.938, b: 0.140 Cost: 0.002842
Epoch  500/1000 W: 0.951, b: 0.110 Cost: 0.001756
Epoch  600/1000 W: 0.962, b: 0.087 Cost: 0.001085
Epoch  700/1000 W: 0.970, b: 0.068 Cost: 0.000670
Epoch  800/1000 W: 0.976, b: 0.054 Cost: 0.000414
Epoch  900/1000 W: 0.981, b: 0.042 Cost: 0.000256
Epoch 1000/1000 W: 0.985, b: 0.033 Cost: 0.000158


In [308]:
print(hypothesis)

tensor([[1.0186],
        [2.0040],
        [2.9894]], grad_fn=<AddBackward0>)


## High-level Implementation with ```nn.Module```

In [309]:
x_train = torch.FloatTensor([[1], [2], [3]])
y_train = torch.FloatTensor([[1], [2], [3]])

#### Basically, all the models providing by PyTorch are made of inheriting ```nn.Module```. Now we are going to build linear regression model

In [310]:
class LinearRegressionModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(1, 1)

    def forward(self, x):
        return self.linear(x)

#### At the model __init__, we are going to define the layers that will be used. Here, we are building the linear regression model, we will use ```nn.Linear```. And at the ```forward```, we will tell it how this model should return the output from the input.

In [311]:
model = LinearRegressionModel()

## Hypothesis

Now Let's make hypothesis by generating model

Basically, The ***Hypothesis*** means the return value of ```forward()``` of ```nn.Module```, which means that it is the forward result of the Linear Layer.

### **Hypothesis === Forward**

In [312]:
hypothesis = model(x_train)
print(list(model.parameters()))

[Parameter containing:
tensor([[0.5153]], requires_grad=True), Parameter containing:
tensor([-0.4414], requires_grad=True)]


In [313]:
print(hypothesis)

tensor([[0.0739],
        [0.5891],
        [1.1044]], grad_fn=<AddmmBackward>)


## Cost

Now, Let's get cost by MSE(Mean Squared Error). MSE function is also provided by PyTorch.

In [314]:
print(hypothesis)
print(y_train)

tensor([[0.0739],
        [0.5891],
        [1.1044]], grad_fn=<AddmmBackward>)
tensor([[1.],
        [2.],
        [3.]])


In [315]:
cost = F.mse_loss(hypothesis, y_train)

In [316]:
print(cost)

tensor(2.1471, grad_fn=<MseLossBackward>)


## Gradient Descent

Let's reduce the cost by the optimizer providing by PyTorch. You can use one of the optimizers in ```torch.optim```. Here, we will use the SGD.

In [317]:
optimizer = optim.SGD(model.parameters(), lr=0.01)

In [318]:
optimizer.zero_grad()
cost.backward()
optimizer.step()

## Training with Full Code

Now, we understand the Linear Regression, Let's fit it.

In [321]:
# Data
x_train = torch.FloatTensor([[1], [2], [3]])
y_train = torch.FloatTensor([[1], [2], [3]])

#Initialize the Model
model = LinearRegressionModel()

#Set the optimizer
optimizer = optim.SGD(model.parameters(), lr=0.01)

nb_epochs = 1000

for epoch in range(nb_epochs + 1):

    #Hypothsis
    pred = model(x_train)

    #Cost
    cost = F.mse_loss(pred, y_train)

    #Optimize the Cost
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    #Print Logs for 100 epochs
    if epoch % 100 == 0:
        params = list(model.parameters())
        W = params[0].item()
        b = params[1].item()
        print('Epoch {:4d}/{} W: {: .3f}, b: {: 3f} Cost: {: .6f}'.format(
            epoch, nb_epochs, W, b, cost.item()
        ))

Epoch    0/1000 W: -0.114, b:  0.546798 Cost:  4.589475
Epoch  100/1000 W:  0.700, b:  0.682770 Cost:  0.067199
Epoch  200/1000 W:  0.764, b:  0.536722 Cost:  0.041525
Epoch  300/1000 W:  0.814, b:  0.421912 Cost:  0.025660
Epoch  400/1000 W:  0.854, b:  0.331661 Cost:  0.015856
Epoch  500/1000 W:  0.885, b:  0.260716 Cost:  0.009798
Epoch  600/1000 W:  0.910, b:  0.204947 Cost:  0.006055
Epoch  700/1000 W:  0.929, b:  0.161107 Cost:  0.003741
Epoch  800/1000 W:  0.944, b:  0.126644 Cost:  0.002312
Epoch  900/1000 W:  0.956, b:  0.099554 Cost:  0.001429
Epoch 1000/1000 W:  0.966, b:  0.078259 Cost:  0.000883
