<a href="https://colab.research.google.com/github/hufsaim/T10402201/blob/master/notebook/Lab02_LinearRegression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Linear Regression Model
- 이번 실습은 linear regression model을 직접 구현하여 mini-batch stochastic gradient descent 방법을 이용해서 model을 학습시키는 방법을 살펴봅니다.

In [1]:
import torch
import numpy as np
import matplotlib.pyplot as plt

# Defining Linear Regression Model
- Matrix X는 각 row에 각각의 example이 가지고 있는 input feature들을 포함합니다.
- vector w는 각각의 feature에 해당하는 weight입니다.
- b는 bias로 모든 feature가 0일 때의 출력값입니다.
- w와 b는 우리가 학습을 시켜야 되는 파라미터들이기 때문에 requires_grad = True 를 포함시켜야 합니다.
- pytorch에서는 requires_grad = True인 tensor들에 대해 자동으로 gradient를 계산할 수 있습니다.

In [2]:
# linear regression model y = Xw + b
def linreg(X, w, b):
    return torch.matmul(X,w) + b

In [3]:
X = torch.eye(3)
w = torch.normal(0,0.01,size=(3,1),requires_grad = True)
#w = torch.tensor([1.,1,1],requires_grad = True)
b = torch.zeros(1,requires_grad = True)

print("X:",X)
print("w:",w)
print("b:",b)

X: tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])
w: tensor([[-0.0077],
        [-0.0122],
        [-0.0162]], requires_grad=True)
b: tensor([0.], requires_grad=True)


In [4]:
y = linreg(X,w,b)
print("y:",y)

y: tensor([[-0.0077],
        [-0.0122],
        [-0.0162]], grad_fn=<AddBackward0>)


# Defining Squared Loss
- tensor의 연산에서 tensor의 shape를 맞추는 것은 매우 중요합니다.
- tensor의 shape가 맞지 않는 경우 임의로 broadcasting이 진행되어, 의도하지 않은 결과를 얻을 수 있습니다.

In [5]:
y_hat = torch.normal(0,0.1,size=(3,1))
y = torch.tensor([1.,2,3])

print((y-y_hat)**2)

tensor([[1.1650, 4.3237, 9.4824],
        [0.7369, 3.4537, 8.1706],
        [0.9049, 3.8075, 8.7100]])


In [6]:
y_hat = torch.normal(0,0.1,size=(3,1))
y = torch.tensor([1.,2,3],)

print( (y_hat-y.reshape(y_hat.shape))**2 /2)

tensor([[0.4614],
        [1.9234],
        [3.9592]])


In [7]:
def squared_loss(y_hat, y):
    return (y_hat - y.reshape(y_hat.shape))**2 / 2

In [8]:
squared_loss(y_hat,y)

tensor([[0.4614],
        [1.9234],
        [3.9592]])

# Calculating Gradients
- 정의한 loss function을 이용하여 w, b에 대한 gradient를 계산합니다.


In [9]:
y = torch.tensor([1.,2,3]) # label (ground truth)
X = torch.eye(3)
w = torch.tensor([1.,1,1],requires_grad = True)
b = torch.zeros(1,requires_grad = True)

print('# initial parameters')
print(w,b)
print('# initial gradients: None')
print(w.grad, b.grad)

y_hat = linreg(X,w,b)
print('# first prediction using initial parameters')
print(y_hat)

l = squared_loss(y_hat, y)
l.sum().backward()
print('# calculated loss')
print(l.sum())

print('# calculated gradients using the loss')
print(w.grad, b.grad)

# initial parameters
tensor([1., 1., 1.], requires_grad=True) tensor([0.], requires_grad=True)
# initial gradients: None
None None
# first prediction using initial parameters
tensor([1., 1., 1.], grad_fn=<AddBackward0>)
# calculated loss
tensor(2.5000, grad_fn=<SumBackward0>)
# calculated gradients using the loss
tensor([ 0., -1., -2.]) tensor([-3.])


- 계산된 gradients를 이용해 파라미터를 새로 업데이트 합니다.
- 파라미터를 업데이트한 후에는 각각의 파라미터의 gradient를 다시 초기화 해주어야 합니다.

In [10]:
lr = 1
batch_size = 3
print([w,b])
print([w.grad,b.grad])
with torch.no_grad():
    w -= lr * w.grad/batch_size # w = w - lr*w.grad 
    b -= lr * b.grad/batch_size # b = b - lr*b.grad
    w.grad.zero_()
    b.grad.zero_()
print([w,b])
print([w.grad,b.grad])

[tensor([1., 1., 1.], requires_grad=True), tensor([0.], requires_grad=True)]
[tensor([ 0., -1., -2.]), tensor([-3.])]
[tensor([1.0000, 1.3333, 1.6667], requires_grad=True), tensor([1.], requires_grad=True)]
[tensor([0., 0., 0.]), tensor([0.])]


# Defining minibatch stochastic gradient descent


In [11]:
def sgd(params, lr, batch_size):  
    with torch.no_grad():
        for param in params:
            param -= lr * param.grad/batch_size
            param.grad.zero_()

- 이제 Linear regression model 학습을 위한 model architecture, loss function, optimization algorithm이 모두 준비가 되었습니다.
- 3개의 example로 구성된 간단한 데이터를 이용해 학습을 진행하는 과정을 살펴 봅니다.

In [12]:
lr = 1
batch_size = 3

y = torch.tensor([1.,2,3]) # label (ground truth)
X = torch.eye(3)
w = torch.tensor([1.,1,1],requires_grad = True)
b = torch.zeros(1,requires_grad = True)

print('# initial parameters')
print(w,b)

# 1st iteration
y_hat = linreg(X,w,b)
print('\n')
print('## first prediction using initial parameters ##')
print(y_hat)

l = squared_loss(y_hat, y)
l.sum().backward()
sgd([w,b],lr,batch_size)
print('# updated parameters')
print(w, b)

# 2nd iteration
y_hat = linreg(X,w,b)
print('\n')
print('## second prediction using the updated parameters ##')
print(y_hat)

l = squared_loss(y_hat, y)
l.sum().backward()
sgd([w,b],lr,batch_size)
print('# updated parameters')
print(w, b)


# 3rd iteration
y_hat = linreg(X,w,b)
print('\n')
print('## third prediction using the updated parameters ##')
print(y_hat)

l = squared_loss(y_hat, y)
l.sum().backward()
sgd([w,b],lr,batch_size)
print('# updated parameters')
print(w, b)


# 4th iteration
y_hat = linreg(X,w,b)
print('\n')
print('## fourth prediction using the updated parameters ##')
print(y_hat)

l = squared_loss(y_hat, y)
l.sum().backward()
sgd([w,b],lr,batch_size)
print('# updated parameters')
print(w, b)

# initial parameters
tensor([1., 1., 1.], requires_grad=True) tensor([0.], requires_grad=True)


## first prediction using initial parameters ##
tensor([1., 1., 1.], grad_fn=<AddBackward0>)
# updated parameters
tensor([1.0000, 1.3333, 1.6667], requires_grad=True) tensor([1.], requires_grad=True)


## second prediction using the updated parameters ##
tensor([2.0000, 2.3333, 2.6667], grad_fn=<AddBackward0>)
# updated parameters
tensor([0.6667, 1.2222, 1.7778], requires_grad=True) tensor([0.6667], requires_grad=True)


## third prediction using the updated parameters ##
tensor([1.3333, 1.8889, 2.4444], grad_fn=<AddBackward0>)
# updated parameters
tensor([0.5556, 1.2593, 1.9630], requires_grad=True) tensor([0.7778], requires_grad=True)


## fourth prediction using the updated parameters ##
tensor([1.3333, 2.0370, 2.7407], grad_fn=<AddBackward0>)
# updated parameters
tensor([0.4444, 1.2469, 2.0494], requires_grad=True) tensor([0.7407], requires_grad=True)
