# Linear Regression with full-batch

* y와 한 개 이상의 독립 변수 (또는 설명 변수) X와의 선형 상관 관계를 모델링하는 회귀분석 기법이다. 한 개의 설명 변수에 기반한 경우에는 단순 선형 회귀, 둘 이상의 설명 변수에 기반한 경우에는 다중 선형 회귀라고 한다. [참고: 위키피디아](https://ko.wikipedia.org/wiki/선형_회귀)

$$y_{\textrm{pred}} = \boldsymbol{w}^{\top}\boldsymbol{x} + b$$
or
$$y_{\textrm{pred}} = w_{0} + w_{1} x_{1} + w_{2} x_{2} + \cdots + w_{d} x_{d},$$
where $w_{0} = b$.

* $\mathbf{x} = [x_{1}, x_{2}, \cdots, x_{d}]^{\top}$
* $\mathbf{w} = [w_{1}, w_{2}, \cdots, w_{d}]^{\top}$
* Loss function: $\mathcal{L} = \sum^{N} (y_{\textrm{pred}} - y)^{2}$
  * where $N$ is a number of examples

## Import

In [None]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import time

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

from mpl_toolkits.mplot3d import Axes3D
from matplotlib.colors import LogNorm

## Data generation

In [None]:
np.random.seed(219)
N = 200
a = 4
b = -3
low = -3.0
high = 4.0
data_x = np.random.uniform(low=low, high=high, size=N)
data_y = np.zeros(N)
for i, x in enumerate(data_x):
  scale = - (x - low) * (x - high) / 3. + 1.5
  data_y[i] = a * x + b + np.random.normal(loc=0.0, scale=scale, size=1)

### Data plot

In [None]:
plt.plot(data_x, data_y, 'ro')
plt.axhline(0, color='black', lw=1)
plt.axvline(0, color='black', lw=1)
plt.show()

## Exact solution of linear regression

Linear regression model is
$$y_{\textrm{pred}} = \boldsymbol{w}^{\top}\boldsymbol{x} + b$$
or
$$y_{\textrm{pred}} = w_{0} + w_{1} x_{1} + w_{2} x_{2} + \cdots + w_{d} x_{d},$$
where $w_{0} = b$.

Extend the class of models by considering linear combinations of fixed nonlinear functions of the input variables

$$y_{\textrm{pred}} = w_{0} + w_{1} \phi_{1}(\mathbf{x}) + w_{2} \phi_{2}(\mathbf{x}) + \cdots + w_{M-1} \phi_{M-1}(\mathbf{x}),$$

$$y_{\textrm{pred}} = w_{0} + \sum_{j}^{M-1} w_{j} \phi_{j}(\mathbf{x}).$$

$\phi_{j}(\mathbf{x})$ is called *basis function*.
And add dummy 'basis function' $\phi_{0}(\mathbf{x}) = 1$ so that

$$y_{\textrm{pred}} = \sum_{j}^{M-1} w_{j} \phi_{j}(\mathbf{x}) = \mathbf{w}^{\top} \boldsymbol{\phi}(\mathbf{x}).$$

### Normal Equation (analytic solution of linear model)

$$\mathbf{w} = (\mathbf{\Phi}^{\top} \mathbf{\Phi})^{-1} (\mathbf{\Phi}^{\top} \mathbf{Y})$$

where $\mathbf{\Phi}$ is a $N \times M$ matrix, called *design matrix*

$$\mathbf{\Phi} = \left( \begin{array}{llll}
\phi_{0}(\mathbf{x_{1}}) & \phi_{1}(\mathbf{x_{1}}) & \cdots & \phi_{M-1}(\mathbf{x_{1}})\\
\phi_{0}(\mathbf{x_{2}}) & \phi_{1}(\mathbf{x_{2}}) & \cdots & \phi_{M-1}(\mathbf{x_{2}})\\
\vdots & \vdots & \ldots & \vdots\\
\phi_{0}(\mathbf{x_{N}}) & \phi_{1}(\mathbf{x_{N}}) & \cdots & \phi_{M-1}(\mathbf{x_{N}})
\end{array} \right).$$

And $\mathbf{Y}$ is a target vector (label data)
* $\mathbf{Y} = [y_{1}, y_{2}, \cdots, y_{N}]^{\top}$

### Problem

* 우리 문제는 다음과 같습니다.
$$y_{\textrm{pred}} = w_{0} + w_{1} x,$$

$$y_{\textrm{pred}} = \sum_{j}^{M-1} w_{j} \phi_{j}(\mathbf{x}) = \mathbf{w}^{\top} \boldsymbol{\phi}(\mathbf{x}).$$

* 우리 문제에서 $\mathbf{\Phi}$를 직접 써보세요
  1. 먼저 $\phi_{0}(\mathbf{x})$과 $\phi_{1}(\mathbf{x})$를 직접 써보세요
  2. 그 다음 design matrix $\mathbf{\Phi}$가 어떻게 생겼는지 생각해봅시다

### $\Phi$, $\mathbf{Y}$ 만들기
* 변수명 X가 $\Phi$를 뜻함
* 변수명 Y가 $\mathbf{Y}$를 뜻함

In [None]:
# TODO
X = 
Y = 

### $\mathbf{w} = (\mathbf{\Phi}^{\top} \mathbf{\Phi})^{-1} (\mathbf{\Phi}^{\top} \mathbf{Y})$ 구현하기
* A: $\mathbf{\Phi}^{\top} \mathbf{\Phi}$
* invA: inverse of A
* B: $\mathbf{\Phi}^{\top} \mathbf{Y}$
* W_exact: $\mathbf{w}$ with shape: (2,)

In [None]:
%time
# TODO
A = 
invA = 
B = 
W_exact = 
W_exact = 
minima = W_exact.reshape(2, 1)

In [None]:
print("Real parameters used creating the data")
print("w: {:.4f}  b: {:.4f}".format(a, b))
print("Exact Solution using the normal equation")
print("w: {:.4f}  b: {:.4f}".format(W_exact[0], W_exact[1]))

Result
```shell
Real parameters used creating the data
w: 4.0000  b: -3.0000
Exact Solution using the normal equation
w: 4.1533  b: -3.2426```

## Training Pseudo Code using Gradient Descent

```python
for epoch in max_epochs: # 1 epoch: 모든 데이터(N)를 한번 학습 시켰을 때
  for step in num_batches: # num_batches = int(data_size / batch_size)
    1. sampling mini-batches with batch_size
      1-1. data augmentation (필요하면)
    2. calculate the logits # logits = f(x)
    3. calculate the loss # loss = loss(logits, labels)
    4. calculate the gradient with respect to weights
    5. update weights
```

## Build a LinearRegression model

In [None]:
class LinearRegression(object):
  def __init__(self, data_x, data_y, w_init=None, b_init=None, learning_rate=0.1):
    scale = 4.0
    if w_init is not None:
      self.w = w_init
    else:
      self.w = np.random.uniform(low=a-scale, high=a+scale)
    if b_init is not None:
      self.b = b_init
    else:
      self.b = np.random.uniform(low=b-scale, high=b+scale)
    print("w_init: {:.3f}".format(self.w))
    print("b_init: {:.3f}".format(self.b))
      
    self.x = data_x
    self.y = data_y
    self.lr = learning_rate
  
  def inference(self, x):
    """Inference function for a linear model
      y_pred = w * x + b.
    
    Args:
      x: full-batch data, shape: (1-rank Tensor (vector) np.array)
    
    Returns:
      y_pred: full-batch y_pred, shape: (1-rank Tensor (vector) np.array)
    """
    # TODO
    y_pred = 
    return y_pred
  
  def loss_for_plot(self, w, b):
    """List of loss function with respect to given list of (w, b).
      
    Args:
      w: shape: (1-rank Tensor (vector) np.array)
      b: shape: (1-rank Tensor (vector) np.array)
    
    Returns:
      loss_for_plot: shape: (1-rank Tensor (vector) np.array)
    """
    y_pred = np.matmul(np.expand_dims(self.x, axis=1), np.expand_dims(w, axis=0)) + b
    loss_for_plot = 0.5 * (y_pred - np.expand_dims(self.y, axis=1))**2
    loss_for_plot = np.mean(loss_for_plot, axis=0)
    return loss_for_plot
  
  def loss_fn(self, labels, predictions):
    """Loss function.
    
    Args:
      labels: target data y, shape: (1-rank Tensor (vector) np.array)
      predictions: model inference y_pred, shape: (1-rank Tensor (vector) np.array)
    
    Returns:
      loss: mean value of loss for full-batch data, shape: (0-rank Tensor (scalar))
    """
    # TODO
    loss = 
    loss = 
    return loss
  
  def loss_derivative(self):
    """Loss derivative.
    
    Returns:
      dw: dL / dw, mean value of derivatives for full-batch data, shape: (0-rank Tensor (scalar))
      db: dL / db, mean value of derivatives for full-batch data, shape: (0-rank Tensor (scalar))
    """
    # TODO
    dw = 
    db = 
    return dw, db
  
  def weights_update(self):
    """Weights update using Gradient descent.
    
      w' = w - lr * dL/dw
    """
    # TODO
    self.w = 
    self.b = 
    
  def train(self, max_epochs):
    self.loss_history = []
    self.w_history = []
    self.b_history = []
    pre_loss = 0.0
    for epoch in range(max_epochs):
      # TODO
      self.y_pred = self.inference()
      self.loss = self.loss_fn()
      
      self.loss_history.append(self.loss)
      self.w_history.append(self.w)
      self.b_history.append(self.b)
      #print("epochs: {}  loss: {:.6f}  w: {:.4f}  b: {:.4f}".format(epoch, self.loss, self.w, self.b))
      
      self.dw, self.db = self.loss_derivative()
      self.weights_update()
      
      if np.abs(pre_loss - self.loss) < 1e-6:
        # TODO
        self.loss = self.loss_fn()
        self.loss_history.append(self.loss)
        self.w_history.append(self.w)
        self.b_history.append(self.b)
        print("epochs: {}  loss: {:.6f}  w: {:.4f}  b: {:.4f}".format(epoch+1, self.loss, self.w, self.b))
        break
      pre_loss = self.loss
    
    self.w_history = np.array(self.w_history)
    self.b_history = np.array(self.b_history)
    self.path = np.concatenate((np.expand_dims(self.w_history, 1), np.expand_dims(self.b_history, 1)), axis=1).T

### Create a `LinearRegression` class

In [None]:
model = LinearRegression(data_x, data_y, w_init=1., b_init=0., learning_rate=0.3)
#model = LinearRegression(data_x, data_y, w_init=1., b_init=-3., learning_rate=0.1)
#model = LinearRegression(data_x, data_y, w_init=7., b_init=-6., learning_rate=0.1)
#model = LinearRegression(data_x, data_y, w_init=7., b_init=0., learning_rate=0.3)
#model = LinearRegression(data_x, data_y, w_init=4., b_init=0., learning_rate=0.1)
#model = LinearRegression(data_x, data_y, w_init=None, b_init=None, learning_rate=0.1)

### Training

In [None]:
%time
model.train(100)

### Results

In [None]:
print("Real parameters used creating the data")
print("w: {:.4f}  b: {:.4f}".format(a, b))
print("Exact Solution using the normal equation")
print("w: {:.4f}  b: {:.4f}".format(W_exact[0], W_exact[1]))
print("Solution using the gradient descent")
print("w: {:.4f}  b: {:.4f}".format(model.w, model.b))

### Loss function plot

In [None]:
#Plot the loss function
plt.title('Loss Function L')
plt.xlabel('Number of epochs')
plt.ylabel('Loss')
plt.plot(model.loss_history)
plt.show()

### Plot the data with our model

In [None]:
plt.plot(data_x, data_y, 'ro', label='Real data')
plt.plot(data_x, model.w * data_x + model.b, lw=5, label='model')
plt.axhline(0, color='black', lw=1)
plt.axvline(0, color='black', lw=1)
plt.legend()
plt.show()

## Plot loss surface function

In [None]:
# putting together our points to plot in a 3D plot
number_of_points = 50
margin = 4.
w_min = a - margin
w_max = a + margin
b_min = b - margin
b_max = b + margin
w_points = np.linspace(w_min, w_max, number_of_points) 
b_points = np.linspace(b_min, b_max, number_of_points)
w_mesh, b_mesh = np.meshgrid(w_points, b_points)
loss_ = np.array([model.loss_for_plot(wps, bps) for wps, bps in zip(w_mesh, b_mesh)])

### 3D plot with learning path

In [None]:
#%matplotlib inline
#%matplotlib notebook
#%pylab

path = model.path

fig = plt.figure(figsize=(10, 8))
ax = plt.axes(projection='3d', elev=40, azim=-100)

ax.plot_surface(w_mesh, b_mesh, loss_, norm=LogNorm(), rstride=1, cstride=1, 
                edgecolor='none', alpha=.8, cmap=plt.cm.jet)

ax.plot(*minima, model.loss_for_plot(*minima), 'r*', markersize=20)
ax.quiver(path[0,:-1], path[1,:-1], model.loss_for_plot(*path[::,:-1]),
          path[0,1:]-path[0,:-1], path[1,1:]-path[1,:-1],
          model.loss_for_plot(*path[::,1:]) - model.loss_for_plot(*path[::,:-1]),
          color='k', length=0.2, normalize=True)

ax.set_xlabel('w')
ax.set_ylabel('b')
ax.set_zlabel('loss')

ax.set_xlim((w_min, w_max))
ax.set_ylim((b_min, b_max))

#plt.draw()
plt.show()

### Contour plot with learning path

In [None]:
fig, ax = plt.subplots(figsize=(10, 8))

ax.contour(w_mesh, b_mesh, loss_, levels=np.logspace(-1, 2, 35), norm=LogNorm(), cmap=plt.cm.jet)
ax.plot(*minima, 'r*', markersize=20)
ax.quiver(path[0,:-1], path[1,:-1], path[0,1:]-path[0,:-1], path[1,1:]-path[1,:-1],
          scale_units='xy', angles='xy', scale=1, color='k')

ax.set_xlabel('w')
ax.set_ylabel('b')

ax.set_xlim((w_min, w_max))
ax.set_ylim((b_min, b_max))

plt.show()