# Step by Step Implement Linear Regression 

You've see the [theory](https://nickyfoto.github.io/blog/entries/linear-regression) part of Linear Regression, now let's see how to implement it.

In [1]:
# import necessary modules
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from evaluation import test
from utils import load_data, predict_image, scatter_plot, contour_plot
from utils import plot_boundary, load_cat_dataset, load_iris_2D
from utils import plot_costs

from sklearn import datasets
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# from lr import LogisticRegression
from lm import LinearRegression, Ridge, SGDRegressor

In this tutorial, you will learn

- How to implement Least Mean Squares linear regression.
- How to implement stochastic gradient descent linear regression.
- How to evaluate the performance using coefficient of determination, usually denoted as $R^2$.
- How to implement normal equation linear regression.

## LMS algorithm

Take an example from sklearn, we can see that the model learned the `coef_` and `intercept_` correctly. How did it do that? There are many ways the model can learn the weights. The first one we introduce is Least Mean Squares (LMS) update rule.

In [2]:
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
# y = 1 * x_0 + 2 * x_1 + 3
y = np.dot(X, np.array([1, 2])) + 3
reg = linear_model.LinearRegression().fit(X, y)

In [3]:
reg.score(X, y)
reg.coef_, reg.intercept_ 
reg.predict(np.array([[3, 5]]))

1.0

(array([1., 2.]), 3.0000000000000018)

array([16.])

Given `X`, `coef_` and `intercept_` predict new example using

```python
np.dot(X, coef_) + intercept_
```

Note that the only difference with logistic regression is the decision function we use. Here we discard the sigmoid function and directly use

```python
np.dot(X, coef_.T) + intercept_
```

In [4]:
m, n_features = X.shape
coef_ = np.zeros(shape=(1, n_features))
intercept_ = np.zeros(shape=(1,))
y.shape = (m, 1)
max_iter = 1000
learning_rate = 1e-1
for step in range(max_iter):  
    preds = np.dot(X, coef_.T) + intercept_
    error = preds - y
    gradient = np.dot(X.T, error) 
    coef_ -= learning_rate * gradient.T / m
    intercept_ -= learning_rate * error.sum() / m
coef_, intercept_

(array([[1.00081836, 1.99989287]]), array([2.99890748]))

Using the weights we learned to predict new data.

In [5]:
np.dot(np.array([[3,5]]), coef_.T) + intercept_
r2_score(y_true=y.flatten(), y_pred = np.dot(X, coef_.T) + intercept_)

array([[16.0008269]])

0.9999999582739324

## Stochastic gradient descent linear regression

Note that to proximate the target value, we need 1000 epochs. Is there a better way? Similar to logistic regression, we can try stochastic gradient descent.

```python
pred = np.dot(x, coef_.T) + intercept_
```

In [6]:
coef_ = np.zeros(shape=(1, n_features))
max_iter = 600
intercept_ = np.zeros(shape=(1,))
for i in range(max_iter):
    for idx, x in enumerate(X):
        pred = np.dot(x, coef_.T) + intercept_
        error = pred - y[idx]
        gradient = x * error
        coef_ -= learning_rate * gradient.T
        intercept_ -= learning_rate * error
coef_, intercept_

(array([[1., 2.]]), array([3.]))

We update weights on every training example, hence it reduces the number of iteration to converge. To wrap up all our functionality in our own `SGDRegressor`, we get

In [7]:
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.dot(X, np.array([1, 2])) + 3
y
sgdreg = sgdreg = SGDRegressor(learning_rate=1e-1)
sgdreg.fit(X, y)
sgdreg.score(X, y)
sgdreg.coef_, sgdreg.intercept_ 
sgdreg.predict(np.array([[3, 5]]))

array([ 6,  8,  9, 11])

SGDRegressor(batch=False, c_lambda=0, fit_intercept=True, learning_rate=0.1,
             max_iter=1000, penalty=None)

1.0

(array([[1., 2.]]), array([3.]))

array([[16.]])

## The normal equations

According to the [normal equation](https://nickyfoto.github.io/blog/entries/linear-regression#the-normal-equation), we can analytically solve for $\theta$.

In [8]:
intercept = np.ones((X.shape[0], 1))
X = np.hstack((intercept, X))
X

array([[1., 1., 1.],
       [1., 1., 2.],
       [1., 2., 2.],
       [1., 2., 3.]])

Before we doing so, remember to add the intercept part so that our line doesn't have to be lie in the origin point.

In [9]:
theta = np.linalg.pinv(X.T.dot(X)).dot(X.T).dot(y)
theta.T

array([[3., 1., 2.]])

Analytically solving $\theta$

In [10]:
np.dot(X, theta).T

array([[ 6.,  8.,  9., 11.]])

Make predictions using $\theta$