# LinearRegression from scratch

![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png)  
This work by Jephian Lin is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

## Algorithm
**Input:**  
- `X`: an array of shape `(N,d)` whose rows are samples and columns are features
- `y`: the labels of shape `(N,)`
- `fit_intercept`: whether to calculate the intercept or not  
- `algorithm`: `"projction"` or `"grad_descent"`
- `learning_rate`: learning rate for the gradient descent algorithm
- `n_iter`: number of iterations for the gradient descent algorithm 

**Output:**  
A tuple `(predict, coefs, intercep)`.    
- `predict`: a function that can takes some samples `X_test` and return the prediction `X_test.dot(coefs) + intercept` 
- `coef`: an array of shape `(d,)` that stores the coefficients
- `intercept`: a float for the intercept

**Steps:**
1. If `fit_intercept`, let $A$ be the matrix obtained from $X$ by adding a column of ones on the left; otherwise, let $A = X$ (make a copy).  Let `dp` be the number of columns of $A$.
2. If `algorithm=="projection"`, compute ${\bf v} = (A^\top A)^{-1}A^\top {\bf y}$.
3. If `algorithm=="grad_descent"`, run the gradient descent algorithm as follows:
    1. Pick a random vector ${\bf v}$ of shape `(dp,)` .
    2. Calculate the gradient $\nabla = \frac{2}{N}(A{\bf v} - y)^\top A$.
    3. Update ${\bf v}$ by ${\bf v} - \alpha\nabla$.
    4. Repeat Steps B and C `n_iter` times.
4. If `fit_intercept`, let `coef` be ${\bf v}[1:]$ and `intercept` be ${\bf v}[0]$; otherwise, let `coef` be ${\bf v}$ and `intercept` be 0.  
5. Define `predict` as a function that sends `X_test` to `X_test.dot(coefs) + intercept`.

## Pseudocode
Translate the algorithm into the pseudocode.  
This helps you to identify the parts that you don't know how to do it.  

    1. 
    2. 
    3. ...

## Code

In [None]:
### your answer here

def LR(X, y, algorithm="projection", learning_rate=.001, n_iter=100, fit_intercept=True):
    '''Linear Regression. '''
    N = X.shape[0]
    A = np.c_[np.ones(N), X] if fit_intercept else X.copy()
    
    if algorithm=="projection":
        v = np.linalg.inv(A.T@A)@A.T@y
    elif algorithm=="grad_descent":
        v = np.random.rand(A.shape[1],)
        for i in range(n_iter):
            d = 2/X.shape[0]*(A@v-y).T@A
            v = v - learning_rate*d
    else:
        raise ValueError
    
    (coef, intercept) = (v[1:], v[0]) if fit_intercept else (v, 0)
    predict = lambda X_test: X_test@coef + intercept
    
    return predict, coef, intercept

## Test
Take some sample data from [LinearRegression-with-scikit-learn](LinearRegression-with-scikit-learn.ipynb) and check if your code generates similar outputs with the existing packages.

##### Name of the data
Description of the data.

##### Jephian:
You are supposed to give a brief description of the data, e.g., number of samples, features, and how it looks like.

In [None]:
### results with your code

x = np.arange(10)
X = x[:, None]
y = 0.5 * x + 3 + 0.3*np.random.randn(10)
X_test = np.linspace(0,10,20)[:, None]

result = LR(X, y)
print("coef = ", result[1])
print("intercept = ", result[2])
y_new = result[0](X_test)

%matplotlib inline
plt.scatter(X,y)
plt.plot(X_test, y_new,c='r')


In [None]:
### results with existing packages

model = LinearRegression()

model.fit(X, y)
print("coef = ", model.coef_)
print("intercept = ", model.intercept_)
y_new = model.predict(X_test)

%matplotlib inline
plt.scatter(x,y)
plt.plot(X_test,y_new,c = 'r')


## Comparison

##### Exercise 1
Set `algorithm="projection"` .  
Let  
```python
x = np.arange(10)
X1 = np.vstack([x]).T
X2 = np.vstack([np.ones_like(x), x]).T
y = 0.5 * x + 3 + 0.3*np.random.randn(10)
```
Apply your code to `X1` with `fit_intercept=True` and obtain `(predict1, coef1, intercept1)` .  
Apply your code to `X2` with `fit_intercept=False` and obtain `(predict2, coef2, intercept2)` .  
What are the relation between `coef1`, `intercept1` and `coef2` ?

In [None]:
x = np.arange(10)
X1 = np.vstack([x]).T
X2 = np.vstack([np.ones_like(x), x]).T
y = 0.5 * x + 3 + 0.3*np.random.randn(10)

In [None]:
### your answer here

_, coef1, intercept1 = LR(X1, y)
_, coef2, intercept2 = LR(X2, y, fit_intercept=False)
print("projection:")
print("  coef1 = ", coef1)
print("  intercept1 = ", intercept1)


print("projection without intercept:")
print("  coef2 = ", coef2)
print("  intercept2 = ", intercept2)


intercept1 = coef2[0]


coef1 = coef2[1]

##### Jephian:
More precisely, it should be `coef1 = coef2[1:]` .

##### Exercise 2
Let  
```python
x = np.arange(10)
X = np.vstack([x]).T
y = 0.5 * x + 3 + 0.3*np.random.randn(10)
```

In [None]:
x = np.arange(10)
X = np.vstack([x]).T
y = 0.5 * x + 3 + 0.3*np.random.randn(10)

###### 2(a)
Apply the linear regresssion algorithm to `X`  
1. by your code with `algorithm=="projection"` ,  
2. by your code with `algorithm=="grad_descent"` ,  
3. by `sklearn.linear_model.LinearRegresssion` .  
Check if the outputs are almost the same (up to some numerical errors).

In [None]:
### your answer here

_ , coef, intercept = LR(X, y, algorithm="projection")
print("projection:")
print("  coef =", coef)
print("  intercept =", intercept)

_ , coef, intercept = LR(X, y, algorithm="grad_descent")
print("grad_descent:")
print("  coef =", coef)
print("  intercept =", intercept)

model.fit(X,y)
print("sklearn.linear_model.LinearRegresssion:")
print("  coef =", model.coef_)
print("  intercept =", model.intercept_)


The results from projection and sklearn linear regression are the same.

But the result from grad_descent is not the same since the result will be affected by the n_iter and learning_rate.

##### Jephian:
Indeed.

###### 2(b)
Change `learning_rate=0.1` .  
What happened?

In [None]:
### your answer here

_ , coef, intercept = LR(X, y, algorithm="grad_descent", learning_rate=.001)
print("learning_rate=0.00 :")
print("  coef =", coef)
print("  intercept =", intercept)

_ , coef, intercept = LR(X, y, algorithm="grad_descent", learning_rate=.1)
print("learning_rate=0.1:")
print("  coef =", coef)
print("  intercept =", intercept)


It diverges.

###### 2(c)
Change `learning_rate=0.0001` .  
What happened?

In [None]:
### your answer here

_ , coef, intercept = LR(X, y, algorithm="grad_descent", learning_rate=.001)
print("learning_rate=0.001")
print("coef = ", coef)
print("intercept = ", intercept)

_ , coef, intercept = LR(X, y, algorithm="grad_descent", learning_rate=.0001)
print("learning_rate=0.0001")
print("coef = ", coef)
print("intercept = ", intercept)

The result of grad_descent is not the same as the other two.

It doesn't converge well.

###### 2(d)
Modify your code so that it prints the mean square error at each step of the gradient descent.  
Check if it is always decreasing.

In [None]:
### your answer here

def LR_MSE(X, y, algorithm="projection", learning_rate=.001, n_iter=10000, fit_intercept=True):
    '''LinearRegression'''
    A = np.c_[np.ones(X.shape[0]), X] if fit_intercept else X.copy()
    if algorithm=="projection":
        v = np.linalg.inv(A.T@A)@A.T@y
    elif algorithm=="grad_descent":
        v = np.random.rand(A.shape[1],)
        for i in range(n_iter):
            d = 2/X.shape[0]*(A@v-y).T@A
            v = v - learning_rate*d
            print(np.linalg.norm(A.dot(v)-y)**2/X.shape[0])
    else:
        raise TypeError
    (coef, intercept) = (v[1:], v[0]) if fit_intercept else (v, 0)

LR_MSE(X, y, algorithm="grad_descent", learning_rate=.001, n_iter=10)


Yes, it's decreasing

##### Exercise 3
This exercise checks if the gradient formula is correct (or at least reasonable).  
Let  
```python
N,dp = 100,3
np.random.seed(20025)
A = np.random.randn(N,dp)
v = np.random.randn(dp)
y = np.random.randn(N)
```
Define $c(A, {\bf v}, {\bf y}) = \frac{1}{N}\|A{\bf v} - {\bf y}\|^2$.

In [None]:
N,dp = 100,3
np.random.seed(20025)
A = np.random.randn(N,dp)
v = np.random.randn(dp)
y = np.random.randn(N)

###### 3(a)
Write a function `cost(A, v, y)` that calculate $c(A, {\bf v}, {\bf y})$.

In [None]:
### your answer here

def cost(A, v, y):
    return 1/N*(np.linalg.norm(A@v-y))**2


###### 3(b)
Calculate the gradient  
$$\frac{2}{N}(A{\bf v} - y)^\top A.$$

In [None]:
### your answer here

2/N*(A@v-y).T@A


###### 3(c)
Let  
```python
h = 0.001
i = 0
e = np.zeros((dp,))
e[i] = 1
g = (cost(A, v+h*e, y) - cost(A, v, y)) / h
```
Run the code for `i = 0,1,2` and compare `g` with the gradient in 3(b).  
If necessary, change `h` to smaller numbers.

In [None]:
### your answer here

h = 0.00000001
l = []
for i in range(3):
    e = np.zeros((dp,))
    e[i] = 1
    g = (cost(A, v+h*e, y) - cost(A, v, y)) / h
    l.append(g)
l

The output of 3(b) is similar to the output of 3(c) 

In [None]:
class LR:
    '''
    Linear Regression.

    Parameters
    ----------
    X : an array of shape (N,d) whose rows are samples and columns are features
    y : the labels of shape (N,)
    fit_intercept: whether to calculate the intercept or not
    algorithm: "projection" or "grad_descent"
    learning_rate: learning rate for the gradient descent algorithm
    n_iter: number of iterations for the gradient descent algorithm

    Returns
    -------
    predict: a function that can takes some samples X_test and return the prediction X_test.dot(coefs) + intercept
    coef: an array of shape (d,) that stores the coefficients
    intercept: a float for the intercept

    '''
    def __init__(self, **kwargs):
        self._algorithm = kwargs.pop('algorithm', "projection")
        self._a = kwargs.pop('learning_rate', .001)
        self._n_iter = kwargs.pop('n_iter', 10000)
        self._fit_intercept = kwargs.pop('fit_intercept', True)
        if not len(kwargs) == 0:
            raise ValueError(f'Unknown keywords ({kwargs.keys()})')
        if not self._algorithm in ("projection", "grad_descent"):
            raise ValueError
    
    def fit(self, X, y, MSQ = False):
        N, d = X.shape
        A = np.c_[np.ones(N), X] if self._fit_intercept else X.copy()
        if self._algorithm=="projection":
            v = np.linalg.inv(A.T@A)@A.T@y
        else:
            v = np.random.rand(A.shape[1],)
            for i in range(self._n_iter):
                d = 2/X.shape[0]*(A@v-y).T@A
                v = v - self._a*d
                if MSQ == True:
                    print(np.linalg.norm(A.dot(v)-y)**2/N)
        (self.coef_, self.intercept_) = (v[1:], v[0]) if self._fit_intercept else (v, 0)

    def predict(self, X_test):
        return X_test @ self.coef_ + self.intercept_
    

##### Jephian:
Well done!