# LinearRegression from scratch

![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png)  
This work by Jephian Lin is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

## Algorithm
**Input:**  
- `X`: an array of shape `(N,d)` whose rows are samples and columns are features
- `y`: the labels of shape `(N,)`
- `fit_intercept`: whether to calculate the intercept or not  
- `algorithm`: `"projction"` or `"grad_descent"`
- `learning_rate`: learning rate for the gradient descent algorithm
- `n_iter`: number of iterations for the gradient descent algorithm 

**Output:**  
A tuple `(predict, coefs, intercep)`.    
- `predict`: a function that can takes some samples `X_test` and return the prediction `X_test.dot(coefs) + intercept` 
- `coef`: an array of shape `(d,)` that stores the coefficients
- `intercept`: a float for the intercept

**Steps:**
1. If `fit_intercept`, let $A$ be the matrix obtained from $X$ by adding a column of ones on the left; otherwise, let $A = X$ (make a copy).  Let `dp` be the number of columns of $A$.
2. If `algorithm=="projection"`, compute ${\bf v} = (A^\top A)^{-1}A^\top {\bf y}$.
3. If `algorithm=="grad_descent"`, run the gradient descent algorithm as follows:
    1. Pick a random vector ${\bf v}$ of shape `(dp,)` .
    2. Calculate the gradient $\nabla = \frac{2}{N}(A{\bf v} - {\bf y})^\top A$.
    3. Update ${\bf v}$ by ${\bf v} - \alpha\nabla$.
    4. Repeat Steps B and C `n_iter` times.
4. If `fit_intercept`, let `coef` be ${\bf v}[1:]$ and `intercept` be ${\bf v}[0]$; otherwise, let `coef` be ${\bf v}$ and `intercept` be 0.  
5. Define `predict` as a function that sends `X_test` to `X_test.dot(coefs) + intercept`.

## Pseudocode
Translate the algorithm into the pseudocode.  
This helps you to identify the parts that you don't know how to do it.  

    1. 
    2. 
    3. ...

## Code

In [None]:
### your answer here

def LR(X, y, fit_intercept=True, algorithm="projection", learning_rate=0.01, n_iter=1000):
    
    N,d = X.shape
    
    if fit_intercept:
        A = np.hstack([np.ones((N,1)),X])
        dp = d+1
    else:
        A = X.copy()
        dp = d
        
    if algorithm == "projection":
        v = np.linalg.inv(A.T.dot(A)).dot(A.T).dot(y)
    elif algorithm == "grad_descent":
        v = np.random.rand(dp,)
        for i in range(n_iter):
            grad = (2/N)*((A.dot(v)-y).T).dot(A)
            v = v - learning_rate*grad
                
    if fit_intercept:
        coef = v[1:]
        intercept = v[0]
    else:
        coef = v.copy()
        intercept = 0
        
    predict = lambda X_test: X_test.dot(coef) + intercept
        
    return predict, coef, intercept

## Test
Take some sample data from [LinearRegression-with-scikit-learn](LinearRegression-with-scikit-learn.ipynb) and check if your code generates similar outputs with the existing packages.

##### Name of the data
Description of the data.

#### Alex:
You should give a brief description of your data.

In [None]:
### results with your code
### (sklearn exercise 1)
x = np.arange(10)
y = 0.5*x + 3 + 0.3*np.random.randn(10)
X = x[:,np.newaxis]
x_test = np.linspace(0,10,20)
X_test = x_test[:,np.newaxis]

predict, coef, intercept = LR(X, y, algorithm="projection")
y_new = predict(X_test)

%matplotlib inline
plt.scatter(x, y)
plt.plot(x_test, y_new, c='r')

print('coef_=', coef)
print('intercept_=', intercept) 

In [None]:
### results with existing packages
x = np.arange(10)
y = 0.5*x + 3 + 0.3*np.random.randn(10)
X = x[:,np.newaxis]
x_test = np.linspace(0,10,20)
X_test = x_test[:,np.newaxis]

from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X, y)
y_new = model.predict(X_test)

%matplotlib inline
plt.scatter(x, y)
plt.plot(x_test, y_new, c='r')

print('model.coef_=', model.coef_)   
print('model.intercept_=', model.intercept_)   

#### Alex:
Since you randomize your data two times, the data for using package and your own function are not the same. Thus, you get a different result. You should use the same data to check whether your code generates similar outputs with the existing packages.

## Comparison

##### Exercise 1
Set `algorithm="projection"` .  
Let  
```python
x = np.arange(10)
X1 = np.vstack([x]).T
X2 = np.vstack([np.ones_like(x), x]).T
y = 0.5 * x + 3 + 0.3*np.random.randn(10)
```
Apply your code to `X1` with `fit_intercept=True` and obtain `(predict1, coef1, intercept1)` .  
Apply your code to `X2` with `fit_intercept=False` and obtain `(predict2, coef2, intercept2)` .  
What are the relation between `coef1`, `intercept1` and `coef2` ?

In [None]:
### your answer here

x = np.arange(10)
X1 = np.vstack([x]).T
X2 = np.vstack([np.ones_like(x), x]).T
y = 0.5 * x + 3 + 0.3*np.random.randn(10)

predict1, coef1, intercept1 = LR(X1, y, algorithm="projection")
print('coef1 = ',coef1)
print('intercept1 = ',intercept1)

predict2, coef2, intercept2 = LR(X2, y, algorithm="projection", fit_intercept=False)
print('coef2 = ',coef2)


# coef1 = coef2[1:]
# intercept1 = coef2[0]

##### Exercise 2
Let  
```python
x = np.arange(10)
X = np.vstack([x]).T
y = 0.5 * x + 3 + 0.3*np.random.randn(10)
```

###### 2(a)
Apply the linear regresssion algorithm to `X`  
1. by your code with `algorithm=="projection"` ,  
2. by your code with `algorithm=="grad_descent"` ,  
3. by `sklearn.linear_model.LinearRegresssion` .  
Check if the outputs are almost the same (up to some numerical errors).

In [None]:
x = np.arange(10)
X = np.vstack([x]).T
y = 0.5 * x + 3 + 0.3*np.random.randn(10)

predict, coef, intercept = LR(X, y, algorithm="projection")
print('projection')
print('coef = ',coef)
print('intercept = ',intercept)

predict, coef, intercept = LR(X, y, algorithm="grad_descent")
print('grad_descent')
print('coef = ',coef)
print('intercept = ',intercept)
 
model = LinearRegression()
model.fit(X,y)
print('sklearn.linear_model.LinearRegression')
print('coef = ',model.coef_)
print('intercept = ',model.intercept_)

# The results from projestion and sklearn LR are almost the same. 
# The result from gradient descent is different from the 2nd numerical error.

###### 2(b)
Change `learning_rate=0.1` .  
What happened?

In [None]:
### your answer here

predict, coef, intercept = LR(X, y, algorithm="grad_descent", learning_rate=0.1)

print(' Change learning_rate=0.1: ')
print(coef)
print(intercept)

# It diverges.

###### 2(c)
Change `learning_rate=0.0001` .  
What happened?

In [None]:
### your answer here

predict, coef, intercept = LR(X, y, algorithm="grad_descent", learning_rate=0.0001)

print(' Change learning_rate=0.0001: ')
print(coef)
print(intercept)

# It did not converge well since the learning rate is too small.
# The result is different to algorithm='projection' and sklearn.
# The result from grad_descent for learning_rate=0.0001 with the result from grad_descent for learning_rate=0.01 are not the same.

###### 2(d)
Modify your code so that it prints the mean square error at each step of the gradient descent.  
Check if it is always decreasing.

In [None]:
### your answer here

def LR_gradient(X, y, fit_intercept=True, algorithm="projction", learning_rate = 0.01, n_iter = 10):
    
    N = X.shape[0]
    d = X.shape[1]
    
    if fit_intercept:
        A = np.hstack([np.ones((N,1)),X])
        dp = d+1
    else:
        A = X.copy()
        dp = d
        
    if algorithm == "projection":
        v = np.linalg.inv(A.T.dot(A)).dot(A.T).dot(y)
    elif algorithm == "grad_descent":
        v = np.random.rand(dp,)
        for i in range(n_iter):
            grad = (2/N)*((A.dot(v)-y).T).dot(A)
            v = v - learning_rate*grad
            print(np.linalg.norm(A.dot(v)-y)**2/X.shape[0])
            
LR_gradient(X, y, algorithm = "grad_descent")

# Yes, it is always decreasing.

##### Exercise 3
This exercise checks if the gradient formula is correct (or at least reasonable).  
Let  
```python
N,dp = 100,3
np.random.seed(20025)
A = np.random.randn(N,dp)
v = np.random.randn(dp)
y = np.random.randn(N)
```
Define $c(A, {\bf v}, {\bf y}) = \frac{1}{N}\|A{\bf v} - {\bf y}\|^2$.

###### 3(a)
Write a function `cost(A, v, y)` that calculate $c(A, {\bf v}, {\bf y})$.

In [None]:
### your answer here

N,dp = 100,3
np.random.seed(20025)
A = np.random.randn(N,dp)
v = np.random.randn(dp)
y = np.random.randn(N)

def cost(A, v, y):
    return (1/N)*(np.linalg.norm(A.dot(v)-y))**2

###### 3(b)
Calculate the gradient  
$$\frac{2}{N}(A{\bf v} - y)^\top A.$$

In [None]:
### your answer here

2/N*(A.dot(v)-y).T.dot(A)

###### 3(c)
Let  
```python
h = 0.001
i = 0
e = np.zeros((dp,))
e[i] = 1
g = (cost(A, v+h*e, y) - cost(A, v, y)) / h
```
Run the code for `i = 0,1,2` and compare `g` with the gradient in 3(b).  
If necessary, change `h` to smaller numbers.

In [None]:
### your answer here

h = 0.00000001
i = 0

for i in range(3):
    e = np.zeros((dp,))
    e[i] = 1
    g = (cost(A, v+h*e, y) - cost(A, v, y)) / h
    print(g)
    
# g is similar to the gradient in 3(b).