# PolynomialRegression from scratch

![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png)  
This work by Jephian Lin is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).

In [None]:
import numpy as np
import matplotlib.pyplot as plt

## Algorithm
**Input:**  
- `X`: an array of shape `(N,1)` whose rows are samples and columns are features
- `y`: the labels of shape `(N,)`
- `degree`: the degree of the polynomial 
- `**kwargs`: keywords for your linear regression function

**Output:**  
Revised output of your linear regression function.

**Steps:**
1. Let `X_ex = X**np.arange(1, degree + 1)` .
2. Suppose `LR` is your linear regression fuction.  
Let `predict_lin,coef,intercept = LR(X_ex, y, **kwargs)` .  
3. Define the function `predict` that sends `X_test` to `(X_test**np.arange(1, degree+1)).dot(coef) + intercept` .

## Pseudocode
Translate the algorithm into the pseudocode.  
This helps you to identify the parts that you don't know how to do it.  

    1. 
    2. 
    3. ...

## Code

In [None]:
### your answer here
from sklearn.metrics import mean_absolute_error

class PolynomialRegression:
    def __init__(self, degree):
        self.degree = degree
    
    def LR_proj(self, x, y, regularization=None, alpha=1e-4):
        A = x
        v = np.linalg.inv(A.T.dot(A)).dot(A.T.dot(y))
        return v[0], v[1:]
    
    def LR_gd(self, x, y, n_iter=100, learning_rate=1e-4, regularization=None, alpha=1e-4, verbose=False):
        self.scores = []
        A = x
        v = np.random.rand(A.shape[1],)
        for i in range(n_iter):
            d = 2/X.shape[0]*(A.dot(v)-y).T.dot(A)
            if regularization == 'L1':
                d = d + alpha * np.sign(v)
            if regularization == 'L2':
                d = d + alpha * 2 * v
            v = v - learning_rate*d
            y_pred = A[:, 1:].dot(v[1:]) + v[0]
            self.scores.append(mean_absolute_error(y, y_pred))
            if verbose:
                print(self.scores[-1])
        return v[0], v[1:]
    
    def fit(self, x, y, algorithm='projection', regularization=None, verbose=False):
        self.powers = [x for x in range(self.degree+1)]
        self.X = x**self.powers
        self.y = y
        if algorithm == 'projection':
            self.intercept_, self.coef_ = self.LR_proj(self.X, y, regularization)
        if algorithm == 'grad_descent':
            self.intercept_, self.coef_ = self.LR_gd(self.X, y, 100, 1e-4, regularization, 1e-4, verbose)
        
    def predict(self, x):
        X = x**self.powers[1:]
        return X.dot(self.coef_) + self.intercept_

#### Alex: 
Your function cannot adjust `n_iter`, `learning_rate`, `alpha` since you fix these values in `fit`. Also, you should include `fit_intercept` as a parameter.

You should not use the package. You should calculate the error by yourself.

## Test
Take some sample data from [PolynomialRegression-with-scikit-learn](PolynomialRegression-with-scikit-learn.ipynb) and check if your code generates similar outputs with the existing packages.

##### Name of the data
Description of the data.

In [None]:
### results with your code
x = np.arange(10)
X = x[:,np.newaxis]
y = 0.1*x**2 + 0.2*x + 0.3 + 0.5*np.random.randn(10)

x_test = np.linspace(0,10,20)
X_test = x_test[:,np.newaxis]

model = PolynomialRegression(2)
model.fit(X, y)

y_pred = model.predict(X_test)

print(model.coef_)
print(model.intercept_)
print(y_pred)

In [None]:
### results with existing packages
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

def SKPolynomialRegression(degree=2, fit_intercept=True):
    return make_pipeline(PolynomialFeatures(degree=degree, include_bias=False), 
                         LinearRegression(fit_intercept=fit_intercept))

model = SKPolynomialRegression(2)
model.fit(X, y)
y_pred = model.predict(X_test)

print(model[1].coef_)
print(model[1].intercept_)
print(y_pred)

## Comparison

##### Exercise 1
Let  
```python
degree = 3
x = np.arange(5)
X = x[:,np.newaxis]
```

###### 1(a)
Let `X_ex1 = X**np.arange(1, degree+1)` .  
The new data `X_ex1` is supposed to be the same as the output of `sklearn.preprocessing.PolynomialFeatures` with `include_bias=False` .  
Check if this is true.

In [None]:
### your answer here
degree = 3
x = np.arange(5)
X = x[:,np.newaxis]
X_ex1 = X**np.arange(1, degree+1)

ftr_model = PolynomialFeatures(degree, include_bias=False)
sk_X_ex1 = ftr_model.fit_transform(X)
print(X_ex1)
print(sk_X_ex1)
print('It is ture about X_ex1 be the same as the output')

###### 1(b)
Let `X_ex1 = X**np.arange(0, degree+1)` .  
The new data `X_ex1` is supposed to be the same as the output of `sklearn.preprocessing.PolynomialFeatures` with `include_bias=False` .  
Check if this is true.

In [None]:
### your answer here
degree = 3
x = np.arange(5)
X = x[:,np.newaxis]
X_ex1 = X**np.arange(0, degree+1)

ftr_model = PolynomialFeatures(degree, include_bias=False)
sk_X_ex1 = ftr_model.fit_transform(X)
print(X_ex1)
print(sk_X_ex1)
print('X_ex1 is not same as sk_X_ex1')

#### Alex:
It would be the same if you set `include_bias=True`.

##### Exercise 2
Let  
```python
x = np.arange(10)
y = 0.1*x**2 + 0.2*x + 0.3 + 0.5*np.random.randn(10)
X = x[:,np.newaxis]
```

###### 2(a)
Let `degree=2` .
Apply the linear regresssion algorithm to `X`  
1. by your code with `algorithm=="projection"` ,  
2. by your code with `algorithm=="grad_descent"` ,  
3. by `sklearn.linear_model.LinearRegresssion` .  

Check if the outputs are almost the same (up to some numerical errors).  

In [None]:
### your answer here
x = np.arange(10)
y = 0.1*x**2 + 0.2*x + 0.3 + 0.5*np.random.randn(10)
X = x[:,np.newaxis]
degree = 2

model = PolynomialRegression(2)
model.fit(X, y, algorithm='projection')
proj_y_pred = model.predict(X)

model.fit(X, y, algorithm='grad_descent', verbose=False)
gd_y_pred = model.predict(X)

sk_model = SKPolynomialRegression(2)
sk_model.fit(X, y)
sk_y_pred = sk_model.predict(X)

In [None]:
print(proj_y_pred)
print(gd_y_pred)
print(sk_y_pred)
print('1跟3是一樣的')

#### Alex:
You can adjust the `n_iter` and `learning_rate` to make the second result to be the same as the others.

###### 2(b)
Modify your code so that it prints the mean square error at each step of the gradient descent.  
Check if it is always decreasing.

In [None]:
### your answer here
plt.plot(list(range(100)), model.scores)

#### Alex:
Your `model.scores` stores mean absolute error, but you are supposed to be print out the mean square error. Also, it is hard to see whether the errors are always decreasing by the figure. 

##### Exercise 3
Add a new keyword `regularization`, which can be `None`, `"L1"`, or `"L2"` .  
Add another keyword `alpha`, which is a positive number.  

When `regularization==None`, the cost function is 
$$\frac{1}{N}\sum_{i=0}^{N-1}\|f({\bf x}_i) - y_i\|^2.$$ 
When `regularization=="L1"`, the cost function is 
$$\frac{1}{N}\sum_{i=0}^{N-1}\|f({\bf x}_i) - y_i\|^2 + \alpha\sum_{i=0}^{d-1}|c_i|.$$ 
When `regularization=="L2"`, the cost function is 
$$\frac{1}{N}\sum_{i=0}^{N-1}\|f({\bf x}_i) - y_i\|^2 + \alpha\sum_{i=0}^{d-1}c_i^2.$$ 
Here ${\bf x}_i$ are the data, $y_i$ are the labels, and $c_i$ are the coefficients to be solved.

The regularization avoids the coefficients being too high.

###### 3(a)
When `regularization=="L1"`, the correct gradient is `g = g0 + alpha * np.sign(c)` , where `g0` is the gradient when `regularization==None` .  
Update your code for L1.

In [None]:
### your answer here
x = np.arange(10)
y = 0.1*x**2 + 0.2*x + 0.3 + 0.5*np.random.randn(10)
X = x[:,np.newaxis]
degree = 2

model = PolynomialRegression(2)
model.fit(X, y, algorithm='grad_descent', regularization='L1', verbose=False)
l1_gd_y_pred = model.predict(X)

sk_model = SKPolynomialRegression(2)
sk_model.fit(X, y)
sk_y_pred = sk_model.predict(X)

In [None]:
print(l1_gd_y_pred)
print(sk_y_pred)

###### 3(b)
When `regularization=="L2"`, the correct gradient is `g = g0 + alpha * 2 * c` , where `g0` is the gradient when `regularization==None` .  
Update your code for L2.

In [None]:
### your answer here
model.fit(X, y, algorithm='grad_descent', regularization='L2', verbose=False)
l2_gd_y_pred = model.predict(X)

In [None]:
print(l2_gd_y_pred)
print(sk_y_pred)