# PolynomialRegression with scikit learn

![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png)  
This work by Jephian Lin is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).

In [None]:
import numpy as np
import matplotlib.pyplot as plt

## Code
```python
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

def PolynomialRegression(degree=2, fit_intercept=True):
    return make_pipeline(PolynomialFeatures(degree=degree, include_bias=False), 
                         LinearRegression(fit_intercept=fit_intercept))
```

Official Reference: [PolynomialFeatures](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html) and [make_pipeline](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.make_pipeline.html)

In [None]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

def PolynomialRegression(degree=2, fit_intercept=True):
    return make_pipeline(PolynomialFeatures(degree=degree, include_bias=False), 
                         LinearRegression(fit_intercept=fit_intercept))

## Parameters
- `degree`: the degree of the polynomial  
for example, if `degree=2` and two features a and b are given  
then a, b, a^2, a*b, b^2 are generated as the expanded features
- `fit_intercept`: whether to calculate the intercept or not  

## Attributes
For `model[0]`:
- `n_input_features`: the width of `X`  
- `n_output_features`: the number of expanded features
- `powers_` : an array of shape (n_output_features, n_input_features) that stores how each expanded feature is obtained

For `model[1]`:
- `coef_`: an array of shape `(n_output_features,)` that stores the coefficients for each expanded feature
- `intercept_`: the coefficient of the constant

## Sample data

##### Exercise 1
Let  
```python
x = np.arange(10)
y = 0.1*x**2 + 0.2*x + 0.3 + 0.5*np.random.randn(10)
X = x[:,np.newaxis]
x_test = np.linspace(0,10,20)
X_test = x_test[:,np.newaxis]

model = PolynomialRegression(2)
model.fit(X, y)
y_new = model.predict(X_test)
```

###### 1(a)
Use `plt.scatter` to plot the points with `x` and `y` .  
Use `plt.plot( ..., c='r')` to plot the line with `x_test` and `y_new` .  
Print `model[1].coef_` and  `model[1].intercept_` .  
Can you guess these values by the definition of `y` ?

In [None]:
### your answer here
x = np.arange(10)
y = 0.1*x**2 + 0.2*x + 0.3 + 0.5*np.random.randn(10)
X = x[:,np.newaxis]
x_test = np.linspace(0,10,20)
X_test = x_test[:,np.newaxis]

model = PolynomialRegression(2)
model.fit(X, y)
y_new = model.predict(X_test)

plt.scatter(x,y)
plt.plot(X_test, y_new ,c = 'r')
print(model[1].coef_)
print(model[1].intercept_)

#### Alex:
You do not answer the question: Can you guess these values by the definition of `y`?

In [None]:
print(model[0].n_features_in_)
print(model[0].n_output_features_)
print(model[0].powers_)

###### 1(b)
Redo 1(a) with the setting `fit_intercept=False` .

In [None]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

def PolynomialRegression(degree=2, fit_intercept=False):
    return make_pipeline(PolynomialFeatures(degree=degree, include_bias=False), 
                         LinearRegression(fit_intercept=fit_intercept))

In [None]:
### your answer here
x = np.arange(10)
y = 0.1*x**2 + 0.2*x + 0.3 + 0.5*np.random.randn(10)
X = x[:,np.newaxis]
x_test = np.linspace(0,10,20)
X_test = x_test[:,np.newaxis]

model = PolynomialRegression(2)
model.fit(X, y)
y_new = model.predict(X_test)
plt.scatter(x,y)
plt.plot(X_test, y_new ,c = 'r')
print(model[1].coef_)
print(model[1].intercept_)


#### Alex:
You do not need to redefine the function again with default parameter `fit_intercept=False`. Instead, you can just call the function with argument `fit_intercept=False`.

Notice that in each question below, you do not pass `fit_intercept` to the function, so the default value will be `False`.

##### Exercise 2
Let  
```python
x1 = np.arange(5)
X = np.vstack([x1]).T

model = PolynomialFeatures(degree=3, include_bias=False)
X_ex = model.fit_transform(X)
```

###### 2(a)
Understand the relation between `X` and `X_ex` .  
Can you generate `X_ex` by boradcasting instead of the `PolynomialFeatures` function?

In [None]:
### your answer here
x1 = np.arange(5)
X = np.vstack([x1]).T

model = PolynomialFeatures(degree=3, include_bias=False)
X_ex = model.fit_transform(X)
print(X)
print(X_ex)

#### Alex:
You can use `X**np.arange(1,4)` to generate the same result as `X_ex`.

###### 2(b)
Switch the setting to `include_bias=True` .  
Understand the relation between `X` and `X_ex` .  

In [None]:
### your answer here
x1 = np.arange(5)
X = np.vstack([x1]).T

model = PolynomialFeatures(degree=3, include_bias=True)
X_ex = model.fit_transform(X)
print(X)
print(X_ex)

###### 2(c)
Let  
```python
x1 = np.arange(5)
x2 = np.arange(5,10)
X = np.vstack([x1,x2]).T

model = PolynomialFeatures(degree=2, include_bias=False)
X_ex = model.fit_transform(X)
```
Print `model.powers_` and understand the relation between `X` and `X_ex` .  

In [None]:
### your answer here
x1 = np.arange(5)
x2 = np.arange(5,10)
X = np.vstack([x1,x2]).T

model = PolynomialFeatures(degree=2, include_bias=False)
X_ex = model.fit_transform(X)
print(model.powers_)
print(X)
print(X_ex)
#X = |x1|x2|
#X_ex = |x1|x2|x1**2|x1*x2|x2**2|
#model.powers_ means X_ex consists of x1 and x2 舉例來說:[1 0]是一個x1跟0個x2組成

##### Exercise 3
Let  
```python
r = 100 * np.random.rand(100)
area = 4*np.pi*r**2 + 0.5*np.random.randn(100)
```
be a collection of data of 100 balls,  
where `r` stores the radii and  
`area` stores the surface areas.  
Suppose you knows nothing about the formula of the surface area of a sphere.  
How would you guess their relation?

In [None]:
### your answer here
r = 100 * np.random.rand(100)
area = 4*np.pi*r**2 + 0.5*np.random.randn(100)
R = r[:,np.newaxis]
r_test = np.linspace(0,100,30)
R_test = r_test[:,np.newaxis]

model = PolynomialRegression(2) 
model.fit(R, area)
area_test = model.predict(R_test)
plt.scatter(r,area)
plt.plot(R_test,area_test,c='r')

print(model[1].coef_)
print(model[1].intercept_)
#The print out 12.5 which again is quite close 4*pi

#### Alex:
In general, if you do not know the formula, you should include the intercept term.
In fact, you do not even know which degree you should pick, so maybe you can try various degree and choose the reasonable one.

##### Exercise 4
Let  
```python
r = 100 * np.random.rand(100)
volume = 4/3*np.pi*r**3 + 0.5*np.random.randn(100)
```
be a collection of data of 100 balls,  
where `r` stores the radii and  
`volume` stores the volumes.  
Suppose you knows nothing about the formula of the surface area of a sphere.  
How would you guess their relation?

In [None]:
### your answer here
r = 100 * np.random.rand(100)
R = r[:,np.newaxis]
volume = 4/3*np.pi*r**3 + 0.5*np.random.randn(100)

model = PolynomialRegression(3)
model.fit(R, volume)
coefs = model[1].coef_,
intercept = model[1].intercept_
print(intercept)
print(coefs)
# 4.18  which again is quite close 4/3*pi

#### Alex:
Similar to previous question, you should pass the argument `fit_intercept=True`.

## Experiments

##### Exercise 5
Let  
```python
x = np.arange(10)
y = 0.1*x**2 + 0.2*x + 0.3 + 0.5*np.random.randn(10)
X = x[:,np.newaxis]
x_test = np.linspace(0,10,20)
X_test = x_test[:,np.newaxis]
```
For `k = 0, ..., 4`, run the polynomial regression model with `degree=k`.  
Let `scores` be a list storing their scores.  
Plot the scores.  
Which degree is an appropriate guess?

In [None]:
### your answer here
from sklearn.metrics import mean_absolute_error

x = np.arange(10)
y = 0.1*x**2 + 0.2*x + 0.3 + 0.5*np.random.randn(10)
X = x[:,np.newaxis]
x_test = np.linspace(0,10,20)
X_test = x_test[:,np.newaxis]
y_test = 0.1*x_test**2 + 0.2*x_test + 0.3 + 0.5*np.random.randn(20)

scores = []
coeffs = []

for k in range(1, 5):
    model = PolynomialRegression(k)
    model.fit(X, y)
    coeffs.append(model[1].coef_)
    y_pred = model.predict(X_test)
    scores.append(mean_absolute_error(y_test, y_pred))


for i in range(4):
    print(i+1, scores[i], coeffs[i])
    
#the best score was correctly achieved with degree of 2

#### Alex:
Usually, you will not have `y_test` and you cannot generate it by yourself since you do not know the actual formula, so you should use your training data to calculate the scores. 

##### Exercise 6
Let  
```python
x = np.arange(10)
y = 0.1*x**2 + 0.2*x + 0.3 + 0.5*np.random.randn(10)
X = x[:,np.newaxis]

model = PolynomialRegression(2)
model.fit(X, y)
y_new = model.predict(X)

a0 = model[1].intercept_
a1,a2 = model[1].coef_
```
The prediction `y_new` is supposed to be the same as `a0 + a1*x + a2*x**2` .  
Check if it is true.

In [None]:
### your answer here
x = np.arange(10)
y = 0.1*x**2 + 0.2*x + 0.3 + 0.5*np.random.randn(10)
X = x[:,np.newaxis]

model = PolynomialRegression(2)
model.fit(X, y)
y_new = model.predict(X)

a0 = model[1].intercept_
a1,a2 = model[1].coef_
print(y_new-(a0 + a1*x+a2*x**2))
#It is true about y_new = a0 + a1*x + a2*x**2