# Introduction
<hr style = "border:2px solid black" > </hr >

<div class="alert alert-warning">
<font color=black>

**What?** Testing linear and non-linear regression model.

</font>
</div>

# Linear vs. non-linear methods
<hr style = "border:2px solid black" > </hr >

<div class="alert alert-info">
<font color=black>

-  You cannot know which algorithms are best suited to your problem beforehand. 
- You must trial a number of methods and focus attention on those that prove themselves the most promising.
<br><br>
- **LINEAR**models:
    - Linear Regression
    - Ridge Regression
    - LASSO Linear Regression
    [- Elastic Net Regression
- **NON-LINEAR** models:
    - k-Nearest Neighbors
    - Classification and Regression Trees
    - Support Vector Machines

</font>
</div>

# Import modules
<hr style = "border:2px solid black" > </hr >

In [1]:
import numpy as np
from pandas import read_csv
from sklearn.svm import SVR
from sklearn.datasets import load_boston
from sklearn.model_selection import KFold
from IPython.display import Markdown, display
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet

# Read-in the dataset
<hr style = "border:2px solid black" > </hr >

In [2]:
X, Y = load_boston(return_X_y = True)
Y = np.array(Y)
print("Inputs shape: ", X.shape)
print("Labels shape", Y.shape)

Inputs shape:  (506, 13)
Labels shape (506,)


# Linear models
<hr style = "border:2px solid black" > </hr >

## Linear regression

<div class="alert alert-info">
<font color=black>

- Linear regression assumes that the input variables have a Gaussian distribution. 
- It is also assumed that input variables are relevant to the output variable and that they are not highly correlated with each other (a problem called COLLINEARITY).

</font>
</div>

In [3]:
kfold = KFold(n_splits = 10, shuffle = True, random_state=7)
model = LinearRegression()
results = cross_val_score(model, X, Y, cv=kfold, scoring = 'neg_mean_squared_error') 
print("Mean: ", results.mean())

Mean:  -23.746501811313365


## Ridge regression

<div class="alert alert-info">
<font color=black>

- Ridge regression is an extension of linear regression where the loss function  is modified to minimize the complexity of the model measured as the sum squared value of the coefficient values (also called the L2-norm)  

</font>
</div>

In [4]:
kfold = KFold(n_splits = 10, shuffle = True, random_state=7)
model = Ridge()
results = cross_val_score(model, X, Y, cv=kfold, scoring = 'neg_mean_squared_error') 
print("Mean: ", results.mean())

Mean:  -23.88989018505344


## LASSO regression

<div class="alert alert-info">
<font color=black>

- LASSO = Least Absolute Shrinkage and Selection Operator
- It is a modification of linear regression, like ridge regression, where the loss function is modified to minimize the complexity of the model measured as the sum absolute value of the coefficient values (also called the L1-norm).

</font>
</div>

In [5]:
kfold = KFold(n_splits = 10, shuffle = True, random_state=7)
model = Lasso()
results = cross_val_score(model, X, Y, cv=kfold, scoring = 'neg_mean_squared_error') 
print("Mean: ", results.mean())

Mean:  -28.74589007585154


## ElasticNet regression

<div class="alert alert-info">
<font color=black>

- ElasticNet is a form of regularization regression that combines the properties  of both Ridge Regression and LASSO regression. 
- It seeks to minimize the complexity of the regression model (magnitude and number of regression coefficients) by  penalizing the model using both the L2-norm (sum squared coefficient values) and  the L1-norm (sum absolute coefficient values)

</font>
</div>

In [6]:
kfold = KFold(n_splits = 10, shuffle = True, random_state=7)
model = ElasticNet()
results = cross_val_score(model, X, Y, cv=kfold, scoring = 'neg_mean_squared_error') 
print("Mean: ", results.mean())

Mean:  -27.908420360231055


# Non-linear models
<hr style = "border:2px solid black" > </hr >

## K-Nearest Neighbors

<div class="alert alert-info">
<font color=black>

- The k-Nearest Neighbors algorithm (or KNN) locates the k most similar instances  in the training dataset for a new data instance. 
- From the k neighbors, a mean or median output variable is taken as the prediction. Of note is the distance metric used (the metric argument). 
- The Minkowski distance is used by default, which is a generalization of both the Euclidean distance (used when all inputs  have the same scale) and Manhattan distance (for when the scales of the input  variables differ).

</font>
</div>

In [7]:
kfold = KFold(n_splits = 10, shuffle = True, random_state=7)
model = KNeighborsRegressor()
results = cross_val_score(model, X, Y, cv=kfold, scoring = 'neg_mean_squared_error') 
print("Mean: ", results.mean())

Mean:  -38.852320266666666


## Trees (CART)

<div class="alert alert-info">
<font color=black>

- Decision trees or the Classification and Regression Trees (CART as they are known) use the training data to select the best points to split the data in order to  minimize a cost metric.

</font>
</div>

In [8]:
kfold = KFold(n_splits = 10, shuffle = True, random_state=7)
model = DecisionTreeRegressor()
results = cross_val_score(model, X, Y, cv=kfold, scoring = 'neg_mean_squared_error') 
print("Mean: ", results.mean())

Mean:  -22.831329411764706


## Support Vector Machines (SVM)

<div class="alert alert-info">
<font color=black>

- Support Vector Machines (SVM) were developed for binary classification.
- The technique has been extended for the prediction real-valued problems called Support Vector Regression (SVR)

</font>
</div>

In [9]:
kfold = KFold(n_splits = 10, shuffle = True, random_state=7)
model = SVR()
results = cross_val_score(model, X, Y, cv=kfold, scoring = 'neg_mean_squared_error') 
print("Mean: ", results.mean())

Mean:  -67.64140705473743


# References
<hr style = "border:2px solid black" > </hr >

<div class="alert alert-warning">
<font color=black>

- https://machinelearningmastery.com/

</font>
</div>