# <font color="blue">Lesson 3 Basic Models</font>

# Testing Regression Algorithms on the Same Dataset

I'm using a KFold validation on the Boston House Pricing Dataset (downloaded from Kaggle) with several regression algorithms. This dataset is designed to predict median value (continuous) based on a variety of features, including the proximity to the Charles River. 

Don't worry about the "details" for now. Just focus on the testing of the accuracy of these side by side. This is a method you can use to identify the best classifier for a particular type of problem.

**Linear**
* Linear Regression
* Ridge Regression (L2-Norm)
* Least Absolute Shrinkage and Selection Operator (LASSO) (L1-Norm)
* ElasticNet (L1- and L2-Norm)

**Nonlinear**
* kNN Regressor
* Regressor tree 
* Support Vector Regressor (SVR)

In [None]:
# Load Boston Housing Data

import pandas as pd
import numpy as np

# read in the file and assign headers
headers = ['crime', 'zone_res','zone_ind', 'C_river', 'NOX', 'rooms', 'age', 'dist', 'hwy_acc', 
         'prop_tax', 'PT_ratio', 'AA_prop', 'low_inc', 'median_val' ]
df = pd.read_csv('https://library.startlearninglabs.uw.edu/DATASCI420/Datasets/housing.csv', 
                 delim_whitespace=True, names=headers) 

x1 = df.iloc[:, 0:13]   # load features into X DF
y1 = df.iloc[:, 13]     # Load target into Y DF
df.describe()

In [None]:
from matplotlib import pyplot

df.plot(kind='box', subplots=True, layout=(5,3), sharex=False, sharey=False, figsize=(12,15))
pyplot.show()

## Linear Algorithms for Regression 

### Linear Regression
Assumes the following:
* a normal distribution
* all of the independent variables are relevant to the dependent variable
* the independent variables are not highly correlated with one another (no co-linearity)

In [None]:
from sklearn.model_selection import KFold, cross_val_score 
from sklearn.linear_model import LinearRegression

kfold = KFold(n_splits=10, random_state=7)  # 10 fold cross validation ; 
                                            # 7 random state is to assure consistent results

lin_reg_results = cross_val_score(LinearRegression(), x1, y1, cv=kfold, scoring='neg_mean_squared_error') 
print("Negative MAE-> mean:%.3f (std:%.3f)" % (lin_reg_results.mean(), lin_reg_results.std()))

### Ridge Regression
Ridge is an extension of linear regression where the loss function is modiﬁed to minimize the complexity of the model measured as the sum squared value of the coeﬃcient values (AKA L2-norm).

In [None]:
from sklearn.linear_model import Ridge 
rid_reg_results = cross_val_score(Ridge(), x1, y1, cv=kfold, scoring='neg_mean_squared_error')
print("Negative MAE-> mean:%.3f (std:%.3f)" % (rid_reg_results.mean(), rid_reg_results.std()))

### LASSO Regression (Least Absolute Shrinkage and Selection Operator)
Modified linear regression where the loss function is modified to minimize the complexity of the model as measured by sum of the absolute vale of the coefficients (AKA L1-norm). 

In [None]:
from sklearn.linear_model import Lasso

lasso_results = cross_val_score(Lasso(), x1, y1, cv=kfold, scoring='neg_mean_squared_error')
print("Negative MAE-> mean:%.3f (std:%.3f)" % (lasso_results.mean(), lasso_results.std()))

### ElasticNet Regression
Combines the properties of both LASSO and Ridge regressions--to minimize the complexity of a regression model--in both magnitude and regression coeficients--by penalizing the model using both L2 (sum of squared coefficient values) and L1-norm (sum absolute coefficient values). 

In [None]:
from sklearn.linear_model import ElasticNet

elast_results = cross_val_score(ElasticNet(), x1, y1, cv=kfold, scoring='neg_mean_squared_error')
print("Negative MAE-> mean:%.3f (std:%.3f)" % (elast_results.mean(), elast_results.std()))

## Non-linear Regression Algorithms
### kNN Regressor
kNN regressors use a mean or median output variable is taken as the prediction of similarity to new inputs. The distance metric used is Minkowski by default, which is a generalization of both the Euclidean distance (used when all inputs have the same scale) and Manhattan distance (used when the scales of the input variables diﬀer).

In [None]:
from sklearn.neighbors import KNeighborsRegressor

kNN_results = cross_val_score(KNeighborsRegressor(), x1, y1, cv=kfold, scoring='neg_mean_squared_error')
print("Negative MAE-> mean:%.3f (std:%.3f)" % (kNN_results.mean(), kNN_results.std()))

### CART
Like the decision tree classifier except for continuous variables

In [None]:
from sklearn.tree import DecisionTreeRegressor

reg_tree_results = cross_val_score(DecisionTreeRegressor(), x1, y1, cv=kfold, scoring='neg_mean_squared_error')
print("Negative MAE-> mean:%.3f (std:%.3f)" % (reg_tree_results.mean(), reg_tree_results.std()))

### SVR
Like the Support Vector classifier except for continuous variables. 

In [None]:
from sklearn.svm import SVR

svr_results = cross_val_score(SVR(), x1, y1, cv=kfold, scoring='neg_mean_squared_error')
print("Negative MAE-> mean:%.3f (std:%.3f)" % (svr_results.mean(), svr_results.std()))

In [None]:
print("LinearNeg MAE-> mean:%.3f (std:%.3f)" % (lin_reg_results.mean(), lin_reg_results.std()))
print("Ridge Neg MAE-> mean:%.3f (std:%.3f)" % (rid_reg_results.mean(), rid_reg_results.std()))
print("LASSO Neg MAE-> mean:%.3f (std:%.3f)" % (lasso_results.mean(), lasso_results.std()))
print("Elast Neg MAE-> mean:%.3f (std:%.3f)" % (elast_results.mean(), elast_results.std()))
print("kNN   Neg MAE-> mean:%.3f (std:%.3f)" % (kNN_results.mean(), kNN_results.std()))
print("CART  Neg MAE-> mean:%.3f (std:%.3f)" % (reg_tree_results.mean(), reg_tree_results.std()))
print("SVR   Neg MAE-> mean:%.3f (std:%.3f)" % (svr_results.mean(), svr_results.std()))

What does this tell us?