> Reference:
+ [machinelearningmastery: regression mla spot checking](http://machinelearningmastery.com/spot-check-regression-machine-learning-algorithms-python-scikit-learn/)

+ Linear Machine Learning Algorithms:
    - Linear Regression
    - Ridge Regression
    - LASSO Linear Regression
    - Elastic Net Regression
+ Nonlinear Machine Learning Algorithms:
    - K-Nearest Neighbors
    - Classification and Regression Trees
    - Support Vector Machines
    
A test harness with 10-fold cross validation is used to demonstrate how to spot check each machine learning algorithm and mean squared error measures are used to indicate algorithm performance. Note that mean squared error values are inverted (negative). This is a quirk of the cross_val_score() function used that requires all algorithm metrics to be sorted in ascending order (larger value is better).

In [8]:
import pandas
from sklearn import cross_validation
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data"
names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
dataframe = pandas.read_csv(url, delim_whitespace=True, names=names)
array = dataframe.values
X = array[:,0:13]
Y = array[:,13]
num_folds = 10
num_instances = len(X)
seed = 7
scoring = 'mean_squared_error'
kfold = cross_validation.KFold(n=num_instances, n_folds=num_folds, random_state=seed)

# Linear MLA: Linear Regression #
+ Assumes Gaussian distribution of features.
+ Assumes that input variables are relevant to the output variable and that they are not highly correlated with each other (a problem called collinearity).

In [9]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
results = cross_validation.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

-34.7052559445


# Linear MLA: Ridge Regression #
+ An extension of linear regression where the loss function is modified to minimize the complexity of the model measured as the sum squared value of the coefficient values (also called the l2-norm).

In [10]:
from sklearn.linear_model import Ridge
model = Ridge()
results = cross_validation.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

-34.0782462093


# Linear MLA: LASSO Regression #
+ The Least Absolute Shrinkage and Selection Operator (or LASSO for short) is a modification of linear regression, like ridge regression, where the loss function is modified to minimize the complexity of the model measured as the sum absolute value of the coefficient values (also called the l1-norm).

In [12]:
from sklearn.linear_model import Lasso
model = Lasso()
results = cross_validation.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

-34.4640845883


# Linear MLA: ElasticNet Regression #
+ ElasticNet is a form of regularization regression that combines the properties of both Ridge Regression and LASSO regression.
+ It seeks to minimize the complexity of the regression model (magnitude and number of regression coefficients) by penalizing the model using both the l2-norm (sum squared coefficient values) and the l1-norm (sum absolute coefficient values).

In [13]:
from sklearn.linear_model import ElasticNet
model = ElasticNet()
scoring = 'mean_squared_error'
results = cross_validation.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

-31.1645737142


# Nonlinear MLA: K-Nearest Neighbors #
+ K-Nearest Neighbors (or KNN) locates the K most similar instances in the training dataset for a new data instance. From the K neighbors, a mean or median output variable is taken as the prediction. 
+ The **Minkowski distance** is used as the distance metric by default, which is a generalization of both the Euclidean distance (used when all inputs have the same scale) and Manhattan distance (for when the scales of the input variables differ).

In [14]:
from sklearn.neighbors import KNeighborsRegressor
model = KNeighborsRegressor()
scoring = 'mean_squared_error'
results = cross_validation.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

-107.28683898


# Nonlinear MLA: Classification and Regression Trees #
+ CART uses the training data to select the best points to split the data in order to minimize a cost metric. 
+ The default cost metric for regression decision trees is the mean squared error, specified in the criterion parameter.

In [18]:
from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor()
scoring = 'mean_squared_error'
results = cross_validation.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

-38.5316560784


# Nonlinear MLA: Support Vector Machines #
+ Developed for binary classification. 
+ The technique has been extended for the prediction real-valued problems called Support Vector Regression (SVR). 
+ Like the classification example, SVR is built upon the LIBSVM library.

In [19]:
from sklearn.svm import SVR
model = SVR()
scoring = 'mean_squared_error'
results = cross_validation.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

-91.0478243332
