<a href="https://colab.research.google.com/github/marcelounb/ML-Mastery-with-Python-Course/blob/master/chap12_Spot_Check_Regression_Algorithms.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Spot-checking is a way of discovering which algorithms perform well on your machine learning problem. You cannot know which algorithms are best suited to your problem beforehand. You must trial a number of methods and focus attention on those that prove themselves the most promising.

# Algorithms Overview
In this lesson we are going to take a look at seven regression algorithms that you can spot-check on your dataset. 

Starting with four linear machine learning algorithms:

*   Linear Regression
*   Ridge Regression
*   LASSO Linear Regression
*   Elastic Net Regression

Then looking at three nonlinear machine learning algorithms:

*   k-Nearest Neighbors
*   Classiﬁcation and Regression Trees
*   Support Vector Machines



Each recipe is demonstrated on the Boston House Price dataset. This is a regression problem where all attributes are numeric. A test harness with 10-fold cross validation is used to demonstrate how to spot-check each machine learning algorithm and mean squared error measures are used to indicate algorithm performance. Note that mean squared error values are inverted (negative). This is a quirk of the cross val score() function used that requires all algorithm metrics to be sorted in ascending order (larger value is better)

# Linear Machine Learning Algorithms

In [0]:
from pandas import read_csv 
from sklearn.model_selection import KFold 
from sklearn.model_selection import cross_val_score 
from sklearn.linear_model import LinearRegression 
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.linear_model import ElasticNet

In [2]:
filename = 'housing.csv' 
names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV'] 
dataframe = read_csv(filename, delim_whitespace=True, names=names) 
array = dataframe.values 
X = array[:,0:13] 
Y = array[:,13] 
kfold = KFold(n_splits=30, random_state=7) 
modelos = []



 Linear Regression


In [0]:
model = LinearRegression() 
scoring = 'neg_mean_squared_error' 
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)

In [4]:
modelos.append(['Linear Regression', results.mean()])
results.mean()

-30.734398351340932

Ridge Regression

In [0]:
model = Ridge()
scoring = 'neg_mean_squared_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)

In [6]:
modelos.append(['Ridge', results.mean()])
results.mean()

-30.462475684757823

LASSO Linear Regression

In [0]:
model = Lasso()
scoring = 'neg_mean_squared_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)

In [8]:
modelos.append(['Lasso', results.mean()])
results.mean()

-33.342725397610884

ElasticNet Regression

In [0]:
model = ElasticNet()
scoring = 'neg_mean_squared_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)

In [10]:
modelos.append(['ElasticNet', results.mean()])
results.mean()

-31.21606919247604