# Algorithms Overview

we are going to take a look at **seven regression algorithms** that we can spot-check on our dataset.

**1. Linear ML Algorithms**
   * Linear Regression
   * Ridge Regression
   * LASSO Linear Regression
   * Elastic Net Regression
   
**2. Nonlinear ML Algorithms**
   * k-Nearest Neighbors
   * Classification and Regression Trees
   * Support Vector Machines

## 1.1 Linear Regression
* It assumes the input variables have a Gaussian distribution.
* It is also assumed that input variables are relevant to the output variable and they are not highly correlated with each other.

In [1]:
# Linear Regression
from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
filename = 'housing.csv'
names = ['CRIM','ZN','INDUS','CHAS','NOX','RM','AGE','DIS','RAD','TAX','PTRATIO','B','LSTAT','MEDV']
dataframe = read_csv(filename,delim_whitespace=True,names=names)
array = dataframe.values
X = array[:,0:13]
Y = array[:,13]
kfold = KFold(n_splits=10,random_state=None)
model = LinearRegression()
scoring = 'neg_mean_squared_error'
results = cross_val_score(model,X,Y,cv=kfold,scoring=scoring)
print(results.mean())


-34.70525594452491


## 1.2 Ridge Regression
* It is an extension of linear regression where the loss function is modified to minimize the complexity of the model measured as the sum squared value of the coefficient values(L2-norm). 

In [2]:
# Ridge Regression
from sklearn.linear_model import Ridge
model = Ridge()
scoring = 'neg_mean_squared_error'
results = cross_val_score(model,X,Y,cv=kfold,scoring=scoring)
print(results.mean())

-34.07824620925937


## 1.3 LASSO Regression
* LASSO - **Least Absolute Shrinkage and Selection Operator**.
* The loss function is modified to minimize the complexity of the model measured as the sum absolute value of the coefficient values(L1-norm).

In [3]:
# Lasso Regression
from sklearn.linear_model import Lasso
model = Lasso()
scoring = 'neg_mean_squared_error'
results = cross_val_score(model,X,Y,cv=kfold,scoring=scoring)
print(results.mean())

-34.46408458830232


## 1.4 ElasticNet Regression
* It is a form of regularization regression that combines the properties of both Ridge Regression and LASSO regression.
* It seeks to minimize the complexity of the regression model(**magnitude and number of regression coefficients**) by penalizing the model using both the L2-norm and L1-norm.

In [4]:
# ElasticNet Regression
from sklearn.linear_model import ElasticNet
model = ElasticNet()
scoring = 'neg_mean_squared_error'
results = cross_val_score(model,X,Y,cv=kfold,scoring=scoring)
print(results.mean())

-31.16457371424976


## 2.1 K-Nearest Neighbors
* From the k neighbors, a mean or median output variable is taken as the prediction.
* distance metric is used
* The **Minkowski distance is used by default which is a generalization of both the Euclidean distance(used when all inputs have the same scale) and Manhattan distance (for when the scales of the input variables differ)**.

In [5]:
## KNN Regression
from sklearn.neighbors import KNeighborsRegressor
model = KNeighborsRegressor()
scoring = 'neg_mean_squared_error'
results = cross_val_score(model,X,Y,cv=kfold,scoring=scoring)
print(results.mean())

-107.28683898039215


## 2.2 Classification and Regression Trees 
* It uses the training data to select the best points to split the data in order to minimize a cost metric.

In [6]:
# Decision Tree Regression
from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor()
scoring = 'neg_mean_squared_error'
results = cross_val_score(model,X,Y,cv=kfold,scoring=scoring)
print(results.mean())

-34.269355294117645


## 2.3 Support Vector Machines


In [7]:
# SVM Regression
from sklearn.svm import SVR
model = SVR()
scoring = 'neg_mean_squared_error'
results = cross_val_score(model,X,Y,cv=kfold,scoring=scoring)
print(results.mean())

-72.25543311855311


# Summary
* learnt how to spot-check on 7 machine learning regression algorithms.