# <font color="red">TODO: Regression</font>
## Sections:

* [1. Feature selection](#section1)
    * [1.1 Univariate feature selection](#section1.1)
    * [1.2 Feature selection using fitted model](#section1.2)
* [2. Models for regression](#section2)
    * [2.1 Lasso and Linear SVR](#section2.1)
    * [2.2 ARD regression](#section2.2)
    * [2.2 Nonlinear methods](#section2.3)
* [3. Model selection](#section3)


## <font color="blue">1. Feature selection</font> <a id="section1"/>

### 1.1 Univariate feature selection (<a href="http://scikit-learn.org/stable/modules/feature_selection.html#univariate-feature-selection">link</a>) <a id="section1.1"/>

In [13]:
#example
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression, mutual_info_regression, chi2

iris = load_iris()
X, y = iris.data, iris.target
print(X.shape)
X_new = SelectKBest(f_regression, k=2).fit_transform(X, y)
print(X_new.shape)

(150, 4)
(150, 2)


In [23]:
X_new[0,:]

array([ 1.4,  0.2])

### 1.2 Feature selection using fitted model (<a href="http://scikit-learn.org/stable/modules/feature_selection.html#feature-selection-using-selectfrommodel">link</a>) <a id="section1.2"/>

In [32]:
#use l1-regularized linear model for regression (Lasso) to estimate feature importance
from sklearn.linear_model import LassoCV, Lasso
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectFromModel

iris = load_iris()
X, y = iris.data, iris.target
print(X.shape)

#without tuning alpha (regularization term parameter)
lasso_model = Lasso(alpha=0.01, normalize=True, random_state=1234).fit(X, y)
#with tuning alpha (regularization term parameter)
lasso_model = LassoCV(eps=1e-6, n_alphas=100, normalize=False, cv=3, n_jobs=-1, random_state=1234, selection='random').fit(X,y)

model = SelectFromModel(lasso_model, prefit=True)
X_new = model.transform(X)
print(X_new.shape)

(150, 4)
(150, 2)


In [33]:
X_new[0,:]

array([ 1.4,  0.2])

### !NB: we can use feature selection as a part of Pipeline (<a href="http://scikit-learn.org/stable/modules/feature_selection.html#feature-selection-as-part-of-a-pipeline">link</a>)

## <font color="blue">2. Models for regression</font> <a id="section2"/>

### 2.1 Lasso (<a href="http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html">link</a>) or LinearSVC (<a href="http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVR.html">link</a>) <a id="section2.1"/>

In [40]:
#use l1-regularized linear model for regression (Lasso, LinearSVR) to estimate feature importance
from sklearn.linear_model import LassoCV, Lasso
from sklearn.svm import LinearSVR
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectFromModel

iris = load_iris()
X, y = iris.data, iris.target
print(X.shape)

#without tuning alpha (regularization term parameter)
lasso_model = Lasso(alpha=0.01, normalize=True, random_state=1234).fit(X, y)
#with tuning alpha (regularization term parameter)
lassoCV_model = LassoCV(eps=1e-6, n_alphas=100, normalize=False, cv=3, n_jobs=-1, random_state=1234, selection='random').fit(X,y)
#Support Vector Machine Regressor
svr_model = LinearSVR(epsilon=1e2, C=1e-6, max_iter=50).fit(X,y)

#take model to make predictions
models = [lasso_model, lassoCV_model, svr_model]
y_preds = []
for model in models:
    y_preds.append(model.predict(X))

(150, 4)


### Metrics for regrission: MSE and R2-score (<a href="http://scikit-learn.org/stable/modules/model_evaluation.html#r2-score-the-coefficient-of-determination">link</a>)

In [60]:
#metrics for validation of our model
from sklearn.metrics import mean_squared_error, r2_score
names = ["Lasso  ", "LassoCV", "Linear SVR"]
for y_pred, name in zip(y_preds, names):
    print(name + ' => MSE: \t', mean_squared_error(y, y_pred))
    print(name + ' => R2: \t\t', r2_score(y, y_pred))
    print()

Lasso   => MSE: 	 0.0648117971182
Lasso   => R2: 		 0.902782304323

LassoCV => MSE: 	 0.0579932650788
LassoCV => R2: 		 0.913010102382

Linear SVR => MSE: 	 1.66666666667
Linear SVR => R2: 		 -1.5



### 2.2 Bayessian Automatic Relevance Determination (ARD) Regression (<a href="http://scikit-learn.org/stable/modules/linear_model.html#automatic-relevance-determination-ard">link</a>) <a id="section2.2"/>

ARDRegression is very similar to Bayesian Ridge Regression, but can lead to sparser weights. ARDRegression poses a different prior over w, by dropping the assumption of the Gaussian being spherical.
Instead, the distribution over w is assumed to be an axis-parallel, elliptical Gaussian distribution.

In [62]:
from sklearn.linear_model import ARDRegression
from sklearn.datasets import load_iris

iris = load_iris()
X, y = iris.data, iris.target
print(X.shape)

ard_model = ARDRegression(compute_score=True).fit(X,y)

y_pred = ard_model.predict(X)
print("ARD" + ' => MSE: \t', mean_squared_error(y, y_pred))
print("ARD" + ' => R2: \t', r2_score(y, y_pred))

(150, 4)
ARD => MSE: 	 0.0465415031752
ARD => R2: 	 0.930187745237


### 2.3 Nonlinear methods (optional) <a id="section2.3"/>

In [66]:
from sklearn.svm import SVR
from sklearn.ensemble import RandomForestRegressor

iris = load_iris()
X, y = iris.data, iris.target
print(X.shape)

svr_model = SVR(C=1e2).fit(X,y)
rf_model = RandomForestRegressor(n_estimators=100, n_jobs=-1).fit(X,y)

#take model to make predictions
models = [lasso_model, lassoCV_model, svr_model]
y_preds = []
for model in models:
    y_preds.append(model.predict(X))
    
    
#metrics for validation of our model
from sklearn.metrics import mean_squared_error, r2_score
names = ["SVR RBF kernel", "Random Forest"]
for y_pred, name in zip(y_preds, names):
    print(name + ' => MSE: \t\t', mean_squared_error(y, y_pred))
    print(name + ' => R2: \t\t', r2_score(y, y_pred))
    print()

(150, 4)
SVR RBF kernel => MSE: 		 0.0648117971182
SVR RBF kernel => R2: 		 0.902782304323

Random Forest => MSE: 		 0.0579932650788
Random Forest => R2: 		 0.913010102382



## <font color="blue">3. Model selection</font> <a id="section3"/>

With method should be fine-tuned. We can do with using Cross-Validation.

In [70]:
import numpy as np

In [68]:
from sklearn.model_selection import GridSearchCV, train_test_split

In [72]:
def training_pipeline(model, parameters, X, y):
    X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2)
    gs_model = GridSearchCV(model, parameters, scoring='neg_mean_squared_error', n_jobs=-1, verbose=5)
    gs_model.fit(X_train, y_train)
    
    best_regressor = gs_model.best_estimator_
    y_pred = best_regressor.predict(X_test)
    print('MSE: \t', mean_squared_error(y_test, y_pred))
    print('R2: \t', r2_score(y_test, y_pred))
    return best_regressor, y_pred

In [75]:
models = [Lasso(), LinearSVR(), ARDRegression(), SVR(), RandomForestRegressor()]

parameters_lasso = {'alpha': np.logspace(-4,4,9),
                    'selection': ['random']}
parameters_linear_svr = {'C': np.logspace(-4,4,9),
                         'epsilon': [1e-4,1e-3,0.]}
parameters_adr = {'alpha_1': [1e-06,1e-5], 
                  'alpha_2': [1e-06,1e-5],
                  'lambda_1': [1e-5,1e-06],
                  'lambda_2': [1e-5,1e-06]}
parameters_svr = {'C': np.logspace(-4,4,9),
                 'kernel': ['rbf', 'sigmoid']}
parameters_rf = {'max_depth': [1,3,5],
                 'n_jobs': [-1]}
params = [parameters_lasso, parameters_linear_svr, parameters_adr, parameters_svr, parameters_rf]

y_preds = []
models_ = []
for model, param in zip(models, params):
    model_, y_pred = training_pipeline(model, param, X, y)
    models_.append(model_)
    y_preds.append(y_pred)

Fitting 3 folds for each of 9 candidates, totalling 27 fits


[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    2.8s
[Parallel(n_jobs=-1)]: Done  27 out of  27 | elapsed:    2.9s finished


MSE: 	 0.0562313527756
R2: 	 0.902863306146
Fitting 3 folds for each of 27 candidates, totalling 81 fits


[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    3.1s
[Parallel(n_jobs=-1)]: Done  40 out of  81 | elapsed:    3.1s remaining:    3.2s
[Parallel(n_jobs=-1)]: Done  81 out of  81 | elapsed:    3.4s finished


MSE: 	 0.0380227907457
R2: 	 0.939000870461
Fitting 3 folds for each of 16 candidates, totalling 48 fits


[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    3.3s
[Parallel(n_jobs=-1)]: Done  48 out of  48 | elapsed:    3.5s finished


MSE: 	 0.0570714066293
R2: 	 0.909728882309
Fitting 3 folds for each of 18 candidates, totalling 54 fits


[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    3.2s
[Parallel(n_jobs=-1)]: Done  54 out of  54 | elapsed:    3.5s finished


MSE: 	 0.0433080517477
R2: 	 0.935997953082
Fitting 3 folds for each of 3 candidates, totalling 9 fits


[Parallel(n_jobs=-1)]: Done   4 out of   9 | elapsed:    3.4s remaining:    4.3s
[Parallel(n_jobs=-1)]: Done   6 out of   9 | elapsed:    3.5s remaining:    1.7s
[Parallel(n_jobs=-1)]: Done   9 out of   9 | elapsed:    3.8s finished


MSE: 	 0.0311031519274
R2: 	 0.955496285


We see when we tuned our models they performs almost equally well. However, for a large data sets, linear models are much faster.