# Advanced regression

In this section i will perform hyperparameters optimisation on **RandomForest**, **LightGBM** and **MLPRegressor** estimators. Then by Three estimators obtained from hyperparameter optimisation procedure, I will develop a simple custom regressor that uses these algorithms and performs a simple averaging strategy for making a more robust and accurate estimator.

So let's begin.

In [8]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.metrics import make_scorer
from sklearn.model_selection import train_test_split

In [9]:
# Define a function for splitting Data to train and test sets, in a stratified manner
def split_data(X, y):
    bins = np.linspace(0, len(y), 100)
    y_binned = np.digitize(y, bins)
    x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y_binned)
    return x_train, x_test, y_train, y_test

# load previously saved data
data = pd.read_csv('ubaar_data.csv', index_col='ID')
X, y = data.drop('logPrice', axis=1), data['logPrice']
x_train, x_test, y_train, y_test = split_data(X, y)

# Define a function for evaluating the regressor using t-times k-fold cross validation,
# note that this function does not use stratified folding.
from sklearn.model_selection import cross_val_score

def t_times_k_fold_cv(model, X, y, t=5, k=3):
    def score_func(y, y_pred):
        y_true = np.exp(y)
        y_pred = np.exp(y_pred)
        return 100 - np.mean(np.abs((y_true - y_pred) / y_true)) * 100
    
    scorer = make_scorer(score_func=score_func)
    scores = []
    for i in range(t):
        scores.extend(cross_val_score(model, X, y, cv=k, scoring=scorer).tolist())
    return np.array(scores)

def maps(y, y_pred):
        y_true = np.exp(y)
        y_pred = np.exp(y_pred)
        return 100 - np.mean(np.abs((y_true - y_pred) / y_true)) * 100

scorer = make_scorer(maps)

# Hyperparameter optimization

RandomForest, LightGBM and MLP algorithms take many parameters that should set by data scientist before training the model, such as number of trees in RandomForest. Controlling the hyperparameters affects the model performance by altering the balance between underfitting and overfitting in a model.

The problem with choosing the right hyperparameters is that the optimal set will be different for every machine learning problem! Therefore, the only way to find the best settings is to try out a number of them on each new dataset. We will use **RandomSearch with Cross-Validation** for hyperparameter optimization.

## Random Search with Cross-Validation

We define a grid of hyperparameters, and create a model with a combination of hyperparameters randomly sampled from grid. Then we will evaluate the trained model using K-fold Cross-validation, and selecting the best hyperparameters.


### Hyperparameter optimization for RandomForestRegressor

Let's fine-tune the RandomForestRegressor :

In [7]:
# Implementing a randomsearch for finding optimal parameters in RandomForestRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.metrics import make_scorer

n_estimators = [50, 200, 500, 1000]
max_features = [0.1, 0.5, 0.9]
max_depth = [None, 5, 20, 50]
min_samples_split = [2, 30, 100]
min_samples_leaf = [1, 5, 20]
max_leaf_nodes = [None, 5, 20]
min_impurity_decrease = [0, 0.1, 0.5]

hyperparameter_grid = {'n_estimators': n_estimators,
                       'max_features': max_features,
                       'max_depth': max_depth,
                       'min_samples_split': min_samples_split,
                       'min_samples_leaf': min_samples_leaf,
                       'max_leaf_nodes': max_leaf_nodes,
                       'min_impurity_decrease': min_impurity_decrease}
model = RandomForestRegressor(random_state=42, n_jobs=6)
random_cv = RandomizedSearchCV(estimator=model,
                               param_distributions=hyperparameter_grid,
                               cv=4,
                               n_iter=50, 
                               scoring=scorer,
                               n_jobs=1,
                               verbose=1 
                               return_train_score=True,
                               random_state=42)
random_cv.fit(x_train, y_train)

Fitting 4 folds for each of 50 candidates, totalling 200 fits
[CV] n_estimators=50, min_samples_split=2, min_samples_leaf=5, min_impurity_decrease=0.5, max_leaf_nodes=20, max_features=0.9, max_depth=50 
[CV]  n_estimators=50, min_samples_split=2, min_samples_leaf=5, min_impurity_decrease=0.5, max_leaf_nodes=20, max_features=0.9, max_depth=50, total=   0.6s
[CV] n_estimators=50, min_samples_split=2, min_samples_leaf=5, min_impurity_decrease=0.5, max_leaf_nodes=20, max_features=0.9, max_depth=50 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.8s remaining:    0.0s


[CV]  n_estimators=50, min_samples_split=2, min_samples_leaf=5, min_impurity_decrease=0.5, max_leaf_nodes=20, max_features=0.9, max_depth=50, total=   0.4s
[CV] n_estimators=50, min_samples_split=2, min_samples_leaf=5, min_impurity_decrease=0.5, max_leaf_nodes=20, max_features=0.9, max_depth=50 
[CV]  n_estimators=50, min_samples_split=2, min_samples_leaf=5, min_impurity_decrease=0.5, max_leaf_nodes=20, max_features=0.9, max_depth=50, total=   0.4s
[CV] n_estimators=50, min_samples_split=2, min_samples_leaf=5, min_impurity_decrease=0.5, max_leaf_nodes=20, max_features=0.9, max_depth=50 
[CV]  n_estimators=50, min_samples_split=2, min_samples_leaf=5, min_impurity_decrease=0.5, max_leaf_nodes=20, max_features=0.9, max_depth=50, total=   0.4s
[CV] n_estimators=500, min_samples_split=2, min_samples_leaf=5, min_impurity_decrease=0.1, max_leaf_nodes=None, max_features=0.5, max_depth=20 
[CV]  n_estimators=500, min_samples_split=2, min_samples_leaf=5, min_impurity_decrease=0.1, max_leaf_nodes

[CV]  n_estimators=200, min_samples_split=30, min_samples_leaf=1, min_impurity_decrease=0.1, max_leaf_nodes=5, max_features=0.1, max_depth=None, total=   0.5s
[CV] n_estimators=200, min_samples_split=30, min_samples_leaf=1, min_impurity_decrease=0.1, max_leaf_nodes=5, max_features=0.1, max_depth=None 
[CV]  n_estimators=200, min_samples_split=30, min_samples_leaf=1, min_impurity_decrease=0.1, max_leaf_nodes=5, max_features=0.1, max_depth=None, total=   0.5s
[CV] n_estimators=200, min_samples_split=30, min_samples_leaf=1, min_impurity_decrease=0.1, max_leaf_nodes=5, max_features=0.1, max_depth=None 
[CV]  n_estimators=200, min_samples_split=30, min_samples_leaf=1, min_impurity_decrease=0.1, max_leaf_nodes=5, max_features=0.1, max_depth=None, total=   0.5s
[CV] n_estimators=50, min_samples_split=30, min_samples_leaf=5, min_impurity_decrease=0, max_leaf_nodes=None, max_features=0.5, max_depth=20 
[CV]  n_estimators=50, min_samples_split=30, min_samples_leaf=5, min_impurity_decrease=0, max

[CV]  n_estimators=1000, min_samples_split=100, min_samples_leaf=20, min_impurity_decrease=0.1, max_leaf_nodes=5, max_features=0.1, max_depth=None, total=   1.8s
[CV] n_estimators=1000, min_samples_split=100, min_samples_leaf=20, min_impurity_decrease=0.1, max_leaf_nodes=5, max_features=0.1, max_depth=None 
[CV]  n_estimators=1000, min_samples_split=100, min_samples_leaf=20, min_impurity_decrease=0.1, max_leaf_nodes=5, max_features=0.1, max_depth=None, total=   1.9s
[CV] n_estimators=1000, min_samples_split=100, min_samples_leaf=20, min_impurity_decrease=0.1, max_leaf_nodes=5, max_features=0.1, max_depth=None 
[CV]  n_estimators=1000, min_samples_split=100, min_samples_leaf=20, min_impurity_decrease=0.1, max_leaf_nodes=5, max_features=0.1, max_depth=None, total=   1.8s
[CV] n_estimators=500, min_samples_split=2, min_samples_leaf=5, min_impurity_decrease=0, max_leaf_nodes=None, max_features=0.1, max_depth=None 
[CV]  n_estimators=500, min_samples_split=2, min_samples_leaf=5, min_impurit

[CV]  n_estimators=500, min_samples_split=30, min_samples_leaf=20, min_impurity_decrease=0.5, max_leaf_nodes=20, max_features=0.5, max_depth=None, total=   1.2s
[CV] n_estimators=500, min_samples_split=30, min_samples_leaf=20, min_impurity_decrease=0.5, max_leaf_nodes=20, max_features=0.5, max_depth=None 
[CV]  n_estimators=500, min_samples_split=30, min_samples_leaf=20, min_impurity_decrease=0.5, max_leaf_nodes=20, max_features=0.5, max_depth=None, total=   1.0s
[CV] n_estimators=500, min_samples_split=30, min_samples_leaf=20, min_impurity_decrease=0.5, max_leaf_nodes=20, max_features=0.5, max_depth=None 
[CV]  n_estimators=500, min_samples_split=30, min_samples_leaf=20, min_impurity_decrease=0.5, max_leaf_nodes=20, max_features=0.5, max_depth=None, total=   1.1s
[CV] n_estimators=50, min_samples_split=30, min_samples_leaf=20, min_impurity_decrease=0.1, max_leaf_nodes=5, max_features=0.9, max_depth=20 
[CV]  n_estimators=50, min_samples_split=30, min_samples_leaf=20, min_impurity_decr

[CV]  n_estimators=50, min_samples_split=100, min_samples_leaf=5, min_impurity_decrease=0, max_leaf_nodes=5, max_features=0.9, max_depth=None, total=   0.6s
[CV] n_estimators=50, min_samples_split=100, min_samples_leaf=5, min_impurity_decrease=0, max_leaf_nodes=5, max_features=0.9, max_depth=None 
[CV]  n_estimators=50, min_samples_split=100, min_samples_leaf=5, min_impurity_decrease=0, max_leaf_nodes=5, max_features=0.9, max_depth=None, total=   0.6s
[CV] n_estimators=50, min_samples_split=100, min_samples_leaf=5, min_impurity_decrease=0, max_leaf_nodes=5, max_features=0.9, max_depth=None 
[CV]  n_estimators=50, min_samples_split=100, min_samples_leaf=5, min_impurity_decrease=0, max_leaf_nodes=5, max_features=0.9, max_depth=None, total=   0.6s
[CV] n_estimators=50, min_samples_split=100, min_samples_leaf=5, min_impurity_decrease=0, max_leaf_nodes=5, max_features=0.9, max_depth=None 
[CV]  n_estimators=50, min_samples_split=100, min_samples_leaf=5, min_impurity_decrease=0, max_leaf_nod

[CV]  n_estimators=500, min_samples_split=100, min_samples_leaf=1, min_impurity_decrease=0.1, max_leaf_nodes=20, max_features=0.9, max_depth=50, total=   2.4s
[CV] n_estimators=500, min_samples_split=100, min_samples_leaf=1, min_impurity_decrease=0.1, max_leaf_nodes=20, max_features=0.9, max_depth=50 
[CV]  n_estimators=500, min_samples_split=100, min_samples_leaf=1, min_impurity_decrease=0.1, max_leaf_nodes=20, max_features=0.9, max_depth=50, total=   2.3s
[CV] n_estimators=500, min_samples_split=100, min_samples_leaf=1, min_impurity_decrease=0.1, max_leaf_nodes=20, max_features=0.9, max_depth=50 
[CV]  n_estimators=500, min_samples_split=100, min_samples_leaf=1, min_impurity_decrease=0.1, max_leaf_nodes=20, max_features=0.9, max_depth=50, total=   2.4s
[CV] n_estimators=500, min_samples_split=100, min_samples_leaf=1, min_impurity_decrease=0.1, max_leaf_nodes=20, max_features=0.9, max_depth=50 
[CV]  n_estimators=500, min_samples_split=100, min_samples_leaf=1, min_impurity_decrease=0.

[CV]  n_estimators=50, min_samples_split=30, min_samples_leaf=1, min_impurity_decrease=0, max_leaf_nodes=20, max_features=0.5, max_depth=None, total=   0.7s
[CV] n_estimators=50, min_samples_split=30, min_samples_leaf=1, min_impurity_decrease=0, max_leaf_nodes=20, max_features=0.5, max_depth=None 
[CV]  n_estimators=50, min_samples_split=30, min_samples_leaf=1, min_impurity_decrease=0, max_leaf_nodes=20, max_features=0.5, max_depth=None, total=   0.6s
[CV] n_estimators=50, min_samples_split=30, min_samples_leaf=1, min_impurity_decrease=0, max_leaf_nodes=20, max_features=0.5, max_depth=None 
[CV]  n_estimators=50, min_samples_split=30, min_samples_leaf=1, min_impurity_decrease=0, max_leaf_nodes=20, max_features=0.5, max_depth=None, total=   0.6s
[CV] n_estimators=50, min_samples_split=30, min_samples_leaf=1, min_impurity_decrease=0, max_leaf_nodes=20, max_features=0.5, max_depth=None 
[CV]  n_estimators=50, min_samples_split=30, min_samples_leaf=1, min_impurity_decrease=0, max_leaf_node

[CV]  n_estimators=200, min_samples_split=100, min_samples_leaf=5, min_impurity_decrease=0, max_leaf_nodes=None, max_features=0.1, max_depth=50, total=   1.6s
[CV] n_estimators=200, min_samples_split=100, min_samples_leaf=5, min_impurity_decrease=0, max_leaf_nodes=None, max_features=0.1, max_depth=50 
[CV]  n_estimators=200, min_samples_split=100, min_samples_leaf=5, min_impurity_decrease=0, max_leaf_nodes=None, max_features=0.1, max_depth=50, total=   1.5s
[CV] n_estimators=200, min_samples_split=100, min_samples_leaf=5, min_impurity_decrease=0, max_leaf_nodes=None, max_features=0.1, max_depth=50 
[CV]  n_estimators=200, min_samples_split=100, min_samples_leaf=5, min_impurity_decrease=0, max_leaf_nodes=None, max_features=0.1, max_depth=50, total=   1.5s
[CV] n_estimators=200, min_samples_split=100, min_samples_leaf=5, min_impurity_decrease=0, max_leaf_nodes=None, max_features=0.1, max_depth=50 
[CV]  n_estimators=200, min_samples_split=100, min_samples_leaf=5, min_impurity_decrease=0,

[Parallel(n_jobs=1)]: Done 200 out of 200 | elapsed:  8.7min finished


RandomizedSearchCV(cv=4, error_score='raise',
          estimator=RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=6,
           oob_score=False, random_state=42, verbose=0, warm_start=False),
          fit_params=None, iid=True, n_iter=50, n_jobs=1,
          param_distributions={'n_estimators': [50, 200, 500, 1000], 'max_features': [0.1, 0.5, 0.9], 'max_depth': [None, 5, 20, 50], 'min_samples_split': [2, 30, 100], 'min_samples_leaf': [1, 5, 20], 'max_leaf_nodes': [None, 5, 20], 'min_impurity_decrease': [0, 0.1, 0.5]},
          pre_dispatch='2*n_jobs', random_state=42, refit=True,
          return_train_score=True, scoring=make_scorer(maps), verbose=2)

In [8]:
# Get all of the cv results and sort by the test performance
random_results = pd.DataFrame(random_cv.cv_results_).sort_values('mean_test_score', ascending = False)
random_results.head(10)

Unnamed: 0,mean_fit_time,mean_score_time,mean_test_score,mean_train_score,param_max_depth,param_max_features,param_max_leaf_nodes,param_min_impurity_decrease,param_min_samples_leaf,param_min_samples_split,...,split1_test_score,split1_train_score,split2_test_score,split2_train_score,split3_test_score,split3_train_score,std_fit_time,std_score_time,std_test_score,std_train_score
8,1.137851,0.132754,77.412762,79.504476,20.0,0.5,,0,5,30,...,77.322374,79.558909,77.884309,79.419349,77.060896,79.59019,0.001726,0.001212,0.297987,0.071735
15,3.826383,0.233544,76.02034,77.31169,,0.1,,0,5,2,...,75.842894,77.305787,76.557419,77.20116,75.686678,77.408844,0.162585,0.000954,0.328608,0.074267
6,18.339679,0.334336,75.820222,76.832475,,0.5,,0,20,2,...,75.73119,76.831001,76.325401,76.714625,75.507243,76.967242,0.772019,0.001516,0.304844,0.089849
36,0.488123,0.13421,75.692684,77.059321,20.0,0.1,,0,1,30,...,75.682697,77.170042,76.146757,76.910745,75.327128,77.094434,0.057877,0.002727,0.294138,0.094306
49,1.457882,0.133087,75.118046,75.958506,50.0,0.1,,0,5,100,...,74.988432,76.00911,75.650397,75.808366,74.740912,76.013647,0.043227,0.00081,0.332817,0.086767
39,1.402197,0.136084,74.5969,75.499618,20.0,0.1,,0,5,2,...,74.474741,75.555926,75.074759,75.336089,74.339561,75.656051,0.080701,0.005007,0.282471,0.119169
25,0.439426,0.135782,72.875429,73.270343,,0.1,,0,20,100,...,72.658115,73.343762,73.324974,73.050033,72.796115,73.50762,0.003258,0.004642,0.264093,0.172044
12,12.477301,0.335175,71.65784,71.856248,5.0,0.9,,0,20,30,...,71.569136,71.903252,72.031965,71.720679,71.364083,71.9535,0.368386,0.002176,0.241968,0.086778
13,0.559337,0.132805,71.051521,71.229702,,0.5,20.0,0,20,30,...,71.047477,71.277502,71.358092,71.12003,70.752952,71.254083,0.044661,0.001692,0.213983,0.06386
42,0.556485,0.132899,71.047461,71.228053,,0.5,20.0,0,1,30,...,71.048508,71.286976,71.348915,71.112401,70.743461,71.24333,0.041731,0.001639,0.214061,0.068555



The best one:

In [9]:
# Get best estimator object
random_cv.best_estimator_

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=20,
           max_features=0.5, max_leaf_nodes=None, min_impurity_decrease=0,
           min_impurity_split=None, min_samples_leaf=5,
           min_samples_split=30, min_weight_fraction_leaf=0.0,
           n_estimators=50, n_jobs=6, oob_score=False, random_state=42,
           verbose=0, warm_start=False)

Now I will perform a grid search on a set of parameters:

In [14]:
# Grid search on RandomForestRegressor
max_depth = [None]
max_features = [0.5, 0.7, 0.9]
min_samples_leaf = [1, 3, 5]
min_samples_split = [2, 5]
n_estimators = [200]
grid = {'max_depth': max_depth,
        'max_features': max_features,
        'min_samples_leaf': min_samples_leaf,
        'min_samples_split': min_samples_split,
        'n_estimators': n_estimators}
model = RandomForestRegressor(random_state=42, n_jobs=6)
grid_cv = GridSearchCV(estimator=model,
                       param_grid=grid,
                       cv=4,
                       scoring=scorer,
                       n_jobs=1,
                       verbose=1,
                       return_train_score=True)
grid_cv.fit(x_train, y_train)

grid_results = pd.DataFrame(grid_cv.cv_results_).sort_values('mean_test_score', ascending = False)
grid_results.head(10)

Fitting 4 folds for each of 18 candidates, totalling 72 fits


[Parallel(n_jobs=1)]: Done  72 out of  72 | elapsed:  8.7min finished


GridSearchCV(cv=4, error_score='raise',
       estimator=RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=6,
           oob_score=False, random_state=42, verbose=0, warm_start=False),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'max_depth': [None], 'max_features': [0.5, 0.7, 0.9], 'min_samples_leaf': [1, 3, 5], 'min_samples_split': [2, 5], 'n_estimators': [200]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring=make_scorer(maps), verbose=1)

In [15]:
grid_results = pd.DataFrame(grid_cv.cv_results_).sort_values('mean_test_score', ascending = False)
grid_results.head(10)

Unnamed: 0,mean_fit_time,mean_score_time,mean_test_score,mean_train_score,param_max_depth,param_max_features,param_min_samples_leaf,param_min_samples_split,param_n_estimators,params,...,split1_test_score,split1_train_score,split2_test_score,split2_train_score,split3_test_score,split3_train_score,std_fit_time,std_score_time,std_test_score,std_train_score
0,5.962511,0.233499,79.513676,89.659085,,0.5,1,2,200,"{'max_depth': None, 'max_features': 0.5, 'min_...",...,79.338984,89.679047,79.776701,89.653303,79.314044,89.65455,0.18228,0.000821,0.194894,0.011677
6,7.442325,0.233292,79.473091,89.644433,,0.7,1,2,200,"{'max_depth': None, 'max_features': 0.7, 'min_...",...,79.311034,89.657644,79.678108,89.63215,79.298231,89.649158,0.14722,0.00093,0.170487,0.009743
1,5.68488,0.233783,79.463705,87.59806,,0.5,1,5,200,"{'max_depth': None, 'max_features': 0.5, 'min_...",...,79.285599,87.631032,79.698003,87.608109,79.298515,87.613436,0.442782,0.000994,0.177328,0.034767
7,7.73786,0.237637,79.434433,87.839555,,0.7,1,5,200,"{'max_depth': None, 'max_features': 0.7, 'min_...",...,79.269199,87.876538,79.68497,87.83621,79.255606,87.847463,0.308201,0.003564,0.180826,0.02814
12,9.117845,0.23448,79.416449,89.62296,,0.9,1,2,200,"{'max_depth': None, 'max_features': 0.9, 'min_...",...,79.257991,89.642713,79.613172,89.604424,79.248436,89.62539,0.085906,0.002605,0.164975,0.01372
13,8.650456,0.23439,79.395334,87.994861,,0.9,1,5,200,"{'max_depth': None, 'max_features': 0.9, 'min_...",...,79.250328,88.035059,79.60655,87.993379,79.259644,87.999735,0.085844,0.000491,0.14906,0.029757
8,7.021611,0.24021,78.787363,84.086538,,0.7,3,2,200,"{'max_depth': None, 'max_features': 0.7, 'min_...",...,78.622677,84.157319,79.153178,84.041696,78.54646,84.103872,0.602027,0.004214,0.234815,0.047942
9,6.792192,0.185277,78.787363,84.086538,,0.7,3,5,200,"{'max_depth': None, 'max_features': 0.7, 'min_...",...,78.622677,84.157319,79.153178,84.041696,78.54646,84.103872,0.316202,0.049629,0.234815,0.047942
3,4.845764,0.133055,78.782149,83.559732,,0.5,3,5,200,"{'max_depth': None, 'max_features': 0.5, 'min_...",...,78.612694,83.62756,79.191344,83.504359,78.524657,83.593352,0.002775,0.000364,0.256303,0.05225
2,4.849395,0.132918,78.782149,83.559732,,0.5,3,2,200,"{'max_depth': None, 'max_features': 0.5, 'min_...",...,78.612694,83.62756,79.191344,83.504359,78.524657,83.593352,0.004542,0.000578,0.256303,0.05225


Let's memorize the best estimator obtained from grid search as **best_rf** and evaluate its performance :

In [10]:
best_rf = RandomForestRegressor(max_features=0.5, min_samples_leaf=1, min_samples_split=2, n_estimators=500, n_jobs=6)
t_times_k_fold_cv(best_rf, X, y, 2, 5)

array([ 79.97379088,  80.31594707,  80.15113928,  80.04235786,
        80.19813729,  80.01685546,  80.30545582,  80.15136359,
        80.0684021 ,  80.18318024])

### Hyperparameter optimization for LightGBM

Define a grid of hyperparameters for random searching:

In [20]:
# Implementing a randomsearch for finding optimal hyperparameters in LightGBM regressor
import lightgbm as lgb

num_leaves = [10, 100, 200, 500]
max_depth = [-1, 10, 30]
n_estimators = [100, 500, 1000, 1500]
subsample_for_bin = [20000, 50000, 100000]
min_split_gain = [0, 0.001, 0.1]
min_child_samples = [2, 10, 30, 100]
colsample_bytree = [0.1, 0.3, 0.5, 1]
reg_alpha = [0, 0.01, 1]
reg_lambda = [0, 0.01, 1]
bagging_fraction = [0.3, 0.5, 1]
val_metric = ['mae', 'mape', 'mse']

hyperparameter_grid = {'num_leaves': num_leaves,
                       'max_depth': max_depth,
                       'n_estimators': n_estimators,
                       'subsample_for_bin': subsample_for_bin,
                       'min_split_gain': min_split_gain,
                       'min_child_samples': min_child_samples,
                       'colsample_bytree': colsample_bytree,
                       'reg_alpha': reg_alpha,
                       'reg_lambda': reg_lambda,
                       'bagging_fraction': bagging_fraction,
                       'val_metric': val_metric}
model = lgb.LGBMModel(boosting_type='gbdt', objective='regression', random_state=1, n_jobs=6)
random_cv = RandomizedSearchCV(estimator=model,
                               param_distributions=hyperparameter_grid,
                               cv=4, n_iter=50, 
                               scoring = scorer,
                               n_jobs = 1, verbose = 1, 
                               return_train_score = True,
                               random_state=1)
random_cv.fit(x_train, y_train)

# Get all of the cv results and sort by the test performance
random_results = pd.DataFrame(random_cv.cv_results_).sort_values('mean_test_score', ascending = False)
random_results.head(10)

Fitting 4 folds for each of 50 candidates, totalling 200 fits


[Parallel(n_jobs=1)]: Done 200 out of 200 | elapsed:  5.6min finished


Unnamed: 0,mean_fit_time,mean_score_time,mean_test_score,mean_train_score,param_bagging_fraction,param_colsample_bytree,param_max_depth,param_min_child_samples,param_min_split_gain,param_n_estimators,...,split1_test_score,split1_train_score,split2_test_score,split2_train_score,split3_test_score,split3_train_score,std_fit_time,std_score_time,std_test_score,std_train_score
26,1.426008,0.202459,79.883928,85.520381,0.5,0.5,10,2,0.0,500,...,79.723863,85.622972,80.175128,85.491379,79.692076,85.488662,0.044517,0.006107,0.194234,0.059425
30,3.8896,0.397936,79.610667,87.655277,0.3,0.5,30,10,0.0,500,...,79.492295,87.717045,79.781353,87.655963,79.511104,87.672042,0.029644,0.048311,0.117568,0.050923
47,3.222759,0.587429,79.389978,83.545482,1.0,0.5,30,100,0.0,1000,...,79.302595,83.677426,79.605373,83.513488,79.270237,83.449253,0.094454,0.002731,0.1308,0.083227
27,1.097066,0.121675,79.306559,82.639007,0.3,0.5,-1,10,0.001,1000,...,79.275162,82.870056,79.634482,82.46857,78.957882,82.571846,0.050279,0.002443,0.241251,0.147467
29,2.126065,0.388462,79.298866,83.154185,0.5,1.0,30,100,0.0,500,...,79.211225,83.229124,79.62681,83.075801,79.132236,83.222477,0.102307,0.062421,0.192624,0.071814
43,0.947217,0.122921,79.283309,82.618823,0.3,0.5,30,10,0.001,500,...,79.180766,82.845767,79.659647,82.460799,79.005844,82.615667,0.026494,0.007376,0.239342,0.142135
41,1.614932,0.32164,79.257574,82.794362,0.5,1.0,-1,100,0.0,500,...,79.160328,82.868162,79.548871,82.756873,79.072382,82.811701,0.012934,0.005158,0.179361,0.050077
6,1.98943,0.363029,79.254701,82.766956,0.3,0.5,10,30,0.0,1000,...,79.131359,82.829129,79.517022,82.672065,79.087255,82.819253,0.09275,0.03818,0.167982,0.063222
28,1.872244,0.360536,79.252754,82.817076,0.3,0.5,10,30,0.0,1000,...,79.158931,82.893185,79.517358,82.754424,79.097888,82.828162,0.05035,0.051656,0.160508,0.051096
35,3.284025,0.629318,79.224143,82.46871,1.0,0.3,-1,100,0.0,1500,...,79.125424,82.51781,79.543946,82.413409,79.079589,82.479825,0.001982,0.011217,0.186258,0.037473


In [21]:
pd.set_option('display.max_columns', 50)
random_results.head(10)

Unnamed: 0,mean_fit_time,mean_score_time,mean_test_score,mean_train_score,param_bagging_fraction,param_colsample_bytree,param_max_depth,param_min_child_samples,param_min_split_gain,param_n_estimators,param_num_leaves,param_reg_alpha,param_reg_lambda,param_subsample_for_bin,param_val_metric,params,rank_test_score,split0_test_score,split0_train_score,split1_test_score,split1_train_score,split2_test_score,split2_train_score,split3_test_score,split3_train_score,std_fit_time,std_score_time,std_test_score,std_train_score
26,1.426008,0.202459,79.883928,85.520381,0.5,0.5,10,2,0.0,500,200,0.01,1.0,50000,mae,"{'val_metric': 'mae', 'subsample_for_bin': 500...",1,79.944638,85.478509,79.723863,85.622972,80.175128,85.491379,79.692076,85.488662,0.044517,0.006107,0.194234,0.059425
30,3.8896,0.397936,79.610667,87.655277,0.3,0.5,30,10,0.0,500,500,0.0,1.0,100000,mape,"{'val_metric': 'mape', 'subsample_for_bin': 10...",2,79.65791,87.576058,79.492295,87.717045,79.781353,87.655963,79.511104,87.672042,0.029644,0.048311,0.117568,0.050923
47,3.222759,0.587429,79.389978,83.545482,1.0,0.5,30,100,0.0,1000,200,0.0,0.01,20000,mape,"{'val_metric': 'mape', 'subsample_for_bin': 20...",3,79.381707,83.541761,79.302595,83.677426,79.605373,83.513488,79.270237,83.449253,0.094454,0.002731,0.1308,0.083227
27,1.097066,0.121675,79.306559,82.639007,0.3,0.5,-1,10,0.001,1000,200,0.01,0.0,20000,mse,"{'val_metric': 'mse', 'subsample_for_bin': 200...",4,79.358703,82.645557,79.275162,82.870056,79.634482,82.46857,78.957882,82.571846,0.050279,0.002443,0.241251,0.147467
29,2.126065,0.388462,79.298866,83.154185,0.5,1.0,30,100,0.0,500,500,0.01,1.0,50000,mse,"{'val_metric': 'mse', 'subsample_for_bin': 500...",5,79.225203,83.089338,79.211225,83.229124,79.62681,83.075801,79.132236,83.222477,0.102307,0.062421,0.192624,0.071814
43,0.947217,0.122921,79.283309,82.618823,0.3,0.5,30,10,0.001,500,200,0.0,0.0,50000,mse,"{'val_metric': 'mse', 'subsample_for_bin': 500...",6,79.28698,82.55306,79.180766,82.845767,79.659647,82.460799,79.005844,82.615667,0.026494,0.007376,0.239342,0.142135
41,1.614932,0.32164,79.257574,82.794362,0.5,1.0,-1,100,0.0,500,100,0.01,1.0,100000,mae,"{'val_metric': 'mae', 'subsample_for_bin': 100...",7,79.248715,82.740713,79.160328,82.868162,79.548871,82.756873,79.072382,82.811701,0.012934,0.005158,0.179361,0.050077
6,1.98943,0.363029,79.254701,82.766956,0.3,0.5,10,30,0.0,1000,500,0.01,1.0,100000,mape,"{'val_metric': 'mape', 'subsample_for_bin': 10...",8,79.283167,82.747376,79.131359,82.829129,79.517022,82.672065,79.087255,82.819253,0.09275,0.03818,0.167982,0.063222
28,1.872244,0.360536,79.252754,82.817076,0.3,0.5,10,30,0.0,1000,500,0.01,0.0,100000,mape,"{'val_metric': 'mape', 'subsample_for_bin': 10...",9,79.236842,82.792532,79.158931,82.893185,79.517358,82.754424,79.097888,82.828162,0.05035,0.051656,0.160508,0.051096
35,3.284025,0.629318,79.224143,82.46871,1.0,0.3,-1,100,0.0,1500,200,0.01,1.0,100000,mse,"{'val_metric': 'mse', 'subsample_for_bin': 100...",10,79.147621,82.463797,79.125424,82.51781,79.543946,82.413409,79.079589,82.479825,0.001982,0.011217,0.186258,0.037473


In [22]:
# Take a look on best random estimator
random_cv.best_estimator_

LGBMModel(bagging_fraction=0.5, boosting_type='gbdt', class_weight=None,
     colsample_bytree=0.5, learning_rate=0.1, max_depth=10,
     min_child_samples=2, min_child_weight=0.001, min_split_gain=0,
     n_estimators=500, n_jobs=6, num_leaves=200, objective='regression',
     random_state=1, reg_alpha=0.01, reg_lambda=1, silent=True,
     subsample=1.0, subsample_for_bin=50000, subsample_freq=0,
     val_metric='mae')

Implement grid search on a grid of hyperparameters chosen based on previous results:

In [21]:
# Grid search
import lightgbm as lgb
num_leaves = [200]
max_depth = [10, 30, 50]
n_estimators = [1000]
subsample_for_bin = [50000, 100000]
min_split_gain = [0]
min_child_samples = [2, 10, 50]
colsample_bytree = [0.5]
reg_alpha = [0, 0.01]
reg_lambda = [0.1, 1]
bagging_fraction = [0.3, 0.5, 1]
val_metric = ['mape']

grid = {'num_leaves': num_leaves,
        'max_depth': max_depth,
       'n_estimators': n_estimators,
       'subsample_for_bin': subsample_for_bin,
       'min_split_gain': min_split_gain,
       'min_child_samples': min_child_samples,
       'colsample_bytree': colsample_bytree,
       'reg_alpha': reg_alpha,
       'reg_lambda': reg_lambda,
       'bagging_fraction': bagging_fraction,
       'val_metric': val_metric}
model = lgb.LGBMModel(boosting_type='gbdt', objective='regression', random_state=1, n_jobs=6)
grid_cv = GridSearchCV(estimator=model,
                       param_grid=grid,
                       cv=4,
                       scoring=scorer,
                       n_jobs=1,
                       verbose=1,
                       return_train_score=True)
grid_cv.fit(x_train, y_train)

grid_results = pd.DataFrame(grid_cv.cv_results_).sort_values('mean_test_score', ascending = False)
grid_results.head(10)

Fitting 4 folds for each of 216 candidates, totalling 864 fits


[Parallel(n_jobs=1)]: Done 864 out of 864 | elapsed: 79.6min finished


Unnamed: 0,mean_fit_time,mean_score_time,mean_test_score,mean_train_score,param_bagging_fraction,param_colsample_bytree,param_max_depth,param_min_child_samples,param_min_split_gain,param_n_estimators,...,split1_test_score,split1_train_score,split2_test_score,split2_train_score,split3_test_score,split3_train_score,std_fit_time,std_score_time,std_test_score,std_train_score
151,3.400658,0.414143,79.956666,91.375292,1.0,0.5,10,2,0,1000,...,79.981425,91.372666,79.773866,91.425801,79.860212,91.434842,0.016025,0.005341,0.164385,0.06642
150,3.409882,0.409655,79.956666,91.375292,1.0,0.5,10,2,0,1000,...,79.981425,91.372666,79.773866,91.425801,79.860212,91.434842,0.026659,0.005054,0.164385,0.06642
78,3.438544,0.422869,79.956666,91.375292,0.5,0.5,10,2,0,1000,...,79.981425,91.372666,79.773866,91.425801,79.860212,91.434842,0.044113,0.005134,0.164385,0.06642
79,3.430827,0.418382,79.956666,91.375292,0.5,0.5,10,2,0,1000,...,79.981425,91.372666,79.773866,91.425801,79.860212,91.434842,0.033325,0.001496,0.164385,0.06642
6,3.307406,0.414642,79.956666,91.375292,0.3,0.5,10,2,0,1000,...,79.981425,91.372666,79.773866,91.425801,79.860212,91.434842,0.078221,0.007308,0.164385,0.06642
7,3.27599,0.417634,79.956666,91.375292,0.3,0.5,10,2,0,1000,...,79.981425,91.372666,79.773866,91.425801,79.860212,91.434842,0.058506,0.011467,0.164385,0.06642
74,3.591896,0.421124,79.945928,91.713865,0.5,0.5,10,2,0,1000,...,80.090771,91.759219,79.699839,91.828647,79.815825,91.658246,0.15658,0.006323,0.194922,0.085512
75,3.668024,0.468882,79.945928,91.713865,0.5,0.5,10,2,0,1000,...,80.090771,91.759219,79.699839,91.828647,79.815825,91.658246,0.147417,0.044352,0.194922,0.085512
147,3.451521,0.417384,79.945928,91.713865,1.0,0.5,10,2,0,1000,...,80.090771,91.759219,79.699839,91.828647,79.815825,91.658246,0.018783,0.003034,0.194922,0.085512
146,3.469971,0.417385,79.945928,91.713865,1.0,0.5,10,2,0,1000,...,80.090771,91.759219,79.699839,91.828647,79.815825,91.658246,0.030556,0.006087,0.194922,0.085512


And memorize the best one, then evaluate its performance:

In [22]:
best_lgb = grid_cv.best_estimator_
t_times_k_fold_cv(best_lgb, X, y)

array([ 80.66245659,  80.78797208,  80.70654313,  80.86267896,
        80.70861318,  80.66245659,  80.78797208,  80.70654313,
        80.86267896,  80.70861318,  80.66245659,  80.78797208,
        80.70654313,  80.86267896,  80.70861318,  80.66245659,
        80.78797208,  80.70654313,  80.86267896,  80.70861318,
        80.66245659,  80.78797208,  80.70654313,  80.86267896,  80.70861318])

### Hyperparameter optimization on MLPRegressor


In [15]:
# Implementing a randomsearch for finding optimal hyperparameters in MLP regressor
from sklearn.neural_network import MLPRegressor

hidden_layer_sizes = [(10, 10), (10, 100), (100, 10), (100, 100), (10, 10, 10)]
activation = ['logistic', 'tanh', 'relu']
alpha = [0.0001, 0.01]
batch_size = ['auto', 50, 200, 1000]
momentum = [0.9, 0.7, 0.5]

hyperparameter_grid = {'hidden_layer_sizes': hidden_layer_sizes,
                       'activation': activation,
                       'alpha': alpha,
                       'batch_size': batch_size,
                       'momentum': momentum}
model = MLPRegressor(max_iter=1000, random_state=42)
random_cv = RandomizedSearchCV(estimator=model,
                               param_distributions=hyperparameter_grid,
                               cv=4, n_iter=50, 
                               scoring = make_scorer(score_func=lambda y_true, y_pred: 100 - np.mean(np.abs((np.exp(y_true) - np.exp(y_pred)) / np.exp(y_true)) * 100)),
                               n_jobs = 6, verbose = 2, 
                               return_train_score = True,
                               random_state=42)
random_cv.fit(x_train, y_train)

Fitting 4 folds for each of 50 candidates, totalling 200 fits


[Parallel(n_jobs=6)]: Done  29 tasks      | elapsed:  1.8min
[Parallel(n_jobs=6)]: Done 150 tasks      | elapsed:  8.5min
[Parallel(n_jobs=6)]: Done 200 out of 200 | elapsed: 12.0min finished


RandomizedSearchCV(cv=4, error_score='raise',
          estimator=MLPRegressor(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=1000, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=42, shuffle=True,
       solver='adam', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False),
          fit_params=None, iid=True, n_iter=50, n_jobs=6,
          param_distributions={'hidden_layer_sizes': [(10, 10), (10, 100), (100, 10), (100, 100), (10, 10, 10)], 'activation': ['logistic', 'tanh', 'relu'], 'alpha': [0.0001, 0.01], 'batch_size': ['auto', 50, 200, 1000], 'momentum': [0.9, 0.7, 0.5]},
          pre_dispatch='2*n_jobs', random_state=42, refit=True,
          return_train_score=True, scoring='neg_mean_squared_error',
          verbose=2)

In [16]:
# Get all of the cv results and sort by the test performance
random_results = pd.DataFrame(random_cv.cv_results_).sort_values('mean_test_score', ascending = False)
random_results.head(10)

Unnamed: 0,mean_fit_time,mean_score_time,mean_test_score,mean_train_score,param_activation,param_alpha,param_batch_size,param_hidden_layer_sizes,param_momentum,params,...,split1_test_score,split1_train_score,split2_test_score,split2_train_score,split3_test_score,split3_train_score,std_fit_time,std_score_time,std_test_score,std_train_score
6,22.253013,0.058843,-0.080866,-0.078523,logistic,0.0001,50,"(10, 10)",0.9,"{'momentum': 0.9, 'hidden_layer_sizes': (10, 1...",...,-0.081795,-0.078338,-0.081593,-0.077881,-0.079422,-0.078513,4.714045,0.004172,0.000938,0.000536
15,29.707349,0.095745,-0.0812,-0.074536,tanh,0.0001,auto,"(100, 10)",0.9,"{'momentum': 0.9, 'hidden_layer_sizes': (100, ...",...,-0.083488,-0.075977,-0.083231,-0.074075,-0.080976,-0.076739,3.217046,0.000705,0.002558,0.002077
44,39.928854,0.099734,-0.0812,-0.074536,tanh,0.0001,200,"(100, 10)",0.7,"{'momentum': 0.7, 'hidden_layer_sizes': (100, ...",...,-0.083488,-0.075977,-0.083231,-0.074075,-0.080976,-0.076739,6.261992,0.002908,0.002558,0.002077
36,30.913944,0.085772,-0.081456,-0.078916,logistic,0.0001,50,"(100, 10)",0.7,"{'momentum': 0.7, 'hidden_layer_sizes': (100, ...",...,-0.082875,-0.079749,-0.082667,-0.078846,-0.079282,-0.077512,5.55804,0.005231,0.00145,0.000878
39,12.256478,0.05635,-0.082085,-0.080294,logistic,0.01,200,"(10, 10)",0.9,"{'momentum': 0.9, 'hidden_layer_sizes': (10, 1...",...,-0.082555,-0.07974,-0.0836,-0.08025,-0.081316,-0.080819,1.655196,0.00295,0.001071,0.000384
40,10.95072,0.059092,-0.08265,-0.079913,tanh,0.01,auto,"(10, 10)",0.9,"{'momentum': 0.9, 'hidden_layer_sizes': (10, 1...",...,-0.083242,-0.079734,-0.084125,-0.079554,-0.082248,-0.080737,0.196536,0.002481,0.001168,0.00048
12,12.290141,0.065324,-0.08265,-0.079913,tanh,0.01,200,"(10, 10)",0.9,"{'momentum': 0.9, 'hidden_layer_sizes': (10, 1...",...,-0.083242,-0.079734,-0.084125,-0.079554,-0.082248,-0.080737,1.09046,0.007243,0.001168,0.00048
35,36.611858,0.102975,-0.083455,-0.081756,logistic,0.0001,200,"(100, 100)",0.9,"{'momentum': 0.9, 'hidden_layer_sizes': (100, ...",...,-0.084268,-0.081261,-0.085418,-0.082238,-0.08149,-0.080955,7.886792,0.003888,0.001503,0.000667
31,29.985923,0.102228,-0.083455,-0.081756,logistic,0.0001,auto,"(100, 100)",0.9,"{'momentum': 0.9, 'hidden_layer_sizes': (100, ...",...,-0.084268,-0.081261,-0.085418,-0.082238,-0.08149,-0.080955,2.08937,0.003192,0.001503,0.000667
28,33.583297,0.102724,-0.083784,-0.08153,logistic,0.0001,50,"(100, 100)",0.7,"{'momentum': 0.7, 'hidden_layer_sizes': (100, ...",...,-0.083302,-0.080195,-0.083196,-0.079782,-0.083851,-0.083077,0.853462,0.004622,0.00063,0.001549


In [17]:
best_mlp = random_cv.best_estimator_
t_times_k_fold_cv(best_mlp, X, y)

array([ 78.30386441,  77.81224731,  77.4210379 ,  77.09670888,
        76.52366954,  78.30386441,  77.81224731,  77.4210379 ,
        77.09670888,  76.52366954,  78.30386441,  77.81224731,
        77.4210379 ,  77.09670888,  76.52366954,  78.30386441,
        77.81224731,  77.4210379 ,  77.09670888,  76.52366954,
        78.30386441,  77.81224731,  77.4210379 ,  77.09670888,  76.52366954])

## Averaging regressor

Now I will define an estimator class that gets some estimators as args and averages given estimators' predictions on test data:

In [18]:
from sklearn.base import BaseEstimator, ClassifierMixin

class AveragingRegressor(BaseEstimator, ClassifierMixin):
    def __init__(self, estimators=list()):
        self.estimators = estimators
        
    def fit(self, X, y=None):
        for estimator in self.estimators:
            estimator.fit(X, y)
        return self
    
    def predict(self, X):
        return np.mean(np.array([estimator.predict(X) for estimator in self.estimators]), axis=0)
    
    def get_params(self, deep=True):
        return {"estimators": self.estimators}
    
    def set_params(self, **parameters):
        for parameter, value in parameters.items():
            setattr(self, parameter, value)
        return self

Evaluate performance:

In [23]:
myregressor = AveragingRegressor(estimators=[best_rf, best_lgb, best_mlp])
t_times_k_fold_cv(myregressor, X, y)

array([ 80.72004038,  80.62611435,  80.45712291,  80.31719054,
        80.2183164 ,  80.72523548,  80.6184246 ,  80.44931862,
        80.31194192,  80.2182443 ,  80.72142188,  80.63177045,
        80.45287373,  80.3193459 ,  80.21204319,  80.7169413 ,
        80.62308526,  80.46470665,  80.33018874,  80.21544911,
        80.72691201,  80.62911723,  80.45282844,  80.31567721,  80.21755084])