Overview

This is a notebook that shows you how to tune plankton ML models using the 'tune' class.

This is the first notebook in a set of three:

    - tune.ipynb: tune hyper-parameters to find the best model configuration

    - predict.ipynb: make predictions using the best fitting model

    - post.ipynb: analyse predictions and calculate metrics such as diversity

There are several dependencies that need to be install prior to running this notebook:

    pandas
    numpy
    scikit-learn
    xgboost
    joblib
    

Tuned models and scoring are saved using the following directory structure:

    
    /your_base_path/scoring/xgb/sppA_reg.sav
    /your_base_path/scoring/rf/sppA_reg.sav
    /your_base_path/scoring/rf/sppA_reg.sav

    
    /your_base_path/tuning/xgb/sppA_reg.sav


In [6]:
# import required packages
import pandas as pd
import numpy as np
from tune import tune 
from sklearn.preprocessing import MinMaxScaler
from sklearn.datasets import make_regression

from yaml import safe_load, load, dump
try:
    from yaml import CLoader as Loader, CDumper as Dumper
except ImportError:
    from yaml import Loader, Dumper   

from functions import example_data

In [7]:
# Setting up the model

with open('/home/phyto/planktonSDM/model_config.yml', 'r') as f:
    model_config = load(f, Loader=Loader)


seed = 1 # random seed
n_threads = 2 # how many cpu threads to use
n_spp = 0 # which species to model
path_out = "/home/phyto/ModelOutput/test/" #where to save model output

X, y = example_data(y_name =  "Coccolithus pelagicus", n_samples=500, n_features=5, noise=20, random_state=seed)

cv = 3
verbose = 3

In [8]:
'''
1-phase Random forest 
'''
reg_scoring = model_config['reg_scoring']
reg_param_grid = model_config['rf_param_grid']['reg_param_grid']

m = tune(X, y, seed, n_threads, verbose, cv, path_out)
m.XGB(reg_scoring, reg_param_grid, cv=cv, model="rf", zir=False, log="yes")

Fitting 3 folds for each of 54 candidates, totalling 162 fits
[CV 1/3] END regressor__max_depth=3, regressor__max_features=2, regressor__max_samples=0.2, regressor__min_samples_leaf=0.2, regressor__n_estimators=100;, score=(train=-0.292, test=-0.250) total time=   0.4s
[CV 2/3] END regressor__max_depth=3, regressor__max_features=2, regressor__max_samples=0.2, regressor__min_samples_leaf=0.2, regressor__n_estimators=100;, score=(train=-0.279, test=-0.281) total time=   0.4s
[CV 3/3] END regressor__max_depth=3, regressor__max_features=2, regressor__max_samples=0.2, regressor__min_samples_leaf=0.2, regressor__n_estimators=100;, score=(train=-0.270, test=-0.312) total time=   0.4s
[CV 1/3] END regressor__max_depth=3, regressor__max_features=2, regressor__max_samples=0.2, regressor__min_samples_leaf=0.5, regressor__n_estimators=100;, score=(train=-0.292, test=-0.250) total time=   0.4s
[CV 2/3] END regressor__max_depth=3, regressor__max_features=2, regressor__max_samples=0.2, regressor__min

[Parallel(n_jobs=2)]: Using backend MultiprocessingBackend with 2 concurrent workers.


[CV] END  MAE: (test=-0.243) R2: (test=0.293) RMSE: (test=-0.307) total time=   0.3s[CV] END  MAE: (test=-0.212) R2: (test=0.356) RMSE: (test=-0.270) total time=   0.3s

[CV] END  MAE: (test=-0.249) R2: (test=0.357) RMSE: (test=-0.308) total time=   0.3s
finished tuning model
reg rRMSE: 43%
reg rMAE: 34%
reg R2: 0.34
execution time: 30.50816535949707 seconds


[Parallel(n_jobs=2)]: Done   3 out of   3 | elapsed:    0.7s finished


In [9]:
'''
2-phase Random forest 
note: for the 2-phase model we need to define the model configuration for both the classifier and the regressor
'''

reg_scoring = model_config['reg_scoring']
clf_scoring = model_config['clf_scoring']

clf_param_grid = model_config['rf_param_grid']['clf_param_grid']
reg_param_grid = model_config['rf_param_grid']['reg_param_grid']

m = tune(X, y, seed, n_threads, verbose, cv, path_out)
m.XGB(reg_scoring, reg_param_grid, clf_scoring = clf_scoring, clf_param_grid = clf_param_grid, 
      cv=cv, model="rf", zir=True, log="yes")

Fitting 3 folds for each of 54 candidates, totalling 162 fits
[CV 1/3] END regressor__max_depth=3, regressor__max_features=2, regressor__max_samples=0.2, regressor__min_samples_leaf=0.2, regressor__n_estimators=100;, score=(train=-0.292, test=-0.250) total time=   0.3s[CV 2/3] END regressor__max_depth=3, regressor__max_features=2, regressor__max_samples=0.2, regressor__min_samples_leaf=0.2, regressor__n_estimators=100;, score=(train=-0.279, test=-0.281) total time=   0.3s

[CV 3/3] END regressor__max_depth=3, regressor__max_features=2, regressor__max_samples=0.2, regressor__min_samples_leaf=0.2, regressor__n_estimators=100;, score=(train=-0.270, test=-0.312) total time=   0.3s
[CV 1/3] END regressor__max_depth=3, regressor__max_features=2, regressor__max_samples=0.2, regressor__min_samples_leaf=0.5, regressor__n_estimators=100;, score=(train=-0.292, test=-0.250) total time=   0.3s
[CV 2/3] END regressor__max_depth=3, regressor__max_features=2, regressor__max_samples=0.2, regressor__min

  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)


Fitting 3 folds for each of 18 candidates, totalling 54 fits
[CV 2/3] END max_depth=3, max_features=2, max_samples=0.5, min_samples_leaf=0.2, n_estimators=100;, score=0.500 total time=   0.5s[CV 1/3] END max_depth=3, max_features=2, max_samples=0.5, min_samples_leaf=0.2, n_estimators=100;, score=0.500 total time=   0.5s



  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)


[CV 1/3] END max_depth=3, max_features=2, max_samples=0.5, min_samples_leaf=0.5, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 3/3] END max_depth=3, max_features=2, max_samples=0.5, min_samples_leaf=0.2, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 2/3] END max_depth=3, max_features=2, max_samples=0.5, min_samples_leaf=0.5, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 3/3] END max_depth=3, max_features=2, max_samples=0.5, min_samples_leaf=0.5, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 1/3] END max_depth=3, max_features=2, max_samples=0.5, min_samples_leaf=0.8, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 2/3] END max_depth=3, max_features=2, max_samples=0.5, min_samples_leaf=0.8, n_estimators=100;, score=0.500 total time=   0.5s


  estimator.fit(X_train, y_train, **fit_params)


[CV 3/3] END max_depth=3, max_features=2, max_samples=0.5, min_samples_leaf=0.8, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 1/3] END max_depth=3, max_features=3, max_samples=0.5, min_samples_leaf=0.2, n_estimators=100;, score=0.500 total time=   0.5s


  estimator.fit(X_train, y_train, **fit_params)


[CV 2/3] END max_depth=3, max_features=3, max_samples=0.5, min_samples_leaf=0.2, n_estimators=100;, score=0.500 total time=   0.5s


  estimator.fit(X_train, y_train, **fit_params)


[CV 3/3] END max_depth=3, max_features=3, max_samples=0.5, min_samples_leaf=0.2, n_estimators=100;, score=0.500 total time=   0.5s


  estimator.fit(X_train, y_train, **fit_params)


[CV 1/3] END max_depth=3, max_features=3, max_samples=0.5, min_samples_leaf=0.5, n_estimators=100;, score=0.500 total time=   0.5s


  estimator.fit(X_train, y_train, **fit_params)


[CV 2/3] END max_depth=3, max_features=3, max_samples=0.5, min_samples_leaf=0.5, n_estimators=100;, score=0.500 total time=   0.5s


  estimator.fit(X_train, y_train, **fit_params)


[CV 3/3] END max_depth=3, max_features=3, max_samples=0.5, min_samples_leaf=0.5, n_estimators=100;, score=0.500 total time=   0.5s
[CV 1/3] END max_depth=3, max_features=3, max_samples=0.5, min_samples_leaf=0.8, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)


[CV 2/3] END max_depth=3, max_features=3, max_samples=0.5, min_samples_leaf=0.8, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 3/3] END max_depth=3, max_features=3, max_samples=0.5, min_samples_leaf=0.8, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 1/3] END max_depth=3, max_features=4, max_samples=0.5, min_samples_leaf=0.2, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 2/3] END max_depth=3, max_features=4, max_samples=0.5, min_samples_leaf=0.2, n_estimators=100;, score=0.500 total time=   0.5s


  estimator.fit(X_train, y_train, **fit_params)


[CV 3/3] END max_depth=3, max_features=4, max_samples=0.5, min_samples_leaf=0.2, n_estimators=100;, score=0.500 total time=   0.5s


  estimator.fit(X_train, y_train, **fit_params)


[CV 1/3] END max_depth=3, max_features=4, max_samples=0.5, min_samples_leaf=0.5, n_estimators=100;, score=0.500 total time=   0.5s


  estimator.fit(X_train, y_train, **fit_params)


[CV 3/3] END max_depth=3, max_features=4, max_samples=0.5, min_samples_leaf=0.5, n_estimators=100;, score=0.500 total time=   0.5s
[CV 2/3] END max_depth=3, max_features=4, max_samples=0.5, min_samples_leaf=0.5, n_estimators=100;, score=0.500 total time=   0.5s


  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)


[CV 1/3] END max_depth=3, max_features=4, max_samples=0.5, min_samples_leaf=0.8, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 2/3] END max_depth=3, max_features=4, max_samples=0.5, min_samples_leaf=0.8, n_estimators=100;, score=0.500 total time=   0.5s


  estimator.fit(X_train, y_train, **fit_params)


[CV 3/3] END max_depth=3, max_features=4, max_samples=0.5, min_samples_leaf=0.8, n_estimators=100;, score=0.500 total time=   0.5s


  estimator.fit(X_train, y_train, **fit_params)


[CV 1/3] END max_depth=5, max_features=2, max_samples=0.5, min_samples_leaf=0.2, n_estimators=100;, score=0.500 total time=   0.5s


  estimator.fit(X_train, y_train, **fit_params)


[CV 2/3] END max_depth=5, max_features=2, max_samples=0.5, min_samples_leaf=0.2, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 3/3] END max_depth=5, max_features=2, max_samples=0.5, min_samples_leaf=0.2, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 1/3] END max_depth=5, max_features=2, max_samples=0.5, min_samples_leaf=0.5, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 2/3] END max_depth=5, max_features=2, max_samples=0.5, min_samples_leaf=0.5, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 3/3] END max_depth=5, max_features=2, max_samples=0.5, min_samples_leaf=0.5, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 1/3] END max_depth=5, max_features=2, max_samples=0.5, min_samples_leaf=0.8, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 2/3] END max_depth=5, max_features=2, max_samples=0.5, min_samples_leaf=0.8, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 3/3] END max_depth=5, max_features=2, max_samples=0.5, min_samples_leaf=0.8, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 1/3] END max_depth=5, max_features=3, max_samples=0.5, min_samples_leaf=0.2, n_estimators=100;, score=0.500 total time=   0.5s


  estimator.fit(X_train, y_train, **fit_params)


[CV 2/3] END max_depth=5, max_features=3, max_samples=0.5, min_samples_leaf=0.2, n_estimators=100;, score=0.500 total time=   0.5s


  estimator.fit(X_train, y_train, **fit_params)


[CV 3/3] END max_depth=5, max_features=3, max_samples=0.5, min_samples_leaf=0.2, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 1/3] END max_depth=5, max_features=3, max_samples=0.5, min_samples_leaf=0.5, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 2/3] END max_depth=5, max_features=3, max_samples=0.5, min_samples_leaf=0.5, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 3/3] END max_depth=5, max_features=3, max_samples=0.5, min_samples_leaf=0.5, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 1/3] END max_depth=5, max_features=3, max_samples=0.5, min_samples_leaf=0.8, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 2/3] END max_depth=5, max_features=3, max_samples=0.5, min_samples_leaf=0.8, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 3/3] END max_depth=5, max_features=3, max_samples=0.5, min_samples_leaf=0.8, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 1/3] END max_depth=5, max_features=4, max_samples=0.5, min_samples_leaf=0.2, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 2/3] END max_depth=5, max_features=4, max_samples=0.5, min_samples_leaf=0.2, n_estimators=100;, score=0.500 total time=   0.5s


  estimator.fit(X_train, y_train, **fit_params)


[CV 3/3] END max_depth=5, max_features=4, max_samples=0.5, min_samples_leaf=0.2, n_estimators=100;, score=0.500 total time=   0.5s


  estimator.fit(X_train, y_train, **fit_params)


[CV 1/3] END max_depth=5, max_features=4, max_samples=0.5, min_samples_leaf=0.5, n_estimators=100;, score=0.500 total time=   0.5s


  estimator.fit(X_train, y_train, **fit_params)


[CV 2/3] END max_depth=5, max_features=4, max_samples=0.5, min_samples_leaf=0.5, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 3/3] END max_depth=5, max_features=4, max_samples=0.5, min_samples_leaf=0.5, n_estimators=100;, score=0.500 total time=   0.4s


  estimator.fit(X_train, y_train, **fit_params)


[CV 1/3] END max_depth=5, max_features=4, max_samples=0.5, min_samples_leaf=0.8, n_estimators=100;, score=0.500 total time=   0.5s


  estimator.fit(X_train, y_train, **fit_params)


[CV 2/3] END max_depth=5, max_features=4, max_samples=0.5, min_samples_leaf=0.8, n_estimators=100;, score=0.500 total time=   0.4s
[CV 3/3] END max_depth=5, max_features=4, max_samples=0.5, min_samples_leaf=0.8, n_estimators=100;, score=0.500 total time=   0.5s


  self.best_estimator_.fit(X, y, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
[Parallel(n_jobs=2)]: Using backend MultiprocessingBackend with 2 concurrent workers.


[CV] END ............................. accuracy: (test=0.500) total time=   0.5s


  estimator.fit(X_train, y_train, **fit_params)


[CV] END ............................. accuracy: (test=0.500) total time=   0.5s
[CV] END ............................. accuracy: (test=0.500) total time=   0.4s


[Parallel(n_jobs=2)]: Done   3 out of   3 | elapsed:    0.9s finished
[Parallel(n_jobs=2)]: Using backend MultiprocessingBackend with 2 concurrent workers.


[CV] END  MAE: (test=-0.243) R2: (test=0.293) RMSE: (test=-0.307) total time=   0.4s
[CV] END  MAE: (test=-0.212) R2: (test=0.356) RMSE: (test=-0.270) total time=   0.4s
[CV] END  MAE: (test=-0.249) R2: (test=0.357) RMSE: (test=-0.308) total time=   0.3s


[Parallel(n_jobs=2)]: Done   3 out of   3 | elapsed:    0.7s finished
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
[Parallel(n_jobs=2)]: Using backend MultiprocessingBackend with 2 concurrent workers.


[CV] END  MAE: (test=-0.212) R2: (test=0.356) RMSE: (test=-0.270) total time=   0.8s


  y = column_or_1d(y, warn=True)


[CV] END  MAE: (test=-0.243) R2: (test=0.293) RMSE: (test=-0.307) total time=   0.8s
[CV] END  MAE: (test=-0.249) R2: (test=0.357) RMSE: (test=-0.308) total time=   0.8s
finished tuning model
reg rRMSE: 43%
reg rMAE: 34%
reg R2: 0.34
zir rRMSE: 43%
zir rMAE: 34%
zir R2: 0.34
execution time: 51.498241901397705 seconds


[Parallel(n_jobs=2)]: Done   3 out of   3 | elapsed:    1.6s finished


In [10]:
'''
Testing the impact of log transformation on the 1-phase Random forest 

note: we test both log and no-log by defining log="both"
'''

reg_scoring = model_config['reg_scoring']
reg_param_grid = model_config['rf_param_grid']['reg_param_grid']

m = tune(X, y, seed, n_threads, verbose, cv, path_out)
m.XGB(reg_scoring, reg_param_grid, cv=cv, model="rf", zir=False, log="both")

Fitting 3 folds for each of 54 candidates, totalling 162 fits
[CV 2/3] END regressor__max_depth=3, regressor__max_features=2, regressor__max_samples=0.2, regressor__min_samples_leaf=0.2, regressor__n_estimators=100;, score=(train=-0.271, test=-0.270) total time=   0.3s
[CV 1/3] END regressor__max_depth=3, regressor__max_features=2, regressor__max_samples=0.2, regressor__min_samples_leaf=0.2, regressor__n_estimators=100;, score=(train=-0.282, test=-0.246) total time=   0.4s
[CV 3/3] END regressor__max_depth=3, regressor__max_features=2, regressor__max_samples=0.2, regressor__min_samples_leaf=0.2, regressor__n_estimators=100;, score=(train=-0.261, test=-0.300) total time=   0.3s
[CV 1/3] END regressor__max_depth=3, regressor__max_features=2, regressor__max_samples=0.2, regressor__min_samples_leaf=0.5, regressor__n_estimators=100;, score=(train=-0.282, test=-0.246) total time=   0.3s
[CV 2/3] END regressor__max_depth=3, regressor__max_features=2, regressor__max_samples=0.2, regressor__min

[Parallel(n_jobs=2)]: Using backend MultiprocessingBackend with 2 concurrent workers.


[CV] END  MAE: (test=-0.198) R2: (test=0.325) RMSE: (test=-0.277) total time=   0.4s
[CV] END  MAE: (test=-0.227) R2: (test=0.283) RMSE: (test=-0.309) total time=   0.4s
[CV] END  MAE: (test=-0.243) R2: (test=0.327) RMSE: (test=-0.315) total time=   0.4s
finished tuning model
reg rRMSE: 44%
reg rMAE: 32%
reg R2: 0.31
execution time: 61.73108768463135 seconds


[Parallel(n_jobs=2)]: Done   3 out of   3 | elapsed:    0.8s finished


In [11]:
'''
1-phase Gradient boosting with XGBoost:
'''

reg_scoring = model_config['reg_scoring']
reg_param_grid = model_config['xgb_param_grid']['reg_param_grid']

m = tune(X, y, seed, n_threads, verbose, cv, path_out)
m.XGB(reg_scoring, reg_param_grid, cv=cv, model="xgb", zir=False, log="yes")

Fitting 3 folds for each of 1 candidates, totalling 3 fits
[CV 1/3] END regressor__alpha=1, regressor__colsample_bytree=0.6, regressor__eta=0.01, regressor__gamma=1, regressor__max_depth=4, regressor__n_estimators=100, regressor__subsample=0.6;, score=(train=-0.246, test=-0.218) total time=   0.1s
[CV 2/3] END regressor__alpha=1, regressor__colsample_bytree=0.6, regressor__eta=0.01, regressor__gamma=1, regressor__max_depth=4, regressor__n_estimators=100, regressor__subsample=0.6;, score=(train=-0.229, test=-0.245) total time=   0.1s
[CV 3/3] END regressor__alpha=1, regressor__colsample_bytree=0.6, regressor__eta=0.01, regressor__gamma=1, regressor__max_depth=4, regressor__n_estimators=100, regressor__subsample=0.6;, score=(train=-0.228, test=-0.265) total time=   0.1s
[CV] END  MAE: (test=-0.245) R2: (test=0.256) RMSE: (test=-0.315) total time=   0.1s[CV] END  MAE: (test=-0.218) R2: (test=0.273) RMSE: (test=-0.287) total time=   0.1s



[Parallel(n_jobs=2)]: Using backend MultiprocessingBackend with 2 concurrent workers.


[CV] END  MAE: (test=-0.265) R2: (test=0.228) RMSE: (test=-0.337) total time=   0.1s
finished tuning model
reg rRMSE: 46%
reg rMAE: 35%
reg R2: 0.25
execution time: 0.886012077331543 seconds


[Parallel(n_jobs=2)]: Done   3 out of   3 | elapsed:    0.2s finished


In [12]:
'''
2-phase Gradient boosting with XGBoost:
'''

reg_scoring = model_config['reg_scoring']
clf_scoring = model_config['clf_scoring']

clf_param_grid = model_config['xgb_param_grid']['clf_param_grid']
reg_param_grid = model_config['xgb_param_grid']['reg_param_grid']

m = tune(X, y, seed, n_threads, verbose, cv, path_out)
m.XGB(reg_scoring, reg_param_grid, clf_scoring = clf_scoring, clf_param_grid = clf_param_grid,
      cv=cv, model="xgb", zir=True, log="yes")

Fitting 3 folds for each of 1 candidates, totalling 3 fits
[CV 2/3] END regressor__alpha=1, regressor__colsample_bytree=0.6, regressor__eta=0.01, regressor__gamma=1, regressor__max_depth=4, regressor__n_estimators=100, regressor__subsample=0.6;, score=(train=-0.229, test=-0.245) total time=   0.1s
[CV 1/3] END regressor__alpha=1, regressor__colsample_bytree=0.6, regressor__eta=0.01, regressor__gamma=1, regressor__max_depth=4, regressor__n_estimators=100, regressor__subsample=0.6;, score=(train=-0.246, test=-0.218) total time=   0.1s
[CV 3/3] END regressor__alpha=1, regressor__colsample_bytree=0.6, regressor__eta=0.01, regressor__gamma=1, regressor__max_depth=4, regressor__n_estimators=100, regressor__subsample=0.6;, score=(train=-0.228, test=-0.265) total time=   0.1s
[CV 2/3] END alpha=1, colsample_bytree=0.6, eta=0.01, gamma=1, max_depth=4, n_estimators=100, subsample=0.6;, score=0.625 total time=   0.1s[CV 1/3] END alpha=1, colsample_bytree=0.6, eta=0.01, gamma=1, max_depth=4, n_est

[Parallel(n_jobs=2)]: Using backend MultiprocessingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   3 out of   3 | elapsed:    0.2s finished


[CV] END  MAE: (test=-0.245) R2: (test=0.256) RMSE: (test=-0.315) total time=   0.1s
[CV] END  MAE: (test=-0.218) R2: (test=0.273) RMSE: (test=-0.287) total time=   0.2s


[Parallel(n_jobs=2)]: Using backend MultiprocessingBackend with 2 concurrent workers.


[CV] END  MAE: (test=-0.265) R2: (test=0.228) RMSE: (test=-0.337) total time=   0.1s


[Parallel(n_jobs=2)]: Done   3 out of   3 | elapsed:    0.3s finished
[Parallel(n_jobs=2)]: Using backend MultiprocessingBackend with 2 concurrent workers.
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


[CV] END  MAE: (test=-0.165) R2: (test=0.470) RMSE: (test=-0.245) total time=   0.2s[CV] END  MAE: (test=-0.220) R2: (test=0.252) RMSE: (test=-0.316) total time=   0.2s



  y = column_or_1d(y, warn=True)


[CV] END  MAE: (test=-0.253) R2: (test=0.224) RMSE: (test=-0.338) total time=   0.2s
finished tuning model
reg rRMSE: 46%
reg rMAE: 35%
reg R2: 0.25
zir rRMSE: 44%
zir rMAE: 31%
zir R2: 0.32
execution time: 2.437180995941162 seconds


[Parallel(n_jobs=2)]: Done   3 out of   3 | elapsed:    0.4s finished


In [13]:
'''
1-phase nearest neighbors with a bagged KNN
note: we need to define the number of bags when running KNN by defining bagging_estimators=30
'''

reg_scoring = model_config['reg_scoring']
reg_param_grid = model_config['knn_param_grid']['reg_param_grid']

m = tune(X, y, seed, n_threads, verbose, cv, path_out)
m.XGB(reg_scoring, reg_param_grid, cv=cv, model="knn", zir=False, log="yes", bagging_estimators=30)

Fitting 3 folds for each of 1 candidates, totalling 3 fits
[CV 1/3] END regressor__estimator__leaf_size=30, regressor__estimator__n_neighbors=3, regressor__estimator__p=1, regressor__estimator__weights=uniform, regressor__max_features=0.5, regressor__max_samples=0.5;, score=(train=-0.160, test=-0.168) total time=   0.1s[CV 2/3] END regressor__estimator__leaf_size=30, regressor__estimator__n_neighbors=3, regressor__estimator__p=1, regressor__estimator__weights=uniform, regressor__max_features=0.5, regressor__max_samples=0.5;, score=(train=-0.160, test=-0.191) total time=   0.1s

[CV 3/3] END regressor__estimator__leaf_size=30, regressor__estimator__n_neighbors=3, regressor__estimator__p=1, regressor__estimator__weights=uniform, regressor__max_features=0.5, regressor__max_samples=0.5;, score=(train=-0.164, test=-0.221) total time=   0.1s
[CV] END  MAE: (test=-0.191) R2: (test=0.504) RMSE: (test=-0.257) total time=   0.2s


[Parallel(n_jobs=2)]: Using backend MultiprocessingBackend with 2 concurrent workers.


[CV] END  MAE: (test=-0.175) R2: (test=0.518) RMSE: (test=-0.234) total time=   0.2s
[CV] END  MAE: (test=-0.218) R2: (test=0.492) RMSE: (test=-0.273) total time=   0.2s
finished tuning model
reg rRMSE: 37%
reg rMAE: 28%
reg R2: 0.5
execution time: 1.3672292232513428 seconds


[Parallel(n_jobs=2)]: Done   3 out of   3 | elapsed:    0.4s finished


In [14]:
'''
2-phase nearest neighbors with a bagged KNN
note: we need to define the number of bags when running KNN by defining bagging_estimators=30
'''

reg_scoring = model_config['reg_scoring']
clf_scoring = model_config['clf_scoring']

clf_param_grid = model_config['knn_param_grid']['clf_param_grid']
reg_param_grid = model_config['knn_param_grid']['reg_param_grid']

m = tune(X, y, seed, n_threads, verbose, cv, path_out)
m.XGB(reg_scoring, reg_param_grid,  clf_scoring = clf_scoring, clf_param_grid = clf_param_grid,  
      cv=cv, model="knn", zir=True, log="both", bagging_estimators=30)

Fitting 3 folds for each of 1 candidates, totalling 3 fits


[CV 1/3] END regressor__estimator__leaf_size=30, regressor__estimator__n_neighbors=3, regressor__estimator__p=1, regressor__estimator__weights=uniform, regressor__max_features=0.5, regressor__max_samples=0.5;, score=(train=-0.156, test=-0.163) total time=   0.1s
[CV 2/3] END regressor__estimator__leaf_size=30, regressor__estimator__n_neighbors=3, regressor__estimator__p=1, regressor__estimator__weights=uniform, regressor__max_features=0.5, regressor__max_samples=0.5;, score=(train=-0.157, test=-0.184) total time=   0.2s
[CV 3/3] END regressor__estimator__leaf_size=30, regressor__estimator__n_neighbors=3, regressor__estimator__p=1, regressor__estimator__weights=uniform, regressor__max_features=0.5, regressor__max_samples=0.5;, score=(train=-0.152, test=-0.208) total time=   0.1s
Fitting 3 folds for each of 1 candidates, totalling 3 fits
[CV 1/3] END regressor__estimator__leaf_size=30, regressor__estimator__n_neighbors=3, regressor__estimator__p=1, regressor__estimator__weights=uniform, 

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


[CV 1/3] END estimator__leaf_size=30, estimator__n_neighbors=3, estimator__p=1, estimator__weights=uniform, max_features=0.5, max_samples=0.5;, score=0.500 total time=   0.2s
[CV 2/3] END estimator__leaf_size=30, estimator__n_neighbors=3, estimator__p=1, estimator__weights=uniform, max_features=0.5, max_samples=0.5;, score=0.500 total time=   0.2s
Fitting 3 folds for each of 1 candidates, totalling 3 fits


  y = column_or_1d(y, warn=True)


[CV 3/3] END estimator__leaf_size=30, estimator__n_neighbors=3, estimator__p=1, estimator__weights=uniform, max_features=0.5, max_samples=0.5;, score=0.574 total time=   0.2s


  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


[CV] END ............................. accuracy: (test=0.552) total time=   0.2s
[CV] END ............................. accuracy: (test=0.518) total time=   0.2s


[Parallel(n_jobs=2)]: Using backend MultiprocessingBackend with 2 concurrent workers.
  y = column_or_1d(y, warn=True)


[CV] END ............................. accuracy: (test=0.537) total time=   0.2s


[Parallel(n_jobs=2)]: Done   3 out of   3 | elapsed:    0.4s finished
[Parallel(n_jobs=2)]: Using backend MultiprocessingBackend with 2 concurrent workers.


[CV] END  MAE: (test=-0.190) R2: (test=0.491) RMSE: (test=-0.261) total time=   0.2s
[CV] END  MAE: (test=-0.174) R2: (test=0.491) RMSE: (test=-0.240) total time=   0.2s
[CV] END  MAE: (test=-0.212) R2: (test=0.466) RMSE: (test=-0.280) total time=   0.1s


[Parallel(n_jobs=2)]: Done   3 out of   3 | elapsed:    0.3s finished
[Parallel(n_jobs=2)]: Using backend MultiprocessingBackend with 2 concurrent workers.
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


[CV] END  MAE: (test=-0.172) R2: (test=0.444) RMSE: (test=-0.251) total time=   0.4s[CV] END  MAE: (test=-0.185) R2: (test=0.443) RMSE: (test=-0.273) total time=   0.4s



  y = column_or_1d(y, warn=True)


[CV] END  MAE: (test=-0.218) R2: (test=0.375) RMSE: (test=-0.303) total time=   0.3s
finished tuning model
reg rRMSE: 38%
reg rMAE: 28%
reg R2: 0.48
zir rRMSE: 40%
zir rMAE: 28%
zir R2: 0.42
execution time: 4.431487321853638 seconds


[Parallel(n_jobs=2)]: Done   3 out of   3 | elapsed:    0.7s finished


TO DO:
    
Add print statement for log="both"

Add tau scoring