# Stacking ensembling

Vamos a crear un ensamble de tipo stacking. Para ello se utilizaron las siguientes fuentes:
 - https://mlwave.com/kaggle-ensembling-guide/
 - http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/
 - https://github.com/emanuele/kaggle_pbr/blob/master/blend.py

Para ello se desarrollaron las siguientes secciones:
 - [Predictores base](#Predictores-base)
     - [Xgboost](#Xgboost)
     - [Random Forest](#Random-Forest)
     - [AdaBoost](#AdaBoost)
 - [Metafeatures](#Metafeatures)
 - [Predictor Stacking](#Predictor-Stacking)

In [105]:
from sklearn.model_selection import cross_val_score, RandomizedSearchCV
from sklearn.ensemble import RandomForestRegressor, AdaBoostRegressor
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import KFold
from xgboost import XGBRegressor
from time import time
import pandas as pd
import os

In [6]:
if '__file__' in locals():
    current_folder = os.path.dirname(os.path.abspath(__file__))
else:
    current_folder = os.getcwd()

set_de_entrenamiento_testing_y_prediccion = '"{}"'.format(os.path.join(
    current_folder,
    '..',
    'Set de entrenamiento, testing y predicción.ipynb'
))
merge_features = '"{}"'.format(os.path.join(current_folder, '..', 'Features', 'Merge features.ipynb'))

Cargo el df con los features.

In [7]:
pd.options.mode.chained_assignment = None
%run $merge_features

KeyboardInterrupt: La limpieza ya corrió en este Kernel

KeyboardInterrupt: La limpieza ya corrió en este Kernel

KeyboardInterrupt: La limpieza ya corrió en este Kernel

In [9]:
assert(df_features.shape[0] == get_clean_df()['person'].unique().shape[0])

Cargo el set de entrenamiento.

In [35]:
%run $set_de_entrenamiento_testing_y_prediccion

labels_with_features = labels.merge(df_features, how='inner', on='person')
train = labels_with_features.drop('label', axis=1)
train_target = labels_with_features['label']

### Predictores base

En esta sección vamos a preparar los predictores base a utilizar. Estos son los mismos que se encuentran en la carpeta *Algoritmos de ML*.

In [11]:
base_predictors = []

#### Xgboost

Nota: vamos a usar XGBRegressor para tener la misma interfaz con el resto de los predictores. Los hiperparámetros tienen distinto nombre, pero producen los mismos resultados. Se puede consultar la documentación de xgboost para encontrar los nombres de los parámetros: https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBRegressor. 

In [21]:
param = {
    'objective': 'reg:logistic',
    'colsamble_bylevel': 1,
    'colsamble_bytree': 1,
    'min_child_weight': 1,
    'learning_rate': 0.3,
    'max_delta_step': 5,
    'n_estimators': 25,
    'reg_lambda': 3,
    'max_depth': 9,
    'silent': True,
    'subsample': 1,
    'reg_alpha': 2,
    'gamma': 10
}

base_predictors.append(XGBRegressor(**param))

#### Random Forest

In [14]:
param = {
    'bootstrap': True,
    'max_depth': 9,
    'max_features': 37,
    'min_samples_leaf': 30,
    'min_samples_split': 7,
    'n_estimators': 784
}

base_predictors.append(RandomForestRegressor(**param))

#### AdaBoost

In [19]:
param = {
    'n_estimators': 124,
    'loss': 'linear',
    'learning_rate': 0.07,
    'base_estimator': DecisionTreeRegressor(criterion='mse', max_depth=4, max_features=None,
            max_leaf_nodes=None, min_impurity_decrease=0.0,
            min_impurity_split=None, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=None, splitter='best')
}

base_predictors.append(AdaBoostRegressor(**param))

### Metafeatures

Ahora vamos a realizar una predicción por cada predictor base, y las vamos a agregar a una copia del set de entrenamiento (*train_meta*).

In [76]:
train_meta = train.copy()
for predictor in base_predictors:
    train_meta[predictor.__class__.__name__] = np.nan

Para realizar las predicciones utilizar cross validation.

In [77]:
%%time
kf = KFold(n_splits=10, shuffle=False)
for train_i, validation_i in kf.split(train):    
    for predictor in base_predictors:
        # como warm_start=False cada vez que llamo fit, el modelo se reinicia
        predictor.fit(train.iloc[train_i], target.iloc[train_i]) # train
        train_meta[predictor.__class__.__name__].iloc[validation_i] = predictor.predict(train.iloc[validation_i]) # predict

CPU times: user 7min 39s, sys: 96 ms, total: 7min 39s
Wall time: 7min 39s


In [79]:
train_meta.head(3)

Unnamed: 0_level_0,screen_resolution_height mean,screen_resolution_width mean,screen_resolution_height std,screen_resolution_width std,ad campaign hit,brand listing,checkout,conversion,generic listing,lead,search engine hit,searched products,...,Saturday,Sunday,Thursday,Tuesday,Wednesday,madrugada,tarde,noche,maniana,RandomForestRegressor,AdaBoostRegressor,XGBRegressor
person,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1
0566e9c1,568.0,320.0,0.0,0.0,6.0,3.0,1.0,1.0,15.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0,1,0,0,0.02153,0.134871,0.019391
6ec7ee77,640.0,360.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0.065669,0.312528,0.111775
abe7a2fb,640.0,360.0,0.0,0.0,9.0,14.0,1.0,0.0,9.0,0.0,4.0,6.0,...,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0.024404,0.134112,0.014011


### Predictor Stacking

Ahora vamos a entrenar un nuevo modelo utilizando como features las predicciones anteriores (metafeatures). También podemos agregar algunos de los features originales.

Para esto realizamos un Random Search. TODO 

In [141]:
param = {
    'solver': 'liblinear'
}
regr = LogisticRegression(**param)
stack_train = train_meta[[predictor.__class__.__name__ for predictor in base_predictors]]

In [143]:
%%time
scores = cross_val_score(regr, stack_train, train_target, cv=10, scoring='roc_auc')
print("Accuracy: %0.6f (+/- %0.6f)" % (scores.mean(), scores.std() * 2))

Accuracy: 0.875144 (+/- 0.024633)
CPU times: user 212 ms, sys: 4 ms, total: 216 ms
Wall time: 217 ms


### Bayesian Optimization

In [151]:
pbounds = {
    'max_depth': (2, 20),
    'eta': (0, 0.3),
    'gamma': (0, 10),
#     'min_child_weight': (1, 5),
#     'max_delta_step': (1, 5),
#     'subsample': (0, 1),
#     'colsample_bytree': (0, 1),
#     'colsample_bylevel': (0, 1),
#     'lambda': (1, 3),
#     'alpha': (0, 2)
}

discrete = ['max_depth'] # parámetros discretos
cv_splits = 10 # cantidad de splits en el cv
num_round = 100 # cantidad máxima de boosts

In [152]:
dtrain = xgb.DMatrix(stack_train, label=train_target)
def cv_score_xgb(**param):
    param['silent'] = 1
    param['objective'] = 'reg:logistic'
    
    # transformo los valores que deben ser discretos
    for d in discrete:
        param[d] = int(param[d])
    
    # hago el cv
    scores = xgb.cv(param, dtrain, nfold=cv_splits, metrics='auc', verbose_eval=False, shuffle=False, stratified=False, num_boost_round=num_round)
    return scores['test-auc-mean'].max()

In [155]:
%%time
optimizer = BayesianOptimization(f=cv_score_xgb, pbounds=pbounds)
optimizer.maximize(
    init_points=5,
    n_iter=100,
)

|   iter    |  target   |    eta    |   gamma   | max_depth |
-------------------------------------------------------------
| [0m 1       [0m | [0m 0.8775  [0m | [0m 0.1599  [0m | [0m 9.376   [0m | [0m 3.968   [0m |
| [95m 2       [0m | [95m 0.8775  [0m | [95m 0.2449  [0m | [95m 6.801   [0m | [95m 2.428   [0m |
| [0m 3       [0m | [0m 0.8713  [0m | [0m 0.09288 [0m | [0m 3.098   [0m | [0m 13.05   [0m |
| [0m 4       [0m | [0m 0.868   [0m | [0m 0.276   [0m | [0m 1.789   [0m | [0m 11.12   [0m |
| [0m 5       [0m | [0m 0.8692  [0m | [0m 0.2701  [0m | [0m 3.239   [0m | [0m 14.14   [0m |
| [0m 6       [0m | [0m 0.8696  [0m | [0m 0.02418 [0m | [0m 9.984   [0m | [0m 2.067   [0m |
| [0m 7       [0m | [0m 0.8292  [0m | [0m 2.45e-09[0m | [0m 10.0    [0m | [0m 20.0    [0m |
| [0m 8       [0m | [0m 0.8762  [0m | [0m 0.2969  [0m | [0m 8.086   [0m | [0m 10.51   [0m |
| [0m 9       [0m | [0m 0.5     [0m | [0m 0.0     

| [0m 80      [0m | [0m 0.8776  [0m | [0m 0.3     [0m | [0m 10.0    [0m | [0m 2.0     [0m |
| [0m 81      [0m | [0m 0.8766  [0m | [0m 0.3     [0m | [0m 8.782   [0m | [0m 9.501   [0m |
| [0m 82      [0m | [0m 0.8746  [0m | [0m 0.3     [0m | [0m 5.824   [0m | [0m 12.26   [0m |
| [0m 83      [0m | [0m 0.8744  [0m | [0m 0.3     [0m | [0m 8.044   [0m | [0m 14.98   [0m |
| [0m 84      [0m | [0m 0.8756  [0m | [0m 0.3     [0m | [0m 5.608   [0m | [0m 9.915   [0m |
| [0m 85      [0m | [0m 0.8767  [0m | [0m 0.3     [0m | [0m 10.0    [0m | [0m 19.04   [0m |
| [0m 86      [0m | [0m 0.8738  [0m | [0m 0.3     [0m | [0m 6.196   [0m | [0m 16.37   [0m |
| [0m 87      [0m | [0m 0.8659  [0m | [0m 0.3     [0m | [0m 2.691   [0m | [0m 19.27   [0m |
| [0m 88      [0m | [0m 0.8778  [0m | [0m 0.3     [0m | [0m 6.32    [0m | [0m 2.0     [0m |
| [0m 89      [0m | [0m 0.8659  [0m | [0m 0.3     [0m | [0m 0.0     [0m | 

In [154]:
optimizer.max

{'params': {'eta': 0.2990783546409911,
  'gamma': 3.1805084110474757,
  'max_depth': 2.0045991562435495},
 'target': 0.8780648000000001}

In [157]:
optimizer.max

{'params': {'eta': 0.3, 'gamma': 5.263545124083896, 'max_depth': 2.0},
 'target': 0.8779366999999999}