<div style="background-color:rgba(108, 29, 21, 0.5);">
    <h1><center>Importing Libraries and Data</center></h1>
</div>

This is a sandpit of sorts, where I explore different methods of stacking which came to my mind.
I am aware of the fact that adding more levels to stacking may improve accuracy further.

After getting weights for my simple ensemble, I am trying **different methods of stacking**-
1. Taking the predictions as separate and using them for training my meta-classifier
2. Aggregating the two predictions (mean) and using that for my meta.
3. Adding the predictions to whole training data.
4. Adding the predictions to a subset of the whole data, which has most 'important' features.

I have calculated total average AUCs for these 4 methods.

I hae compared the results as well and I hope it gives you an idea for trying out a different method of stacking as well. Note that **I have used only 10000 rows of the data**, to demonstrate, for speed and memory.

**Do upvote if you find this notebook useful :)**

In [None]:
import random
random.seed(123)

import pandas as pd
import numpy as np
import datatable as dt
import warnings
warnings.filterwarnings("ignore")

# importing feature selection and processing packages

from sklearn.model_selection import StratifiedKFold, train_test_split
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import MinMaxScaler,StandardScaler,PowerTransformer
from sklearn.decomposition import PCA

# importing modelling packages

from sklearn.linear_model import LogisticRegression, RidgeClassifier
from lightgbm import LGBMClassifier
from catboost import CatBoostClassifier
from xgboost import XGBClassifier

In [None]:
# using datatable for faster loading

train = pd.read_csv(r'../input/tabular-playground-series-oct-2021/train.csv',nrows=10000)
test = pd.read_csv(r'../input/tabular-playground-series-oct-2021/test.csv',nrows=10000)

<div style="background-color:rgba(108, 29, 21, 0.5);">
    <h1><center>Memory Reduction</center></h1>
</div>

In [None]:
def reduce_mem_usage(df, verbose=True):
    numerics = ['int16', 'int32', 'int64','float16', 'float32', 'float64']
    start_mem = df.memory_usage().sum() / 1024**2    
    for col in df.columns:
        col_type = df[col].dtypes
        if col_type in numerics:
            c_min = df[col].min()
            c_max = df[col].max()
            if str(col_type)[:3] == 'int':
                if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
                    df[col] = df[col].astype(np.int8)
                elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
                    df[col] = df[col].astype(np.int16)
                elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
                    df[col] = df[col].astype(np.int32)
                elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:
                    df[col] = df[col].astype(np.int64)  
            else:
                if c_min > np.finfo(np.float16).min and c_max < np.finfo(np.float16).max:
                    df[col] = df[col].astype(np.float16)
                else:
                    df[col] = df[col].astype(np.float32)    
    end_mem = df.memory_usage().sum() / 1024**2
    if verbose: print('Mem. usage decreased to {:5.2f} Mb ({:.1f}% reduction)'.format(end_mem, 100 * (start_mem - end_mem) / start_mem))
    return df

In [None]:
train = reduce_mem_usage(train)
test  = reduce_mem_usage(test)

<div style="background-color:rgba(108, 29, 21, 0.5);">
    <h1><center>Data Splitting</center></h1>
</div>

In [None]:
X = train.drop(columns=["id", "target"]).copy()
y = train["target"].copy()
test_for_model = test.drop(columns=["id"]).copy()

# freeing up some memory

del train
del test

<div style="background-color:rgba(108, 29, 21, 0.5);">
    <h1><center>Initialising Baseline Models</center></h1>
</div>

In [None]:
# using baseline xgb and catboost models - on gpu

cat_params = {"task_type": "GPU"}
xgb_params = {'tree_method': 'gpu_hist','predictor': 'gpu_predictor'}

# Simple Ensembling

In [None]:
folds = StratifiedKFold(n_splits = 10, random_state = 2021, shuffle = True)

predictions_cb = np.zeros(len(test_for_model))
predictions_xgb = np.zeros(len(test_for_model))

cat_oof = np.zeros(X.shape[0])
xgb_oof = np.zeros(X.shape[0])

for fold, (trn_idx, val_idx) in enumerate(folds.split(X,y)):
    print(f"Fold: {fold+1}")
    X_train, X_val = X.iloc[trn_idx], X.iloc[val_idx]
    y_train, y_val = y.iloc[trn_idx], y.iloc[val_idx]

    model_cb =  CatBoostClassifier(**cat_params,verbose=0,random_state=2021)
    model_xgb = XGBClassifier(**xgb_params,random_state=2021)
    
    model_cb.fit(X_train, y_train)
    pred_cb = model_cb.predict_proba(X_val)[:,1]
    cat_oof[val_idx] = pred_cb
    print('ROC of CB: ',roc_auc_score(y_val,pred_cb))
    
    model_xgb.fit(X_train, y_train)
    pred_xgb = model_xgb.predict_proba(X_val)[:,1]
    xgb_oof[val_idx] = pred_xgb
    print('ROC of XGB: ',roc_auc_score(y_val,pred_xgb))
    
    print("-"*50)
    
    predictions_cb += model_cb.predict_proba(test_for_model)[:,1] / folds.n_splits
    predictions_xgb += model_xgb.predict_proba(test_for_model)[:,1] / folds.n_splits

In [None]:
# calculating appropriate weights for ensemble

import scipy
def class_optimizer(X, a0, a1):
    oof = X[0]*a0 + (1-X[0])*a1
    return (1-roc_auc_score(y, oof))

res = scipy.optimize.minimize(
    fun=class_optimizer,
    x0=[0.5],
    args=tuple([cat_oof, xgb_oof]),
    method='BFGS',
    options={'maxiter': 1000})

print(res)
print(f"coef0 {res.x[0]}, coef1 {1-res.x[0]}")

In [None]:
ensemble_pred = res.x[0] * predictions_cb  + (1-res.x[0]) * predictions_xgb
sub['target'] = ensemble_pred
sub.to_csv('submission_simple_ensemble.csv',index = False)

# Stacking using the 2 predictions separately

In [None]:
# creating stack datasets for our meta-classifier

cat_train = pd.DataFrame(cat_oof,columns=['CAT_train'])
xgb_train = pd.DataFrame(xgb_oof,columns=['XGB_train'])
cat_test = pd.DataFrame(predictions_cb,columns=['CAT_train'])
xgb_test = pd.DataFrame(predictions_xgb,columns=['XGB_train'])

stack_x_train = pd.concat((cat_train,xgb_train), axis = 1)
stack_x_test = pd.concat((cat_test,xgb_test), axis = 1)

In [None]:
stk = StratifiedKFold(n_splits = 10, random_state = 42)

test_pred = 0
fold = 1
total_auc = 0

for train_index, valid_index in stk.split(stack_x_train, y):
    x_train, y_train = stack_x_train.iloc[train_index], y[train_index]
    x_valid, y_valid = stack_x_train.iloc[valid_index], y[valid_index]
    
    lr = LogisticRegression(n_jobs = -1, random_state = 42, C = 1000, max_iter = 1000)
    lr.fit(x_train, y_train)
    
    valid_pred = lr.predict_proba(x_valid)[:,1]
    test_pred += lr.predict_proba(stack_x_test)[:,1]
    auc = roc_auc_score(y_valid, valid_pred)
    total_auc += auc / 10
    print('Fold', fold, 'AUC :', auc)
    fold += 1
    
print('Total AUC score :', total_auc)

# Stacking using average of the 2 predictions

In [None]:
# creating stack datasets for our meta-classifier

stack_x_train['pred'] = stack_x_train.mean(axis=1)
stack_x_test['pred'] = stack_x_test.mean(axis=1)
stack_x_train = pd.DataFrame(stack_x_train['pred'])
stack_x_test = pd.DataFrame(stack_x_test['pred'])

In [None]:
stk = StratifiedKFold(n_splits = 10, random_state = 42)

test_pred = 0
fold = 1
total_auc = 0

for train_index, valid_index in stk.split(stack_x_train, y):
    x_train, y_train = stack_x_train.iloc[train_index], y[train_index]
    x_valid, y_valid = stack_x_train.iloc[valid_index], y[valid_index]
    
    lr = LogisticRegression(n_jobs = -1, random_state = 42, C = 1000, max_iter = 1000)
    lr.fit(x_train, y_train)
    
    valid_pred = lr.predict_proba(x_valid)[:, 1]
    test_pred += lr.predict_proba(stack_x_test)[:, 1]
    auc = roc_auc_score(y_valid, valid_pred)
    total_auc += auc / 10
    print('Fold', fold, 'AUC :', auc)
    fold += 1
    
print('Total AUC score :', total_auc)

# Stacking using whole data+predictions separately

In [None]:
# adding whole data to the predictions

stack_x_train = pd.concat((X,cat_train, xgb_train), axis = 1)
stack_x_test = pd.concat((test_for_model,cat_test, xgb_test), axis = 1)

In [None]:
stk = StratifiedKFold(n_splits = 10, random_state = 42)

test_pred = 0
fold = 1
total_auc = 0

for train_index, valid_index in stk.split(stack_x_train, y):
    x_train, y_train = stack_x_train.iloc[train_index], y[train_index]
    x_valid, y_valid = stack_x_train.iloc[valid_index], y[valid_index]
    
    lr = LogisticRegression(n_jobs = -1, random_state = 42, C = 1000, max_iter = 1000)
    lr.fit(x_train, y_train)
    
    valid_pred = lr.predict_proba(x_valid)[:, 1]
    test_pred += lr.predict_proba(stack_x_test)[:, 1]
    auc = roc_auc_score(y_valid, valid_pred)
    total_auc += auc / 10
    print('Fold', fold, 'AUC :', auc)
    fold += 1
    
print('Total AUC score :', total_auc)

# Stacking using important features+predictions separately

In [None]:
imp_features= ["f22", "f179", "f69", "f58", "f214", "f78", "f136", "f156",
               "f8", "f3", "f77", "f200", "f92", "f185", "f142", "f115", "f284"]
X_new = X[imp_features]
test_for_model_new = test_for_model[imp_features]

stack_x_train = pd.concat((X_new,cat_train, xgb_train), axis = 1)
stack_x_test = pd.concat((test_for_model_new,cat_test, xgb_test), axis = 1)

In [None]:
stk = StratifiedKFold(n_splits = 10, random_state = 42)

test_pred = 0
fold = 1
total_auc = 0

for train_index, valid_index in stk.split(stack_x_train, y):
    x_train, y_train = stack_x_train.iloc[train_index], y[train_index]
    x_valid, y_valid = stack_x_train.iloc[valid_index], y[valid_index]
    
    lr = LogisticRegression(n_jobs = -1, random_state = 42, C = 1000, max_iter = 1000)
    lr.fit(x_train, y_train)
    
    valid_pred = lr.predict_proba(x_valid)[:, 1]
    test_pred += lr.predict_proba(stack_x_test)[:, 1]
    auc = roc_auc_score(y_valid, valid_pred)
    total_auc += auc / 10
    print('Fold', fold, 'AUC :', auc)
    fold += 1
    
print('Total AUC score :', total_auc)

# Summary

1. Taking the predictions as separate and using them for training my meta-classifier - 0.8394 - BEST
2. Aggregating the two predictions (mean) and using that for my meta - 0.8345
3. Adding the predictions to whole training data. - 0.8301 - WORST (Weird)
4. Adding the predictions to a subset of the whole data, which has most 'important' features - 0.8384

<div style="background-color:rgba(108, 29, 21, 0.5);">
    <h1><center>Use for your own experiments and do upvote. Thanks :)</center></h1>
</div>