# VotingClassifier + Power Averaging - TPS Oct 2021

## What is Power Averaging?

The main idea behind Power Averaging is that we want **highly correlated** submissions to combine to get a better AUC.

But why does Power Averaging work?
1. AUC judges score based on ranking only. Example: (0,1,2) has the same AUC as (0,50,100)
2. Power Averaging **amplifies** the distance between probabilities.
3. This makes the order of ranks clearer = better AUC.

In case you want more explanation on Power Averaging, check out my previous notebooks:
* [Simple Power Averaging](https://www.kaggle.com/edrickkesuma/power-averaging-is-your-friend)
* [In depth Power Averaging](https://www.kaggle.com/edrickkesuma/in-depth-power-averaging-0-81848)

## What is VotingClassifier?

VotingClassifier takes a group of classifiers/models and **averages** out their predictions.

Remember that we need **highly correlated** submissions to get a good AUC. 

The easiest way to get them is to just change the **random_state** when creating each model. This way, the model makes predictions differently but they are still close to each other = high correlation.

You create several of these models with different random states and put them in VotingClassifier. 

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import datatable as dt

from xgboost import XGBClassifier
import time
from sklearn.ensemble import VotingClassifier

import gc

# Read in the data

In [None]:
%%time

train_data = dt.fread('../input/tabular-playground-series-oct-2021/train.csv')
test_data = dt.fread('../input/tabular-playground-series-oct-2021/test.csv')

print(train_data.shape, test_data.shape)

In [None]:
%%time 

# Credit: https://www.kaggle.com/hardyxu52/tps-oct-2021-reduce-memory-usage-but-faster
for i, col in enumerate(train_data):
    if col.type.name == 'float64':
        train_data[:,i] = dt.as_type(col, 'float32')

for i, col in enumerate(test_data):
    if col.type.name == 'float64':
        test_data[:,i] = dt.as_type(col, 'float32')

train_data = train_data.to_pandas()
test_data = test_data.to_pandas()

In [None]:
train_data = train_data.set_index('id', drop=True)
test_data = test_data.set_index('id', drop=True)
train_data.head()

# Preprocessing

In [None]:
# Turn True/False into 0s and 1s
bool_cols_train = []
for i, col in enumerate(train_data.columns):
    if train_data[col].dtypes == bool:
        bool_cols_train.append(i)

In [None]:
bool_cols_test = []
for i, col in enumerate(test_data.columns):
    if test_data[col].dtypes == bool:
        bool_cols_test.append(i)

In [None]:
train_data.iloc[:, bool_cols_train] = train_data.iloc[:, bool_cols_train].astype(int)
test_data.iloc[:, bool_cols_test] = test_data.iloc[:, bool_cols_test].astype(int)

In [None]:
features = train_data.drop('target', axis=1).columns
label = 'target'

In [None]:
X = train_data[features].copy()
y = train_data[label].copy()
X_test = test_data.copy()

del train_data, test_data
gc.collect()

In [None]:
X['std'] = X.std(axis=1)
X['min'] = X.min(axis=1)
X['max'] = X.max(axis=1)
X['var'] = X.var(axis=1)

X_test['std'] = X_test.std(axis=1)
X_test['min'] = X_test.min(axis=1)
X_test['max'] = X_test.max(axis=1)
X_test['var'] = X_test.var(axis=1)

# Create models + VotingClassifier

In an ideal situation, you could run each batch on a different kernel for for **parallel** training and predicting.

In [None]:
# Credit: https://www.kaggle.com/shenurisumanasekara/tabular-october-xgbclassifier-stepbystep
# Set optimal hyperparameters
params = {
    'max_depth': 6,
    'n_estimators': 5500,
    'subsample': 0.6000000000000001,
    'colsample_bytree': 0.2,
    'colsample_bylevel': 0.4,
    'min_child_weight': 0.0475667709098205,
    'reg_lambda': 50.33144833870577,
    'reg_alpha': 0.01634917276171278,
    'gamma': 5.507875585868313,
    'booster': 'gbtree',
    'eval_metric': 'auc',
    'tree_method': 'gpu_hist',
    'predictor': 'gpu_predictor',
    'use_label_encoder': False
}

In [None]:
# Batch 1

xgb_clf1 = XGBClassifier(**params, random_state=1)
xgb_clf2 = XGBClassifier(**params, random_state=2)
xgb_clf3 = XGBClassifier(**params, random_state=3)
xgb_clf4 = XGBClassifier(**params, random_state=4)
xgb_clf5 = XGBClassifier(**params, random_state=5)

In [None]:
estimators=[('xgb1', xgb_clf1), 
            ('xgb2', xgb_clf2), 
            ('xgb3', xgb_clf3), 
            ('xgb4', xgb_clf4),
            ('xgb5', xgb_clf5)
           ]

start = time.time()
print(f'fitting ...')
model = VotingClassifier(estimators=estimators, voting='soft', verbose=True)
model.fit(X, y)

print('predicting ...')
model_preds1 = model.predict_proba(X_test)[:, -1]

elapsed = time.time() - start
print(f'elapsed time: {elapsed:.2f}sec\n')
print('model_preds1 ready!')

In [None]:
# Batch 2

xgb_clf6 = XGBClassifier(**params, random_state=6)
xgb_clf7 = XGBClassifier(**params, random_state=7)
xgb_clf8 = XGBClassifier(**params, random_state=8)
xgb_clf9 = XGBClassifier(**params, random_state=9)
xgb_clf10 = XGBClassifier(**params, random_state=10)

In [None]:
estimators=[('xgb6', xgb_clf6), 
            ('xgb7', xgb_clf7), 
            ('xgb8', xgb_clf8), 
            ('xgb9', xgb_clf9),
            ('xgb10', xgb_clf10)
           ]

start = time.time()
print(f'fitting ...')
model = VotingClassifier(estimators=estimators, voting='soft', verbose=True)
model.fit(X, y)

print('predicting ...')
model_preds2 = model.predict_proba(X_test)[:, -1]

elapsed = time.time() - start
print(f'elapsed time: {elapsed:.2f}sec\n')
print('model_preds2 ready!')

In [None]:
# Batch 3

xgb_clf11 = XGBClassifier(**params, random_state=11)
xgb_clf12 = XGBClassifier(**params, random_state=12)
xgb_clf13 = XGBClassifier(**params, random_state=13)
xgb_clf14 = XGBClassifier(**params, random_state=14)
xgb_clf15 = XGBClassifier(**params, random_state=15)

In [None]:
estimators=[('xgb11', xgb_clf11), 
            ('xgb12', xgb_clf12), 
            ('xgb13', xgb_clf13), 
            ('xgb14', xgb_clf14),
            ('xgb15', xgb_clf15)
           ]

start = time.time()
print(f'fitting ...')
model = VotingClassifier(estimators=estimators, voting='soft', verbose=True)
model.fit(X, y)

print('predicting ...')
model_preds3 = model.predict_proba(X_test)[:, -1]

elapsed = time.time() - start
print(f'elapsed time: {elapsed:.2f}sec\n')
print('model_preds3 ready!')

# Check for correlation

In [None]:
import matplotlib as plt
import plotly.figure_factory as ff
import plotly.express as px

group_labels = ['batch1', 'batch2', 'batch3']

data = np.corrcoef([model_preds1, model_preds2, model_preds3])
fig=px.imshow(data,x=group_labels, y=group_labels)

fig.show()

# Power averaging

The formula is **Final Predictions = (Predictions1^Power + Predictions2^Power + Predictions3^Power) / n_predictions**

In [None]:
sample_sub = pd.read_csv('../input/tabular-playground-series-oct-2021/sample_submission.csv')
ensemble = sample_sub.copy()

power = 4

ensemble.loc[:,'target'] = (model_preds1**power + model_preds2**power + model_preds3**power)/3

In [None]:
ensemble

In [None]:
ensemble.to_csv('submission.csv', index=False)

# Closing thoughts

This strategy carried me to 20th place last TPS. Though, it seems that it doesn't work that well on this dataset. Particularly because the correlations aren't as high.. 

There might be potential for this to work if the model is trained differently. Eg. Changing learning rates and stacking from what I've seen.

Open to any suggestions.