## <div style='background:#2b6684;color:white;padding:0.5em;border-radius:0.2em'>Introduction</div>

Hi,
I just wanted to share a quick notebook sharing the results of spot-checking the "big-three".<br>
The list of models can be modified to expand your spot-check. The results can then be used to decide on which algorithm to focus on,<br>
for Feature-Engineering and Hyperparameter-Tuning.

Thanks for checking out this "quick-one" and have fun with this competition!

Best Regards

## <div style='background:#2b6684;color:white;padding:0.5em;border-radius:0.2em'>Import, Preprocess, Spot-Check</div>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from warnings import filterwarnings
filterwarnings('ignore')

plt.rcParams['font.family'] = 'serif'
cmap = sns.color_palette("ch:start=.2,rot=-.3")
sns.set_palette(cmap)

In [None]:
# read dataframe
df_train = pd.read_csv('../input/tabular-playground-series-sep-2021/train.csv')
df_test = pd.read_csv('../input/tabular-playground-series-sep-2021/test.csv')
sample_submission = pd.read_csv('../input/tabular-playground-series-sep-2021/sample_solution.csv')

In [None]:
# prepare dataframe for modeling
X = df_train.drop(columns=['id','claim']).copy()
y = df_train['claim'].copy()

test_data = df_test.drop(columns=['id']).copy()

In [None]:
# create preprocessing pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer

pipeline = Pipeline([
    ('impute', SimpleImputer()),
    ('scale', StandardScaler())
])

In [None]:
# model params
lgbm_params = {
    'device_type' : 'gpu'
}

catb_params = {
    'task_type' : 'GPU',
    'devices' : '0',
    'verbose' : 0
}

xgb_params = {
    'predictor': 'gpu_predictor',
    'tree_method': 'gpu_hist',
    'gpu_id' : 0,
    'verbosity': 0
}

In [None]:
# spot checking which model to chose
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve, auc
from lightgbm import LGBMClassifier
from xgboost import XGBClassifier
from catboost import CatBoostClassifier

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, shuffle=True, random_state=1)

# preprocessing
X_train = pipeline.fit_transform(X_train)
X_test = pipeline.transform(X_test)

models = [
    ('LGBM', LGBMClassifier	(**lgbm_params)),
    ('CATB', CatBoostClassifier(**catb_params)),
    ('XGB', XGBClassifier(**xgb_params))
]

scores = dict()

for name, model in models:
    model.fit(X_train, y_train)
    y_hat = model.predict_proba(X_test)[:,1]
    fpr, tpr, _ = roc_curve(y_test, y_hat)
    auc_score = auc(fpr, tpr)
    scores[name] = auc_score

In [None]:
scores_df = pd.DataFrame([scores]).transpose().rename(columns={0:'AUC'})

fig, ax = plt.subplots(figsize=(12,6))

sns.barplot(
    data=scores_df,
    x='AUC',
    y=scores_df.index,
    orient='h',
    ax=ax
)

for idx in range(0, len(scores_df)):
    x = scores_df['AUC'][idx]
    ax.annotate(
        s=f"AUC: {np.round(x,3)}",
        xy=(x-0.01, idx),
        va='center', ha='right'
    )

sns.despine(left=True)
plt.show()