The competition is about fine margins, which i have had reasonable success by blending models and averaging many outputs. I havent had much success by removing outliers, transforming the data and attempting to normalize the target... all which seems counter intuitive.<br>

So is there another method we could use to gain a competative advantage, since a fractional improvement could mean a decent place on the leaderboard?

This kernel is inspired by the notebook: [handling-multimodal-distributions-fe-techniques](https://www.kaggle.com/iamleonie/handling-multimodal-distributions-fe-techniques/comments). <br>
The below shows there is potential to get a great score if we can split target into 2 distributions.<br>

In [None]:
# libs
import random
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
from sklearn.metrics import accuracy_score
from sklearn.mixture import GaussianMixture
from sklearn.metrics import confusion_matrix

from lightgbm import LGBMRegressor
from catboost import CatBoostRegressor
from lightgbm import LGBMClassifier
from xgboost import XGBRegressor

import optuna 
from optuna import Trial
from optuna.samplers import TPESampler

In [None]:
# load
df_train = pd.read_csv('/kaggle/input/tabular-playground-series-jan-2021/train.csv')
df_test = pd.read_csv('/kaggle/input/tabular-playground-series-jan-2021/test.csv')
df_train.info(verbose=False, memory_usage='deep')

In [None]:
# config
train_mode = False
target = 'target'
random_state = 42
x_cols = [c for c in df_train.columns if 'cont' in c] # all training columns

# Evaluation.
Simple kfold prediction, returns oof and an average prediction on the test set (which i disregard when tuning).

In [None]:
# evaluation function
def _evaluate(model, x_cols, df, target=target, n_folds=5):
    
    oof = np.zeros(len(df[target])) # means 'out of fold' - basically where we are going to store our test fold predictions
    preds_test = np.zeros(len(df_test)) # test set predictions

    # enum folds
    kf = KFold(n_splits=n_folds, random_state=random_state, shuffle=True)
    for idx_train, idx_test in kf.split(df):

        # setup test / train data
        x_train = df.loc[idx_train, x_cols].values
        y_train = df.loc[idx_train, target].values
        x_test = df.loc[idx_test, x_cols].values
        y_test = df.loc[idx_test, target].values
    
        # fit / predict
        model.fit(x_train, y_train)#, eval_set = [(x_test, y_test)], early_stopping_rounds=100, verbose=False)
        preds_train = model.predict(x_test) # train set predictions (used for hypertuning)
        preds_test += model.predict(df_test[x_cols].values) / n_folds
        
        # append train predictions
        oof[idx_test] = preds_train
    
    return oof, preds_test

# Baseline.
Lets get a baseline score to try and beat...

In [None]:
# get baseline score
lgbm = LGBMRegressor(seed=random_state)
oof, preds_test = _evaluate(lgbm, x_cols, df_train)

# plot
fig, ax = plt.subplots(figsize=(12, 4))
sns.kdeplot(df_train[target], color='b', label='actual')
sns.kdeplot(oof, color='r', label='prediction')
ax.set_title('mse: ' + str(round(mean_squared_error(df_train[target], oof, squared=False), 5)))
plt.show()

Mse of 0.702, you can also see above that the model hasnt quite handled the biomodal distribution. <br>
# Eda.
## Correlations.

In [None]:
# correlations / mask
corr = df_train.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))

# plot
fig, ax = plt.subplots(figsize=(11, 9))
cmap = sns.diverging_palette(230, 20, as_cmap=True)
sns.heatmap(corr, mask=mask, cmap=cmap, center=0, square=True, linewidths=.5, cbar_kws={"shrink": .6})

plt.show()

It seems too convenient that the target doesnt correlate with anything. Initial observations are that fields 1 and 6-13 correlate well and 2-5 not so well. <br>
It does lead you to think that this is 2 datasets merged together and the challenge is to unpick it.
## Boxplots (vs test data).

In [None]:
# setup plot
plot_cols = 5 # no of columns / change to preference
list_feats = [a for a in df_train.columns if 'cont' in a] # all features
total_rows = int(np.ceil(len(list_feats) / plot_cols))
fig, ax = plt.subplots(nrows=total_rows, ncols=plot_cols, figsize=(20, 4 * total_rows)) # setup subplots

# loop each feature / plot
j = 0 # keeps track of rows
for i in range(0, len(list_feats)): # loop index of each feature
    
    # create plot
    sns.boxplot(data=[df_train[list_feats[i]], df_test[list_feats[i]]], palette='vlag', ax=ax[j,i % plot_cols])
    ax[j,i % plot_cols].set_title(list_feats[i])
    ax[j,i % plot_cols].set_ylabel('')
    ax[j,i % plot_cols].set_xlabel('')
    
    # increment row
    if i % plot_cols == (plot_cols - 1): j += 1 # basically says at end of each column start a new row

plt.show()

Straight away you can see cont2 has a different distribution and cont7 / cont10 have extra outliers - we can trial removing them.
## Distributions.

In [None]:
# setup plot
plot_cols = 5 # no of columns / change to preference
list_feats = [a for a in df_train.columns if 'cont' in a] # all features
total_rows = int(np.ceil(len(list_feats) / plot_cols))
fig, ax = plt.subplots(nrows=total_rows, ncols=plot_cols, figsize=(20, 4 * total_rows)) # setup subplots

# loop each feature / plot
j = 0 # keeps track of rows
for i in range(0, len(list_feats)): # loop index of each feature
    
    # create plot
    sns.distplot(df_train[list_feats[i]], ax=ax[j,i % plot_cols])
    ax[j,i % plot_cols].set_title(list_feats[i])
    ax[j,i % plot_cols].set_ylabel('')
    ax[j,i % plot_cols].set_xlabel('')
    
    # increment row
    if i % plot_cols == (plot_cols - 1): j += 1 # basically says at end of each column start a new row

plt.show()

This dataset is riddled with multimodal distributions, we may get a better result from binning... something to test.

## Iqr.
Using the Interquartile Range (Iqr) to remove outliers is fairly common practice, I wonder why sklearn doesnt have its own Iqr function?

In [None]:
# mask of rows outside iqr
def _iqr_mask(feature):
    
    # compute quantiles
    q1 = df_train[feature].quantile(0.25)
    q3 = df_train[feature].quantile(0.75)
    iqr = q3 - q1
    min_quartile = q1 - 1.5 * iqr
    max_quartile = q3 + 1.5 * iqr

    # create bool mask
    mask_iqr = (df_train[feature] >= min_quartile) & (df_train[feature] <= max_quartile)
    
    return mask_iqr

# get iqr mask for target
mask_iqr_target = _iqr_mask(target)

# plot
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(20,5))
sns.boxplot(data=df_train[target], ax=ax[0,0])
sns.kdeplot(df_train[target], ax=ax[1,0])
sns.boxplot(data=df_train.loc[mask_iqr_target, target], ax=ax[0,1])
sns.kdeplot(df_train.loc[mask_iqr_target, target], ax=ax[1,1])
ax[0,0].set_title('pre iqr')
ax[0,1].set_title('post iqr')
plt.show()

Visually the distribution looks much better and yeilds a much better training score, however the same cannot be said about the LB score.

# Cont 7 / Cont 10
Again i get a better oof score when the following are removed, this isnt reflected in the LB score.

In [None]:
# # remove outliers / reset index
# df_train = df_train[df_train['cont7'] > -0.03]
# df_train = df_train.loc[_iqr_mask('cont10')]
# df_train.reset_index(drop=True, inplace=True)

# plot
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(20,5))
sns.boxplot(df_train['cont7'], ax=ax[0])
sns.boxplot(df_train['cont10'], ax=ax[1])
ax[0].set_title('cont7')
ax[1].set_title('cont10')
plt.show()

# Optuna.
Always been a fan of GridSearch, until i found Optuna (its much more flexible, uses ranges rather that brute forcing lists and works better on larger datasets)...

In [None]:
# lgbm / objective function
def objective(trial):
    
    # hyperparameters
    param = {
        'boosting_type':'gbdt',
        'num_leaves':trial.suggest_int('num_leaves', 3, 150),
        'max_depth':trial.suggest_int('max_depth', -1, 20),
        'learning_rate':trial.suggest_float('learning_rate', 0.001, 0.6),
        'n_estimators':trial.suggest_int('n_estimators', 50, 500),
        'min_child_weight':trial.suggest_float('min_child_weight', 0.2, 0.6),
        'min_child_samples':trial.suggest_int('min_child_samples', 15, 30),
        'subsample':trial.suggest_float('subsample', 0.5, 1.0),
        'subsample_freq':trial.suggest_int('subsample_freq', 3, 150),
        'random_state':random_state,
        'lambda_l1':trial.suggest_float('lambda_l1', 0.0, 5.0)
    }
    
    # model / evaluate
    lgbm = LGBMRegressor(**param) # model
    x_cols = [c for c in df_train.columns if 'cont' in c] # all columns
    oof, preds_test = _evaluate(lgbm, x_cols, df_train.reset_index(drop=True)) # evaluate
    
    return mean_squared_error(df_train[target].reset_index(drop=True), oof, squared=False)

if train_mode:

    # run study
    study = optuna.create_study(direction='minimize', sampler=TPESampler())
    study.optimize(objective, n_trials=200)

    # output study
    print (study.best_value)
    print (study.best_params)

In [None]:
# cat / objective function
def objective(trial):
    
    # hyperparameters
    param = {
        'iterations':1000,
        'verbose':False,
        'random_state':random_state,
        'loss_function':'RMSE',
        'bootstrap_type':'Bernoulli',
        'learning_rate':trial.suggest_float('learning_rate', 0.0001, 0.31),
        'max_depth':trial.suggest_int('max_depth', 3, 10),
        'colsample_bylevel':trial.suggest_float('colsample_bylevel', 0.3, 0.8),
    }
    
    # model / evaluate
    cat = CatBoostRegressor(**param) # model
    x_cols = [c for c in df_train.columns if 'cont' in c] # all columns
    oof, preds_test = _evaluate(cat, x_cols, df_train, target=target) # evaluate
    
    return mean_squared_error(df_train[target], oof, squared=False)

if train_mode:
    
    # run study
    study = optuna.create_study(direction='minimize', sampler=TPESampler())
    study.optimize(objective, n_trials=100)

    # output study
    print (study.best_value)
    print (study.best_params)

In [None]:
# xgb / objective function
def objective(trial):
    
    # hyperparameters
    param = {
        'random_state':random_state,
        'objective':'reg:squarederror',
        'booster':'gbtree',
        'learning_rate':trial.suggest_float('learning_rate', 0.001, 0.1),
        'alpha':trial.suggest_float('alpha', 0.001, 0.1),
        'colsample_bylevel':trial.suggest_float('colsample_bylevel', 0.05, 0.1),
        'colsample_bytree':trial.suggest_float('colsample_bytree', 0.1, 0.9),
        'gamma':trial.suggest_float('gamma', 0, 0.5),
        'max_depth':trial.suggest_int('max_depth', 3, 18),
        'min_child_weight': trial.suggest_float ('min_child_weight', 1, 20),
        #'reg_lambda':trial.suggest_int('reg_lambda', 0, 10),
        #'reg_alpha':trial.suggest_int('reg_alpha', 10, 50),
        'subsample':trial.suggest_float('subsample', 0.3, 0.7),
    }
    
    # model / evaluate
    xgb = XGBRegressor(**param) # model
    x_cols = [c for c in df_train.columns if 'cont' in c] # all columns
    oof, preds_test = _evaluate(xgb, x_cols, df_train, target=target) # evaluate
    
    return mean_squared_error(df_train[target], oof, squared=False)

if train_mode:
    
    # run study
    study = optuna.create_study(direction='minimize', sampler=TPESampler())
    study.optimize(objective, n_trials=100)

    # output study
    print (study.best_value)
    print (study.best_params)

Run each of our hyper optimized models, but this time keep hold of the test predictions...

In [None]:
# hyper params per each model
param_lgbm = {'num_leaves': 148, 'max_depth': 19, 'learning_rate': 0.04168752594808129, 'n_estimators': 468, 'min_child_weight': 0.35392113973764505, 'min_child_samples': 29, 'subsample': 0.9348697769228501, 'subsample_freq': 6, 'lambda_l1': 4.639129744838143}
param_cat = {'learning_rate': 0.07468089271003528, 'max_depth': 8, 'colsample_bylevel': 0.7338241468797853}
param_xgb = param_xgb = {'learning_rate': 0.0652222304334701, 'alpha': 0.0036866921576056855, 'colsample_bylevel': 0.09959606060270643, 'colsample_bytree': 0.863554381598069, 'gamma': 0.3959383978062547, 'max_depth': 15, 'min_child_weight': 19.357558021086128, 'subsample': 0.6991638855748524}

# lgbm hyperopt score
lgbm = LGBMRegressor(seed=random_state, **param_lgbm)
oof_lgbm, preds_lgbm = _evaluate(lgbm, x_cols, df_train, n_folds=5)
print ('hyper lgbm:', mean_squared_error(df_train[target], oof_lgbm, squared=False))

# catboost hyperopt score
cat = CatBoostRegressor(iterations=1000, verbose=False, random_state=random_state, loss_function='RMSE', bootstrap_type='Bernoulli', **param_cat)
oof_cat, preds_cat = _evaluate(cat, x_cols, df_train, n_folds=5)
print ('hyper cat:', mean_squared_error(df_train[target], oof_cat, squared=False))

# xgb score
xgb = XGBRegressor(random_state=random_state, **param_xgb)
oof_xgb, preds_xgb = _evaluate(xgb, x_cols, df_train, n_folds=2) # scaled back so it runs quicker for public.
print ('hyper xgb:', mean_squared_error(df_train[target], oof, squared=False))

# blend score
print ('hyper blend:', mean_squared_error(df_train[target], ((oof_lgbm * 0.4) + (oof_cat * 0.4) + (oof_xgb * 0.2)), squared=False))

Even though we have only gone from 0.702 to 0.696 - its a huge improvement!

# Submission (pt1).
Lets blend our model prediction to get a more robust average...

In [None]:
# merge model outputs to test
df_test['hyper_lgbm'] = preds_lgbm
df_test['hyper_cat'] = preds_cat
df_test['hyper_xgb'] = preds_xgb
df_test['hyper_blend'] = (preds_lgbm * 0.4) + (preds_cat * 0.4) + (preds_xgb * 0.2)

# assign target
df_test['target'] = df_test['hyper_blend']
df_test[['id','target']].to_csv('submission_hyper_blend.csv', index=False)

This produces a LB score of: <br>
0.69851 - no records removed. <br>
0.69858 - cont7 / cont 10 outliers removed. <br>
0.69762 - xgb blend.

# Seeds.
If at first you dont succeed, pick another seed and try again (or blend a few)... <br>
The following is commented out because of the time it takes to run.

In [None]:
n_folds = 3 # controls no of times model is run and no of folds within evaluation

# arrays to store oof and predictions
oof_lgbm_seeds = np.zeros(len(df_train)) 
oof_car_seeds = np.zeros(len(df_train))
preds_lgbm_seeds = np.zeros(len(df_test)) 
preds_car_seeds = np.zeros(len(df_test)) 

# enum each fold / generate random num (used for seed)
for g in range(n_folds):
    rand_num = random.randint(1, 5000)
    
    # lgbm predictions
    lgbm = LGBMRegressor(seed=rand_num, **param_lgbm)
    oof, preds_lgbm = _evaluate(lgbm, x_cols, df_train, n_folds=n_folds)
    oof_lgbm_seeds += (oof / n_folds) # merge with seed results
    preds_lgbm_seeds += (preds_lgbm / n_folds) # merge with seed results
    
    # cat boost predictions
    cat = CatBoostRegressor(iterations=1000, verbose=False, random_state=rand_num, loss_function='RMSE', bootstrap_type='Bernoulli', **param_cat)
    oof, preds_cat = _evaluate(cat, x_cols, df_train, n_folds=n_folds)
    oof_car_seeds += (oof / n_folds) # merge with seed results
    preds_car_seeds += (preds_cat / n_folds) # merge with seed results

In [None]:
print ('hyper lgbm:', mean_squared_error(df_train[target], oof_lgbm_seeds, squared=False))
print ('hyper cat:', mean_squared_error(df_train[target], oof_car_seeds, squared=False))

# hyper lgbm: 0.6959915123976514
# hyper cat: 0.697341630050542

# Submission (pt2).

In [None]:
# merge model outputs to test
df_test['seed_lgbm'] = preds_lgbm_seeds
df_test['seed_cat'] = preds_car_seeds
df_test['seed_blend'] = (preds_lgbm_seeds * 0.4) + (preds_car_seeds * 0.6)

# assign target
df_test['target'] = df_test['seed_blend']
df_test[['id','target']].to_csv('submission_seed_blend.csv', index=False)

Lb: 0.69847 - barely any difference.

# Gaussian Mixture (Gmm).
This was inspired by [https://www.kaggle.com/iamleonie/handling-multimodal-distributions-fe-techniques/comments](https://www.kaggle.com/iamleonie/handling-multimodal-distributions-fe-techniques/comments).<br>
The basics of the following process are:
* Split the data into 2 based on the Gmm distribution output.
* Train a classification model to predict which distribution it belongs to.
* Train 2 regression models on each of the distributions.

In [None]:
# apply gaussian mix / assign bins back to training
gmm = GaussianMixture(n_components=2, random_state=42)
gmm.fit(df_train[target].values.reshape(-1, 1))
df_train['target_gmm'] = gmm.predict(df_train[target].values.reshape(-1, 1))

# masks for each gmm bin
mask_gmm_0 = df_train['target_gmm'] == 0
mask_gmm_1 = df_train['target_gmm'] == 1

# plot
fig, ax = plt.subplots(figsize=(12, 4))
sns.kdeplot(data=df_train.loc[mask_gmm_0, target], label='gmm bin 0')
sns.kdeplot(data=df_train.loc[mask_gmm_1, target], label='gmm bin 1')
plt.show()

In [None]:
# oof masks for dataframe
oof_gmm_0, preds = _evaluate(lgbm, x_cols, df_train.loc[mask_gmm_0].reset_index(drop=True))
oof_gmm_1, preds = _evaluate(lgbm, x_cols, df_train.loc[mask_gmm_1].reset_index(drop=True))

# plot
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(18, 4))
sns.kdeplot(data=df_train.loc[mask_gmm_0, target], ax=ax[0], label='actual')
sns.kdeplot(data=oof_gmm_0, ax=ax[0], label='prediction')
sns.kdeplot(data=df_train.loc[mask_gmm_1, target], ax=ax[1], label='actual')
sns.kdeplot(data=oof_gmm_1, ax=ax[1], label='prediction')

# decorate
ax[0].set_title('gmm_0: ' + str(round(mean_squared_error(df_train.loc[mask_gmm_0, target], oof_gmm_0, squared=False), 5)))
ax[1].set_title('gmm_1: ' + str(round(mean_squared_error(df_train.loc[mask_gmm_1, target], oof_gmm_1, squared=False), 5)))
plt.show()

Interesting results, we have gone from 0.71 to 0.38/0.36. If we could only split the test set accurately we could be onto a winning prediction...
# Classification.
To make this easier to test we will reserve a random 3rd of the training data for a final test.

In [None]:
# split training data / reset indexes
df_gmm_test = df_train.sample(100000).copy()
df_gmm_train = df_train[(~df_train['id'].isin(df_gmm_test['id'].to_list()))].copy()
df_gmm_test.reset_index(drop=True, inplace=True)
df_gmm_train.reset_index(drop=True, inplace=True)
print (df_gmm_train.shape, df_gmm_test.shape)

In [None]:
# baseline score for gmm classifier
lgbm_gmm_class = LGBMClassifier(random_state=random_state)
oof, preds = _evaluate(lgbm_gmm_class, x_cols, df_gmm_train, target='target_gmm')
print(accuracy_score(df_gmm_train['target_gmm'], oof))
print(confusion_matrix(df_gmm_train['target_gmm'], oof))

Unfortunatley not that accurately, can we boost a better value...

# Optuna.

In [None]:
# # lgbm / objective function
# def objective(trial):
    
#     # hyperparameters
#     param = {
#         'num_leaves':trial.suggest_int('num_leaves', 3, 150),
#         'max_depth':trial.suggest_int('max_depth', -1, 20),
#         'learning_rate':trial.suggest_float('learning_rate', 0.001, 0.6),
#         'n_estimators':trial.suggest_int('n_estimators', 50, 500),
#         'min_child_weight':trial.suggest_float('min_child_weight', 0.2, 0.6),
#         'min_child_samples':trial.suggest_int('min_child_samples', 15, 30),
#         'subsample':trial.suggest_float('subsample', 0.5, 1.0),
#         'subsample_freq':trial.suggest_int('subsample_freq', 3, 150),
#         'random_state':random_state,
#         'lambda_l1':trial.suggest_float('lambda_l1', 0.0, 5.0)
#     }
    
#     # model / evaluate
#     lgbm_gmm_class = LGBMClassifier(**param) # model
#     oof, preds = _evaluate(lgbm_gmm_class, x_cols, df_gmm_train, target='target_gmm') # evaluate
    
#     return accuracy_score(df_gmm_train['target_gmm'], oof)

# # run study
# study = optuna.create_study(direction='maximize',sampler=TPESampler())
# study.optimize(objective, n_trials=30) # scaled down for public (i maxed out at 0.6)

# # output study
# print (study.best_value)
# print (study.best_params)

To summarise the below:
* Fit a classification tree that can predict which part of the Gmm it belongs to.
* Fit 2 regression trees on each Gmm. (roughly about 0.38 mse).
* Re fit the Lgbm and Catboost models on the new training set.
* Perform predictions on the test set.

In [None]:
# optuna hyper params
param_lgbm_gmm_class = {'num_leaves': 144, 'max_depth': 10, 'learning_rate': 0.0934759313797003, 'n_estimators': 107, 'min_child_weight': 0.30363944085393396, 'min_child_samples': 17, 'subsample': 0.9463879414095, 'subsample_freq': 80, 'lambda_l1': 4.67131782429971}

# init classifiers / regressors (re-fit other ones on new partition)
lgbm_gmm_class = LGBMClassifier(random_state=random_state, **param_lgbm_gmm_class)
lgbm_gmm_reg_0 = LGBMRegressor(seed=random_state)
lgbm_gmm_reg_1 = LGBMRegressor(seed=random_state)
lgbm_gmm_reg = LGBMRegressor(seed=random_state, **param_lgbm)
lgbm_cat_reg = CatBoostRegressor(iterations=1000, verbose=False, random_state=random_state, loss_function='RMSE', bootstrap_type='Bernoulli', **param_cat)

# masks for each gmm bin
mask_gmm_0 = df_gmm_train['target_gmm'] == 0
mask_gmm_1 = df_gmm_train['target_gmm'] == 1

# fit
lgbm_gmm_class.fit(df_gmm_train[x_cols], df_gmm_train['target_gmm'])
lgbm_gmm_reg_0.fit(df_gmm_train.loc[mask_gmm_0, x_cols].reset_index(drop=True), df_gmm_train.loc[mask_gmm_0, target].reset_index(drop=True))
lgbm_gmm_reg_1.fit(df_gmm_train.loc[mask_gmm_1, x_cols].reset_index(drop=True), df_gmm_train.loc[mask_gmm_1, target].reset_index(drop=True))
lgbm_gmm_reg.fit(df_gmm_train[x_cols], df_gmm_train[target])
lgbm_cat_reg.fit(df_gmm_train[x_cols], df_gmm_train[target])

# class predictions (obtain probabilities rather than class)
df_gmm_test[['gmm_class_' + str(a) + '_pred' for a in lgbm_gmm_class.classes_]] = lgbm_gmm_class.predict_proba(df_gmm_test[x_cols])

# regression predictions (on all data)
df_gmm_test['lgbm'] = lgbm_gmm_reg.predict(df_gmm_test[x_cols])
df_gmm_test['cat'] = lgbm_cat_reg.predict(df_gmm_test[x_cols])
df_gmm_test['hyper_blend'] = (df_gmm_test['lgbm'] * 0.5) + (df_gmm_test['cat'] * 0.5)
df_gmm_test['gmm_reg_0'] = lgbm_gmm_reg_0.predict(df_gmm_test[x_cols])
df_gmm_test['gmm_reg_1'] = lgbm_gmm_reg_1.predict(df_gmm_test[x_cols])

In [None]:
# plot
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(18, 4))
sns.kdeplot(df_gmm_test['lgbm'], ax=ax[0], label='lgbm')
sns.kdeplot(df_gmm_test['cat'], ax=ax[0], label='cat boost')
sns.kdeplot(df_gmm_test['hyper_blend'], ax=ax[0], label='blend')
sns.kdeplot(df_gmm_test['target'], ax=ax[0], label='target')
sns.kdeplot(df_gmm_test['gmm_reg_0'], ax=ax[1], label='regression 0')
sns.kdeplot(df_gmm_test['gmm_reg_1'], ax=ax[1], label='regression 1')
sns.kdeplot(df_gmm_test['target'], ax=ax[1], label='target')

# decorate
ax[0].set_title('full regressors')
ax[1].set_title('gmm regressors')
plt.show()

Making a prediction using the classifier and the 2 Gmm regressors doesnt actually yeild a good prediction (its easy to see why). <br>
So instead, can we gain a competative advantage using our blended model then use our Gmm classifier (whereby the prediction is 90%+) to use the 2 Gmm regressors, lets call this 'super target'...

In [None]:
# create super target
def _super_target(x):
    
    # certain class
    if x['gmm_class_0_pred'] >= 0.7: return x['gmm_reg_0']
    if x['gmm_class_1_pred'] >= 0.7: return x['gmm_reg_1']
    
    return x['hyper_blend']

# apply super target
df_gmm_test['super_target'] = df_gmm_test.apply(lambda x: _super_target(x), axis=1)

# plot
fig, ax = plt.subplots(figsize=(18, 4))
sns.kdeplot(df_gmm_test[target])
sns.kdeplot(df_gmm_test['super_target'])
plt.show()

Visually this looks promising, but the LB tells me otherwise...

In [None]:
print ('blend:', mean_squared_error(df_gmm_test[target], df_gmm_test['hyper_blend'], squared=False))
print ('super:', mean_squared_error(df_gmm_test[target], df_gmm_test['super_target'], squared=False))