<font size=6>Tabular Playgroung Series - december 2021 - with CatBoost and pseudo labels</font><br>
Inspired by 
* https://www.kaggle.com/kaaveland/tps202112-reasonable-xgboost-model (thank you rkaveland)
* https://www.kaggle.com/remekkinas/tps-12-nn-tpu-pseudolabeling-0-95690 (thank you Remek Kinas)
------------
In this notebook I tried to use CatBoost, but even if results are beginning to be decent, there are not as high as with XGBoost.
Why CatBoost ? because Soil_Type and Wilderness features could be seen as categorical features.
- - - - - - -
What's new :<br>
* CatBoost instead of XGBoost
* randomized grid search
* feature importance with value permutation, for a set of features because it takes times. Only if ```GRID_SEARCH == FALSE```
* degugging mode (small train set for fast execution)
* some other plots
____________
To do by cross validation :<br>
* test more parameters
* drop useless features
- - - - - - -

|Version|CV accuracy|n folds|Public LB|Note|
|:--:|--:|--:|--:|--:|
|V7||2||DEBUG mode. A grid search for max_leaves and l2_leaf_reg|
|V8||2||A grid search for max_leaves and l2_leaf_reg|
|V9|0.96188|5|0.95620|max_leaves 255 & l2_leaf_reg 100 **with** a features selction|
|V10||5||max_leaves 255 & l2_leaf_reg 100 **without** features selction|

# 1. LIBRARIES & PARAMETERS

In [None]:
SEED = 666
DEBUG = False

# For Fetaure imprtance - Only if not GRID_SEARCH :
MAKE_VALUE_PERMUTATION = ["Elevation", "Hydrology_Elevation", "slope_o_elevation", "firep_m_elev", "h_hydro_eps_p_road"
    , "h_hydro_eps_p_fire", "Aspect2", "Aspect_sin", "Slope", "Hillshade_Sum", "Aspect", "Aspect_mod_360"
    , "Hillshade_3pm_clipped", "slope_m_elevation"]

FOLDS = 2 if DEBUG else 5
MAX_ELEM_IN_RANDOM_GRID_SEARCH = 2 if DEBUG else 16 # time control

## Grid of parameters


In [None]:
# V8
GRID = {"learning_rate":[.15], "subsample":[.25], "l2_leaf_reg":[50, 100], "max_leaves":[223, 255]
        # low value of Max_bin for better generalization, except for for Elevation and HydrologyxElevation which are very important
        # 19 is the index of Hydrology_Elevation in the train.columns, 
    , "border_count":[96], 'per_float_feature_quantization':[['0:border_count=256', '19:border_count=256']]
    , "depth":[1000]
    , "drop_feats":[
        ['slope_o_elevation', 'firep_m_elev', 'Hillshade_Noon_clipped_upper', 'Hillshade_3pm_clipped_upper'
        , 'Hillshade_9am_clipped_upper', 'Hillshade_Noon', 'Hillshade_3pm', 'Hillshade_9am']]}

# V9 # V10
GRID = {"learning_rate":[.15], "subsample":[.25], "l2_leaf_reg":[100], "max_leaves":[255]
    # low value of Max_bin for better generalization, except for for Elevation and HydrologyxElevation which are very important
    # 21 is the index of Hydrology_Elevation in the train.columns, 
    , "border_count":[96], 'per_float_feature_quantization':[['0:border_count=256', '21:border_count=256']]
    , "depth":[1000]
    , "drop_feats":[['slope_o_elevation', 'firep_m_elev', 'Hillshade_Noon_clipped_upper', 'Hillshade_3pm_clipped_upper'
            , 'Hillshade_9am_clipped_upper', 'Hillshade_Noon', 'Hillshade_3pm', 'Hillshade_9am']]}

from itertools import product

_, values = zip(*GRID.items())
GRID_SEARCH = len(list(product(*values))) > 1

## Libraries

In [None]:
import os, gc, random, pickle

import numpy as np 

import pandas as pd
pd.set_option('max_columns', 100)
pd.set_option('max_rows', 200)

import datatable as dt

import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter

import seaborn as sns
sns.set(style='darkgrid', context='notebook', rc={'figure.figsize': (16, 12), 'figure.frameon': False})

from sklearn.model_selection import StratifiedKFold
splits = StratifiedKFold(n_splits=FOLDS, shuffle=True, random_state=SEED)

from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

from catboost import CatBoostClassifier, Pool

## What are default values for CatBoost parameters ?

In [None]:
train_data = [[1, 4, 5, 6],
              [4, 5, 6, 7],
              [30, 40, 50, 60]]

train_labels = [0, 1, 0]
model = CatBoostClassifier(**{"task_type": "GPU", "random_seed": SEED,
    "loss_function": "MultiClass", 'eval_metric': "Accuracy", 
    "grow_policy": "Lossguide", "iterations":10, "max_leaves":255})  
model.fit(train_data,
          train_labels,
          verbose=False)
print(model.get_all_params())
print(model.get_all_params()["depth"])

# 2. DATA & FEATURE ENGINEERING
## Read data

In [None]:
%%time
train = dt.fread('../input/tabular-playground-series-dec-2021/train.csv').to_pandas()
test = dt.fread('../input//tabular-playground-series-dec-2021/test.csv').to_pandas()
sub = test[["Id"]].copy()
pseudo = dt.fread('../input/tps12-pseudolabels/tps12-pseudolabels_v2.csv').to_pandas()

## Feature Enginnnering

In [None]:
for df in [train, test, pseudo]:
    
    df["Sum_Wilderness_Area"] = df[[f"Wilderness_Area{i+1}" for i in range(4)]].sum(axis=1)
    df["Sum_Soil_Type"] = df[[f"Soil_Type{i+1}" for i in range(40)]].sum(axis=1)
    
    # I won't use those features like that
    df.drop([f"Wilderness_Area{i+1}" for i in range(4)] + [f"Soil_Type{i+1}" for i in range(40)] + ["Id"]
           , inplace=True, axis=1)
    
gc.collect()

Create categorical features for Soil_Type and Wilderness_Area features.<br>
Catboost will run faster with those features instead of original One hot Encoded features.<br>
Score is quite the same.

In [None]:
def categ_feats(feats, file, name_col):
    
    feats.update({...:None})
    
    df = dt.fread(file, columns=feats)
    df.to_csv("temp.csv")
    
    #  ";" to interpret all colomns like a unique column
    new_col = dt.fread("temp.csv", sep=";").to_pandas() 
    new_col.columns=[name_col]
    new_col[name_col] = new_col[name_col].astype('category')
    
    return new_col


for feats, new_col in zip([
    {f"Wilderness_Area{i+1}":f"Wilderness_Area{i+1}" for i in range(4)}
    , {f"Soil_Type{i+1}":f"Soil_Type{i+1}" for i in range(10)}
    , {f"Soil_Type{i+11}":f"Soil_Type{i+11}" for i in range(10)}
    , {f"Soil_Type{i+21}":f"Soil_Type{i+21}" for i in range(10)}
    , {f"Soil_Type{i+31}":f"Soil_Type{i+31}" for i in range(10)}
    ], ["all_wilderness_area", "ST_1_10", "ST_11_20", "ST_21_30", "ST_31_40"]):
    
    train = pd.concat([train, categ_feats(feats, '../input/tabular-playground-series-dec-2021/train.csv', new_col)], axis=1)
    test = pd.concat([test, categ_feats(feats, '../input/tabular-playground-series-dec-2021/test.csv', new_col)], axis=1)
    pseudo = pd.concat([pseudo, categ_feats(feats, '../input/tps12-pseudolabels/tps12-pseudolabels_v2.csv', new_col)], axis=1)

In [None]:
train.head()

In [None]:
for f in ["all_wilderness_area", "ST_1_10", "ST_11_20", "ST_21_30", "ST_31_40"]:
    print(f"Unique values in train for the new categorical feature {f} : {train[f].nunique()}")

## Reduce memory usage

In [None]:
def reduce_mem_usage(df, verbose = True):
    
    numerics = ['int8','int16', 'int32', 'int64', 'float16', 'float32', 'float64']
    start_mem = df.memory_usage().sum() / 1024**2

    for col in df.columns:
        col_type = df[col].dtypes

        if col_type in numerics:
            c_min = df[col].min()
            c_max = df[col].max()

            if str(col_type)[:3] == 'int':
                if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
                    df[col] = df[col].astype(np.int8)
                elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
                    df[col] = df[col].astype(np.int16)
                elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
                    df[col] = df[col].astype(np.int32)
                elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:
                    df[col] = df[col].astype(np.int64)  
            else:
                if c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:
                    df[col] = df[col].astype(np.float32)
                else:
                    df[col] = df[col].astype(np.float64)

    end_mem = df.memory_usage().sum() / 1024**2

    if verbose:
        print('Mem. usage decreased to {:5.2f} Mb ({:.1f}% reduction)'.format(end_mem, 100 * (start_mem - end_mem) / start_mem))
 
    return df

## Feature engineering for numerci features
Inspired by https://www.kaggle.com/kaaveland/tps202112-reasonable-xgboost-model

In [None]:
def start_at_eps(series, eps=1e-10): return series - series.min() + eps

wilderness = test.columns[test.columns.str.startswith('Wilderness')]
soil_type = test.columns[test.columns.str.startswith('Soil_Type')]
hillshade = test.columns[test.columns.str.startswith('Hillshade')]

all_df = pd.concat([train.assign(ds=0), pseudo.assign(ds=1), test.assign(ds=2)])

pos_h_hydrology = start_at_eps(all_df.Horizontal_Distance_To_Hydrology)
pos_v_hydrology = start_at_eps(all_df.Vertical_Distance_To_Hydrology)

all_df = pd.concat([
        all_df,

#        all_df[wilderness].sum(axis=1).rename('Wilderness_Sum').astype(np.float32),
#        all_df[soil_type].sum(axis=1).rename('Soil_Type_Sum').astype(np.float32),
    
        (all_df.Aspect % 360).rename('Aspect_mod_360'),
        (all_df.Aspect * np.pi / 180).apply(np.sin).rename('Aspect_sin').astype(np.float32),
        (all_df.Aspect - 180).where(all_df.Aspect + 180 > 360, all_df.Aspect + 180).rename('Aspect2'),

        (all_df.Elevation - all_df.Vertical_Distance_To_Hydrology).rename('Hydrology_Elevation'),
        all_df.Vertical_Distance_To_Hydrology.apply(np.sign).rename('Water_Vertical_Direction'),

        (pos_h_hydrology + pos_v_hydrology).rename('Manhatten_positive_hydrology').astype(np.float32),
        (all_df.Horizontal_Distance_To_Hydrology.abs() + all_df.Vertical_Distance_To_Hydrology.abs()).rename('Manhattan_abs_hydrology'),
        (pos_h_hydrology ** 2 + pos_v_hydrology ** 2).apply(np.sqrt).rename('Euclidean_positive_hydrology').astype(np.float32),
        (all_df.Horizontal_Distance_To_Hydrology ** 2 + all_df.Vertical_Distance_To_Hydrology ** 2).apply(np.sqrt).rename('Euclidean_hydrology'),

        all_df[hillshade].clip(upper=255).add_suffix('_clipped_upper'),
        all_df[hillshade].clip(lower=0, upper=255).add_suffix('_clipped'),
        all_df[hillshade].sum(axis=1).rename('Hillshade_sum'),  

        (all_df.Slope / all_df.Elevation).rename('slope_o_elevation'), # ALDPARIS
        (all_df.Slope * all_df.Elevation).rename('slope_m_elevation'), # ALDPARIS
        (all_df.Horizontal_Distance_To_Fire_Points * all_df.Elevation).rename('firep_m_elev'), # ALDPARIS
        (all_df.Horizontal_Distance_To_Roadways * all_df.Elevation).rename('road_m_elev'),
        (all_df.Vertical_Distance_To_Hydrology * all_df.Elevation).rename('vhydro_elevation'),
        (all_df.Elevation - all_df.Horizontal_Distance_To_Hydrology * .2).rename('elev_sub_.2_h_hydro').astype(np.float32),

        (all_df.Horizontal_Distance_To_Hydrology + all_df.Horizontal_Distance_To_Fire_Points).rename('h_hydro_p_fire'),
        (start_at_eps(all_df.Horizontal_Distance_To_Hydrology) + start_at_eps(all_df.Horizontal_Distance_To_Fire_Points)).rename('h_hydro_eps_p_fire').astype(np.float32),
        (all_df.Horizontal_Distance_To_Hydrology - all_df.Horizontal_Distance_To_Fire_Points).rename('h_hydro_s_fire'),
        (start_at_eps(all_df.Horizontal_Distance_To_Hydrology) + start_at_eps(all_df.Horizontal_Distance_To_Roadways)).rename('h_hydro_eps_p_road').astype(np.float32),

        (all_df.Horizontal_Distance_To_Fire_Points + all_df.Horizontal_Distance_To_Roadways).abs().rename('abs_h_fire_p_road'),
        (all_df.Horizontal_Distance_To_Fire_Points - all_df.Horizontal_Distance_To_Roadways).abs().rename('abs_h_fire_s_road'),
        ], axis=1)
    
types = {'Cover_Type': np.int8}
train = all_df.loc[all_df.ds == 0].astype(types).drop(columns=['ds'])
pseudo = all_df.loc[all_df.ds == 1].astype(types).drop(columns=['ds'])
test = all_df.loc[all_df.ds == 2].drop(columns=['Cover_Type', 'ds'])
    
del all_df, pos_h_hydrology, pos_v_hydrology

In [None]:
train = train[train["Cover_Type"] != 5]

if DEBUG:
    train = train.sample(frac=.05, random_state=SEED)
    pseudo = pseudo.sample(frac=.1, random_state=SEED)
    
le = LabelEncoder()
y = le.fit_transform(train.Cover_Type)
y_pseudo = le.transform(pseudo.Cover_Type)

gc.collect()
    
train = reduce_mem_usage(train)
pseudo = reduce_mem_usage(pseudo)
test = reduce_mem_usage(test)
    
gc.collect()

# All possible categorical features for catboost
cat_features = ["Sum_Wilderness_Area", "all_wilderness_area", "ST_1_10", "ST_11_20", "ST_21_30", "ST_31_40"]

In [None]:
print("Index of Elevation : {}".format(list(train.columns).index("Elevation")))
print("Index of Hydrology_Elevation : {}".format(list(train.columns).index("Hydrology_Elevation")))

# 3. CATBOOST
Inspired by https://www.kaggle.com/kaaveland/tps202112-reasonable-xgboost-model<br>
I'm using pandas.concat to preserve dtype of features (integer for all cat_features).

In [None]:
def make_trainset(X, X_pseudo, feats, seed = SEED):
    
    np.random.seed(seed)
    
    ix = np.arange(X.shape[0] + X_pseudo.shape[0])
    np.random.shuffle(ix)
    
    X_trn = pd.concat([X[feats], X_pseudo[feats]], axis=0)
    y_trn = pd.concat([X["Cover_Type"], X_pseudo["Cover_Type"]], axis=0)
    
    return X_trn.iloc[ix], y_trn.iloc[ix]

In [None]:
def cv_catboost(params, drop_feats=[], early_stopping_rounds=30):

    oof_proba = np.zeros((train.shape[0], len(le.classes_)), dtype=np.float32)

    feats = list(set(list(test.columns)) - set(drop_feats))
    cat_features_pos = [feats.index(f) for f in cat_features if f in feats]
    
    test_proba = np.zeros((test.shape[0], len(le.classes_)), dtype=np.float32)

    accs = [] ; feat_imp = {} ; df_feat_imp = pd.DataFrame(index=feats)
    
    if not GRID_SEARCH:
        fig, ax = plt.subplots(nrows = splits.n_splits, ncols = 3, figsize=(20, 5 * splits.n_splits))
        plt.subplots_adjust(hspace = 0.5, wspace = 0.3)

    for fold, (trn_idx, val_idx) in enumerate(splits.split(train, train["Cover_Type"])): 
    
        X_trn, y_trn = make_trainset(train.iloc[trn_idx], pseudo, feats, seed = SEED + fold)
        X_val, y_val = train.iloc[val_idx][feats], train.iloc[val_idx]["Cover_Type"]
    
        model = CatBoostClassifier(**params)  
    
        os.environ['PYTHONHASHSEED'] = str(SEED)

        model.fit(Pool(data = X_trn, label=y_trn, cat_features = cat_features_pos),
            use_best_model=True,
            verbose = 100,
            plot = False,
            eval_set=Pool(data = X_val, label=y_val, cat_features = cat_features_pos),
            early_stopping_rounds=early_stopping_rounds)
    
        oof_proba[val_idx] = model.predict_proba(X_val)
        accs.append( accuracy_score(y_val, le.inverse_transform(oof_proba[val_idx].argmax(axis=1))) )
        print("Fold {} - Accuracy {:.5f} - Best iteration : {}".format(
            fold + 1, accs[fold], model.get_best_iteration()))
    
        del X_trn, y_trn
        gc.collect()
        
        # Test prediction and feature importance by value permutation
        if not GRID_SEARCH:
    
            # Test prediction
            test_proba += model.predict_proba(test[feats]) / splits.n_splits
        
            # Feature importance with CatBoost
            df_temp = pd.DataFrame({'value': model.feature_importances_}, index=feats).sort_values("value", ascending = False)
            df_temp[-10:].plot.barh(ax=ax[fold, 0])
            ax[fold, 0].set_title(f"Catboost less important features - fold n°{fold+1}")
            ax[fold, 0].set_xlabel('Importance')
            df_feat_imp = pd.concat([df_feat_imp, df_temp.rename({"value":f'Fold{fold+1}'}, axis=1)], axis=1)

            # Learning history
            df_hist = pd.concat([pd.DataFrame(model.evals_result_["learn"]).rename(
                    {"Accuracy":"Train Accuracy", "MultiClass":"Train Loss"}, axis=1)
                , pd.DataFrame(model.evals_result_["validation"]).rename(
                    {"Accuracy":"Valid Accuracy", "MultiClass":"Valid Loss"}, axis=1)], axis=1)
            df_hist.index.names=["Iteration"]
            df_hist=df_hist[:-early_stopping_rounds]

            df_hist[["Train Loss", "Valid Loss"]].plot(ax=ax[fold, 1])
            df_hist[["Train Accuracy", "Valid Accuracy"]].plot(ax=ax[fold, 2])
            for p, t, l in zip(list(range(2)), ["Loss", "Accuracy"], [[0, .2], [.8, 1.]]):
                ax[fold, p+1].set_title(f"CatBoost fit History - fold n°{fold + 1}")
                ax[fold, p+1].set_xlabel('Iteration')
                ax[fold, p+1].set_ylabel(t)
                ax[fold, p+1].set(ylim=(l[0], l[1]))

            # Feature importance with value permutation
            for f, feat in enumerate(feats):
                if feat in MAKE_VALUE_PERMUTATION or MAKE_VALUE_PERMUTATION == "all":
                    if feat not in feat_imp.keys(): feat_imp[feat] = {}
                    temp_df = X_val.copy()
                    temp_df[feat] = np.random.permutation(temp_df[feat])
                    y_temp = model.predict_proba(temp_df)
                    y_temp = le.inverse_transform(y_temp.argmax(axis=1))
                    feat_imp[feat][fold] = accs[fold] - accuracy_score(y_val, y_temp)
                
            # Pickle save for later blend of models
            pickle.dump(oof_proba, open("oof_proba.pkl", "wb" ))
            pickle.dump(test_proba, open("test_proba.pkl", "wb" ))
                    
            # End if not grid search
                    
        del X_val, y_val
        if "temp_df" in locals(): del temp_df, y_temp
        
        gc.collect()
        
        # Feature importance
        if GRID_SEARCH and fold == 1: break # beacause it takes many times...
        # End loop
                
    print("Mean Accuracy {:.5f} - Std Accuracy {:.5f} - OOF Accuracy {:.5f}".format(
        np.mean(accs), np.std(accs), 
            accuracy_score(train["Cover_Type"], le.inverse_transform(oof_proba.argmax(axis=1)))))
    
    return {"acc":np.mean(accs), "test_proba":test_proba, "oof_proba":oof_proba, "feat_imp":feat_imp, "df_feat_imp":df_feat_imp}

In [None]:
params = {
    "task_type": "GPU", 
    "random_seed": SEED, # Useless : catboost is not deterministic with GPU...
    "loss_function": "MultiClass",
    'eval_metric': "Accuracy", 
    "grow_policy": "Lossguide",
    'iterations': 1000, 
    "learning_rate" : .2,
    "subsample": .2,
    "bootstrap_type": 'Poisson',
    "l2_leaf_reg": 25,
    "max_leaves": 255,
}

In [None]:
#%%time 
#cv_res = cv_catboost(params)

# 4. RANDOMIZED GRID SEARCH

In [None]:
%%time

all_cv_res = []

keys, values = zip(*GRID.items())
list_comb = list(product(*values))

print("With all GRID ({} combinations) : almost {} hours".format(len(list_comb), (len(list_comb)*18)//60+1))
final_list_comb = random.sample(list_comb, k = min(MAX_ELEM_IN_RANDOM_GRID_SEARCH, len(list_comb)))
print("With only {} combinations : almost {} hours".format(len(final_list_comb), (len(final_list_comb)*18)//60+1))

for i, comb in enumerate(final_list_comb):

    boost_params = params.copy()

    d = dict(zip(keys, comb))
    all_cv_res.append(d.copy())
    d.pop("drop_feats")
    boost_params.update(d)

    cv_res = cv_catboost(params = boost_params, drop_feats = all_cv_res[i]["drop_feats"])
    
    y_pred = le.inverse_transform(cv_res["oof_proba"].argmax(axis=1))
    print(classification_report(train["Cover_Type"], y_pred, digits = 3))

    all_cv_res[i].update({"acc":cv_res["acc"], "oof_acc":accuracy_score(train["Cover_Type"], y_pred)})
    print("\nMean CV Accuracy {:.5f} (std:{:.5f}) | OOF Accuracy : {:.5f}".format(
        np.mean(cv_res["acc"]), np.std(cv_res["acc"]), accuracy_score(train["Cover_Type"], y_pred)))
    
print(all_cv_res)

In [None]:
df_all_res_cv = pd.DataFrame(all_cv_res).sort_values("acc", ascending = False)
print(df_all_res_cv.iloc[0]["drop_feats"])
pickle.dump(df_all_res_cv, open("df_all_res_cv.pkl", "wb" ))

df_all_res_cv

# 5. FEATURES IMPORTANCE
## CatBoost importances features

In [None]:
if cv_res["df_feat_imp"].shape[1] > 0:
    cv_res["df_feat_imp"]["mean"] = cv_res["df_feat_imp"].mean(axis=1)
    cv_res["df_feat_imp"][["mean"]].sort_values("mean", ascending=True).plot(
        kind = "barh", figsize = (15,15), title = "Mean importance with CatBoost"
        , xlabel="Importance", legend = False)

## Importance with value permutation for a sample of features

In [None]:
def plot_feat_imp(feat_imp, drop=["Elevation", "Hydrology_Elevation"]):
    
    if feat_imp == {}: return 
    
    nf = len(feat_imp[list(feat_imp.keys())[0]])
    
    feat_imp_df = pd.DataFrame(feat_imp).transpose()
    scores_decrease = [i for i in range(nf)]
    scores_decrease_sign = [f"sign_{i}" for i in range(nf)]

    feat_imp_df[scores_decrease_sign] = 0
    for score_decrease, score_decrease_sign in zip(scores_decrease, scores_decrease_sign):
        feat_imp_df.loc[feat_imp_df[score_decrease]>0, score_decrease_sign] = 1
    
    feat_imp_df["nb_folds"] = feat_imp_df[scores_decrease_sign].sum(axis=1)
        
    feat_imp_df["mean"] = feat_imp_df[list(range(nf))].mean(axis=1)
    feat_imp_df.sort_values("mean", ascending = False, inplace=True)
    
    # drop to much important features
    if drop is not None:
        print("Very important feature dropped from plot :")
        for f in drop:
            print("{} : {:.6f}".format(f, feat_imp_df.loc[f, "mean"]))
        feat_imp_df.drop(drop, axis=0, inplace = True)
    
    fig = plt.figure(figsize = (15, int(feat_imp_df.shape[0] * 2/3)), constrained_layout=False)
    gs = fig.add_gridspec(nrows=20, ncols=5, left=0.05, right=0.95,
                        wspace=0.1, hspace=.1)
    ax1 = fig.add_subplot(gs[:, 0])
    ax2 = fig.add_subplot(gs[:, 2:])

    sns.barplot(x="nb_folds",
            y=feat_imp_df.index,
            data=feat_imp_df, ax=ax1, color = "green")
    ax1.set_title("N folds where feature is important")
    ax1.set_xlabel('Nb folds')

    sns.barplot(x = "mean", y = feat_imp_df.index ,data = feat_imp_df, ax = ax2, color = "green")
    ax2.set_title("Accuracy decrease after values permutation")
    
    ax2.set_xlabel('Accuracy difference after values permutations')

In [None]:
plot_feat_imp(cv_res["feat_imp"])

# 6. CONFUSION MATRIX 
inspired by :
* https://www.kaggle.com/ambrosm/tpsdec21-01-keras-quickstart
* and https://www.kaggle.com/teckmengwong/dcnv2-softmaxclassification (my recall compute is different)

In [None]:
def plot_confusion_matrix(y_true, y_pred, labels):
    
    cm = pd.DataFrame(confusion_matrix(y_true, y_pred, labels = labels), index = labels, columns = labels)
        
    plots = {
        'Count':[cm, ',d'],
        'Accuracy on diagonal (sum of matrix = 100%)': [cm / cm.sum().sum(), '.1%'],
        'Precision on diagonal (sum of each column = 100%)': [cm / cm.sum(axis = 0), '.1%'],
        'Recall on diagonal (sum of each row = 100%)' : [(cm.transpose() / cm.sum(axis = 1)).transpose(), '.1%'],
        # Recall : I don't trust cm / cm.sum(axis = 1) from https://www.kaggle.com/teckmengwong/dcnv2-softmaxclassification
    }
    
    fig, ax = plt.subplots(2,2, tight_layout = True, figsize = (15,15))
    plt.subplots_adjust(hspace = 0.7, wspace = 0.6)
    ax = ax.flatten()

    for idx, (title, (numbers, fmt)) in enumerate(plots.items()):

        sns.heatmap(
            data = numbers,
            cmap = sns.dark_palette("#69d", reverse=True, as_cmap=True),
            cbar = False,
            lw = 0.25,
            annot = True,
            fmt = fmt,
            ax = ax[idx]
        )
        
        ax[idx].set_title(title, fontweight='bold')
        ax[idx].set_ylabel('True label')
        ax[idx].set_xlabel('Predicted label')

In [None]:
plot_confusion_matrix(train["Cover_Type"], y_pred, le.classes_)

# 7. SUBMISSION

In [None]:
if not GRID_SEARCH:
    
    sub["Cover_Type"] = le.inverse_transform(cv_res["test_proba"].argmax(axis=1))
    sub.to_csv('submission.csv', index=False)

Distribution of the test predictions<br>
Inspired by https://www.kaggle.com/ambrosm/tpsdec21-01-keras-quickstart

In [None]:
if not GRID_SEARCH:

    plt.figure(figsize=(10,3))
    plt.hist(train["Cover_Type"], bins=np.linspace(0.5, 7.5, 8), density=True, label='Train labels')
    plt.hist(sub['Cover_Type'], bins=np.linspace(0.5, 7.5, 8), density=True, rwidth=0.7, label='Test predictions')
    plt.xlabel('Cover_Type')
    plt.ylabel('Frequency')
    plt.gca().yaxis.set_major_formatter(PercentFormatter(1.0))
    plt.legend()
    plt.show();