<a href="https://www.kaggle.com/rsizem2/tps-10-21-catboost-optuna-starter-w-pruning?scriptVersionId=84928750" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# CatBoost Hyperparameter Search

In this notebook we use optuna to perform a hyperparameter search on a Catboost model with a custom [pruner](https://optuna.readthedocs.io/en/stable/reference/pruners.html). Unfortunately, CatBoost does not yet have a built-in [integration](https://optuna.readthedocs.io/en/stable/reference/integration.html) like LightGBM and XGBoost.

We check each set of parameters using k-fold cross validation. Our pruner checks the validation AUC on each fold and compares it to the previously seen models, if our current validation AUC is in the lower half of seen models, we exit the trial early (prune), thus saving some time by not training as many unpromising models.

**Note:** This notebook will take several hours to run. To shorten the runtime adjust `NUM_TRIALS` below.

In [1]:
# Global variables for testing changes to this notebook quickly
RANDOM_SEED = 0
NUM_FOLDS = 3
MAX_TREES = 20000
EARLY_STOP = 150
NUM_TRIALS = 50

In [2]:
# General imports
import numpy as np
import pandas as pd
import datatable as dt
import time
import gc

# Model and evaluation
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import StratifiedKFold, train_test_split
from catboost import CatBoostClassifier
import catboost

# Optuna
import optuna
from optuna.visualization import plot_param_importances, plot_parallel_coordinate
from optuna.pruners import PercentilePruner

# Hide warnings (makes optuna output easier to parse)
import warnings
warnings.filterwarnings('ignore')

# Preparing the Data

1. Load data with `datatable` and convert to `pandas`
2. Reduce memory usage by downcasting datatypes
3. Get holdout set from training data using a stratified scheme
4. Save categorical features

In [3]:
# Helper function for downcasting 
def reduce_memory_usage(df, verbose=True):
    start_mem = df.memory_usage().sum() / 1024 ** 2
    for col, dtype in df.dtypes.iteritems():
        if dtype.name.startswith('int'):
            df[col] = pd.to_numeric(df[col], downcast ='integer')
        elif dtype.name == 'bool':
            df[col] = df[col].astype('int8')
        elif dtype.name.startswith('float'):
            df[col] = pd.to_numeric(df[col], downcast ='float')
        
    end_mem = df.memory_usage().sum() / 1024 ** 2
    if verbose:
        print(
            "Mem. usage decreased to {:.2f} Mb ({:.1f}% reduction)".format(
                end_mem, 100 * (start_mem - end_mem) / start_mem
            )
        )
    return df

In [4]:
%%time

# Load training data
train = dt.fread(r'../input/tabular-playground-series-oct-2021/train.csv').to_pandas()
train = reduce_memory_usage(train)

# Holdout set for testing our models
train, holdout = train_test_split(
    train,
    test_size = 0.5,
    shuffle = True,
    stratify = train['target'],
    random_state = RANDOM_SEED,
)

train.reset_index(drop = True, inplace = True)
holdout.reset_index(drop = True, inplace = True)

# Save features and categorical features
features = [x for x in train.columns if x not in ['id','target']]
categorical_features = [i for i,x in enumerate(features) if train[x].dtype.name.startswith("int")]

Mem. usage decreased to 963.21 Mb (48.7% reduction)
CPU times: user 39 s, sys: 44.2 s, total: 1min 23s
Wall time: 1min 32s


# CatBoost

We create a function to train a CatBoost model and return the holdout AUC.

## 1. Default Parameters

* `Bernoulli` bootstrap
* `Plain` boosting type

In [5]:
# Default CatBoost params, used for ALL models considered
default_params = dict(            
    random_state = RANDOM_SEED,
    n_estimators = MAX_TREES,
    early_stopping_rounds = EARLY_STOP,
    boosting_type = 'Plain',
    bootstrap_type = 'Bernoulli',
    eval_metric = 'Logloss',
    task_type = 'GPU',
)

## 2. Scoring Function

We define a scoring function which performs cross-validation on a training sets and predicts on a holdout set. We prune based on cross-validation and evaluate using the holdout score.

* `trial` - optuna trial object passed if used as part of an optuna trial
* `model_params` - parameters passed to `CatBoostClassifier`
* `fit_params` - parameters passed to the `fit` method.

In [6]:
def score_catboost(trial = None, model_params = {}, fit_params = {}):
    
    # Store the holdout predictions
    holdout_preds = np.zeros((holdout.shape[0],))
    scores = np.zeros(NUM_FOLDS)
    
    # Stratified k-fold cross-validation
    skf = StratifiedKFold(n_splits = NUM_FOLDS, shuffle = True, random_state = RANDOM_SEED)
    for fold, (train_idx, valid_idx) in enumerate(skf.split(train, train['target'])):
        
        # Training and Validation Sets
        X_train, y_train = train[features].iloc[train_idx], train['target'].iloc[train_idx]
        X_valid, y_valid = train[features].iloc[valid_idx], train['target'].iloc[valid_idx]
        
        start = time.time()
        
        # Define Model
        model = CatBoostClassifier(**{**default_params, **model_params})
        gc.collect()
        
        model.fit(
            X_train, y_train,
            verbose = False,
            eval_set = [(X_valid, y_valid)],
            cat_features = categorical_features,
            use_best_model = True,
            **fit_params
        )
        
        # validation/holdout predictions
        valid_preds = model.predict_proba(X_valid)[:, 1]
        holdout_preds += model.predict_proba(holdout[features])[:, 1] / NUM_FOLDS
        valid_auc = roc_auc_score(y_valid, valid_preds)
        end = time.time()
        print(f'Fold {fold} AUC: {round(valid_auc, 6)} in {round((end-start) / 60, 2)} minutes.')
        
        if trial:
            # Use pruning on fold AUC
            trial.report(
                value = valid_auc,
                step = fold
            )
            # prune slow trials and bad fold AUCs
            if trial.should_prune() or round(end - start, 1) > 480:
                raise optuna.TrialPruned()
        
        time.sleep(0.5)
        
    return roc_auc_score(holdout['target'], holdout_preds)

## 3. Pruning

There's no built-in integration for CatBoost but we can still prune based on the fold AUC, which should still save a decent amount of time

* `n_startup_trials` - number of trials (models trained) before pruning starts
* `n_warmup_steps` - number of iterations before pruning checks
* `interval_steps` - number of iterations between pruning checks
* `n_min_trials` - skip pruning check if too few trials

In [7]:
# Tweak Pruner settings
pruner = PercentilePruner(
    percentile = 0.66,
    n_startup_trials = 5,
    n_warmup_steps = 0,
    interval_steps = 1,
    n_min_trials = 5,
)

# Hyperparameter Search

In [8]:
def parameter_search(trials):
    
    # Optuna objective function
    def objective(trial):
        model_params = dict( 
            # default 
            max_depth = trial.suggest_int(
                "max_depth", 2, 8
            ), 
            # default 0.03
            learning_rate = trial.suggest_loguniform(
                "learning_rate", 0.009, 0.03
            ),
            # default 
            min_child_samples = trial.suggest_int(
                "min_child_samples", 1, 20000
            ), 
            # default 
            random_strength = trial.suggest_uniform(
                "random_strength", 1, 100
            ), 
            # default 
            leaf_estimation_iterations = trial.suggest_int(
                "leaf_estimation_iterations", 1, 20
            ),             
            subsample = trial.suggest_discrete_uniform(
                'subsample', 0.2, 1.0, 0.001
            ),
            # default 3.0
            reg_lambda = trial.suggest_loguniform(
                'reg_lambda', 1e-10, 100
            ),
        )
        
        return score_catboost(trial, model_params = model_params)
    
    optuna.logging.set_verbosity(optuna.logging.DEBUG)
    study = optuna.create_study(pruner = pruner, direction = "maximize")
    # (nearly) default
    study.enqueue_trial({
        'max_depth': 6, 
        'learning_rate': 0.0125730000436306,
        'min_child_samples': 1, 
        'random_strength': 1, 
        'leaf_estimation_iterations': 10,
        'subsample': 1.0, 
        'reg_lambda': 3, 
    })
    study.optimize(objective, n_trials=trials)
    return study

In [9]:
study = parameter_search(NUM_TRIALS)

[32m[I 2022-01-11 01:48:52,587][0m A new study created in memory with name: no-name-5e89bd8f-2393-443b-b398-4259ccc74d74[0m
[37m[D 2022-01-11 01:48:52,589][0m Trial 0 popped from the trial queue.[0m


Fold 0 AUC: 0.855915 in 3.89 minutes.
Fold 1 AUC: 0.856843 in 2.75 minutes.
Fold 2 AUC: 0.852764 in 2.49 minutes.


[32m[I 2022-01-11 01:58:03,810][0m Trial 0 finished with value: 0.8567309410174533 and parameters: {'max_depth': 6, 'learning_rate': 0.0125730000436306, 'min_child_samples': 1, 'random_strength': 1, 'leaf_estimation_iterations': 10, 'subsample': 1.0, 'reg_lambda': 3}. Best is trial 0 with value: 0.8567309410174533.[0m


Fold 0 AUC: 0.8561 in 2.73 minutes.
Fold 1 AUC: 0.856942 in 2.88 minutes.
Fold 2 AUC: 0.852965 in 2.31 minutes.


[32m[I 2022-01-11 02:06:02,499][0m Trial 1 finished with value: 0.8567118979116093 and parameters: {'max_depth': 3, 'learning_rate': 0.015467235977943172, 'min_child_samples': 1554, 'random_strength': 32.142501990482394, 'leaf_estimation_iterations': 6, 'subsample': 0.8720000000000001, 'reg_lambda': 0.001434694014790261}. Best is trial 0 with value: 0.8567309410174533.[0m


Fold 0 AUC: 0.856079 in 2.02 minutes.
Fold 1 AUC: 0.856902 in 2.03 minutes.
Fold 2 AUC: 0.852937 in 1.86 minutes.


[32m[I 2022-01-11 02:12:00,160][0m Trial 2 finished with value: 0.8567454313510312 and parameters: {'max_depth': 4, 'learning_rate': 0.022635500778733744, 'min_child_samples': 8485, 'random_strength': 32.81521395766903, 'leaf_estimation_iterations': 9, 'subsample': 0.901, 'reg_lambda': 6.160657054034542e-06}. Best is trial 2 with value: 0.8567454313510312.[0m


Fold 0 AUC: 0.856056 in 2.97 minutes.
Fold 1 AUC: 0.8569 in 2.98 minutes.
Fold 2 AUC: 0.852958 in 2.96 minutes.


[32m[I 2022-01-11 02:20:57,909][0m Trial 3 finished with value: 0.8565072022158046 and parameters: {'max_depth': 2, 'learning_rate': 0.016470203675977985, 'min_child_samples': 7048, 'random_strength': 46.88601099531622, 'leaf_estimation_iterations': 6, 'subsample': 0.493, 'reg_lambda': 5.422294279443791e-09}. Best is trial 2 with value: 0.8567454313510312.[0m


Fold 0 AUC: 0.856385 in 2.84 minutes.
Fold 1 AUC: 0.85718 in 2.86 minutes.
Fold 2 AUC: 0.853196 in 2.78 minutes.


[32m[I 2022-01-11 02:29:30,589][0m Trial 4 finished with value: 0.8567890227998773 and parameters: {'max_depth': 5, 'learning_rate': 0.013255723998037535, 'min_child_samples': 13618, 'random_strength': 19.80046051984836, 'leaf_estimation_iterations': 3, 'subsample': 0.394, 'reg_lambda': 2.437971169514396e-10}. Best is trial 4 with value: 0.8567890227998773.[0m
[32m[I 2022-01-11 02:31:02,864][0m Trial 5 pruned. [0m


Fold 0 AUC: 0.855999 in 1.53 minutes.


[32m[I 2022-01-11 02:34:10,094][0m Trial 6 pruned. [0m


Fold 0 AUC: 0.856342 in 3.11 minutes.


[32m[I 2022-01-11 02:36:53,286][0m Trial 7 pruned. [0m


Fold 0 AUC: 0.856074 in 2.71 minutes.


[32m[I 2022-01-11 02:39:28,503][0m Trial 8 pruned. [0m


Fold 0 AUC: 0.856246 in 2.58 minutes.


[32m[I 2022-01-11 02:42:58,840][0m Trial 9 pruned. [0m


Fold 0 AUC: 0.856054 in 3.5 minutes.
Fold 0 AUC: 0.856383 in 3.42 minutes.


[32m[I 2022-01-11 02:49:55,264][0m Trial 10 pruned. [0m


Fold 1 AUC: 0.857098 in 3.49 minutes.


[32m[I 2022-01-11 02:52:16,111][0m Trial 11 pruned. [0m


Fold 0 AUC: 0.85632 in 2.34 minutes.


[32m[I 2022-01-11 02:54:17,844][0m Trial 12 pruned. [0m


Fold 0 AUC: 0.856195 in 2.02 minutes.


[32m[I 2022-01-11 02:56:56,224][0m Trial 13 pruned. [0m


Fold 0 AUC: 0.855644 in 2.63 minutes.


[32m[I 2022-01-11 03:00:20,947][0m Trial 14 pruned. [0m


Fold 0 AUC: 0.856353 in 3.4 minutes.


[32m[I 2022-01-11 03:02:38,784][0m Trial 15 pruned. [0m


Fold 0 AUC: 0.856306 in 2.29 minutes.


[32m[I 2022-01-11 03:04:27,789][0m Trial 16 pruned. [0m


Fold 0 AUC: 0.856326 in 1.81 minutes.


[32m[I 2022-01-11 03:07:03,514][0m Trial 17 pruned. [0m


Fold 0 AUC: 0.856354 in 2.58 minutes.


[32m[I 2022-01-11 03:08:43,662][0m Trial 18 pruned. [0m


Fold 0 AUC: 0.856127 in 1.66 minutes.


[32m[I 2022-01-11 03:11:48,102][0m Trial 19 pruned. [0m


Fold 0 AUC: 0.856181 in 3.06 minutes.


[32m[I 2022-01-11 03:14:07,738][0m Trial 20 pruned. [0m


Fold 0 AUC: 0.855925 in 2.32 minutes.


[32m[I 2022-01-11 03:17:18,833][0m Trial 21 pruned. [0m


Fold 0 AUC: 0.855988 in 3.17 minutes.


[32m[I 2022-01-11 03:21:06,826][0m Trial 22 pruned. [0m


Fold 0 AUC: 0.856291 in 3.79 minutes.


[32m[I 2022-01-11 03:24:04,319][0m Trial 23 pruned. [0m


Fold 0 AUC: 0.856335 in 2.95 minutes.


[32m[I 2022-01-11 03:26:34,262][0m Trial 24 pruned. [0m


Fold 0 AUC: 0.856204 in 2.49 minutes.


[32m[I 2022-01-11 03:28:34,637][0m Trial 25 pruned. [0m


Fold 0 AUC: 0.855671 in 2.0 minutes.


[32m[I 2022-01-11 03:32:12,003][0m Trial 26 pruned. [0m


Fold 0 AUC: 0.856364 in 3.61 minutes.


[32m[I 2022-01-11 03:36:05,363][0m Trial 27 pruned. [0m


Fold 0 AUC: 0.85625 in 3.88 minutes.


[32m[I 2022-01-11 03:38:31,856][0m Trial 28 pruned. [0m


Fold 0 AUC: 0.856237 in 2.43 minutes.


[32m[I 2022-01-11 03:41:10,518][0m Trial 29 pruned. [0m


Fold 0 AUC: 0.856228 in 2.63 minutes.


[32m[I 2022-01-11 03:43:44,564][0m Trial 30 pruned. [0m


Fold 0 AUC: 0.856045 in 2.56 minutes.


[32m[I 2022-01-11 03:46:22,006][0m Trial 31 pruned. [0m


Fold 0 AUC: 0.855955 in 2.61 minutes.


[32m[I 2022-01-11 03:49:12,709][0m Trial 32 pruned. [0m


Fold 0 AUC: 0.856037 in 2.83 minutes.


[32m[I 2022-01-11 03:52:14,283][0m Trial 33 pruned. [0m


Fold 0 AUC: 0.855963 in 3.02 minutes.


[32m[I 2022-01-11 03:55:06,644][0m Trial 34 pruned. [0m


Fold 0 AUC: 0.855949 in 2.86 minutes.


[32m[I 2022-01-11 03:56:58,053][0m Trial 35 pruned. [0m


Fold 0 AUC: 0.856063 in 1.85 minutes.


[32m[I 2022-01-11 03:59:16,931][0m Trial 36 pruned. [0m


Fold 0 AUC: 0.856297 in 2.3 minutes.


[32m[I 2022-01-11 04:02:03,554][0m Trial 37 pruned. [0m


Fold 0 AUC: 0.855973 in 2.77 minutes.


[32m[I 2022-01-11 04:05:18,394][0m Trial 38 pruned. [0m


Fold 0 AUC: 0.856284 in 3.24 minutes.


[32m[I 2022-01-11 04:09:06,776][0m Trial 39 pruned. [0m


Fold 0 AUC: 0.856356 in 3.8 minutes.


[32m[I 2022-01-11 04:11:31,808][0m Trial 40 pruned. [0m


Fold 0 AUC: 0.856182 in 2.41 minutes.


[32m[I 2022-01-11 04:14:31,698][0m Trial 41 pruned. [0m


Fold 0 AUC: 0.855974 in 2.99 minutes.


[32m[I 2022-01-11 04:17:16,971][0m Trial 42 pruned. [0m


Fold 0 AUC: 0.856039 in 2.74 minutes.


[32m[I 2022-01-11 04:20:07,991][0m Trial 43 pruned. [0m


Fold 0 AUC: 0.856221 in 2.84 minutes.


[32m[I 2022-01-11 04:23:41,504][0m Trial 44 pruned. [0m


Fold 0 AUC: 0.856012 in 3.55 minutes.


[32m[I 2022-01-11 04:25:51,697][0m Trial 45 pruned. [0m


Fold 0 AUC: 0.856282 in 2.16 minutes.


[32m[I 2022-01-11 04:29:06,124][0m Trial 46 pruned. [0m


Fold 0 AUC: 0.856359 in 3.23 minutes.


[32m[I 2022-01-11 04:31:07,244][0m Trial 47 pruned. [0m


Fold 0 AUC: 0.85616 in 2.01 minutes.


[32m[I 2022-01-11 04:33:00,792][0m Trial 48 pruned. [0m


Fold 0 AUC: 0.856257 in 1.88 minutes.


[32m[I 2022-01-11 04:37:01,797][0m Trial 49 pruned. [0m


Fold 0 AUC: 0.856343 in 4.01 minutes.


# Evaluation

## 1. Best Parameters

In [10]:
print("Best Parameters:", study.best_params)

Best Parameters: {'max_depth': 5, 'learning_rate': 0.013255723998037535, 'min_child_samples': 13618, 'random_strength': 19.80046051984836, 'leaf_estimation_iterations': 3, 'subsample': 0.394, 'reg_lambda': 2.437971169514396e-10}


## 2. Parameter Importances

In [11]:
plot_param_importances(study)

## 3. Parallel Coordinate Plot

In [12]:
plot_parallel_coordinate(study)

# Make Submission

In [13]:
%%time
del train, holdout; gc.collect()
train = dt.fread(r'../input/tabular-playground-series-oct-2021/train.csv').to_pandas()
test = dt.fread(r'../input/tabular-playground-series-oct-2021/test.csv').to_pandas()
submission = dt.fread(r'../input/tabular-playground-series-oct-2021/sample_submission.csv').to_pandas()

train = reduce_memory_usage(train)
test = reduce_memory_usage(test)
gc.collect()

Mem. usage decreased to 963.21 Mb (48.7% reduction)
Mem. usage decreased to 481.13 Mb (48.8% reduction)
CPU times: user 1min 4s, sys: 52 s, total: 1min 56s
Wall time: 1min 56s


0

In [14]:
# Similar to scoring function but trains on full data and predicts on test
def train_catboost(folds, model_params = {}):
    
    # Store the holdout predictions
    test_preds = np.zeros((test.shape[0],))
    print('')
    
    # Stratified k-fold cross-validation
    skf = StratifiedKFold(n_splits = folds, shuffle = True, random_state = RANDOM_SEED)
    for fold, (train_idx, valid_idx) in enumerate(skf.split(train, train['target'])):
        
        # Training and Validation Sets
        start = time.time()
        X_train, y_train = train[features].iloc[train_idx], train['target'].iloc[train_idx]
        X_valid, y_valid = train[features].iloc[valid_idx], train['target'].iloc[valid_idx]
        
        # Define Model
        model = CatBoostClassifier(**{**default_params, **model_params})
        gc.collect()
        
        model.fit(
            X_train, y_train,
            verbose = False,
            eval_set = [(X_valid, y_valid)],
            cat_features = categorical_features,
            use_best_model = True,
        )
        
        # validation and test predictions
        valid_preds = model.predict_proba(X_valid)[:, 1]
        test_preds += model.predict_proba(test[features])[:, 1] / folds
        
        # fold auc score
        fold_auc = roc_auc_score(y_valid, valid_preds)
        end = time.time()
        print(f'Fold {fold} AUC: {round(fold_auc, 6)} in {round((end-start) / 60, 2)} minutes.')
        
    return test_preds

In [15]:
# Make submission
submission['target'] = train_catboost(6, study.best_params)
submission.to_csv('catboost_submission.csv', index=False)


Fold 0 AUC: 0.856795 in 4.79 minutes.
Fold 1 AUC: 0.856837 in 5.32 minutes.
Fold 2 AUC: 0.856584 in 5.95 minutes.
Fold 3 AUC: 0.857154 in 5.47 minutes.
Fold 4 AUC: 0.856857 in 5.45 minutes.
Fold 5 AUC: 0.856331 in 4.62 minutes.




Hope you found this notebook useful, feel free to fork it and adapt it to your own uses.