# CatBoost

## Contents

 - **Theory:**
   - [Intro to CatBoost](#intro-to-catboost)
   - [CatBoost Models at High-Level (High-Level API)](#catboost-apis)
   - [Why use "CatBoost" over other Python Gradient Boosting libraries?](#why-catboost)
   - [Important Attributes & Methods of "CatBoost" Object](#important-attributes-methods)
 - **Solving regression problems with CatBoost:**
   - [Boston Housing Dataset](#boston-housing-dataset)

---

<div id="intro-to-catboost"></div>

## Intro to CatBoost

 - **Catboost (short for Categorical Boosting)** is an open-source machine learning library that provides a fast and reliable implementation of gradient boosting on decision trees algorithm **(gradient boosted decision trees)**.
 - Gradient boosted trees is a type of gradient booting machines algorithm where all estimators of ensemble are decision trees.
 - It combines predictions of these weak tree learners to predict final output.

**NOTE:**  
It can be used **Catboost** for *classification*, *regression*, *ranking*, and other machine learning tasks.


---

<div id="catboost-apis"></div>

## CatBoost Models at High-Level (High-Level API)

CatBoost provides three different estimators to perform **classification** and **regression** tasks.

 - **CatBoost:**
   - It's a universal estimator which can handle both *classification* and *regression* datasets with settings.
 - **CatBoostRegressor:**
   - It is an estimator with *scikit-learn* like API designed to work with *regression* datasets.
 - **CatBoostClassifier:**
   - It is an estimator with *scikit-learn* like API designed to work with *classification* datasets.


---

<div id="why-catboost"></div>

## Why use "CatBoost" over other Python Gradient Boosting libraries?

 - Catboost provided support for handling **categorical** and **text features** of the data without the developer needing to handle them separately.
 - Catboost also provides support for **grid search** and **randomized search** which lets us try out a list of values for parameters to find the best combination of parameters that gives the best results.
 - Catboost algorithm gives quite a **good accuracy** with default parameter settings.
 - Apart from this, catboost also provides support for running the training process on **GPU**. It even lets us run the training process on multiple GPUs with simple configurations.
 - Finally, Catboost provides API in **Python** and **R**.

---

<div id="important-attributes-methods"></div>

## Important Attributes & Methods of "CatBoost" Object

We'll now list down a list of important attributes and methods of the CatBoost estimator. Please make a note that this is not a list of all possible attributes and methods. There are many more methods which we'll cover later as well.

 - **Attributes:**
   - **best_score_** - It returns the best score of the model.
   - **classes_** - It returns list of classes for classification problem.
   - **feature_names_** - It returns list of feature names.
   - **feature_importances_** - It returns the importance of each feature per algorithm.
   - **learning_rate_** - It returns the learning rate of the algorithm.
   - **random_seed_** - It returns a random seed from which initial model weights were assigned.
   - **tree_count_** - It returns the number of trees in the ensemble.
   - **n_features_in_** - It returns the number of features used to train the model.
   - **evals_result** - It returns dictionary of evaluation. If we have provided an evaluation set then evaluation results for it will be included.
 - **Methods:**
   - **get_best_score()** - It returns best score of the estimator.
   - **get_params()** - It returns parameters which were given as dictionary when creating CatBoost estimator and their values as dictionary.
   - **get_all_params()** - It returns list of all parameters of CatBoost estimator and their values as dictionary.
   - **get_cat_feature_indices()** - It returns list of indices which has categorical features.
   - **get_feature_importance()** - It returns feature importance of individual feature according to trained model.
   - **shrink(ntree_end, ntree_start=0)** - It accepts two arguments which are end tree and starts tree to shrink ensemble to include only trees that come in that index range discarding all other trees.
   - **set_params()** - It can be used to set parameters of the estimator. Please make a note that this method will only work before the training model.
   - **calc_leaf_indexes(data, ntree_start=0,ntree_end=0)** - It takes as input data and returns index of leaf in each tree which was used to make prediction for sample. The output of this function will be n_samples x n_trees. It'll return all trees' leaf index for a sample.
   - **get_leaf_values()** - It returns actual leaf values of the trees in ensemble.
   - **get_leaf_weights()** - It returns leaf weights for each leaf of the trees in the ensemble.

---

<div id="boston-housing-dataset"></div>

## Boston Housing Dataset
To start, let's import and check the Boston Housing Dataset:

In [29]:
from sklearn.datasets import load_boston

import pandas as pd
import warnings

warnings.simplefilter(action='ignore', category=FutureWarning)

boston = load_boston()

for line in boston.DESCR.split("\n")[5:29]:
    print(line)

boston_df = pd.DataFrame(data=boston.data, columns = boston.feature_names)
boston_df["Price"] = boston.target

boston_df.head()

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
        - B        1000(Bk - 0.63)^2 where Bk is the

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,Price
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


Now, let's create an instance of CatBoost:

In [31]:
from catboost import CatBoost

booster = CatBoost(params={'iterations':1000, 'verbose':100, 'loss_function':'RMSE'})
booster

<catboost.core.CatBoost at 0x7fcec071a890>

 - The **params={}** dictionary receives:
   - **iterations:** The maximum number of trees that can be built when solving machine learning problems.
   - **verbose:** The purpose of this parameter depends on the type of the given value
   - **loss_function:** The metric to use in training.

Ok, now, let's divide the dataset into training and testing:

In [32]:
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(
    boston.data, # X variables.
    boston.target, # Target variable.
    train_size=0.9,
    random_state=123
)

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape

((455, 13), (51, 13), (455,), (51,))

With the data divided, let's train the model with the training set:

In [33]:
booster.fit(X_train, Y_train)
booster.set_feature_names(boston.feature_names)

Learning rate set to 0.036153
0:	learn: 8.7584915	total: 1.14ms	remaining: 1.14s
100:	learn: 2.7001504	total: 52.4ms	remaining: 466ms
200:	learn: 1.9482117	total: 101ms	remaining: 403ms
300:	learn: 1.5785985	total: 148ms	remaining: 344ms
400:	learn: 1.3053959	total: 200ms	remaining: 299ms
500:	learn: 1.0964266	total: 247ms	remaining: 246ms
600:	learn: 0.9553074	total: 295ms	remaining: 196ms
700:	learn: 0.8308987	total: 342ms	remaining: 146ms
800:	learn: 0.7226049	total: 390ms	remaining: 96.9ms
900:	learn: 0.6310833	total: 440ms	remaining: 48.3ms
999:	learn: 0.5571620	total: 487ms	remaining: 0us


The CatBoost estimator provides the method **predict()** which accepts feature values and returns model predictions. We have below calculated predictions for **train** and **test** datasets:

In [34]:
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

test_preds[:5], train_preds[:5]

(array([24.57487228, 25.39175445, 46.08034269, 18.43230229, 30.56652189]),
 array([29.59084579, 20.03641629, 11.42639395, 20.00427448, 15.05563612]))

Now, let's use **eval_metric() method** available from the **utils** module of **catboost**. We have evaluated the R2 metric on both **train** and **test** sets below:

In [35]:
from catboost.utils import eval_metric

print("Test  R2 : %.2f"%eval_metric(Y_test, test_preds, "R2")[0])
print("Train R2 : %.2f"%eval_metric(Y_train, train_preds, "R2")[0])

Test  R2 : 0.86
Train R2 : 1.00


Now, let's check some important attributes:

In [36]:
print("Best Score                : ",booster.best_score_)
print("\nList of Target Classses : ",booster.classes_)
print("\nData Feature Names      : ",booster.feature_names_)
print("\nFeature Importance      : ", booster.feature_importances_)
print("\nLearning Rate           : ",booster.learning_rate_)
print("\nRandom Seed             : ",booster.random_seed_)
print("\nNumber of Trees         : ",booster.tree_count_)
print("\nNumber of Features      : ",booster.n_features_in_)

Best Score                :  {'learn': {'RMSE': 0.5571619854653218}}

List of Target Classses :  []

Data Feature Names      :  ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT']

Feature Importance      :  [ 4.82077561  0.71119418  2.12293615  0.49655873  6.20955767 33.65050709
  4.1250052   7.81298913  1.97410998  3.13793476  4.66776243  3.31914864
 26.95152044]

Learning Rate           :  0.036152999848127365

Random Seed             :  0

Number of Trees         :  1000

Number of Features      :  13


Now, let's use some important methods:

In [39]:
leaf_indices = booster.calc_leaf_indexes(X_train)

print("Leaf Indices Size : ",leaf_indices.shape)

leaf_indices[:2]

Leaf Indices Size :  (455, 1000)


array([[33, 16,  4, ..., 60, 55, 19],
       [36, 34, 20, ..., 60, 63, 19]], dtype=uint32)

In [40]:
print("Parameters Passed When Creating Model : ",booster.get_params())
print("\nAll Model Parameters                : ",booster.get_all_params())

Parameters Passed When Creating Model :  {'iterations': 1000, 'verbose': 100, 'loss_function': 'RMSE'}

All Model Parameters                :  {'nan_mode': 'Min', 'eval_metric': 'RMSE', 'iterations': 1000, 'sampling_frequency': 'PerTree', 'leaf_estimation_method': 'Newton', 'grow_policy': 'SymmetricTree', 'penalties_coefficient': 1, 'boosting_type': 'Plain', 'model_shrink_mode': 'Constant', 'feature_border_type': 'GreedyLogSum', 'bayesian_matrix_reg': 0.10000000149011612, 'eval_fraction': 0, 'force_unit_auto_pair_weights': False, 'l2_leaf_reg': 3, 'random_strength': 1, 'rsm': 1, 'boost_from_average': True, 'model_size_reg': 0.5, 'pool_metainfo_options': {'tags': {}}, 'subsample': 0.800000011920929, 'use_best_model': False, 'random_seed': 0, 'depth': 6, 'posterior_sampling': False, 'border_count': 254, 'classes_count': 0, 'auto_class_weights': 'None', 'sparse_features_conflict_fraction': 0, 'leaf_estimation_backtracking': 'AnyImprovement', 'best_model_min_trees': 1, 'model_shrink_rate':

In [41]:
print("\nBest Score                  : ",booster.get_best_score())
print("\nCategorical Feature Indices : ",booster.get_cat_feature_indices())
print("\nFeature Importances        : ",booster.get_feature_importance())


Best Score                  :  {'learn': {'RMSE': 0.5571619854653218}}

Categorical Feature Indices :  []

Feature Importances        :  [ 4.82077561  0.71119418  2.12293615  0.49655873  6.20955767 33.65050709
  4.1250052   7.81298913  1.97410998  3.13793476  4.66776243  3.31914864
 26.95152044]


In [42]:
print("\nLeaf Values Shape   : ", booster.get_leaf_values().shape)
print("\nLeaf Values         : ", booster.get_leaf_values()[:10])
print("\nLeaft Weights Shape : ",booster.get_leaf_weights().shape)
print("\nLeaft Weights       : ",booster.get_leaf_weights()[:10])


Leaf Values Shape   :  (63632,)

Leaf Values         :  [ 0.         -0.02473389  0.          0.          0.          0.
  0.          0.          0.         -0.09003159]

Leaft Weights Shape :  (63632,)

Leaft Weights       :  [0. 4. 0. 0. 0. 0. 0. 0. 0. 3.]


Below we have explained how we can use the shrink() method. We have reduced our original ensemble to 50 trees. We have then evaluated the R2 metric on the train and test sets. We can notice a visible change in the R2 score by decreasing the number of trees in the ensemble:

In [43]:
booster.shrink(ntree_end=50)

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("Test  R2 : %.2f"%eval_metric(Y_test, test_preds, "R2")[0])
print("Train R2 : %.2f"%eval_metric(Y_train, train_preds, "R2")[0])

Test  R2 : 0.65
Train R2 : 0.81


### How to Perform Hyperparameters Tunning with CatBoost?
The CatBoost estimator lets us perform grid search as well using the grid_search() method of the estimator. In order to do a grid search, we need to create an estimator without setting parameters that we want to try. We then call the grid_search() method on the estimator instance by giving it parameters dictionary and data to try different parameter combinations.

 - **Important Parameters of "grid_search()":**
   - **param_grid** - It accepts a dictionary of parameter names and a list of values to try for that parameters.
   - **X** - It accepts numpy array, pandas dataframe, 'catboost.Pool' data structure which has feature values.
   - **y** - It accepts target labels of data. If we are using the catboost.Pool data structure which has labels info then we don't need to pass this parameter value.
   - **cv** - It accepts integer or sklearn data splitter classes (KFold, StratifiedKFold, ShuffleSplit, StratifiedShuffleSplit). If we give an integer as input then that many folds of data will be created for training. The default value of the parameter is 3.
   - **calc_cv_statistics** - It accepts boolean value specifying whether to calculate cross-validation statistics. The default is True.
   - **refit** - It accepts boolean values specifying whether to train a model using the best parameter setting found using cross-validation. The default is True.
   - **stratified** - It performs stratified partition of the dataset so that class proportion is maintained in sets. The default is True.

The method returns a dictionary with two keys:

 - Best parameter settings.
 - Cross-validation results.

Below we are explaining how we can perform a grid search with an example. We are trying different values of parameters **iterations**, **learning_rate** and **booststrap_type**. We are using training data created from the Boston dataset earlier. We have then evaluated the performance of the estimator with the best setting by calculating the **R2** score on the train and test dataset:

In [44]:
booster = CatBoost()

params = {
            'iterations':[10,50],
            'learning_rate':[0.01, 0.1],
            'bootstrap_type':['Bayesian', 'Bernoulli', 'No']
}

search_results = booster.grid_search(params, X_train, Y_train, cv=5)

print("\nBest Params : ", search_results['params'])

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest  R2 : %.2f"%eval_metric(Y_test, test_preds, "R2")[0])
print("Train R2 : %.2f"%eval_metric(Y_train, train_preds, "R2")[0])

0:	learn: 23.9335242	test: 23.1547415	best: 23.1547415 (0)	total: 1.81ms	remaining: 16.3ms
1:	learn: 23.7431491	test: 22.9756094	best: 22.9756094 (1)	total: 2.63ms	remaining: 10.5ms
2:	learn: 23.5452925	test: 22.7854415	best: 22.7854415 (2)	total: 3.38ms	remaining: 7.88ms
3:	learn: 23.3594349	test: 22.6119514	best: 22.6119514 (3)	total: 3.98ms	remaining: 5.97ms
4:	learn: 23.1760962	test: 22.4354434	best: 22.4354434 (4)	total: 4.64ms	remaining: 4.64ms
5:	learn: 22.9814178	test: 22.2468305	best: 22.2468305 (5)	total: 5.43ms	remaining: 3.62ms
6:	learn: 22.7963713	test: 22.0643806	best: 22.0643806 (6)	total: 6.06ms	remaining: 2.6ms
7:	learn: 22.6155219	test: 21.8926442	best: 21.8926442 (7)	total: 6.77ms	remaining: 1.69ms
8:	learn: 22.4335623	test: 21.7204730	best: 21.7204730 (8)	total: 7.47ms	remaining: 830us
9:	learn: 22.2618190	test: 21.5523948	best: 21.5523948 (9)	total: 8.04ms	remaining: 0us

bestTest = 21.55239476
bestIteration = 9

0:	loss: 21.5523948	best: 21.5523948 (0)	total: 10.9

18:	learn: 6.3230410	test: 6.4966474	best: 6.4966474 (18)	total: 10.6ms	remaining: 17.2ms
19:	learn: 5.9889306	test: 6.1562438	best: 6.1562438 (19)	total: 11.5ms	remaining: 17.3ms
20:	learn: 5.7092726	test: 5.9163349	best: 5.9163349 (20)	total: 15.9ms	remaining: 21.9ms
21:	learn: 5.4559110	test: 5.6761658	best: 5.6761658 (21)	total: 16.4ms	remaining: 20.9ms
22:	learn: 5.1964849	test: 5.4634404	best: 5.4634404 (22)	total: 17.1ms	remaining: 20.1ms
23:	learn: 4.9654881	test: 5.2557654	best: 5.2557654 (23)	total: 17.8ms	remaining: 19.2ms
24:	learn: 4.7525487	test: 5.0588605	best: 5.0588605 (24)	total: 18.6ms	remaining: 18.6ms
25:	learn: 4.5764585	test: 4.8915886	best: 4.8915886 (25)	total: 19.4ms	remaining: 17.9ms
26:	learn: 4.4301882	test: 4.7915183	best: 4.7915183 (26)	total: 20.1ms	remaining: 17.1ms
27:	learn: 4.2825766	test: 4.6622412	best: 4.6622412 (27)	total: 20.8ms	remaining: 16.4ms
28:	learn: 4.1528333	test: 4.5405439	best: 4.5405439 (28)	total: 21.5ms	remaining: 15.6ms
29:	learn:


Best Params :  {'iterations': 50, 'learning_rate': 0.1, 'bootstrap_type': 'No'}

Test  R2 : 0.81
Train R2 : 0.93


In [46]:
cv_results = pd.DataFrame(search_results["cv_results"])

cv_results.head(10)

Unnamed: 0,iterations,test-RMSE-mean,test-RMSE-std,train-RMSE-mean,train-RMSE-std
0,0,22.02577,0.792004,21.976345,0.180601
1,1,20.316437,0.810409,20.23199,0.168389
2,2,18.925032,0.837807,18.761584,0.165503
3,3,17.52751,0.882776,17.297391,0.104576
4,4,16.218119,0.88079,15.962735,0.09326
5,5,15.022021,0.813361,14.708146,0.079067
6,6,13.939678,0.767475,13.591252,0.088936
7,7,13.002558,0.787296,12.618372,0.087095
8,8,12.129643,0.779681,11.717757,0.08189
9,9,11.332411,0.767541,10.880621,0.087322


### "Pool" Data Structure to Represent Datasets
> We are explaining how we can use the catboost internal data structure named **Pool** for maintaining data.

 - We have even explained how we can create a CatBoost estimator using the **train()** method of the catboost library.
 - Please make a note that **catboost.train()** has almost all parameters same as that of **CatBoost.fit()** method.

> The **Pool** is an internal data structure of catboost that wraps our data and target values. It can make training faster.

 - **Important Parameters Of "Pool()" Constructor:**
   - **data** - It accepts numpy array, pandas dataframe, or list which has features values.
   - **label** - It accepts numpy array, pandas dataframe, or list which has target labels.
   - **cat_features** - It accepts a list of integer specifying indices of data that has categorical features.
   - **text_features** -It accepts a list of integer specifying indices of data that has text features.

Below we have explained how we can use the Pool data structure with the **train()** method to generate the CatBoost estimator:

In [51]:
from catboost import Pool
import catboost

X_train, X_test, Y_train, Y_test = train_test_split(
    boston.data,
    boston.target,
    train_size=0.9,
    random_state=123
)

train_data = Pool(X_train, Y_train)
test_data = Pool(X_test, Y_test)

booster = catboost.train(
    pool=train_data,
    params={
        'iterations':100,
        'verbose':10,
        'loss_function':'RMSE',
    }
)

booster.set_feature_names(boston.feature_names)

test_preds = booster.predict(test_data)
train_preds = booster.predict(train_data)

print("\nTest  R2 : %.2f"%eval_metric(Y_test, test_preds, "R2")[0])
print("Train R2 : %.2f"%eval_metric(Y_train, train_preds, "R2")[0])

Learning rate set to 0.235042
0:	learn: 7.6937837	total: 605us	remaining: 59.9ms
10:	learn: 3.2770506	total: 5.8ms	remaining: 47ms
20:	learn: 2.3723670	total: 10.7ms	remaining: 40.2ms
30:	learn: 2.0514184	total: 15.6ms	remaining: 34.8ms
40:	learn: 1.8302414	total: 20.5ms	remaining: 29.5ms
50:	learn: 1.5577899	total: 25.8ms	remaining: 24.8ms
60:	learn: 1.3697292	total: 31ms	remaining: 19.8ms
70:	learn: 1.2388124	total: 36ms	remaining: 14.7ms
80:	learn: 1.1207019	total: 41.7ms	remaining: 9.77ms
90:	learn: 1.0237971	total: 46.8ms	remaining: 4.63ms
99:	learn: 0.9493643	total: 51.4ms	remaining: 0us

Test  R2 : 0.82
Train R2 : 0.99


Below we have given another example where we have explained how we can give an evaluation set that will be evaluated during training:

In [52]:
from catboost import Pool
from catboost.utils import eval_metric

X_train, X_test, Y_train, Y_test = train_test_split(
    boston.data,
    boston.target,
    train_size=0.9,
    random_state=123
)

train_data = Pool(X_train, Y_train)
test_data = Pool(X_test, Y_test)

booster = CatBoost({'verbose':10, 'iterations':100})

booster.fit(train_data, eval_set=(test_data))
booster.set_feature_names(boston.feature_names)

test_preds = booster.predict(test_data)
train_preds = booster.predict(train_data)

print("\nTest  R2 : %.2f"%eval_metric(Y_test, test_preds, "R2")[0])
print("Train R2 : %.2f"%eval_metric(Y_train, train_preds, "R2")[0])

Learning rate set to 0.183317
0:	learn: 7.9639861	test: 10.0965243	best: 10.0965243 (0)	total: 587us	remaining: 58.2ms
10:	learn: 3.6206010	test: 6.0935815	best: 6.0935815 (10)	total: 5.75ms	remaining: 46.5ms
20:	learn: 2.6457843	test: 5.1887011	best: 5.1887011 (20)	total: 10.5ms	remaining: 39.6ms
30:	learn: 2.2583514	test: 4.7137170	best: 4.7137170 (30)	total: 15.5ms	remaining: 34.5ms
40:	learn: 1.9887718	test: 4.3556914	best: 4.3556914 (40)	total: 20.5ms	remaining: 29.6ms
50:	learn: 1.7782435	test: 4.1495225	best: 4.1495225 (50)	total: 25.6ms	remaining: 24.6ms
60:	learn: 1.6338388	test: 4.0720440	best: 4.0720440 (60)	total: 30.8ms	remaining: 19.7ms
70:	learn: 1.4952212	test: 4.0379851	best: 4.0379851 (70)	total: 35.9ms	remaining: 14.7ms
80:	learn: 1.3448055	test: 3.9966906	best: 3.9966906 (80)	total: 41.2ms	remaining: 9.67ms
90:	learn: 1.2487035	test: 3.9619051	best: 3.9594628 (86)	total: 46.8ms	remaining: 4.63ms
99:	learn: 1.1472042	test: 3.9506018	best: 3.9391287 (95)	total: 51.2ms

---

**REFERENCES:**  
[CatBoost - An In-Depth Guide [Python API]](https://coderzcolumn.com/tutorials/machine-learning/catboost-an-in-depth-guide-python)  
[How to use catboost in python: Hyperparameter tuning of catboost](https://techfor-today.com/catboost-in-python-hyperparameter-tuning-of-catboost/)  
[Introducing text information signals into CatBoost](https://github.com/catboost/tutorials/blob/master/text_features/text_features_in_catboost.ipynb)  
[Cross Validation tutorial](https://github.com/catboost/tutorials/blob/master/cross_validation/cv_tutorial.ipynb)  
[Solving classification problems with CatBoost](https://github.com/catboost/tutorials/blob/master/classification/classification_tutorial.ipynb)  
[Simple classification example with missing feature handling and parameter tuning](https://github.com/catboost/tutorials/blob/master/classification/classification_with_parameter_tuning_tutorial.ipynb)  
[CatBoost tutorial: Categorical features parameters](https://github.com/catboost/tutorials/blob/master/categorical_features/categorical_features_parameters.ipynb)  
[Preparing data](https://github.com/catboost/tutorials/blob/master/hyperparameters_tuning/hyperparameters_tuning.ipynb)  
[Hyperparameters tuning](https://github.com/catboost/tutorials/blob/master/hyperparameters_tuning/hyperparameters_tuning_using_optuna_and_hyperopt.ipynb)  
[AUC in CatBoost](https://github.com/catboost/tutorials/blob/master/metrics/AUC_tutorial.ipynb)  
[Using catboost.metrics module](https://github.com/catboost/tutorials/blob/master/metrics/Metrics_tutorial.ipynb)  
[CatBoost learning to rank on Microsoft dataset](https://github.com/catboost/tutorials/blob/master/ranking/ranking_tutorial.ipynb)  
[Tutorial: Poisson regression with CatBoost](https://github.com/catboost/tutorials/blob/master/regression/poisson.ipynb)  
[Survival analysis with Catboost](https://github.com/catboost/tutorials/blob/master/regression/survival.ipynb)  
[Tweedie Regression](https://github.com/catboost/tutorials/blob/master/regression/tweedie.ipynb)  

---

**R**odrigo **L**eite da **S**ilva - **drigols**

---