# **Catboost Regression**

* Developed by a Russian company Yandex in 2017
* An algorithm that has in built ability to deal woth categorical features with ease (which makes certain feature engineering tasks like one hot encoding futile)
* [Nice resource](https://towardsdatascience.com/catboost-regression-in-6-minutes-3487f3e5b329) that I have been using to learn CatBoost
* November 2021 update: [been using this too](https://towardsdatascience.com/how-do-you-use-categorical-features-directly-with-catboost-947b211c2923)

Noah Rubin - July 2021

In [None]:
# Python files
import data_prep
import helper_funcs

import joblib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from catboost import CatBoostRegressor, Pool
from sklearn.impute import KNNImputer

# Personal display settings
#===========================

# Suppress scientific notation
np.set_printoptions(suppress=True)

# Get dataset values showing only 2dp
pd.options.display.float_format = '{:.2f}'.format
pd.set_option('display.max_colwidth', None)

plt.style.use('seaborn-whitegrid') 
%matplotlib inline

%load_ext autoreload
%autoreload 2

In [None]:
train = pd.read_csv('../datasets/train_updated.csv')
test = pd.read_csv('../datasets/test_updated.csv')

# Split data
to_drop = ['HDI', 'Life_exp']

X_train = train.drop(to_drop, axis='columns')
X_test = test.drop(to_drop, axis='columns')

y_train = train['Life_exp']
y_test = test['Life_exp']

## Build Model

In [None]:
training_set = Pool(X_train, y_train, cat_features = ['Country', 'Status'])

test_set = Pool(X_test, cat_features = ['Country','Status'])

model = CatBoostRegressor(n_estimators=200,
                          loss_function='RMSE',
                          learning_rate=0.4,
                          depth=3, 
                          task_type='CPU',
                          random_state=1,
                          verbose=False).fit(training_set)

## Evaluation Metrics

In [None]:
r2, mse, rmse, mae = helper_funcs.display_regression_metrics(y_test, model.predict(X_test))

## Save Model

In [None]:
joblib.dump(model, './saved_models/CatBoost.joblib')