# CatBoost

CatBoost, Categoric Boosting kelimelerinin kısaltımıdır. Kategorik değişkenler ile otomatik olarak mücadele edebilen, hızlı, başarılı bir diğer GBM türevidir.

## Cat Boost Uygulama

In [2]:
import warnings
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, VotingClassifier
from sklearn.model_selection import GridSearchCV, cross_validate, RandomizedSearchCV, validation_curve

warnings.simplefilter(action='ignore', category=Warning)

from xgboost import XGBClassifier
# !pip install lightgbm
from lightgbm import LGBMClassifier
# !pip install catboost
from catboost import CatBoostClassifier

In [3]:
df = pd.read_csv("datasets/diabetes.csv")

In [4]:
y = df["Outcome"]
X = df.drop(["Outcome"], axis=1)

### CatBoost

In [5]:
catboost_model = CatBoostClassifier(random_state=17, verbose=False)

In [12]:
# hiperparametre optimizasyonu yapmadan önceki hatalar
cv_results = cross_validate(catboost_model, X, y, cv=5, scoring=["accuracy", "f1", "roc_auc"])
print("Accuracy: ", cv_results["test_accuracy"].mean())
print("F1: ", cv_results["test_f1"].mean())
print("ROC-AUC: ", cv_results["test_roc_auc"].mean())

Accuracy:  0.7735251676428148
F1:  0.6502723851348231
ROC-AUC:  0.8378923829489867


In [13]:
catboost_params = {"iterations": [200, 500],
                  "learning_rate": [0.01, 0.1],
                  "depth": [3, 6]}

In [14]:
catboost_best_grid = GridSearchCV(catboost_model, catboost_params, cv=5, n_jobs=-1, verbose=True).fit(X, y)

Fitting 5 folds for each of 8 candidates, totalling 40 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  40 out of  40 | elapsed:   30.7s finished


In [16]:
catboost_best_grid.best_params_

{'depth': 3, 'iterations': 500, 'learning_rate': 0.01}

In [18]:
catboost_final = catboost_model.set_params(**catboost_best_grid.best_params_, random_state=17).fit(X, y)

In [19]:
# hiperparametre optimizasyonu yaptıktan sonraki hatalar
cv_results = cross_validate(catboost_final, X, y, cv=5, scoring=["accuracy", "f1", "roc_auc"])
print("Accuracy: ", cv_results["test_accuracy"].mean())
print("F1: ", cv_results["test_f1"].mean())
print("ROC-AUC: ", cv_results["test_roc_auc"].mean())

Accuracy:  0.7721755368814192
F1:  0.6322580676028952
ROC-AUC:  0.842001397624039
