# Category Boosting (CatBoost)
- Kategorik değişkenler ile otomatik mücadele edebilen, hızlı, başarılı bir diğer GBM türevidir.
- Hızlı ve ölçeklenebilir GPU desteği sağlamaktadır.
- Daha başarılı tahminler yapıldığı iddaa etmektedir.
- Hızlı train ve hızlı tahmin işlemleri yapıldığı iddi edilmektedir.


**Gerekli Kütüphaneler** 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import scale, StandardScaler
from sklearn import model_selection
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn import  neighbors
from sklearn.svm import SVR

In [2]:
# Uyarı Mesajları ile Karşılaşmamak için bu kütüphaneyi kullanıyoruz.
from warnings import filterwarnings
filterwarnings("ignore")

# CatBoost - Model ve Tahmin İşlemleri 

In [3]:
df = pd.read_csv("./Hitters.csv")
# bu csv dosyasının içerisinde eksik gözlemleri(NA) çıkardık.
df = df.dropna()

# Veri seti içerisindeki kategorik değişkenleri dummy değişkenlere çeviriyoruz.
dms = pd.get_dummies(df[["League","Division","NewLeague"]])

# bağımlı değişken
y = df[["Salary"]]

# Veri Setinin içerisinden Bağımlı Değişkeni ve Kategorik değişkenlerin ilk hallerini dışarı bırakıyoruz.
X_ = df.drop(["Salary","League","Division","NewLeague"], axis = 1).astype("float64")

# dms ile X_ birleştirip(concat) bağımsız değişken oluşturduk.
X = pd.concat([X_, dms[["League_N","Division_W","NewLeague_N"]]], axis=1)

# train ve test setlerimizi oluşturuyoruz.
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size = 0.25,
                                                    random_state= 42) 

In [5]:
!pip install catboost



In [11]:
from catboost import CatBoostRegressor

**Model**

In [12]:
catboost_model = CatBoostRegressor().fit(X_train,y_train)

Learning rate set to 0.029229
0:	learn: 438.1974206	total: 4.61ms	remaining: 4.6s
1:	learn: 432.4168868	total: 6.97ms	remaining: 3.48s
2:	learn: 426.3836690	total: 8.68ms	remaining: 2.88s
3:	learn: 420.2261014	total: 9.95ms	remaining: 2.48s
4:	learn: 414.9976675	total: 11.2ms	remaining: 2.23s
5:	learn: 409.6125323	total: 13.2ms	remaining: 2.18s
6:	learn: 403.9277911	total: 15.6ms	remaining: 2.22s
7:	learn: 398.4395285	total: 18.8ms	remaining: 2.33s
8:	learn: 392.4517081	total: 20.9ms	remaining: 2.3s
9:	learn: 387.4871123	total: 22.5ms	remaining: 2.22s
10:	learn: 382.6230510	total: 23.8ms	remaining: 2.14s
11:	learn: 378.1012454	total: 25.3ms	remaining: 2.08s
12:	learn: 372.6002306	total: 27.6ms	remaining: 2.1s
13:	learn: 368.4682192	total: 30.4ms	remaining: 2.14s
14:	learn: 364.0565766	total: 31.6ms	remaining: 2.07s
15:	learn: 359.5683249	total: 33ms	remaining: 2.03s
16:	learn: 355.1782794	total: 34.6ms	remaining: 2s
17:	learn: 350.4689946	total: 35.9ms	remaining: 1.96s
18:	learn: 346.2

**Tahmin**

In [13]:
y_pred = catboost_model.predict(X_test)
np.sqrt(mean_squared_error(y_test,y_pred))

350.2683163098795

## Model Tuning

In [17]:
catb_params = {"iterations": [200,500,1000],
              "learning_rate":[0.01,0.1],
              "depth": [3,6,8]
              }

In [18]:
catb_model = CatBoostRegressor()

In [19]:
catb_cv_model = GridSearchCV(catb_model,
                             catb_params,
                             cv=5,
                             n_jobs=-1,
                             verbose=2).fit(X_train,y_train)

Fitting 5 folds for each of 18 candidates, totalling 90 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:   12.4s
[Parallel(n_jobs=-1)]: Done  90 out of  90 | elapsed:  1.8min finished


0:	learn: 442.4903140	total: 4.07ms	remaining: 810ms
1:	learn: 440.4621805	total: 7.1ms	remaining: 703ms
2:	learn: 438.5132091	total: 10.1ms	remaining: 662ms
3:	learn: 436.2180377	total: 12.9ms	remaining: 632ms
4:	learn: 434.0461579	total: 15.9ms	remaining: 620ms
5:	learn: 431.8437770	total: 18.7ms	remaining: 606ms
6:	learn: 430.1594587	total: 21.9ms	remaining: 603ms
7:	learn: 428.0941830	total: 25.1ms	remaining: 602ms
8:	learn: 426.0998774	total: 28.6ms	remaining: 607ms
9:	learn: 424.0249067	total: 29.5ms	remaining: 560ms
10:	learn: 422.1921868	total: 32.9ms	remaining: 566ms
11:	learn: 420.2506764	total: 36.4ms	remaining: 570ms
12:	learn: 418.3116383	total: 40.2ms	remaining: 578ms
13:	learn: 416.2966847	total: 43.5ms	remaining: 578ms
14:	learn: 414.5776175	total: 47.1ms	remaining: 581ms
15:	learn: 412.8009394	total: 50.2ms	remaining: 577ms
16:	learn: 410.9774146	total: 53.9ms	remaining: 580ms
17:	learn: 409.1047417	total: 57.1ms	remaining: 577ms
18:	learn: 407.6243957	total: 60.5ms	re

In [20]:
catb_cv_model.best_params_

{'depth': 8, 'iterations': 200, 'learning_rate': 0.01}

**Final Modeli**

In [27]:
catb_tuned = CatBoostRegressor(depth = 8,
                               iterations = 200,
                               learning_rate = 0.01).fit(X_train,y_train)

0:	learn: 442.4903140	total: 5.96ms	remaining: 1.19s
1:	learn: 440.4621805	total: 11.1ms	remaining: 1.1s
2:	learn: 438.5132091	total: 16ms	remaining: 1.05s
3:	learn: 436.2180377	total: 19.5ms	remaining: 957ms
4:	learn: 434.0461579	total: 22.9ms	remaining: 893ms
5:	learn: 431.8437770	total: 26.3ms	remaining: 852ms
6:	learn: 430.1594587	total: 29.7ms	remaining: 819ms
7:	learn: 428.0941830	total: 33.2ms	remaining: 798ms
8:	learn: 426.0998774	total: 36.8ms	remaining: 780ms
9:	learn: 424.0249067	total: 37.8ms	remaining: 719ms
10:	learn: 422.1921868	total: 41.4ms	remaining: 712ms
11:	learn: 420.2506764	total: 44.9ms	remaining: 703ms
12:	learn: 418.3116383	total: 49ms	remaining: 705ms
13:	learn: 416.2966847	total: 53.6ms	remaining: 713ms
14:	learn: 414.5776175	total: 59.9ms	remaining: 738ms
15:	learn: 412.8009394	total: 64.5ms	remaining: 742ms
16:	learn: 410.9774146	total: 70.5ms	remaining: 759ms
17:	learn: 409.1047417	total: 74.6ms	remaining: 755ms
18:	learn: 407.6243957	total: 78.8ms	remain

In [28]:
y_tuned_pred = catb_tuned.predict(X_test)
np.sqrt(mean_squared_error(y_test,y_tuned_pred))

369.6970696250705

In [29]:
catb_tuned = CatBoostRegressor(depth = 3,
                               iterations = 500,
                               learning_rate = 0.1).fit(X_train,y_train)


0:	learn: 425.7900818	total: 779us	remaining: 389ms
1:	learn: 404.8723520	total: 1.64ms	remaining: 409ms
2:	learn: 387.4057666	total: 2.45ms	remaining: 405ms
3:	learn: 372.2801584	total: 3.12ms	remaining: 387ms
4:	learn: 358.9204229	total: 3.76ms	remaining: 372ms
5:	learn: 347.0083933	total: 4.51ms	remaining: 371ms
6:	learn: 336.0130818	total: 5.02ms	remaining: 354ms
7:	learn: 324.3923300	total: 5.53ms	remaining: 340ms
8:	learn: 314.8690957	total: 6.17ms	remaining: 337ms
9:	learn: 308.5075563	total: 6.82ms	remaining: 334ms
10:	learn: 298.8587285	total: 7.26ms	remaining: 323ms
11:	learn: 294.7655438	total: 7.69ms	remaining: 313ms
12:	learn: 288.0697862	total: 8.24ms	remaining: 309ms
13:	learn: 282.6697154	total: 8.84ms	remaining: 307ms
14:	learn: 277.6121667	total: 9.38ms	remaining: 303ms
15:	learn: 273.4383979	total: 9.81ms	remaining: 297ms
16:	learn: 269.1556201	total: 10.3ms	remaining: 293ms
17:	learn: 264.8098704	total: 10.8ms	remaining: 289ms
18:	learn: 261.6700768	total: 11.2ms	re

336.40041748521486

In [30]:
y_tuned_pred = catb_tuned.predict(X_test)
np.sqrt(mean_squared_error(y_test,y_tuned_pred))

336.40041748521486