# CatBoost Algorithm

 CatBoost is a gradient boosting framework which attempts to solve for Categorical features using a permutation driven alternative compared to the classical algorithm.

## Advantages

1. Little preprcessing required.
2. Faster in comparision with other algorithm (reason being use symmetric trees).
3. Faster GPU implementation.


Installation of CatBoost

In [14]:
!pip install catboost  #using pip to install catboost



Importing the necessary Libraries

In [15]:
import pandas as pd   #For data manipulation and analysis
from catboost import CatBoostClassifier 
from sklearn.model_selection import train_test_split # For spliting into test and train set
from sklearn.metrics import accuracy_score, f1_score  # For checking of the accuracy of model

Loading Data

In [16]:
data=pd.read_csv("data.csv")   

In [17]:
data.head()

Unnamed: 0,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Loan_Status
0,5849.0,0.0,0.0,360.0,Yes,1
1,4583.0,1508.0,128.0,360.0,Yes,0
2,3000.0,0.0,66.0,360.0,Yes,1
3,2583.0,2358.0,120.0,360.0,Yes,1
4,6000.0,0.0,141.0,360.0,Yes,1


Seprating the X and Y from the data 

In [18]:
X=data.loc[:,data.columns!="Loan_Status"]
Y=data.loc[:,data.columns=="Loan_Status"]
X

Unnamed: 0,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History
0,5849.0,0.0,0.0,360.0,Yes
1,4583.0,1508.0,128.0,360.0,Yes
2,3000.0,0.0,66.0,360.0,Yes
3,2583.0,2358.0,120.0,360.0,Yes
4,6000.0,0.0,141.0,360.0,Yes
...,...,...,...,...,...
609,2900.0,0.0,71.0,360.0,Yes
610,4106.0,0.0,40.0,180.0,Yes
611,8072.0,240.0,253.0,360.0,Yes
612,7583.0,0.0,187.0,360.0,Yes


Test and train spilt

In [19]:
X_train, X_test, y_train, y_test = train_test_split(X, Y,test_size=0.2,random_state=0)

In [20]:
X_train.head()

Unnamed: 0,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History
90,2958.0,2900.0,131.0,360.0,Yes
533,11250.0,0.0,196.0,360.0,No
452,3948.0,1733.0,149.0,360.0,No
355,3813.0,0.0,116.0,180.0,Yes
266,4708.0,1387.0,150.0,360.0,Yes


In [21]:
y_train.head()

Unnamed: 0,Loan_Status
90,1
533,0
452,0
355,1
266,1


Creating a CatBoostClassifier

In [22]:
model = CatBoostClassifier(task_type='GPU', iterations=150, 
                              random_state = 2000, 
                              eval_metric="F1")   

Model Fitting

In [23]:
model.fit(X_train, y_train, cat_features= ["Credit_History"], plot=True,eval_set=(X_test, y_test))

MetricVisualizer(layout=Layout(align_self='stretch', height='500px'))

Learning rate set to 0.165913
0:	learn: 0.8472222	test: 0.8629442	best: 0.8629442 (0)	total: 9.08ms	remaining: 1.35s
1:	learn: 0.8377997	test: 0.8617021	best: 0.8629442 (0)	total: 13.7ms	remaining: 1.01s
2:	learn: 0.8377997	test: 0.8617021	best: 0.8629442 (0)	total: 18.7ms	remaining: 915ms
3:	learn: 0.8377997	test: 0.8617021	best: 0.8629442 (0)	total: 25.8ms	remaining: 943ms
4:	learn: 0.8377997	test: 0.8617021	best: 0.8629442 (0)	total: 32.2ms	remaining: 934ms
5:	learn: 0.8377997	test: 0.8617021	best: 0.8629442 (0)	total: 39.3ms	remaining: 943ms
6:	learn: 0.8377997	test: 0.8617021	best: 0.8629442 (0)	total: 44.9ms	remaining: 917ms
7:	learn: 0.8377997	test: 0.8617021	best: 0.8629442 (0)	total: 51.7ms	remaining: 918ms
8:	learn: 0.8377997	test: 0.8617021	best: 0.8629442 (0)	total: 56.3ms	remaining: 882ms
9:	learn: 0.8377997	test: 0.8617021	best: 0.8629442 (0)	total: 63.5ms	remaining: 888ms
10:	learn: 0.8377997	test: 0.8617021	best: 0.8629442 (0)	total: 70.5ms	remaining: 891ms
11:	learn: 0

<catboost.core.CatBoostClassifier at 0x7fb36f5db690>

Predicting the values for X_test

In [32]:
y_pred=model.predict(X_test)


Finding the accuracy of model  using f1_score and accuracy_score

In [33]:
f1_score(y_test, y_pred)


0.8736842105263158

In [34]:
accuracy_score(y_test,y_pred)

0.8048780487804879