# What is Catboost?

- Catboost is an algorithm for gradient boosting on decision trees. It is in open-source.


### Functions in Catboost library

- Models 
    - CatBoostClassifier
    - CatBoostRegressor
- Utils to improve
    - eval_metric
    - get_confusion_matrix
    - get_gpu_device_count() - Returns “0” if the installed or compiled package does not support training on GPU.
    - quantize
    - select_threshold
- Model analysis
    - Feature importance
    - Feature analysis charts
    - Feature interaction

## How to Install

In [None]:
!pip -q install catboost

#Install visualization tools:
#  Install the ipywidgets Python package (version 7.x or higher is required):
!pip -q install ipywidgets

#Turn on t he widgets extension:
!jupyter nbextension enable --py widgetsnbextension


By default in Kaggle- all the modules come pre-installed.

In [None]:
import catboost as cb
from catboost.datasets import titanic
import numpy as np
from catboost import CatBoostClassifier, Pool, cv
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

import pandas as pd
import seaborn as sns

sns.set(rc = {'figure.figsize': (14,8)})

# TPS June 2021 dataset

In [None]:
df = pd.read_csv('../input/tabular-playground-series-jun-2021/train.csv')

train_df, eval_df = train_test_split(df, train_size=0.75, random_state=42)

train_df.head()

In [None]:
x_train = train_df.drop( [ 'id' , 'target'] , axis = 1)
y_train = train_df[ 'target']

x_eval = eval_df.drop( [ 'id' , 'target'] , axis = 1)
y_eval = eval_df[ 'target']

In [None]:
# Model Creation

params = {
    'iterations': 50,
    #'learning_rate': 0.1,
    'random_seed': 42,
    'loss_function':'MultiClass',
    'logging_level': 'Silent',
    'use_best_model': True
}

model = CatBoostClassifier(**params
)



In [None]:
model.fit(
    x_train, y_train,
    eval_set=(x_eval, y_eval),
     logging_level='Verbose',  # you can uncomment this for text output
    plot=True);

# Make Predictions**

In [None]:
test_df = pd.read_csv('../input/tabular-playground-series-jun-2021/test.csv')
x_test = test_df.drop( [ 'id' ] , axis = 1)
x_test.head()

In [None]:
predictions_probs = model.predict_proba(x_test)
predictions_probs[:10]

In [None]:
sub = pd.DataFrame(predictions_probs)

sample_df = pd.read_csv('../input/tabular-playground-series-jun-2021/sample_submission.csv')
sample_df[sample_df.columns[1:]] = sub

sample_df.head()
sample_df.to_csv('submission.csv' , index = False)

In [None]:
# Predict 
predictions = model.predict(x_test)
print(predictions[:10])

# Feature Importances

In [None]:
validate_pool = Pool(x_eval , y_eval)

In [None]:
model = CatBoostClassifier(iterations=50, random_seed=42, logging_level='Silent').fit(validate_pool)
feature_importances = model.get_feature_importance(validate_pool)

sns.histplot(x = feature_importances , bins = len(feature_importances )  , legend = True , kde=True)

# Plotting Tree

In [None]:
model.plot_tree(tree_idx=0)

# Model saving and loading

In [None]:
# Model Saving
model = CatBoostClassifier(iterations=10, random_seed=42, logging_level='Silent').fit(train_pool)
model.save_model('catboost_model.dump')
model = CatBoostClassifier()
model.load_model('catboost_model.dump');

# Topics for future notebook
- Integrating Tensorboard