![Neptune + CatBoost](https://neptune.ai/wp-content/uploads/2023/09/catboost.svg)

# Neptune + CatBoost

<a target="_blank" href="https://colab.research.google.com/github/neptune-ai/examples/blob/main/integrations-and-supported-tools/catboost/notebooks/Neptune_CatBoost.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/>
</a><a target="_blank" href="https://github.com/neptune-ai/examples/blob/main/integrations-and-supported-tools/catboost/notebooks/Neptune_CatBoost.ipynb">
  <img alt="Open in GitHub" src="https://img.shields.io/badge/Open_in_GitHub-blue?logo=github&labelColor=black">
</a><a target="_blank" href="https://app.neptune.ai/o/common/org/catboost-support/runs/details?viewId=standard-view&detailsTab=dashboard&dashboardId=Overview-99f571df-0fec-4447-9ffe-5a4c668577cd&shortId=CAT-2"> 
  <img alt="Explore in Neptune" src="https://neptune.ai/wp-content/uploads/2024/01/neptune-badge.svg">
</a><a target="_blank" href="https://docs.neptune.ai/integrations/catboost/">
  <img alt="View tutorial in docs" src="https://neptune.ai/wp-content/uploads/2024/01/docs-badge-2.svg">
</a>

## Introduction

[CatBoost](https://catboost.ai/) is a high-performance open source library for gradient boosting on decision trees.  
This guide will show you how to:

* Upload experiment datasets
* Upload CatBoost model parameters and attributes, and
* Upload training results to Neptune

## Before you start

This notebook example lets you try out Neptune as an anonymous user, with zero setup.

If you want to see the example logged to your own workspace instead:

  1. Create a Neptune account. [Register &rarr;](https://neptune.ai/register)
  1. Create a Neptune project that you will use for tracking metadata. For instructions, see [Creating a project](https://docs.neptune.ai/setup/creating_project) in the Neptune docs.

## Install Neptune and dependencies

In [None]:
%pip install -U catboost neptune ipython ipywidgets scikit-learn
%pip install --user -U scikit-learn

## (Neptune) Start a run

To create a new run for tracking the metadata, you tell Neptune who you are (`api_token`) and where to send the data (`project`).

You can use the default code cell below to create an anonymous run in the public project [common/catboost-support](https://app.neptune.ai/o/common/org/catboost-support). **Note**: Public projects are cleaned regularly, so anonymous runs are only stored temporarily.

### Log to your own project instead

Replace the code below with the following:

```python
import neptune
from getpass import getpass

run = neptune.init_run(
    project="workspace-name/project-name",  # replace with your own (see instructions below)
    api_token=getpass("Enter your Neptune API token: "),
    tags=["catboost", "classifier", "notebook"],  # (optional) replace with your own
)
```

To find your API token and full project name:

1. [Log in to Neptune](https://app.neptune.ai/).
1. In the bottom-left corner, expand your user menu and select **Get your API token**.
1. The workspace name is displayed in the top-left corner of the app.

    To copy the project path, in the top-right corner, open the settings menu and select **Properties**.

For more help, see [Setting Neptune credentials](https://docs.neptune.ai/setup/setting_credentials) in the Neptune docs.

In [None]:
import neptune

run = neptune.init_run(
    api_token=neptune.ANONYMOUS_API_TOKEN,  # Replace with your own
    project="common/catboost-support",  # Replace with your own
    capture_hardware_metrics=True,  # This is turned off by default in Notebooks
    tags=["catboost", "classifier", "notebook"],  # (optional) use your own
)

**To open the run in the Neptune web app, click the link that appeared in the cell output.**

We'll use the `run` object we just created to log metadata. You'll see the metadata appear in the app.

## Load data

In [None]:
from catboost.datasets import titanic

titanic_train, titanic_test = titanic()

titanic_train.head(3)

### (Neptune) Upload raw data
You can upload a pandas dataframe directly to Neptune as an HTML file. [Learn more in the docs &rarr;](https://docs.neptune.ai/tools/pandas/)

In [None]:
from neptune.types import File

run["data/raw/train"].upload(File.as_html(titanic_train))
run["data/raw/test"].upload(File.as_html(titanic_test))

### Preprocess data

In [None]:
titanic_train.isna().sum()

In [None]:
titanic_train["Age"].fillna(titanic_train["Age"].median(), inplace=True)
titanic_train["Cabin"].fillna("", inplace=True)
titanic_train["Embarked"].fillna(titanic_train["Embarked"].mode()[0], inplace=True)
titanic_train.isna().sum()

In [None]:
titanic_test.isna().sum()

In [None]:
titanic_test["Age"].fillna(titanic_test["Age"].median(), inplace=True)
titanic_test["Fare"].fillna(titanic_test["Fare"].median(), inplace=True)
titanic_test["Cabin"].fillna("", inplace=True)
titanic_test.isna().sum()

In [None]:
label = ["Survived"]
cat_features = ["Sex", "Embarked"]
text_features = ["Name", "Ticket", "Cabin"]

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_eval, y_train, y_eval = train_test_split(
    titanic_train.drop(columns=label + ["PassengerId"]),
    titanic_train[label],
    test_size=0.25,
    shuffle=True,
)

### (Neptune) Upload processed data

In [None]:
run["data/processed/train"].upload(File.as_html(titanic_train))
run["data/processed/test"].upload(File.as_html(titanic_test))

## Train a CatBoost model

In [None]:
from catboost import CatBoostClassifier

model = CatBoostClassifier()

plot_file = "training_plot.html"

model.fit(
    X=X_train,
    y=y_train,
    eval_set=(X_eval, y_eval),
    cat_features=cat_features,
    text_features=text_features,
    plot=True,
    plot_file=plot_file,
    use_best_model=True,
)

### (Neptune) Upload training results

#### Upload training plot
Upload the training plot as an interactive plot

In [None]:
run["training/plot"].upload(plot_file)

#### Upload training metrics

In [None]:
from neptune.utils import stringify_unsupported

run["training/best_score"] = stringify_unsupported(model.get_best_score())
run["training/best_iteration"] = stringify_unsupported(model.get_best_iteration())

## Make predictions

In [None]:
titanic_test["prediction"] = model.predict(
    data=titanic_test.drop(columns=["PassengerId"]),
    prediction_type="Class",
)
titanic_test

### (Neptune) Upload predictions
You can upload a CSV file to Neptune and view it as an interactive table.

In [None]:
titanic_test.to_csv("results.csv", index=False)

run["data/results"].upload("results.csv")

## (Neptune) Upload model metadata to Neptune

### Upload model binary

In [None]:
model.save_model("model.cbm")

run["model/binary"].upload("model.cbm")

### Upload model attributes

In [None]:
run["model/attributes/tree_count"] = model.tree_count_
run["model/attributes/feature_importances"] = dict(
    zip(model.feature_names_, model.get_feature_importance())
)
run["model/attributes/probability_threshold"] = model.get_probability_threshold()

### Upload model parameters

In [None]:
run["model/parameters"] = stringify_unsupported(model.get_all_params())

## Stop logging

Once you are done logging, stop tracking the run.

In [None]:
run.stop()

## Analyze run in the Neptune app
Follow the run link in the above cell output and explore the logged metadata.  
You can also explore this [example run](https://app.neptune.ai/o/common/org/catboost-support/runs/details?viewId=standard-view&detailsTab=dashboard&dashboardId=Overview-99f571df-0fec-4447-9ffe-5a4c668577cd&shortId=CAT-2).