![Neptune + scikit-learn](https://neptune.ai/wp-content/uploads/2023/09/sklearn-1.svg)

# Neptune + scikit-learn

<a target="_blank" href="https://colab.research.google.com/github/neptune-ai/examples/blob/main/integrations-and-supported-tools/sklearn/notebooks/Neptune_Scikit_learn.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/>
</a><a target="_blank" href="https://github.com/neptune-ai/examples/blob/main/integrations-and-supported-tools/sklearn/notebooks/Neptune_Scikit_learn.ipynb">
  <img alt="Open in GitHub" src="https://img.shields.io/badge/Open_in_GitHub-blue?logo=github&labelColor=black">
</a><a target="_blank" href="https://app.neptune.ai/o/common/org/sklearn-integration/runs/table?viewId=9b015358-59bd-4c02-a020-3426f5a8f09e"> 
  <img alt="Explore in Neptune" src="https://neptune.ai/wp-content/uploads/2024/01/neptune-badge.svg">
</a><a target="_blank" href="https://docs.neptune.ai/integrations/sklearn/">
  <img alt="View tutorial in docs" src="https://neptune.ai/wp-content/uploads/2024/01/docs-badge-2.svg">
</a>

## Introduction

Neptune helps you keep track of your machine learning runs and if you are using scikit-learn you can add tracking very easily.

This quickstart will show you how to (using just single function) log scikit-learn:

* regression summary,
* classification summary,
* kmeans clustering summary.

## Before you start

This notebook example lets you try out Neptune as an anonymous user, with zero setup.

If you want to see the example logged to your own workspace instead:

  1. Create a Neptune account. [Register &rarr;](https://neptune.ai/register)
  1. Create a Neptune project that you will use for tracking metadata. For instructions, see [Creating a project](https://docs.neptune.ai/setup/creating_project) in the Neptune docs.

## Install Neptune and dependencies

In [None]:
%pip install -U neptune[sklearn] scikit-learn matplotlib_inline

In [None]:
# To fix the random RuntimeError: main thread is not in main loop error in Windows running python 3.8
import matplotlib.pyplot as plt

plt.switch_backend("agg")

## Scikit-learn regression

### Create and fit random forest regressor

Define regressor parameters, that will be later passed to Neptune.

In [None]:
parameters = {"n_estimators": 100, "max_depth": 5, "min_samples_split": 5}

Create and fit regressor.

In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

rfr = RandomForestRegressor(**parameters)

X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=28743)

rfr.fit(X_train, y_train)

### Initialize Neptune

To create a new run for tracking the metadata, you tell Neptune who you are (`api_token`) and where to send the data (`project`).

You can use the default code cell below to create an anonymous run in a public project. **Note**: Public projects are cleaned regularly, so anonymous runs are only stored temporarily.

### Log to your own project instead

Replace the code below with the following:

```python
import neptune
from getpass import getpass

run = neptune.init_run(
    project="workspace-name/project-name",  # replace with your own (see instructions below)
    api_token=getpass("Enter your Neptune API token: "),
    name="regression-example",
    tags=["RandomForestRegressor", "regression"],
)
```

To find your API token and full project name:

1. [Log in to Neptune](https://app.neptune.ai/).
1. In the bottom-left corner, expand your user menu and select **Get your API token**.
1. The workspace name is displayed in the top-left corner of the app. To copy the project path, in the top-right corner, open the settings menu and select **Properties**.

For more help, see [Setting Neptune credentials](https://docs.neptune.ai/setup/setting_credentials) in the Neptune docs.

In [None]:
import neptune

run = neptune.init_run(
    project="common/sklearn-integration",
    api_token=neptune.ANONYMOUS_API_TOKEN,
    name="regression-example",
    tags=["RandomForestRegressor", "regression"],
)

**To open the run in the Neptune web app, click the link that appeared in the cell output.**

We'll use the `run` object we just created to log metadata. You'll see the metadata appear in the app.

### Log regressor summary

In [None]:
import neptune.integrations.sklearn as npt_utils

run["rfr_summary"] = npt_utils.create_regressor_summary(rfr, X_train, X_test, y_train, y_test)

You just logged information about the regressor, including:

* [logged regressor parameters](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-92/all?path=rfr_summary%2Fall_params),
* [logged pickled model](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-92/all?path=rfr_summary%2F&attribute=pickled_model),
* [logged test predictions](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-92/all?path=rfr_summary%2Ftest&attribute=preds),
* [logged test scores](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-92/all?path=rfr_summary%2Ftest%2Fscores),
* [logged regressor visualizations](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-92/all?path=rfr_summary%2Fdiagnostics_charts&attribute=feature_importance),
* [logged metadata](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-92/all?path=sys),
* [logged code and git metadata](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-92/source-code?file=main.py).

### Stop logging

<font color=red>**Warning:**</font><br>
Once you are done logging, you should stop tracking the run using the `stop()` method.
This is needed only while logging from a notebook environment. While logging through a script, Neptune automatically stops tracking once the script has completed execution.

In [None]:
run.stop()

### Explore results

You just learned how to log scikit-learn regression summary to Neptune using single function.

Click on the link that was outputted to the console or [go here](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-92/all) to explore a run similar to yours. In particular check:

* [logged regressor parameters](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-92/all?path=rfr_summary%2Fall_params),
* [logged pickled model](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-92/all?path=rfr_summary%2F&attribute=pickled_model),
* [logged test predictions](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-92/all?path=rfr_summary%2Ftest&attribute=preds),
* [logged test scores](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-92/all?path=rfr_summary%2Ftest%2Fscores),
* [logged regressor visualizations](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-92/all?path=rfr_summary%2Fdiagnostics_charts&attribute=feature_importance),
* [logged metadata](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-92/all?path=sys),
* [logged code and git metadata](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-92/source-code?file=main.py).

## Scikit-learn classification

### Create and fit gradient boosting classifier

Define classifier parameters, that will be later passed to Neptune.

In [None]:
parameters = {
    "n_estimators": 80,
    "learning_rate": 0.1,
    "min_samples_split": 5,
    "min_samples_leaf": 5,
}

Create and fit regressor.

In [None]:
from sklearn.datasets import load_digits
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split

gbc = GradientBoostingClassifier(**parameters)

X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=28743)

gbc.fit(X_train, y_train)

### Initialize Neptune

Connect your script to Neptune application and create new run.

In [None]:
import neptune

run = neptune.init_run(
    project="common/sklearn-integration",
    api_token=neptune.ANONYMOUS_API_TOKEN,
    name="classification-example",
    tags=["GradientBoostingClassifier", "classification"],
)

Click on the link above to open this run in Neptune. For now it is empty but keep the tab with run open to see what happens next.

You tell Neptune: 

* **who you are**: your Neptune API token `api_token` 
* **where you want to send your data**: your Neptune `project`.

At this point you have new run in Neptune. For now on you will use `run` to log metadata to it.

---

**Note**


Instead of logging data to the public project `'common/sklearn-integration'` as an anonymous user 'neptuner' you can log it to your own project.

To do that:

1. Get your [Neptune API token](https://docs.neptune.ai/setup/setting_api_token/)
2. Pass the token to ``api_token`` argument of ``neptune.init_run()`` method: ``api_token=YOUR_API_TOKEN``
3. Pass your project to the ``project`` argument of ``neptune.init_run()``.

For example:

```python
neptune.init_run(project="YOUR_WORKSPACE/YOUR_PROJECT", 
             api_token="YOUR_API_TOKEN")
```

### Log classifier summary

In [None]:
import neptune.integrations.sklearn as npt_utils

run["cls_summary"] = npt_utils.create_classifier_summary(gbc, X_train, X_test, y_train, y_test)

You just logged information about the classifier, including:

* [logged classifier parameters](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-95/all?path=cls_summary%2Fall_params),
* [logged pickled model](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-95/all?path=cls_summary%2F&attribute=pickled_model),
* [logged test predictions](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-95/all?path=cls_summary%2Ftest&attribute=preds),
* [logged test predictions probabilities](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-95/all?path=cls_summary%2Ftest&attribute=preds_proba),
* [logged test scores](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-95/all?path=cls_summary%2Ftest%2Fscores%2F),
* [logged classifier visualizations](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-95/all?path=cls_summary%2Fdiagnostics_charts&attribute=class_prediction_error),
* [logged metadata](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-95/all?path=sys),
* [logged code and git metadata](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-95/source-code?file=main.py&filePath=integrations%2Fsklearn%2F).

### Stop logging

<font color=red>**Warning:**</font><br>
Once you are done logging, you should stop tracking the run using the `stop()` method.
This is needed only while logging from a notebook environment. While logging through a script, Neptune automatically stops tracking once the script has completed execution.

In [None]:
run.stop()

### Explore Results

You just learned how to log scikit-learn classification summary to Neptune using single function.

Click on the link that was outputted to the console or [go here](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-95/allhttps://ui.neptune.ai/o/shared/org/sklearn-integration/e/SKLEARN-312/charts) to explore a run similar to yours. In particular check:

* [logged classifier parameters](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-95/all?path=cls_summary%2Fall_params),
* [logged pickled model](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-95/all?path=cls_summary%2F&attribute=pickled_model),
* [logged test predictions](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-95/all?path=cls_summary%2Ftest&attribute=preds),
* [logged test predictions probabilities](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-95/all?path=cls_summary%2Ftest&attribute=preds_proba),
* [logged test scores](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-95/all?path=cls_summary%2Ftest%2Fscores%2F),
* [logged classifier visualizations](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-95/all?path=cls_summary%2Fdiagnostics_charts&attribute=class_prediction_error),
* [logged metadata](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-95/all?path=sys),
* [logged code and git metadata](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-95/source-code?file=main.py&filePath=integrations%2Fsklearn%2F).

## Scikit-learn KMeans clustering

### Create KMeans object and example data

Define KMeans clustering parameters, that will be later passed to Neptune.

In [None]:
parameters = {"n_init": 12, "max_iter": 250}

Create and fit KMeans model.

In [None]:
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs

km = KMeans(**parameters)

X, y = make_blobs(n_samples=579, n_features=17, centers=7, random_state=28743)

### Initialize Neptune

Connect your script to Neptune application and create new run.

In [None]:
import neptune

run = neptune.init_run(
    project="common/sklearn-integration",
    api_token=neptune.ANONYMOUS_API_TOKEN,
    name="clustering-example",
    tags=["KMeans", "clustering"],
)

Click on the link above to open this run in Neptune. For now it is empty but keep the tab with run open to see what happens next.

You tell Neptune: 

* **who you are**: your Neptune API token `api_token` 
* **where you want to send your data**: your Neptune `project`.

At this point you have new run in Neptune. For now on you will use `run` to log metadata to it.

---

**Note**


Instead of logging data to the public project `'common/sklearn-integration'` as an anonymous user 'neptuner' you can log it to your own project.

To do that:

1. Get your [Neptune API token](https://docs.neptune.ai/setup/setting_api_token/)
2. Pass the token to `api_token` argument of the `init_run()` method: `api_token=YOUR_API_TOKEN`
3. Pass your project to the `project` argument of the `init_run()` method.

For example:

```python
neptune.init_run(project="YOUR_WORKSPACE/YOUR_PROJECT", 
             api_token="YOUR_API_TOKEN")
```

### Log KMeans clustering summary

In [None]:
import neptune.integrations.sklearn as npt_utils

run["kmeans_summary"] = npt_utils.create_kmeans_summary(km, X, n_clusters=17)

You just logged information about the KMeans clustering, including:

* [logged KMeans parameters](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-96/all?path=kmeans_summary%2Fall_params),
* [logged cluster labels](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-96/all?path=kmeans_summary%2F&attribute=cluster_labels),
* [logged KMeans clustering visualizations](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-96/all?path=kmeans_summary%2Fdiagnostics_charts&attribute=silhouette),
* [logged metadata](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-96/all?path=sys),
* [logged code and git metadata](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-96/source-code?file=main.py&filePath=integrations%2Fsklearn%2F).

### Stop logging

<font color=red>**Warning:**</font><br>
Once you are done logging, you should stop tracking the run using the `stop()` method.
This is needed only while logging from a notebook environment. While logging through a script, Neptune automatically stops tracking once the script has completed execution.

In [None]:
run.stop()

### Explore Results

You just learned how to log scikit-learn KMeans clustering summary to Neptune using single function.

Click on the link that was outputted to the console or [go here](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-96/all) to explore a run similar to yours. In particular check:

* [logged KMeans parameters](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-96/all?path=kmeans_summary%2Fall_params),
* [logged cluster labels](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-96/all?path=kmeans_summary%2F&attribute=cluster_labels),
* [logged KMeans clustering visualizations](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-96/all?path=kmeans_summary%2Fdiagnostics_charts&attribute=silhouette),
* [logged metadata](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-96/all?path=sys),
* [logged code and git metadata](https://app.neptune.ai/o/common/org/sklearn-integration/e/SKLEAR-96/source-code?file=main.py&filePath=integrations%2Fsklearn%2F).

## Other logging options

Neptune-Sklearn integration also lets you log only specific metadata of your choice, by using additional methods.

Below are few examples, visit the [scikit-learn integration documentation](https://docs.neptune.ai/integrations-and-supported-tools/model-training/sklearn) for the full example.

### Before you start: create and fit gradient boosting classifier

In [None]:
from sklearn.ensemble import RandomForestClassifier

rfc = RandomForestClassifier()

X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=28743)

rfc.fit(X_train, y_train)

### Import scikit-learn integration

In [None]:
import neptune.integrations.sklearn as npt_utils

run = neptune.init_run(
    project="common/sklearn-integration",
    api_token=neptune.ANONYMOUS_API_TOKEN,
    name="other-options",
)

Open link above to see the metadata logging results, as we add them below.

### Log estimator parameters

In [None]:
from neptune.utils import stringify_unsupported

run["estimator/parameters"] = stringify_unsupported(npt_utils.get_estimator_params(rfc))

### Log model

In [None]:
run["estimator/pickled-model"] = npt_utils.get_pickled_model(rfc)

### Log confusion matrix

In [None]:
run["confusion-matrix"] = npt_utils.create_confusion_matrix_chart(
    rfc, X_train, X_test, y_train, y_test
)

### Stop logging

<font color=red>**Warning:**</font><br>
Once you are done logging, you should stop tracking the run using the `stop()` method.
This is needed only while logging from a notebook environment. While logging through a script, Neptune automatically stops tracking once the script has completed execution.

In [None]:
run.stop()