# Tour with Scikit-learn

## Install dependencies

Before we start, make sure that you have all dependencies installed

In [None]:
! pip install --quiet scikit-learn==0.24.1 neptune-client==0.5.0 neptune-contrib[monitoring]==0.26.0

In [None]:
! pip install --quiet --upgrade scikit-learn neptune-client neptune-contrib[monitoring]

## Introduction

This tour will show you how to start using Neptune and Scikit-learn together. In the following sections you will learn Neptune's basics with common Scikit-learn task: classification.

In this tour you will learn:

* how to set project and create experiment in Neptune,
* how to log sklearn model parameters and scores,
* how to automatically log sklearn training metadata using Neptune's integrations with Scikit-learn,
* where to explore the results.

## Logging Scikit-learn classifier meta-data to Neptune

### Basic example

Define classifier parameters, that will be later passed to Neptune.

In [None]:
parameters = {'n_estimators': 120,
              'learning_rate': 0.12,
              'min_samples_split': 3,
              'min_samples_leaf': 2}

Create and fit classifier. We will use it few times in this tour.

In [None]:
from sklearn.datasets import load_digits
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split

gbc = GradientBoostingClassifier(**parameters)

X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=28743)

gbc.fit(X_train, y_train)

Once the classifier is fitted we can create Neptune experiment and log:

* model parameters,
* scores on the test set.

#### Initialize Neptune

In [None]:
import neptune

neptune.init('shared/sklearn-integration', api_token='ANONYMOUS')

Neptune gives you an option of logging data under a public folder as an anonymous user. This is great when you are just trying out the application and don't have a Neptune account yet.

If you already have a [Neptune account](https://neptune.ai/register), you can create your own experiment and start logging to it using your personal API token. Pass your `username` to the `project_qualified_name` argument of the `neptune.init()` method: `project_qualified_name='YOUR_USERNAME/YOUR_PROJECT_NAME`. If you don't have a project yet, keep `/sandbox` at the end. The `sandbox` project is automatically created for you.

#### Create an experiment and log classifier parameters

This creates an experiment in Neptune.

Once you have a live experiment you can log things to it. Here you also pass `parameters` created before.

In [None]:
neptune.create_experiment(params=parameters,
                          name='classification-example',
                          tags=['GradientBoostingClassifier', 'classification'])

Click on the link above to open this experiment in Neptune.

For now it is empty but keep the tab with experiment open to see what happens next.

#### Log scores on test data to Neptune

Here, we use basic Neptune's method, `log_metric()`, that logs numeric type of data to the experiment.

In [None]:
from sklearn.metrics import max_error, mean_absolute_error, r2_score

y_pred = gbc.predict(X_test)

neptune.log_metric('max_error', max_error(y_test, y_pred))
neptune.log_metric('mean_absolute_error', mean_absolute_error(y_test, y_pred))
neptune.log_metric('r2_score', r2_score(y_test, y_pred))

In [None]:
# tests
exp = neptune.get_experiment()

#### Stop Neptune experiment after logging scores

Below method is necessary only for notebooks users. In the Python scipts experiment is closed automatically when script finishes.

In [None]:
neptune.stop()

In [None]:
# tests
# check logs
correct_logs_set = {'max_error', 'mean_absolute_error', 'r2_score'}
from_exp_logs = set(exp.get_logs().keys())
assert correct_logs_set == from_exp_logs, '{} - incorrect logs'.format(exp)

# check parameters
assert set(exp.get_parameters().keys()) == set(parameters.keys()), '{} parameters do not match'.format(exp)

### Basic example: summary

Now, go back to the previously opened browser tab with your experiment to see tracked [parameters](https://ui.neptune.ai/shared/sklearn-integration/e/SKLEARN-5281/parameters) and [scores](https://ui.neptune.ai/shared/sklearn-integration/e/SKLEARN-5281/charts). Look for these tabs on the left side.

You just learned how to:
* set project and create experiment using Neptune API,
* log sklearn classifier parameters and scores to the experiment.

Such logging is a basic usage of Neptune to track sklearn experiments.

#### If you want to learn more, go to the [Neptune documentation](https://docs.neptune.ai/integrations/sklearn.html).

### Automatically log classifier summary to Neptune

In this section we will use Neptune's integration with sklearn to automatically log multiple types of meta-data related to the trained sklearn classifier.

This integration automatically logs multiple types of meta-data including:

* all parameters as properties,
* pickled model,
* test predictions,
* test predictions probabilities,
* test scores,
* visualizations - such as confusion matrix,
* other metadata including git summary info.

#### Initialize Neptune

In [None]:
import neptune

neptune.init('shared/sklearn-integration', api_token='ANONYMOUS')

#### Create an experiment and log classifier parameters

In [None]:
neptune.create_experiment(params=parameters,
                          name='classification-example',
                          tags=['GradientBoostingClassifier', 'classification'])

#### Log classifier summary

Use Neptune's integration with sklearn to do the logging.

In [None]:
from neptunecontrib.monitoring.sklearn import log_classifier_summary

log_classifier_summary(gbc, X_train, X_test, y_train, y_test)

You just logged information about the classifier, including:

* [logged classifier parameters](https://ui.neptune.ai/o/shared/org/sklearn-integration/e/SKLEARN-312/details) as properties,
* [logged pickled model](https://ui.neptune.ai/o/shared/org/sklearn-integration/e/SKLEARN-312/artifacts?path=model%2F&file=estimator.skl),
* [logged test predictions](https://ui.neptune.ai/o/shared/org/sklearn-integration/e/SKLEARN-312/artifacts?path=csv%2F&file=test_predictions.csv),
* [logged test predictions probabilities](https://ui.neptune.ai/o/shared/org/sklearn-integration/e/SKLEARN-312/artifacts?path=csv%2F&file=test_preds_proba.csv),
* [logged test scores](https://ui.neptune.ai/o/shared/org/sklearn-integration/e/SKLEARN-312/charts),
* [logged classifier visualizations](https://ui.neptune.ai/o/shared/org/sklearn-integration/e/SKLEARN-312/logs) - look for "charts_sklearn",
* [logged metadata](https://ui.neptune.ai/o/shared/org/sklearn-integration/e/SKLEARN-312/details) including git summary info.

Simply use `log_regressor_summary` to log meta-data related to the sklearn's regressors. If you want to learn more, go to the [Neptune documentation](https://docs.neptune.ai/integrations/sklearn.html).

In [None]:
# tests
exp = neptune.get_experiment()

#### Stop Neptune experiment after logging summary

Below method is necessary only for notebooks users. In the Python scipts experiment is closed automatically when script finishes.

In [None]:
neptune.stop()

In [None]:
# check logs
correct_logs_set = {'charts_sklearn'}
for name in ['precision', 'recall', 'fbeta_score', 'support']:
    for i in range(10):
        correct_logs_set.add('{}_class_{}_test_sklearn'.format(name, i))
from_exp_logs = set(exp.get_logs().keys())
assert correct_logs_set == from_exp_logs, '{} - incorrect logs'.format(exp)

# check sklearn parameters
assert set(exp.get_properties().keys()) == set(gbc.get_params().keys()), '{} parameters do not match'.format(exp)

# check neptune parameters
assert set(exp.get_parameters().keys()) == set(parameters.keys()), '{} parameters do not match'.format(exp)

### Automatic logging to Neptune: summary

You just learned how to log scikit-learn classification summary to Neptune using single function.

Click on the link that was outputted to the console or [go here](https://ui.neptune.ai/o/shared/org/sklearn-integration/e/SKLEARN-312/charts) to explore an experiment similar to yours. In particular check:

* [logged classifier parameters](https://ui.neptune.ai/o/shared/org/sklearn-integration/e/SKLEARN-312/details) as properties,
* [logged pickled model](https://ui.neptune.ai/o/shared/org/sklearn-integration/e/SKLEARN-312/artifacts?path=model%2F&file=estimator.skl),
* [logged test predictions](https://ui.neptune.ai/o/shared/org/sklearn-integration/e/SKLEARN-312/artifacts?path=csv%2F&file=test_predictions.csv),
* [logged test predictions probabilities](https://ui.neptune.ai/o/shared/org/sklearn-integration/e/SKLEARN-312/artifacts?path=csv%2F&file=test_preds_proba.csv),
* [logged test scores](https://ui.neptune.ai/o/shared/org/sklearn-integration/e/SKLEARN-312/charts),
* [logged classifier visualizations](https://ui.neptune.ai/o/shared/org/sklearn-integration/e/SKLEARN-312/logs) - look for "charts_sklearn",
* [logged metadata](https://ui.neptune.ai/o/shared/org/sklearn-integration/e/SKLEARN-312/details) including git summary info.

## If you want to learn more, go to the [Neptune documentation](https://docs.neptune.ai/integrations/sklearn.html).