#SigOpt Experiment And Optimization Demo

In this notebook, you will learn how to:

* Install the SigOpt python client
* Set your SigOpt API token
* Create your first project
* Instrument your model
* Create your first Experiment and optimize your model metric with SigOpt
* Visualize Results

## Install `sigopt` Python Client


In [None]:
!pip install sigopt

## Set Your API Token

Once you've installed SigOpt, you need to add your SigOpt API token.

If you don't have an account yet, sign up for a free account at [app.sigopt.com/signup](https://app.sigopt.com/signup).

To get your API token, visit https://app.sigopt.com/tokens/info. This page is accessible from anywhere in the app when you click on your name in the top right corner, and select "API Tokens".

<img src="https://public.sigopt.com/get-started-notebooks/v1/find-api-token.gif" width="900"/>

Once you have your API token, run the code cell below to authenticate, configure SigOpt and load the notebook integration.


In [None]:
!sigopt config
import sigopt
%load_ext sigopt

## Instrument Your Model

Let’s start out by importing some useful libraries and load our data

In [None]:
from xgboost import XGBClassifier
from sklearn.multiclass import OneVsRestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn import datasets
import numpy
import sigopt
import time

In [None]:
DATASET_NAME = "Sklearn Wine"
FEATURE_ENG_PIPELINE_NAME = "Sklearn Standard Scalar"
PREDICTION_TYPE = "Multiclass"
DATASET_SRC = "sklearn.datasets"

def get_data():

  """
  Load sklearn wine dataset, and scale features to be zero mean, unit variance.
  One hot encode labels (3 classes), to be used by sklearn OneVsRestClassifier.
  """

  data = datasets.load_wine()
  X = data["data"]
  y = data["target"]

  scaler = StandardScaler()
  X_scaled = scaler.fit_transform(X)

  enc = OneHotEncoder()
  Y = enc.fit_transform(y[:, numpy.newaxis]).toarray()

  return (X_scaled, Y)

We now create our model function; `evaluate_xgboost_model` that instantiates one xgboost classifier per class in our 3-class dataset, and evaluate the one-vs-rest classifier set on `number_of_cross_val_folds` before reporting the mean score and the wall-clock time to instantiate and train the models.


In [None]:
MODEL_NAME = "OneVsRestClassifier(XGBoostClassifier)"

def evaluate_xgboost_model(X, y,
                           number_of_cross_val_folds=5,
                           max_depth=6,
                           learning_rate=0.3,
                           min_split_loss=0):
    t0 = time.time()
    classifier = OneVsRestClassifier(XGBClassifier(
        objective = "binary:logistic",
        max_depth =    max_depth,
        learning_rate = learning_rate,
        min_split_loss = min_split_loss,
        use_label_encoder=False,
        verbosity = 0
    ))
    cv_accuracies = cross_val_score(classifier, X, y, cv=number_of_cross_val_folds)
    tf = time.time()
    training_and_validation_time = (tf-t0)
    return numpy.mean(cv_accuracies), training_and_validation_time

The second function `run_and_track_in_sigopt` uses SigOpt methods to log and track key model information including:
* the type of model used (`sigopt.log_model`),
* the name of the dataset (`sigopt.log_dataset`),
* the hyperparameters used to build the model that will be tuned during the Experiment, including their default value (`sigopt.params.setdefault`),
* the hyperparameters used to build the model that will just be tracked, but not tuned during the Experiment (`sigopt.params.[PARAMETER_NAME]`),
* various attributes relevant to the model (`sigopt.log_metadata`) and
* the model output metrics (`sigopt.log_metric`).

In [None]:
def run_and_track_in_sigopt():

    (features, labels) = get_data()

    sigopt.log_dataset(DATASET_NAME)
    sigopt.log_metadata(key="Dataset Source", value=DATASET_SRC)
    sigopt.log_metadata(key="Feature Eng Pipeline Name", value=FEATURE_ENG_PIPELINE_NAME)
    sigopt.log_metadata(key="Dataset Rows", value=features.shape[0]) # assumes features X are like a numpy array with shape
    sigopt.log_metadata(key="Dataset Columns", value=features.shape[1])
    sigopt.log_metadata(key="Execution Environment", value="Colab Notebook")
    sigopt.log_model(MODEL_NAME)

    sigopt.params.setdefault("max_depth", numpy.random.randint(low=3, high=15, dtype=int))
    sigopt.params.setdefault("learning_rate", numpy.random.random(size=1)[0])
    sigopt.params.setdefault("min_split_loss", numpy.random.random(size=1)[0]*10)

    args = dict(X=features,
                y=labels,
                max_depth=sigopt.params.max_depth,
                learning_rate=sigopt.params.learning_rate,
                min_split_loss=sigopt.params.min_split_loss)

    mean_accuracy, training_and_validation_time = evaluate_xgboost_model(**args)

    sigopt.log_metric(name='accuracy', value=mean_accuracy)
    sigopt.log_metric(name='training and validation time (s)', value=training_and_validation_time)

## Define Your Experiment Configuration

A SigOpt Experiment is an automated search of your model's hyperparameter space. A SigOpt Experiment works as follows:

<img src="https://static.sigopt.com/b/d4ed0c2c4741dfb05b18877368ba2732ac6f26fd/static/img/landing/homepage_graph1.svg" width="900"/>

With the `experiment` command below, you set your Experiment configuration by giving it a name, defining accuracy as the metric to maximize, and finally setting your hyperparameter space by instructing SigOpt to explore values within set boundaries. In our case, we ask SigOpt's optimization engine to return values for max-depth within 3 and 12 and a learning rate bewteen 0 and 1. Finally, the budget defines how many time we'll train our model. In this case, we will train our model 20 times, representing 20 SigOpt Runs.

In [None]:
%%experiment
{
    'name': 'XGBoost Optimization',
    'metrics': [
        {
            'name': 'accuracy',
            'strategy': 'optimize',
            'objective': 'maximize',
        }
    ],
    'parameters': [
        {
            'name': 'max_depth',
            'type': 'int',
            'bounds': {'min': 3, 'max': 12}
        },
        {
            'name': 'learning_rate',
            'type': 'double',
            'bounds': {'min': 0.0, 'max': 1.0}
        }
    ],
    'budget': 20
}

SigOpt will conveniently output the Experiment link in the terminal so you can check your Experiment was created.

## Execute SigOpt Optimization
Let's run our optimization using the `%%optimize` magic command. SigOpt will pick up the `experiment` configuration automatically  and conveniently output links in the terminal to the current Run on our web application.

In [None]:
%%optimize My_First_Optimization
run_and_track_in_sigopt()

## Visualize Results

You can click on any of the Run links above and view your completed Run in our web application. Here's a view of a Run page:

<img src="https://public.sigopt.com/get-started-notebooks/v1/view-run-page.gif" width="900"/>

The charts on the Run page show how it compares on key metrics with other Runs in the same project.

From the Run page, click on the Project Name at the top of the page to navigate to your project. At the project level, you can compare Runs, sort and filter through your Runs and view useful charts to gain insight into everything you've tried.

<img src="https://public.sigopt.com/get-started-notebooks/v1/sort-runs-in-project.gif" width="900"/>

From the Project page, click on the Experiments tab, and click on the Experiment you just created. The Experiment Summary page features the Experiment best value and shows Experiment improvement in a grapth that plots the best recorded model metric throughout the course of your Experiment.

The Experiment Analysis page features additional visualizations to help you gain insight into your optimization problem, including Paramater Importance, Parallel Coordinates, and interactive graphs that help you create 2D and 3D representations of your metric and parameter space.

## From Experiments To Runs

In this demo we've covered the recommended way to instrument and optimize your model, and visualize your results with SigOpt. You learned that Experiments are collections of Runs that search through a defined parameter space for one or more metrics. Check out this ([notebook](https://colab.research.google.com/github/sigopt/sigopt-examples/blob/master/get-started/sigopt_runs_demo.ipynb/)) for a closer look at a single Run, and see how to track one-off Runs without creating an Experiment.