#SigOpt Training Runs Demo

In this notebook demo, you will learn how to:

* Install the SigOpt python client
* Set your SigOpt API token
* Create your first project
* Instrument your model
* Create your first run and log your model metric and parameters to SigOpt
* Visualize Results

## Install `sigopt` Python Client


In [None]:
! pip install sigopt

Collecting sigopt
  Downloading https://files.pythonhosted.org/packages/c6/aa/68e55c41a72bc7ac79addb28701bb28afb8fc5407e44928d619009a86eb7/sigopt-7.4.0-py2.py3-none-any.whl
Collecting pypng>=0.0.20
[?25l  Downloading https://files.pythonhosted.org/packages/bc/fb/f719f1ac965e2101aa6ea6f54ef8b40f8fbb033f6ad07c017663467f5147/pypng-0.0.20.tar.gz (649kB)
[K     |████████████████████████████████| 655kB 15.8MB/s 
[?25hCollecting GitPython>=2.0.0
[?25l  Downloading https://files.pythonhosted.org/packages/bc/91/b38c4fabb6e5092ab23492ded4f318ab7299b19263272b703478038c0fbc/GitPython-3.1.18-py3-none-any.whl (170kB)
[K     |████████████████████████████████| 174kB 58.3MB/s 
Collecting PyYAML>=5.4.1
[?25l  Downloading https://files.pythonhosted.org/packages/7a/a5/393c087efdc78091afa2af9f1378762f9821c9c1d7a22c5753fb5ac5f97a/PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636kB)
[K     |████████████████████████████████| 645kB 40.5MB/s 
Collecting gitdb<5,>=4.0.1
[?25l  Downloading https://files.

In [None]:
! pip install xgboost==1.4.2

Collecting xgboost==1.4.2
[?25l  Downloading https://files.pythonhosted.org/packages/bb/35/169eec194bf1f9ef52ed670f5032ef2abaf6ed285cfadcb4b6026b800fc9/xgboost-1.4.2-py3-none-manylinux2010_x86_64.whl (166.7MB)
[K     |████████████████████████████████| 166.7MB 91kB/s 
Installing collected packages: xgboost
  Found existing installation: xgboost 0.90
    Uninstalling xgboost-0.90:
      Successfully uninstalled xgboost-0.90
Successfully installed xgboost-1.4.2


## Set Your API Token

Once you've installed SigOpt, you need to add your SigOpt API token.

If you don't have an account yet, sign up for a free at [app.sigopt.com/signup](https://app.sigopt.com/signup).

To get your API token, visit https://app.sigopt.com/tokens/info. This page is accessible from anywhere in the app when you click on your name in the top right corner, and select "API Tokens".

<img src="https://public.sigopt.com/get-started-notebooks/v1/find-api-token.gif" width="900"/>



In [None]:
import portpicker
import threading
import socket
import IPython

from six.moves import socketserver
from six.moves import SimpleHTTPServer

class V6Server(socketserver.TCPServer):
  address_family = socket.AF_INET6

class Handler(SimpleHTTPServer.SimpleHTTPRequestHandler):
  def do_GET(self):
    self.send_response(200)
    # If the response should not be cached in the notebook for
    # offline access:
    # self.send_header('x-colab-notebook-cache-control', 'no-cache')
    self.end_headers()
    self.wfile.write(b'''
      document.querySelector('#output-area').appendChild(document.createTextNode('Script result!'));
    ''')

port = portpicker.pick_unused_port()

def server_entry():
    httpd = V6Server(('::', port), Handler)
    # Handle a single request then exit the thread.
    httpd.serve_forever()

thread = threading.Thread(target=server_entry)
thread.start()

# Display some HTML referencing the resource.
display(IPython.display.HTML('<script src="https://localhost:{port}/"></script>'.format(port=port)))

In [None]:
from google.colab import output
output.serve_kernel_port_as_iframe(port)

In [None]:
from google.colab import output
output.serve_kernel_port_as_window(port)

In [None]:
MY_API_TOKEN = "YOUR_API_TOKEN_HERE"

Then configure your connection with SigOpt:

In [None]:
from sigopt import Connection
conn = Connection(client_token=MY_API_TOKEN)

## Create Your Project

Training runs are created within projects. The project allows you to sort and filter your training runs and view useful charts with insights into everything you've tried.

Feel free to edit the name of your project below. Note that the API token is also set as an environment variable.

In [None]:
import os
os.environ['SIGOPT_API_TOKEN'] = MY_API_TOKEN
os.environ['SIGOPT_PROJECT'] = "SigOpt_Run_XGB_Classifier"
%load_ext sigopt

## Instrument Your Model

Let’s start out by importing some useful libraries and load our data:

In [None]:
from xgboost import XGBClassifier
from sklearn.multiclass import OneVsRestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn import datasets
import numpy
import sigopt
import time

In [None]:
DATASET_NAME = "Sklearn Wine"
FEATURE_ENG_PIPELINE_NAME = "Sklearn Standard Scalar"
PREDICTION_TYPE = "Multiclass"
DATASET_SRC = "sklearn.datasets"

def get_data():
  
  """
  Load sklearn wine dataset, and scale features to be zero mean, unit variance.
  One hot encode labels (3 classes), to be used by sklearn OneVsRestClassifier. 
  """
 
  data = datasets.load_wine()
  X = data["data"]
  y = data["target"]

  scaler = StandardScaler()
  X_scaled = scaler.fit_transform(X)

  enc = OneHotEncoder()
  Y = enc.fit_transform(y[:, numpy.newaxis]).toarray()

  return (X_scaled, Y)

We now create our model function; `evaluate_xgboost_model` that instantiates one xgboost classifier per class in our 3-class dataset, and evaluate the one-vs-rest classifier set on `number_of_cross_val_folds` before reporting the mean score and the wall-clock time to instantiate and train the models.


In [None]:
MODEL_NAME = "OneVsRestClassifier(XGBoostClassifier)"

def evaluate_xgboost_model(X, y, 
                           number_of_cross_val_folds=5,
                           max_depth=6,
                           learning_rate=0.3,
                           min_split_loss=0):
    t0 = time.time()
    classifier = OneVsRestClassifier(XGBClassifier(
        objective = "binary:logistic",
        max_depth =    max_depth,
        learning_rate = learning_rate,
        min_split_loss = min_split_loss,
        use_label_encoder=False,
        verbosity = 0
    ))
    cv_accuracies = cross_val_score(classifier, X, y, cv=number_of_cross_val_folds)
    tf = time.time()
    training_and_validation_time = (tf-t0)
    return numpy.mean(cv_accuracies), training_and_validation_time

The second function `run_and_track_in_sigopt` uses SigOpt methods to log and track key model information including:
* the type of model used (`sigopt.log_model`),
* the name of the dataset (`sigopt.log_dataset`),
* the hyperparameters used to build the model (`sigopt.get_parameter`),
* various attributes relevant to the model (`sigopt.log_metadata`) and
* the model output metrics (`sigopt.log_metric`).

In [None]:
def run_and_track_in_sigopt():

    (features, labels) = get_data()

    sigopt.log_dataset(DATASET_NAME)
    sigopt.log_metadata(key="Dataset Source", value=DATASET_SRC)
    sigopt.log_metadata(key="Feature Eng Pipeline Name", value=FEATURE_ENG_PIPELINE_NAME)
    sigopt.log_metadata(key="Dataset Rows", value=features.shape[0]) # assumes features X are like a numpy array with shape
    sigopt.log_metadata(key="Dataset Columns", value=features.shape[1])
    sigopt.log_metadata(key="Execution Environment", value="Colab Notebook")
    sigopt.log_model(MODEL_NAME)

    args = dict(X=features, y=labels,
                max_depth = sigopt.get_parameter("max_depth", default = numpy.random.randint(low=3, high=15, dtype=int)),
                learning_rate = sigopt.get_parameter("learning_rate", default = numpy.random.random(size=1)[0]),
                min_split_loss = sigopt.get_parameter("min_split_loss", default = numpy.random.random(size=1)[0]*10))

    mean_accuracy, training_and_validation_time = evaluate_xgboost_model(**args)

    sigopt.log_metric(name='accuracy', value=mean_accuracy)
    sigopt.log_metric(name='training and validation time (s)', value=training_and_validation_time)

## Execute SigOpt Runs 
Let's run and track our model using the `%%run` magic command.

In [None]:
%%run My_First_Run
run_and_track_in_sigopt()

## Visualize Results
When a run is executed, SigOpt will conveniently output links to the run page on our web application. You can click on the run link above and view your completed run in our web application. Here's a view of a training run page:

<img src="https://public.sigopt.com/get-started-notebooks/v1/view-run-page.gif" width="900"/>

The charts on the training run page show how it compares on key metrics with other runs in the same experiment.

From the Run page, click on the Project Name at the top of the page to navigate to your project. At the project level, you can compare  training runs, sort and filter through your training runs and view useful charts to gain insight into everything you've tried.

Here's a view of a project page with multiple runs:

<img src="https://public.sigopt.com/get-started-notebooks/v1/sort-runs-in-project.gif" width="900"/>

From the Project page, click on the Analysis tab. The default visualizations can be configured and more plots added, so you can draw conclusions and make comparisons.

Here's a view of the analysis dashboard with multiple runs:

<img src="https://public.sigopt.com/get-started-notebooks/v1/analyze-runs-in-project.gif" width="900"/>

Scroll down to the bottom of the page to see a list of all your runs in one unique table. You can sort and filter runs to identify the most promising runs, customize the table and save in a “View” for later. Filtering from the table of runs can be applied to charts to focus on runs of interest.

<img src="https://public.sigopt.com/get-started-notebooks/v1/filter-project-runs.gif" width="900"/>

## From Training Runs To Optimization Experiments

In this demo we've covered the recommended way to instrument your training run with SigOpt. After your model has been instrumented, it is easy to take advantage of SigOpt's optimization features. Optimization helps find the parameters for your model that give you the best metric (eg. maximizing an accuracy metric). Check out this ([notebook](https://colab.research.google.com/github/sigopt/sigopt-examples/blob/master/get-started/sigopt_experiment_and_optimization_demo.ipynb/)) to see how you can create an experiment!  