# Introduction to Hyperparameter Optimization

## Discussion of HPO principles and collaboration on an interactive use case

`The tutorial will be run by Jim Blomo (jim@sigopt.com)`


`Original content created by Jeremy Bivaud (jeremy@sigopt.com) and Tobias Andreasen (tobias@sigopt.com)`

___

## Introduction

*Experimentation is critical to developing models but can be a messy process. Modelers often spend significant time on tasks like tracking runs, creating visualizations, and troubleshooting hyperparameter optimization jobs, all of which could be supported or automated with software.*

*Join expert Jim Blomo to learn best practices for tracking, training, and tuning models and using the information from these processes to make the best decisions around the model development process. You’ll focus specifically on hyperparameter optimization (HPO): selecting the best method, executing tuning jobs, and analyzing the results of these jobs to select the best model for production. Along the way, you’ll see firsthand just how useful HPO is through an anomaly detection problem (based on a Kaggle financial dataset) that uses an XGBoost classification model. You’ll then use SigOpt to perform your own tuning jobs and cover open source alternatives and how to implement them* - [O'Reilly](https://www.oreilly.com/live-training/courses/introduction-to-hyperparameter-optimization/0636920470830/#instructors)




## Instructor: Jim Blomo

*Jim Blomo is an executive engineering leader at SigOpt. He’s achieved strong business results in technology companies by creating a culture of performance, innovation, and teamwork. Previously, Jim led data-mining efforts as a director of engineering at Yelp, operations at the startup PBWorks, and search infrastructure at Amazon’s A9 subsidiary. He enjoys speaking and travel; he’s lectured on data mining and web architecture at UC Berkeley's School of Information and presented at conferences such as AWS re:Invent, O’Reilly OSCON, Wolfram Data Summit, and RecSys. He loves exploring the food and outdoors of the Bay Area with his family* - [Jim Blomo](https://www.linkedin.com/in/jimblomo/)

<img src="https://sigopt.com/wp-content/uploads/2019/02/img-Jim.jpg" alt="Black Box Overview"  width="200" align="center"/> 

<img src="https://static.sigopt.com/b/ac75899cedb91b80acf515ed92fef03aa4ef690d/static/img/SigOpt_logo_horiz.svg" alt="Black Box Overview" width="300"/>

___

# Session Abstract

This session’s primary objective is to teach attendees how to balance model training and hyperparameter tuning to develop high-performing models. Its secondary objective is to dive deeper into hyperparameter tuning, and, through the process, help attendees develop some intuition around the practical aspects of automated model optimization.

> First, we will spend time discussing the best ways to track runs as you go through the modeling process. Second, we will discuss useful visualizations for analyzing model behavior, comparing architectures, and evaluating metrics. Finally, we will delve into methods for automated hyperparameter optimization with a focus on how tuning hyperparameters boosts model performance, provides model insights, and bolsters modeling workflows and team collaboration.

We will present an anomaly detection problem based on this [Kaggle financial dataset](https://www.kaggle.com/ntnu-testimon/paysim1) using a [XGBoost classification model](https://xgboost.readthedocs.io/en/latest/) to show rather than tell how HPO can be useful. After presenting the use case and the dataset, we’ll dive into the optimization journey, presenting the model performance and workflow uplifts that a modeler can observe when following a structured optimization approach using SigOpt. The session will feature multiple interactive sections including code exercises and group discussions that will utilize Jupyter notebooks. Every attendee will get free access to [SigOpt](https://sigopt.com/) so they can perform their own tuning jobs as part of the tutorial. We will also cover open source alternatives to SigOpt and train attendees on how to integrate them if they need a replacement for SigOpt.


## 🚦 **TO DO [Recommended but optional] - Sign up for a free SigOpt account in order to follow along during the tutorial - [SIGN UP FOR SIGOPT](https://modeling.sigopt.com/oreilly-offer).**

*- After signing up for the SigOpt account you will recieve an email with instructions on how to activate your personal SigOpt account.*

___

# Session Agenda

This session is divided in three sections;

* [__Data Import and preprocessing__](#Data-Import-and-Preprocessing)
    * Modeling Environment
    * Importing Libraries
    * Importing the Dataset
    * Defining our Feature and Label Sets
    * Objective Metric Selection
    * Splitting the dataset
* [__Experiment Tracking__](#Experiment-Tracking)
    * Experiment Tracking with SigOpt
        * Training Runs
        * Experiments
    * Setting Up SigOpt
    * Setting our baseline
* [__Hyperparameter Optimization__](#Hyperparameter-Optimization)
    * Define your Parameter Space
    * Configure your Experiment
    * Instrument your model and run your Experiment
    * Multimetric Experimentation
* [__Learn more at your own time__](#Learn-more-at-your-own-time)
    * EfficientBERT
    * Advanced features
    * Documentation

There will be a 5 min break between the Experiment Tracking and Hyperparameter Optimization sections. Let's get started!

___

# Data Import and Preprocessing

## Modeling Envirnment

In the interest of time we have already prepared the environment that we'll be using for this tutorial. All of the libraries listed belowe have been install using *pip install*'.

🚦 **TO DO - Run the following code cell to have a look at the environment that we will be using for the tutorial.**

In [1]:
with open('../requirements.txt', 'r') as txt:
    print('Preinstalled libraries: \n')
    print(txt.read())

Preinstalled libraries: 

Keras==2.4.3
jupyter==1.0.0
pandas==1.1.2
scikit-learn==0.23.2
tensorflow==2.3.0
xgboost==1.2.0
https://sigopt-python-mpm.s3-us-west-1.amazonaws.com/sigopt-7.1.4-py2.py3-none-any.whl



## Importing Libraries
All of the above libraries have bee pre-installed and are ready to be imported.

Links and a short description of why each of these libraries are important for the tutorial can be found below.
- **pandas** to load the data file as a Pandas dataframe, analyze and process the data directly within the notebook
- **numpy** and **math** for computing scientific functions
- **time** to measure inference and training time
- **SigOpt** for experimentation and optimization
- from **sklearn**, we'll import [train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) to split our dataset into train and test subsets, [average_precision_score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html), [confusion_matrix](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html) and [f1_score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) to track our model performance
- and finally, from **xgboost**, we'll import the [XGBClassifier](https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBClassifier) class that we will use to build and parametrize our model

We also use a random_state variable across this notebook to guarantee that all our functions are deterministic, and results are repeatable.

🚦 **TO DO - Run the following code cell to import the required libraries and classes.**

In [2]:
import os
import pandas as pd
import numpy as np
import math
import time
import sigopt
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, average_precision_score, f1_score 
from xgboost.sklearn import XGBClassifier
random_state = 3

## Importing the Dataset
The dataset we are using is a synthetic dataset generated using the [PaySim](https://github.com/EdgarLopezPhD/PaySim) simulator. PaySim uses aggregated data from a private dataset to generate a synthetic dataset that resembles the normal operation of transactions and injects malicious behavior to later evaluate the performance of fraud detection methods. The dataset is publicly available under this kaggle [project](https://www.kaggle.com/ntnu-testimon/paysim1). For your convenience we performed the data preprocessing and feature engineering on the dataset. We also picked a 10% subset of this data so we can iterate fast and try enough sets of hyperparameters to showcase the optimization process and still fit this journey within a two hour interactive session.

🚦 **TO DO - Run the following code cell to import the data set and create the corresponding Pandas dataframe.**

In [3]:
df_path = '../data/Fraud_Detection_SigOpt_dataset.csv'
df = pd.read_csv(df_path)
df.head()

Unnamed: 0,step,type,amount,oldBalanceOrig,newBalanceOrig,oldBalanceDest,newBalanceDest,isFraud,errorBalanceOrig,errorBalanceDest
0,1,0,181.0,181.0,0.0,0.0,0.0,1,0.0,181.0
1,1,1,181.0,181.0,0.0,21182.0,0.0,1,0.0,21363.0
2,1,1,229133.94,15325.0,0.0,5083.0,51513.44,0,213808.94,182703.5
3,1,0,215310.3,705.0,0.0,22425.0,0.0,0,214605.3,237735.3
4,1,0,311685.89,10835.0,0.0,6267.0,2719172.89,0,300850.89,-2401220.0


## Defining our Feature and Label Sets
🚦 **TO DO - Run the following code cell to split our dataset into the feature set X and the label set Y.**

In [4]:
X = df
Y = df['isFraud']
del X['isFraud']

In most Anomaly Detection problems, the main obstacle for training a robust ML model is the highly imbalanced nature of the data.  The formula below returns a measure of the dataset skew, which is is the share of fraudulent transactions in our dataset which is a little more than .1%

🚦 **TO DO - Run the following code cell to find out the share of fraudulent activity in your dataset**

In [5]:
print('Share of fraudulent activity: {}%'.format(100*(len(Y.loc[Y == 1]) / float(len(Y)))))

Share of fraudulent activity: 0.1378084498528364%


## Objective Metric Selection

Now that we have established that the dataset is highly skewed towards one of our classes. It is important to pick  metrics that are able to account for this type of class imbalance. For this tutorial we will focus on some common  metrics used for imbalanced dataset:

- **F1-score**, **precision** and **recall**: F1 is a wieghted average of precision and recell and tryes to account for one class having a larger representation than the other.
- **Area under the precision-recall curve (AUPRC)** and **area under the receiver operating characteristic (AUROC)**: these two metrics do have similar characteristics, but oftentimes AUPRC does a better job than AUROC at weighing incorrect predictions that occur in the minority class. More info on this tradeoff in this [publication](http://pages.cs.wisc.edu/~jdavis/davisgoadrichcamera2.pdf).
- **Confusion matrix**: the confusion matrix provides the counts of predictions compared to the actual label.

Additionally, computing business level metrics allows modelers to translate model behavior into impacts other stakeholders may care about. This step is critical in explaining how a model will likely behaive in production.

- **max_missed_fraud( )**: The maximum transaction amount we missed flagging as fraud (false negative)
- **max_missed_valid( )**: The maximum transaction amount we missed flagging as valid (false positive)

🚦 **TO DO - Run the following code cells to define these two objective metrics**


In [6]:
def max_missed_fraud(prediction, label, amount):
    """record the mean transaction amount from missing fraudulent transactions"""
    fn_vec = (~prediction) & (label > 0)
    fraud_loss_max = np.where(fn_vec, np.abs(amount), np.zeros_like(amount)).max()
    return fraud_loss_max

In [7]:
def max_missed_valid(prediction, label, amount):
    """record the mean transaction amount from flagging valid transactions"""
    fp_vec = prediction & (label == 0)
    valid_loss_max = np.where(fp_vec, np.abs(amount), np.zeros_like(amount)).max()
    return valid_loss_max

## Exercise: Define a business metric

Define your own definition of a business metric you think a stakeholder would care about. Write the code below, which includes a simple test case.

🚦 **TO DO - Fill in the function below to compute a business metric. Run the cell to run a simple test.**


In [8]:
def exercise_business_metric(prediction, label, amount):
    """TODO: Fill in your documentation"""
    # TODO: Fill in your code here, and replace the raise statement with a return
    raise NotImplementedError("TODO: Fill in code")
    
print(exercise_business_metric(np.array([1,1,0,0]), np.array([1,0,1,0]), np.array([10,20,30,40])))

NotImplementedError: TODO: Fill in code

## Consistent Metric Logging

Having a consistent place to log all your metrics can ensure you have all the data you need on hand to make decisions or engage in a discussion.

🚦 **TO DO - Add your new function above to the metrics being logged with `sigopt.log_metric`.**


In [9]:
def log_all_metrics(prediction, probabilities, testY, testX):
    """Log all relevant metrics using the `predictions` generated by the model, 
    the `probabilities` associated with those predictions, the `testY` actual 
    labels from the dataset, and `testX` the features."""
    F1score = f1_score(testY,prediction)
    AUPRC = average_precision_score(testY, probabilities)
    tn, fp, fn, tp = confusion_matrix(testY,prediction).ravel()

    sigopt.log_metric('AUPRC', average_precision_score(testY, probabilities))
    sigopt.log_metric('F1score', F1score)
    sigopt.log_metric('False Positive', fp)
    sigopt.log_metric('False Negative', fn)
    sigopt.log_metric('True Positive', tp)
    sigopt.log_metric('True Negative', tn)
    sigopt.log_metric('Max $ Missed Fraudulent', max_missed_fraud(prediction, testY, testX['amount']))
    sigopt.log_metric('Max $ Missed Valid', max_missed_valid(prediction, testY, testX['amount']))

## Splitting the dataset

It is important, that we produce a model which is able to generalize to unseen data. In order to do so, we split our full dataset into a training set (80% of the data) and a testing set (20% of the data).

🚦 **TO DO - Run the following code cell create your training and test sets**

In [10]:
trainX, testX, trainY, testY = train_test_split(X, Y, test_size = 0.2, random_state = random_state)

It is important to notice that SigOpt is not limited to using a train/test-split for guaranteeing generalization. It is perfectly resonable to use things like k-fold, simulation etc. Ultimately it comes down to, what you feel the most comfortable with. 

___

### [Training Runs](https://app.sigopt.com/docs/runs/overview)
A SigOpt Run stores the training and evaluation of a model, so that modelers can see a history of their work. This is the fundamental building block of SigOpt. Runs record everything you might need to understand how a model was built, reconstitute the model in the future, or explain the process to a colleague.

Common attributes of a Run include:
- the model type,
- dataset identifier,
- evaluation metrics,
- hyperparameters,
- logs, and
- a code reference.

For a complete list of attributes see the [API Reference](https://app.sigopt.com/docs/runs/reference). Training runs can be recorded by integrating code snippets into Python that you run in a [notebook](https://app.sigopt.com/docs/runs/notebook) or via the [command line](https://app.sigopt.com/docs/runs/editor).

## Setting up SigOpt

Before starting to do modeling, we will need to configure the SigOpt library to connect to the SigOpt backend and come up with a name for our project.

🚦 **TO DO - 1) log into your SigOpt account and access your own personal [API token](https://app.sigopt.com/tokens/info), 2) run the following code cell and 3) follow the instruction in order to connect to the SigOpt backend**

In [11]:
from getpass import getpass
API_TOKEN = getpass('Insert your API token: ')
os.environ['SIGOPT_API_TOKEN'] = API_TOKEN

Insert your API token: ········


🚦 **TO DO - 1) run the following code cell and 2) follow the instructions to name your project**

In [12]:
%load_ext sigopt
PROJECT_NAME = input('Name your project: ')
os.environ['SIGOPT_PROJECT'] = PROJECT_NAME

Name your project: oreilly


In [13]:
from keras.callbacks import Callback
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
import keras.backend as K
import tensorflow as tf

In [14]:
# create a Keras callback to record checkpoints
class CheckpointCB(Callback):
    def on_train_begin(self, logs=None):
        pass

    def on_epoch_end(self, epoch, logs=None):
        if logs:
            sigopt.log_checkpoint(logs)

In [15]:
# implement f1 metric and loss to handle imbalanced dataset
import tensorflow as tf
import keras.backend as K

def f1(y_true, y_pred):
    y_true = K.cast(y_true, 'float')
    
    tp = K.sum(y_true*y_pred, axis=0)
    tn = K.sum((1-y_true)*(1-y_pred), axis=0)
    fp = K.sum((1-y_true)*y_pred, axis=0)
    fn = K.sum(y_true*(1-y_pred), axis=0)

    p = tp / (tp + fp + K.epsilon())
    r = tp / (tp + fn + K.epsilon())

    f1 = 2*p*r / (p+r+K.epsilon())
    f1 = tf.where(tf.math.is_nan(f1), tf.zeros_like(f1), f1)
    return K.mean(f1)

def f1_loss(y_true, y_pred):
    y_true = K.cast(y_true, 'float')
    
    tp = K.sum(y_true*y_pred, axis=0)
    tn = K.sum((1-y_true)*(1-y_pred), axis=0)
    fp = K.sum((1-y_true)*y_pred, axis=0)
    fn = K.sum(y_true*(1-y_pred), axis=0)

    p = tp / (tp + fp + K.epsilon())
    r = tp / (tp + fn + K.epsilon())

    f1 = 2*p*r / (p+r+K.epsilon())
    f1 = tf.where(tf.math.is_nan(f1), tf.zeros_like(f1), f1)
    loss = 1 - f1
    mean_loss = K.mean(loss)
    return tf.where(mean_loss < K.epsilon(), tf.zeros_like(mean_loss), tf.math.log(mean_loss))

In [16]:
#standardizing the input feature
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler

sc = StandardScaler()
scaled_trainX, scaled_testX = sc.fit_transform(trainX), sc.fit_transform(testX)

In [17]:
os.environ['SIGOPT_API_TOKEN']

'GUOSDWFOXYMZYAGRRKFRROZVLEXIUREMASTTONZYMLHADRSW'

In [26]:
%%run Keras First Run

sigopt.log_model('MLP (keras.models.Sequential)')
sigopt.log_dataset('Scaled ' + dfpath)

#model parametrization
model_keras = Sequential()
model_keras.add(Dense(
    trainX.shape[1] * 2,
    activation='relu',
    kernel_initializer='random_normal',
    bias_initializer='zeros',
    input_dim=trainX.shape[1]
))
model_keras.add(Dense(
    trainX.shape[1] * 2,
    activation='relu',
    kernel_initializer='random_normal',
    bias_initializer='zeros'
))
model_keras.add(Dense(
    1,
    activation='sigmoid',
    kernel_initializer='random_normal',
    bias_initializer='zeros'
))
model_keras.compile(
    optimizer=Adam(lr=np.exp(sigopt.get_parameter('log_learning_rate', np.log(0.01)))),
    loss=f1_loss,
    metrics=[f1]
)
model_keras.fit(
    scaled_trainX,
    trainY,
    batch_size=sigopt.get_parameter('batch_size', default=4096),
    epochs=sigopt.get_parameter('epochs', default=6),
    callbacks=[CheckpointCB()],
    validation_data=(scaled_testX, testY),
)

#Collect model metrics
probability = model_keras.predict(scaled_testX).flatten()
prediction = np.where(probability > 0.5, np.ones_like(probability, dtype=bool), np.zeros_like(probability, dtype=bool))

log_all_metrics(prediction, probability, testY, testX)

Run started, view it on the SigOpt dashboard at https://app.sigopt.com/run/25005
Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6
Run finished, view it on the SigOpt dashboard at https://app.sigopt.com/run/25005


## Setting our baseline

In any optimization problem, it is critical to define a baseline so any uplift in performance can be evaluated against that baseline. In our scenario the baseline is running our XGBoost classifier with all default parameters. Let's now look at three SigOpt methods we are using to build consistency in our experimentation approach

- [sigopt.get_parameter](https://app.sigopt.com/docs/runs/reference#get_parameter) allows us to store our defult parameters for the baseline model on our dashboard.
- [sigopt.log_model](https://app.sigopt.com/docs/runs/reference#log_model) stores a text value that you can use to filter your runs in the SigOpt web view. In our example, you might want to filter by Fraud_Analysis to compare models of the same use case in the web charts.
- The most important information about a model is how it performed. With [sigopt.log_metric](https://app.sigopt.com/docs/runs/reference#log_metric) you can take advantage of SigOpt's analysis dashboard of custom charts and advanced sorting and filtering.

🚦 **TO DO - Run the following code cell to create your baseline model, fit it, collect your performance metrics of other metrics of interest**

In [36]:
%%run XGBoost
sigopt.log_model('XGboost for Fraud Analysis')
sigopt.log_dataset(df_path)

model = XGBClassifier(random_state = random_state)

for key, value in model.get_params().items():
    print(key)
    print(value)
    sigopt.get_parameter(name=key, default=value)

modelfit = model.fit(trainX,trainY)
prediction = modelfit.predict(testX)
probabilities = modelfit.predict_proba(testX)[:, 1]


log_all_metrics(prediction, probabilities, testY, testX)



Run started, view it on the SigOpt dashboard at https://app.sigopt.com/run/25009
objective
binary:logistic
base_score
None
booster
None
colsample_bylevel
None
colsample_bynode
None
colsample_bytree
None
gamma
None
gpu_id
None
importance_type
gain
interaction_constraints
None
learning_rate
None
max_delta_step
None
max_depth
None
min_child_weight
None
missing
nan
Run finished, view it on the SigOpt dashboard at https://app.sigopt.com/run/25009


ApiException: ApiException (400): NaN

In [35]:
model = XGBClassifier(random_state = random_state)
model.get_params()

{'objective': 'binary:logistic',
 'base_score': None,
 'booster': None,
 'colsample_bylevel': None,
 'colsample_bynode': None,
 'colsample_bytree': None,
 'gamma': None,
 'gpu_id': None,
 'importance_type': 'gain',
 'interaction_constraints': None,
 'learning_rate': None,
 'max_delta_step': None,
 'max_depth': None,
 'min_child_weight': None,
 'missing': nan,
 'monotone_constraints': None,
 'n_estimators': 100,
 'n_jobs': None,
 'num_parallel_tree': None,
 'random_state': 3,
 'reg_alpha': None,
 'reg_lambda': None,
 'scale_pos_weight': None,
 'subsample': None,
 'tree_method': None,
 'validate_parameters': None,
 'verbosity': None}

🚦 **TO DO - Click the Run hyperlink above. You will be redirected to the the corresponding Run page. Now let's explore the UI and look at the data we collected during this Run**

> Our AUPRC baseline is 0.96077. It is also worth noting that our baseline model only missed on 5 predictions, all False Negative (i.e. fraudulent activities that were predicted as non fraudulent) out of a total 55,585 predictions.

___

# Hyperparameter Optimization

The previous section illustrated how you can log and track a single run of your model by leveraging SigOpt training runs. We now want to leverage SigOpt optimization engine, and let the engine suggest sets of parameters for the purpose of tuning that same model. Similarly we'll now go over some SigOpt terminology.

<img src="https://sigopt.com/wp-content/uploads/2019/05/SigOpt-interaction-model-1.png" alt="Black Box Overview"  width="700"/>

## Define your Parameter Space

Today, we will explore the following parameter space
- **min_child_weight**, used to control over-fitting, this parameter is the sample size under which the model can not split a node.  Higher values prevent a model from learning relations which might be highly specific to the particular sample selected for a tree.
- **max_depth**, this is the maximum depth of a tree.  This parameter controls over-fitting as higher depth will allow model to learn relations very specific to a particular sample.
- **n_estimators**, this is the number of trees to fit.  Usually the higher the number of trees the better to learn the data. However, adding a lot of trees can slow down the training process and intoduce overfitting patterns.
- **learning_rate** controls the weighting of new trees added to the model.  Lowering this value will prevent overfitting, but require the model to add a larger number of tree

## Configure your Experiment

The experiment definition will include the name, project, which of your parameters to optimize, metrics and other options that you would like to run your experiment with. The observation budget is the number of runs you would like created in the optimization. We recommend using 20 for testing. When you're ready, visit the [observation budget page](https://app.sigopt.com/docs/overview/observation_budget) to learn our rule of thumb for the appropriate observation budget.

You can format the experiment definition in Python, YAML, or JSON. In this example, we're using YAML. See our documentation [here](https://app.sigopt.com/docs/runs/optimize#formatting) for Python and JSON examples and examples on how to create experiments using R, Matlab or Java.

🚦 **TO DO - Run the following code cell to create your experiment**

In [None]:
%%experiment
'''
name: Fraud_Analysis
parameters:
  - name: min_child_weight
    bounds:
      min: 1
      max: 15
    type: int
  - name: max_depth
    bounds:
      min: 2
      max: 15
    type: int
  - name: n_estimators
    bounds:
      min: 20
      max: 400
    type: int
  - name: learning_rate
    bounds:
      min: 0.001
      max: 1
    transformation: log
    type: double
metrics:
  - name: AUPRC
    objective: maximize
observation_budget: 20
'''

Once your experiment is created, you'll have a link to your experiment

🚦 **TO DO - Click the link above to confirm your experiment was properly created.**

## Instrument your model and run your Experiment

With SigOpt, it's very easy to instrument your  model and run an optimization loop. The [sigopt.get_parameter](https://app.sigopt.com/docs/runs/reference#get_parameter) method also allows us to access the parameter suggested throughout the optimization process. In our example, the parameters that we will be optimizing are **min_child_weight**, **max_depth**, **n_estimators** and **learning_rate**. **It is important to notice that SigOpt does not require you to optimize all of your parameters, meaning that you are able to keep the defult parameters that you don't want to optimize.** When running the optimization this method will seamlessly return a value generated from a SigOpt Experiment's Suggestion.

🚦 **TO DO - Run the following code cell to create the function that will generate a new model everytime SigOpt has a new set of hyperparameters to evaluate**

In [None]:
%%optimize
sigopt.log_model('XGboost for Fraud Analysis')
sigopt.log_dataset(df_path)

for key, value in zip(model.get_params().keys(), model.get_params().values()):
    sigopt.get_parameter(name=key, default=value)

min_child_weight = sigopt.get_parameter('min_child_weight')
max_depth = sigopt.get_parameter('max_depth')
n_estimators = sigopt.get_parameter('n_estimators')
learning_rate = sigopt.get_parameter('learning_rate')


model = XGBClassifier(min_child_weight=min_child_weight,
                      max_depth=max_depth,
                      n_estimators=n_estimators,
                      learning_rate=learning_rate,
                      random_state = random_state)

modelfit = model.fit(trainX,trainY)
prediction = modelfit.predict(testX)
F1score = f1_score(testY,prediction)
probabilities = modelfit.predict_proba(testX)
AUPRC = average_precision_score(testY, probabilities)
tn, fp, fn, tp = confusion_matrix(testY,prediction).ravel()

sigopt.log_metric('AUPRC', average_precision_score(testY, probabilities))
sigopt.log_metric('F1score', F1score)
sigopt.log_metric('False Positive', fp)
sigopt.log_metric('False Negative', fn)
sigopt.log_metric('True Positive', tp)
sigopt.log_metric('True Negative', tn)
sigopt.log_metric('Mean $ Error Fraudulent', error_fraud(prediction, testY, testX['amount']))
sigopt.log_metric('Mean $ Error Valid', error_valid(prediction, testY, testX['amount']))

The [Optimization Loop](https://app.sigopt.com/docs/overview/optimization) is the backbone of using SigOpt.  In the code cell above, you run through the following three simple steps, in a loop
- Receive a set of parameters suggestion from SigOpt
- Evaluate your model objective metric
- Report your model objective metric to SigOpt

<img src="https://static.sigopt.com/b/7db309215269c8e1d7f88041f283b36b0e0f3884/static/img/optimization-loop.svg" alt="Optimization Loop"  width="250"/>

🚦 **TO DO - Click the Run hyperlink above. You will be redirected to the the corresponding Run page. Now let's explore the UI and look at how an Experiment comprised of multiple runs differ from a single run like the baseline training run we created earlier.**



## Multimetric Experimentation

Most problems of practical relevance involve two or more competing objectives.  Even in a case like a fraud detection classifier where a clear measure of success is effectively blocking fraudulent transactions while allowing legitimate ones, the classifier inference time should be part of your model evaluation.  So far we've graded the accuracy of the model without worrying about inference time, or how fast we can predict whether a transaction is faudulent or not.  In real life, a model that returns a 99.99% accuracy is useless if it has to churn data for 20 seconds to provide an answer.  SigOpt support [multimetric problems](https://app.sigopt.com/docs/overview/multimetric) where 2 competing metrics can be optimized at the same time, and an additional 50 metrics can be stored.

🚦 **TO DO - Run the next three code cells. The first cell is an update to our experiment definition to include a second objective metric. The scond cell run the optimization loop**

In [None]:
%%experiment
'''
name: Fraud_Analysis_Multimetric
parameters:
  - name: min_child_weight
    bounds:
      min: 1
      max: 15
    type: int
  - name: max_depth
    bounds:
      min: 2
      max: 15
    type: int
  - name: n_estimators
    bounds:
      min: 20
      max: 400
    type: int
  - name: learning_rate
    bounds:
      min: 0.001
      max: 1
    type: double
    transformation: log
metrics:
  - name: AUPRC
    objective: maximize
  - name: Inference Time
    objective: minimize    
observation_budget: 40
'''

In [None]:
%%optimize
sigopt.log_model('XGboost for Fraud Analysis using multimetric')
sigopt.log_dataset(df_path)

for key, value in zip(model.get_params().keys(), model.get_params().values()):
    sigopt.get_parameter(name=key, default=value)

min_child_weight = sigopt.get_parameter('min_child_weight')
max_depth = sigopt.get_parameter('max_depth')
n_estimators = sigopt.get_parameter('n_estimators')
learning_rate = sigopt.get_parameter('learning_rate')


model = XGBClassifier(min_child_weight=min_child_weight,
                      max_depth=max_depth,
                      n_estimators=n_estimators,
                      learning_rate=learning_rate,
                      random_state = random_state)

modelfit = model.fit(trainX,trainY)
start = time.time()
prediction = modelfit.predict(testX)
inferenceTime = time.time() - start
F1score = f1_score(testY,prediction)
probabilities = modelfit.predict_proba(testX)
AUPRC = average_precision_score(testY, probabilities[:, 1])
tn, fp, fn, tp = confusion_matrix(testY,prediction).ravel()

sigopt.log_metric('AUPRC', average_precision_score(testY, probabilities[:, 1]))
sigopt.log_metric('Inference Time', inferenceTime)
sigopt.log_metric('F1score', F1score)
sigopt.log_metric('False Positive', fp)
sigopt.log_metric('False Negative', fn)
sigopt.log_metric('True Positive', tp)
sigopt.log_metric('True Negative', tn)
sigopt.log_metric('Mean $ Error Fraudulent', error_fraud(prediction, testY, testX['amount']))
sigopt.log_metric('Mean $ Error Valid', error_valid(prediction, testY, testX['amount']))

🚦 **TO DO - Click the link above to open the analysis page. We'll first go over a little bit of theory around multimetric optimization, and the shape of a multimetric solution. Then dive into the analysis once everyone collect enough data to start seeing a pareto frontier**

___

# Learn more at your own time
## EfficientBERT

🚦 **TO DO [At your own time] - Learn about optimization of efficient BERT by going through the [dashboard](https://app.sigopt.com/guest?guest_token=RDSLBTPPKENPFFAAWDXWGKRUGAJZOPTZZYCRVKCFOZWZZTYL) and watch this [presentation](https://www.youtube.com/watch?v=ZnBYV2h_6RA) by Machine Learning Engineer at SigOpt Meghana Ravikumar**

> With the publication of BERT, transfer learning was suddenly accessible for NLP, unlocking a plethora of model zoos and boosting performances for domain specific problems.  Although BERT has accelerated many modeling efforts, its size is limiting. In this talk, we will explore how to reduce the size of BERT while retaining its capacity in the context of English Question Answering tasks. We'll show how scalable hyperparameter optimization can help you tackle difficult modeling problems, draw insights, and make informed decisions.

> Our approach encompasses fine-tuning, distillation, architecture search, and hyperparameter optimization at scale. First, we fine-tune BERT on SQUAD 2.0 (our teacher model) and use distillation to compress fine-tuned BERT to a smaller model (our student model). Then, combining SigOpt and Ray, we use multimetric Bayesian optimization at scale to find the optimal architecture for the student model. Finally, we explore the trade-offs of our hyperparameter decisions to draw insights for our student model's architecture. 

___

## Advanced features

🚦 **TO DO [At your own time] - [At your own time] Learn about SigOpt's [advanced features](https://app.sigopt.com/docs/overview/multimetric) for optimization**

> The SigOp optimization engine goes beyond traditional hyperparameter tuning packages and methods with numerous advanced features that empower modelers to accelerate model development and solve new optimization problems.

___


## Documentation

🚦 **TO DO [At your own time] - [At your own time] Learn about SigOpt by looking through the [documentation](https://app.sigopt.com/docs)**

___