# Benchmarking

## 0. Setup the logging

This step sets up logging in our environment to increase our visibility over
the steps that Draco performs.

In [1]:
import logging;

logging.basicConfig(level=logging.INFO)
logging.getLogger().setLevel(level=logging.ERROR)
logging.getLogger('draco').setLevel(level=logging.INFO)

import warnings
warnings.simplefilter("ignore")


## Running the Benchmarking

The user API for the Draco Benchmarking is the `draco.benchmark.evaluate_templates` function.

The `evaluate_templates` function accepts the following arguments:
* `templates (list)`: List of templates to try.
* `window_size_rule (list)`: List of tupples (int, str or Timedelta object).
* `metric (function or str)`: Metric to use. If an ``str`` is give it must be one of the metrics defined in the `draco.metrics.METRICS` dictionary.
* `tuning_iterations (int)`: Number of iterations to be used.
* `init_params (dict)`: Initialization parameters for the pipelines.
* `target_times (DataFrame)`: Contains the specefication problem that we are solving, which has three columns:
    * `turbine_id`: Unique identifier of the turbine which this label corresponds to.
    * `cutoff_time`: Time associated with this target.
    * `target`: The value that we want to predict. This can either be a numerical value
        or a categorical label. This column can also be skipped when preparing
        data that will be used only to make predictions and not to fit any
        pipeline.
* `readings (DataFrame)`: Contains the signal data from different sensors, with the following columns:
    * `turbine_id`: Unique identifier of the turbine which this reading comes from.
    * `signal_id`: Unique identifier of the signal which this reading comes from.
    * `timestamp (datetime)`: Time where the reading took place, as a datetime.
    * `value (float)`: Numeric value of this reading.
* `preprocessing (int, list or dict)`: Number of preprocessing steps to be used.
* `cost (bool)`: Wheter the metric is a cost function (the lower the better) or not.
* `test_size (float)`: Percentage of the data set to be used for the test.
* `cv_splits (int)`: Amount of splits to create.
* `random_state (int)`: Random number of train_test split.
* `output_path (str)`: Path where to save the benchmark report.
* `cache_path (str)`: If given, cache the generated cross validation splits in this folder. Defatuls to ``None``.

In [2]:
templates = [
    'probability.unstack_lstm_timeseries_classifier',
    'probability.normalize_dfs_xgb_classifier'
]
window_size_rule = [('1d', '1h'), ('2d', '2h')]
init_params = {
    'unstack_lstm_timeseries_classifier': {
        'keras.Sequential.LSTMTimeSeriesClassifier#1': {
            'epochs': 1,
        }
    }
}


In [3]:
from draco.benchmark import evaluate_templates

results = evaluate_templates(
    templates=templates,
    window_size_rule=window_size_rule,
    init_params=init_params,
    tuning_iterations=3,
    cv_splits=3,
)

INFO:draco.benchmark:Evaluating template probability.unstack_lstm_timeseries_classifier on problem None (1d, 1h)
INFO:draco.pipeline:New configuration found:
  Template: probability.unstack_lstm_timeseries_classifier 
    Hyperparameters: 
      ('sklearn.impute.SimpleImputer#1', 'strategy'): mean
      ('keras.Sequential.LSTMTimeSeriesClassifier#1', 'lstm_1_units'): 80
      ('keras.Sequential.LSTMTimeSeriesClassifier#1', 'dropout_1_rate'): 0.3
      ('keras.Sequential.LSTMTimeSeriesClassifier#1', 'dense_1_units'): 80
INFO:draco.pipeline:New configuration found:
  Template: probability.unstack_lstm_timeseries_classifier 
    Hyperparameters: 
      ('sklearn.impute.SimpleImputer#1', 'strategy'): constant
      ('keras.Sequential.LSTMTimeSeriesClassifier#1', 'lstm_1_units'): 287
      ('keras.Sequential.LSTMTimeSeriesClassifier#1', 'dropout_1_rate'): 0.565737233372491
      ('keras.Sequential.LSTMTimeSeriesClassifier#1', 'dense_1_units'): 145
INFO:draco.pipeline:New configuration found

In [4]:
results

Unnamed: 0,problem_name,window_size,resample_rule,template,default_test,default_cv,tuned_cv,tuned_test,tuning_metric,tuning_metric_kwargs,fit_predict_time,default_cv_time,average_cv_time,total_time,status,accuracy_threshold/0.5,f1_threshold/0.5,fpr_threshold/0.5,tpr_threshold/0.5
0,,1d,1h,probability.unstack_lstm_timeseries_classifier,0.350122,0.538316,0.618558,0.463675,roc_auc_score,{'threshold': 0.5},0 days 00:00:04.250012,0 days 00:00:14.374875,0 days 00:00:15.360015,0 days 00:01:10.806375,OK,0.640449,0.058824,1.0,0.0
1,,2d,2h,probability.unstack_lstm_timeseries_classifier,0.686203,0.491949,0.556803,0.510989,roc_auc_score,{'threshold': 0.5},0 days 00:00:04.410682,0 days 00:00:14.411205,0 days 00:00:10.633619,0 days 00:00:55.011304,OK,0.595506,0.307692,1.0,0.0
2,,1d,1h,probability.normalize_dfs_xgb_classifier,0.697802,0.669508,0.701792,0.766789,roc_auc_score,{'threshold': 0.5},0 days 00:01:11.416859,0 days 00:02:55.012078,0 days 00:00:00.806430,0 days 00:05:20.653100,OK,0.797753,0.666667,1.0,0.0
3,,2d,2h,probability.normalize_dfs_xgb_classifier,0.720391,0.718617,0.740664,0.782662,roc_auc_score,{'threshold': 0.5},0 days 00:01:03.612676,0 days 00:02:26.925796,0 days 00:00:00.755424,0 days 00:04:37.570182,OK,0.820225,0.692308,1.0,0.0
