# Benchmarking

## 0. Setup the logging

This step sets up logging in our environment to increase our visibility over
the steps that Draco performs.

In [1]:
import logging;

logging.basicConfig(level=logging.INFO)
logging.getLogger().setLevel(level=logging.ERROR)
logging.getLogger('draco').setLevel(level=logging.INFO)

import warnings
warnings.simplefilter("ignore")


## Running the Benchmarking

The user API for the Draco Benchmarking is the `draco.benchmark.evaluate_templates` function.

The `evaluate_templates` function accepts the following arguments:
* `templates (list)`: List of templates to try.
* `window_size_rule (list)`: List of tupples (int, str or Timedelta object).
* `metric (function or str)`: Metric to use. If an ``str`` is give it must be one of the metrics defined in the `draco.metrics.METRICS` dictionary.
* `tuning_iterations (int)`: Number of iterations to be used.
* `init_params (dict)`: Initialization parameters for the pipelines.
* `target_times (DataFrame)`: Contains the specefication problem that we are solving, which has three columns:
    * `turbine_id`: Unique identifier of the turbine which this label corresponds to.
    * `cutoff_time`: Time associated with this target.
    * `target`: The value that we want to predict. This can either be a numerical value
        or a categorical label. This column can also be skipped when preparing
        data that will be used only to make predictions and not to fit any
        pipeline.
* `readings (DataFrame)`: Contains the signal data from different sensors, with the following columns:
    * `turbine_id`: Unique identifier of the turbine which this reading comes from.
    * `signal_id`: Unique identifier of the signal which this reading comes from.
    * `timestamp (datetime)`: Time where the reading took place, as a datetime.
    * `value (float)`: Numeric value of this reading.
* `preprocessing (int, list or dict)`: Number of preprocessing steps to be used.
* `cost (bool)`: Wheter the metric is a cost function (the lower the better) or not.
* `test_size (float)`: Percentage of the data set to be used for the test.
* `cv_splits (int)`: Amount of splits to create.
* `random_state (int)`: Random number of train_test split.
* `output_path (str)`: Path where to save the benchmark report.
* `cache_path (str)`: If given, cache the generated cross validation splits in this folder. Defatuls to ``None``.

In [2]:
templates = [
    'lstm_prob_with_unstack',
    'double_lstm_prob_with_unstack'
]
window_size_rule = [('1d', '1h'), ('2d', '2h')]
init_params = {
    'lstm_prob_with_unstack': {
        'keras.Sequential.LSTMTimeSeriesClassifier#1': {
            'epochs': 1,
        }
    },
    'double_lstm_prob_with_unstack': {
        'keras.Sequential.DoubleLSTMTimeSeriesClassifier#1': {
            'epochs': 1,
        }
    }
}


In [3]:
from draco.benchmark import evaluate_templates

results = evaluate_templates(
    templates=templates,
    window_size_rule=window_size_rule,
    init_params=init_params,
    tuning_iterations=3,
    cv_splits=3,
)

INFO:draco.benchmark:Evaluating template lstm_prob_with_unstack on problem None (1d, 1h)
2023-02-27 13:30:50.986746: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2023-02-27 13:30:51.005488: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fdfbf1be480 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2023-02-27 13:30:51.005504: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
INFO:draco.pipeline:New configuration found:
  Template: lstm_prob_with_unstack 
    Hyperparameters: 
      ('sklearn.impute.SimpleImputer#1', 'strategy'): mean
      ('keras.Sequential.LSTMTimeSeriesClassifier#1', 'lstm_1_units'): 80
      ('keras.Sequential.LSTMTimeSeriesClassifier#1', 'dropout_1_rate'): 0.3
      ('keras.Sequential.LSTMTimeSeriesClassifier#1', 'dense_1_units'): 80
INFO:draco.pipeline:New con

In [4]:
results

Unnamed: 0,problem_name,window_size,resample_rule,template,default_test,default_cv,tuned_cv,tuned_test,tuning_metric,tuning_metric_kwargs,fit_predict_time,default_cv_time,average_cv_time,total_time,status,accuracy_threshold/0.5,f1_threshold/0.5,fpr_threshold/0.5,tpr_threshold/0.5
0,,1d,1h,lstm_prob_with_unstack,0.216422,0.399042,0.513071,0.519231,roc_auc_score,{'threshold': 0.5},0 days 00:00:12.094481,0 days 00:00:40.481418,0 days 00:00:26.153966,0 days 00:02:23.894670,OK,0.719101,0.074074,1.0,0.0
1,,2d,2h,lstm_prob_with_unstack,0.354701,0.529236,0.590157,0.571429,roc_auc_score,{'threshold': 0.5},0 days 00:00:10.151120,0 days 00:00:38.269350,0 days 00:00:29.627763,0 days 00:02:28.485501,OK,0.393258,0.490566,1.0,0.0
2,,1d,1h,double_lstm_prob_with_unstack,0.227717,0.361814,0.517905,0.637363,roc_auc_score,{'threshold': 0.5},0 days 00:00:12.056312,0 days 00:00:32.238018,0 days 00:00:25.599763,0 days 00:02:13.255656,OK,0.550562,0.52381,1.0,0.0
3,,2d,2h,double_lstm_prob_with_unstack,0.363858,0.596251,0.596251,0.720085,roc_auc_score,{'threshold': 0.5},0 days 00:00:12.781163,0 days 00:00:34.025537,0 days 00:00:34.601621,0 days 00:02:42.523351,OK,0.651685,0.597403,1.0,0.0
