# DETAILED CANDIDATE GENERATION

In this notebook we will perform the same thing as basic_candidate_generation.ipynb but using the specific methods to follow the workflow (instead of calling run())

In [1]:
from bayesopt_core import *
import sys, os
sys.path.insert(0, os.path.abspath(os.path.join(os.path.join(os.getcwd()), '..')))
from etl.extractors.provenance_extractor import ProvenanceExtractor
from bayesopt_core.helpers.visualization import visualize_data


### 1 - DATA DEFINITION AND OPTIMIZATION INITIALIZATION

In [2]:
data_needed = {
    'input': ['DROPOUT', 'BATCH_SIZE', 'EPOCHS', 'LR'],
    'output': ['accuracy', 'emissions']
}
extractor = ProvenanceExtractor('../test/prov', data_needed)
inp, out = extractor.extract_all()      # cols are parameters/metrics, rows are runs

bayesopt = BayesianOptimizer(OptimizationConfig(
    data_needed['output'],
    data_needed['input'],
    ['MAX', 'MIN'],
    n_candidates=3,
    n_restarts=10,
    raw_samples=200,
    optimizers='optimize_acqf',
    acqf='ucb',
    beta=1.5,
    verbose=True
))

data = {
    'parameters': inp,
    'metrics': out
}

### 2 - DATA TRANSFORMATIONS
This block will call prepare_data() that:
- will use minimization transformation to change sign of the metrics to minimize
- creates bounds with BoundsGenerator and use them with input data to normalize parameters

This method will save as bayesopt attributes: X_data (Tensor), Y_data (Tensor), original_bounds (Tensor), X_normalized (Tensor)

In [3]:
bayesopt.prepare_data(data)

   -> Data transformed
   -> Bounds generated
   -> Data normalized


### 3 - MODEL CREATION AND TRAINING
This block will call model_training() method that, after some validation, perform the train_model() function and returns SingleTaskGP or ModelListGP based on the number of objective metrics. (the model will be saved as bayesopt attribute)

In [4]:
bayesopt.model_training()

   -> Model trained


### 4 - OPTIMIZATION
This block will call the optimize() method that will use the get_candidates() function to obtain candidates and acquisition values. This function will perform optimization through:
- acquisition function generation with generate_acqf() function
- optimize with the optimizer selected in configuration (each one has a function that will return candidates and acq_values)
Finally the optimize() method will denormalize candidates with denormalize_val() function and return candidates both normalized and denormalized and their acquisition values

In [5]:
norm_candidates, denorm_candidates, acq_value = bayesopt.optimize()
visualize_data(denorm_candidates, bayesopt.config.optimization_parameters)

   -> Candidates obtained
   -> Candidates denormalized
┌───────────┬──────────────┬───────────┬──────────┐
│   DROPOUT │   BATCH_SIZE │    EPOCHS │       LR │
├───────────┼──────────────┼───────────┼──────────┤
│  0.508000 │    15.040000 │ 15.200000 │ 0.000175 │
├───────────┼──────────────┼───────────┼──────────┤
│  0.092000 │    15.335344 │  4.800000 │ 0.000228 │
├───────────┼──────────────┼───────────┼──────────┤
│  0.464893 │    64.960000 │ 15.200000 │ 0.000903 │
└───────────┴──────────────┴───────────┴──────────┘


### 5 - CANDIDATES ESTIMATION [Optional]
If needed, there is a method that allows to get estimations of each candidate to compare the better ones and decide which one will be executed. 
This is done by generating the posterior (saved in bayesopt) of each configuration and find mean and variance that will tell us the performance expected

In [6]:
bayesopt.estimate(norm_candidates)
bayesopt.print_estimations(bayesopt.posterior.mean, bayesopt.posterior.variance.sqrt())

CANDIDATE 1
┌───────────┬──────────┬──────────┐
│ METRIC    │     MEAN │      STD │
├───────────┼──────────┼──────────┤
│ accuracy  │ 0.701184 │ 0.059575 │
├───────────┼──────────┼──────────┤
│ emissions │ 0.010417 │ 0.000368 │
└───────────┴──────────┴──────────┘ 

CANDIDATE 2
┌───────────┬──────────┬──────────┐
│ METRIC    │     MEAN │      STD │
├───────────┼──────────┼──────────┤
│ accuracy  │ 0.637644 │ 0.081429 │
├───────────┼──────────┼──────────┤
│ emissions │ 0.002609 │ 0.000687 │
└───────────┴──────────┴──────────┘ 

CANDIDATE 3
┌───────────┬──────────┬──────────┐
│ METRIC    │     MEAN │      STD │
├───────────┼──────────┼──────────┤
│ accuracy  │ 0.651324 │ 0.074734 │
├───────────┼──────────┼──────────┤
│ emissions │ 0.005485 │ 0.000890 │
└───────────┴──────────┴──────────┘ 

