In [None]:
import torch
import pandas as pd
import pytorch_lightning as pl

from minerva.utils import get_index_from_lookup, draw_sobol_samples


## Notebook 4: Quasi-Random Sobol initialisation on a generated chemical space 

This tutorial shows how to initialise a Bayesian optimisation campaign from a pre-created search space by drawing quasi-random Sobol samples, and outputting the resulting experiments as a dataframe for execution. `run_one_iteration.ipynb` shows how to run an iteration of a BO campaign given obtained experimental results from this initialisation. The example in this notebook was used to generate initial experiments for the experimental HTE campaign described in the manuscript accompanying this code repository.

In [None]:
# set random seed 
seed=49
pl.seed_everything(seed)

tkwargs = {
        "dtype": torch.double,
        "device": torch.device("cuda" if torch.cuda.is_available() else "cpu"),
    }

## 1. Loading chemical space

- We use a pre-generated chemical space from our experimental Nickel Suzuki campaign described in the manuscript accompanying this code repository
- This chemical space contains all the combinations of experimental conditions that we considered for our optimisation campaign, totalling `88,000` possible reaction conditions
- This chemical space displays the reaction conditions in text, and also descriptor format
- The notebook generate_chemical_space.ipynb shows example generation of a chemical space in this format
- Note that the inputs in this chemical space are already **pre-normalised** to `[0,1]`

In [None]:
# total possible reaction condition space including reaction descriptors 
total_chem_space = pd.read_csv('../experimental_campaigns/design_spaces/ni_suzuki_chemical_space.csv', index_col=0)
total_chem_space

Now, we have the oppurtunity to drop any reaction conditions we don't want to start our initialisation with. Following the manuscript, we intended to initalise our chemical reaction space with only temperatures of `70` and `100` degrees.

In [None]:
# filtered chemical reaction condition space with only those having temperatures of 70 and 100
filtered_chem_space = total_chem_space[(total_chem_space['temperature'] ==70) | (total_chem_space['temperature'] ==100)]
filtered_chem_space

Get numerical descriptors/feature columns from dataframe for training data 

In [None]:

descriptor_index = 7 # our reaction condition representation in contained from column 7, 'ligand PC1', onwards to the end of the dataframe 
# this index will vary depending on your dataframe.

descriptor_columns = total_chem_space.columns[descriptor_index:]
descriptor_columns

Now we convert the descriptors into tensors for input into training models

In [None]:
# get the descriptor from column 7 onwards (from ligand PC1 onwards)
total_chem_descriptors = total_chem_space[descriptor_columns]
filtered_chem_descriptors = filtered_chem_space[descriptor_columns]

# conver the descriptors into torch tensors for model input 
x_space = torch.tensor(total_chem_descriptors.to_numpy()).to(**tkwargs) # total reaction condition space 
init_x_space = torch.tensor(filtered_chem_descriptors.to_numpy()).to(**tkwargs) # restricted space with only certain initial temperatures

## 2. Quasi-Random Sobol initialisation in the absence of initial data 
Now, we draw quasi-random Sobol samples from the restricted/filtered chemical space to start our campaign. We get the indexes of these drawn experiments from our original search space to obtain our initial experiments. `run_bo_iteration.ipynb` shows how to run BO on the results of these experiments.

In [None]:
init_x = draw_sobol_samples(n_samples=96, feature_matrix=init_x_space) # draw sobol samples from the restricted space 
init_x_index = get_index_from_lookup(init_x, x_space) # looks up index of drawn experimen†s from total chemical condition space.
init_x_experiments = total_chem_space.iloc[init_x_index].sort_values(by='rxn_id')
init_x_experiments.to_csv('../experimental_campaigns/experiments/tutorial_data/initial_experiments.csv')
init_x_experiments