# Initial Design

This notebook contains the code used to select the reaction conditions tested for initial training of the GP. A modified latin hypercube sampling (LHS) strategy is used to select both discrete and continuous variables.  We previously demonstrated that several different designs work well for solvent selection, so LHS was chosen since it is already implemented by GPyOpt.  

## 1. Setup

Let's get everything loaded and ready to go.

In [2]:
#Autoreload automatically reloads any depdencies as you change them
%load_ext autoreload
%autoreload 2

In [21]:
#Import all the necessary packages
import summit
# from summit.intial_design import ModifiedLatinDesign
from summit.domain import Domain, ContinuousVariable, DiscreteVariable
import GPyOpt
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm

In [None]:
#Specify the optimization space

domain = OptimizationDomain()

domain += ContinuousVariable(name='temperature',
                             description = "reaction temperature",
                             bounds=[20, 50], 
                             units=units.degC)

domain += ContinuousVariable(name="acid_conc",
                             description = "propionic acid concentration",
                             bounds=[1,40],
                             units=units.millimolar)

domain += ContinuousVariable(name="cat_load",
                             description = "catalyst loading",
                             bounds=[0.1, 10],
                             units=units.millimolar)

domain += ContinuousVariable(name="co_cat_load",
                             description = "co-catalyst loading",
                             bounds=[15, 1500],
                             units=units.millimolar)

domain += ContinuousVariable(name="acrylate_amine_ratio",
                             description = "molar ratio of acrylate to amine",
                             bounds = [0.8, 2])


domain += ContinuousVariable(name="aldehyde_amine_ratio",
                             description = "molar ratio of aldehyde to amine",
                             bounds=[0.8, 2])

domain += DiscreteVariable(name="co_cat",
                           description="Enumeration of the two potential cocatalysts",
                           values = ['co_cat_1', 'co_cat_2'])


domain += SolventDescriptorSet(select_subset=summit.data.UCB_CPRD_GUIDE)

domain #The domain should display as a pandas dataframe

In [22]:
variables = [{'name': 'var_1', 'type': 'continuous', 'domain':(-1,1), 'dimensionality': 1},
             {'name': 'var_2', 'type': 'continuous', 'domain':(-1,1), 'dimensionality': 1},
             {'name': 'stations', 'type': 'bandit', 'domain':np.array([[1,2], [2,3]]) }]
domain = GPyOpt.Design_space(variables)

InvalidConfigError: Invalid mixed domain configuration. Bandit variables cannot be mixed with other types.

## 2. Construct Initial Design

In [None]:
lhs = ModifiedLatinDesign(domain)
exps = lhs.generate_experiments(n_exper=10, criterion='maximin')
exps

In [None]:
lhs.pca_plot(exps, n_components=2)

In [None]:
lhs.design_coverage()