# Initial Design

This notebook contains the code used to select the reaction conditions tested for initial training of the GP. A modified latin hypercube sampling (LHS) strategy is used to select both discrete and continuous variables.  We previously demonstrated that several different designs work well for solvent selection, so LHS was chosen since it is already implemented by GPyOpt.  

## 1. Setup

Let's get everything loaded and ready to go.

In [17]:
#Autoreload automatically reloads any depdencies as you change them
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [59]:
#Import all the necessary packages
from summit.data import solvent_ds
from summit.domain import Domain, ContinuousVariable, DiscreteVariable, DescriptorsVariable
from summit.experiment_design import LatinDesign
from summit.utils.dataframe import normalize_df, zero_one_scale_df
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm

In [62]:
#Specify the optimization space

domain = Domain()

domain += ContinuousVariable(name='temperature',
                             description = "reaction temperature",
                             bounds=[20, 50])

domain += ContinuousVariable(name="acid_conc",
                             description = "propionic acid concentration",
                             bounds=[1,40])

domain += ContinuousVariable(name="cat_load",
                             description = "catalyst loading",
                             bounds=[0.1, 10])

domain += ContinuousVariable(name="co_cat_load",
                             description = "co-catalyst loading",
                             bounds=[15, 1500])

domain += ContinuousVariable(name="acrylate_amine_ratio",
                             description = "molar ratio of acrylate to amine",
                             bounds = [0.8, 2])


domain += ContinuousVariable(name="aldehyde_amine_ratio",
                             description = "molar ratio of aldehyde to amine",
                             bounds=[0.8, 2])

domain += DiscreteVariable(name="co_cat",
                           description="enumeration of the two potential cocatalysts",
                           levels = ['co_cat_1', 'co_cat_2'])


domain += DescriptorsVariable(name="solvent",
                             description="18 descriptors of the solvent",
                             df=solvent_ds)

domain #The domain should display as an html table 

0,1,2,3
Name,Type,Description,Values
temperature,continuous,reaction temperature,"[20,50]"
acid_conc,continuous,propionic acid concentration,"[1,40]"
cat_load,continuous,catalyst loading,"[0.1,10]"
co_cat_load,continuous,co-catalyst loading,"[15,1500]"
acrylate_amine_ratio,continuous,molar ratio of acrylate to amine,"[0.8,2]"
aldehyde_amine_ratio,continuous,molar ratio of aldehyde to amine,"[0.8,2]"
co_cat,discrete,enumeration of the two potential cocatalysts,2 levels
solvent,descriptors,18 descriptors of the solvent,459 examples of 17 descriptors


## 2. Construct Initial Design

In [66]:
lhs = LatinDesign(domain)
experiments = lhs.generate_experiments(10)
experiments

Unnamed: 0,temperature,acid_conc,cat_load,co_cat_load,acrylate_amine_ratio,aldehyde_amine_ratio,co_cat,stenutz_name,cosmo_name,cas_number,chemical_formula
0,45.5,34.15,4.555,1425.75,1.22,1.82,co_cat_1,1-dodecanol,dodecanol,112-53-8,C12H26O
1,36.5,38.05,1.585,89.25,0.86,1.1,co_cat_2,2-methyl-2-phenylpropane,"(1,1-dimethylethyl)benzene",98-06-6,C10H14
2,48.5,2.95,6.535,534.75,1.94,1.94,co_cat_2,2-methyl-2-phenylpropane,"(1,1-dimethylethyl)benzene",98-06-6,C10H14
3,39.5,6.85,9.505,1277.25,1.82,0.86,co_cat_2,trifluoroacetic acid,trifluoroaceticacid,76-05-1,C2HF3O2
4,27.5,30.25,7.525,237.75,1.58,1.22,co_cat_2,"1,3-dioxolan-2-one","1,3-dioxolan-2-one",96-49-1,C3H4O3
5,30.5,10.75,2.575,831.75,1.46,1.58,co_cat_1,3-bromoaniline,3-bromoaniline,591-19-5,C6H6BrN
6,21.5,22.45,5.545,980.25,0.98,1.34,co_cat_2,tetranitromethane,tetranitro-methane,509-14-8,CN4O8
7,42.5,14.65,3.565,1128.75,1.1,1.7,co_cat_1,4-methoxyaniline,p-anisidine,104-94-9,C7H9NO
8,33.5,18.55,0.595,683.25,1.7,0.98,co_cat_2,2-methyl-2-phenylpropane,"(1,1-dimethylethyl)benzene",98-06-6,C10H14
9,24.5,26.35,8.515,386.25,1.34,1.46,co_cat_2,sulfolane,"tetrahydrothiophene-1,1-dioxide",126-33-0,C4H8O2S


In [68]:
expr_df = experiments.to_frame()
expr_df.to_csv('initial_experiments.csv')