## Nanomole-scale high-throughput chemistry for the synthesis of complex molecules

DOI: 10.1126/science.1259203

Alexander Buitrago Santanilla, Erik L. Regalado, Tony Pereira, Michael Shevlin, Kevin Bateman, Louis-Charles Campeau, Jonathan Schneeweis, Simon Berritt, Zhi-Cai Shi, Philippe Nantermet, Yong Liu, Roy Helmy, Christopher J. Welch, Petr Vachal, Ian W. Davies, Tim Cernak, Spencer D. Dreher *Science* **2015**, *347*, 6217, 49-53. https://www.sciencemag.org/lookup/doi/10.1126/science.1259203

Submission test by Michael R. Maser, Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA. mmaser@caltech.edu

Import schema and helper functions

In [1]:
try:
    import ord_schema
    import rdkit
except:
    import sys
    !wget -c https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
    !time bash ./Miniconda3-latest-Linux-x86_64.sh -b -f -p /usr/local
    !time conda install -q -y -c rdkit rdkit
    !time conda install -q -y -c anaconda protobuf
    !git clone https://github.com/Open-Reaction-Database/ord-schema.git
    %cd ord-schema
    !python setup.py install
    sys.path.append('/usr/local/lib/python3.7/site-packages/')

In [2]:
import ord_schema
from datetime import datetime
from ord_schema.proto import dataset_pb2
from ord_schema.proto import reaction_pb2
from ord_schema.units import UnitResolver
from ord_schema import validations
from ord_schema import message_helpers

unit_resolver = UnitResolver()

In [3]:
from tqdm import tqdm

In [4]:
import pandas as pd
from rdkit import Chem

## Load the dataset
The original dataset published by the authors is found in ./1259203_Datafiles.xlsx, and the "Data S2- Experiment 2" page was converted to .csv in ./experiment_2.csv

In [5]:
# Create pandas dataframe
data = pd.read_csv('experiment_2.csv')

# View dataframe
data

Unnamed: 0,Plate Position,Electrophile,Electrophile charge,Nucleophile,Nucleophile charge,Catalyst,Catalyst charge,Base,Base charge,IS,ArBr,Prod,Nu,Pd/IS,Unnamed: 14
0,A1,bromide 22,250 nL (50 nmol),amine S1,250 nL (100 nmol),BINAP Pd G3 30,250 nL (10 nmol),DBU 24,250 nL (200 nmol),240134.0,1206548.0,0.0,2638835.0,0.00,
1,A2,bromide 22,250 nL (50 nmol),amine S1,250 nL (100 nmol),BINAP Pd G3 30,250 nL (10 nmol),MTBD 25,250 nL (200 nmol),238726.0,1130276.0,0.0,2474074.0,0.00,
2,A3,bromide 22,250 nL (50 nmol),amine S1,250 nL (100 nmol),BINAP Pd G3 30,250 nL (10 nmol),BTMG 26,250 nL (200 nmol),235018.0,1099909.0,0.0,2387052.0,0.00,
3,A4,bromide 22,250 nL (50 nmol),amine S1,250 nL (100 nmol),BINAP Pd G3 30,250 nL (10 nmol),BEMP 27,250 nL (200 nmol),238060.0,1111019.0,32684.0,2511884.0,0.14,
4,A5,bromide 22,250 nL (50 nmol),amine S1,250 nL (100 nmol),BINAP Pd G3 30,250 nL (10 nmol),BTTP 28,250 nL (200 nmol),232567.0,1116529.0,0.0,2576513.0,0.00,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1533,AF46,bromide 22,250 nL (50 nmol),alkyne S16,250 nL (100 nmol),AdBrettPhos Pd G3 45,250 nL (10 nmol),BEMP 27,250 nL (200 nmol),202191.0,815521.0,453729.0,,2.24,
1534,AF47,bromide 22,250 nL (50 nmol),alkyne S16,250 nL (100 nmol),AdBrettPhos Pd G3 45,250 nL (10 nmol),BTTP 28,250 nL (200 nmol),201368.0,622883.0,1109048.0,,5.51,
1535,AF48,bromide 22,250 nL (50 nmol),alkyne S16,250 nL (100 nmol),AdBrettPhos Pd G3 45,250 nL (10 nmol),P2Et 29,250 nL (200 nmol),197245.0,515647.0,468247.0,,2.37,
1536,,,,,,,,,,,,,,,


In [6]:
# Remove unnecessary rows and columns
data = data.drop(['Unnamed: 14'], axis=1)
data.drop(data.tail(2).index, inplace=True)

# View new dataframe
data

Unnamed: 0,Plate Position,Electrophile,Electrophile charge,Nucleophile,Nucleophile charge,Catalyst,Catalyst charge,Base,Base charge,IS,ArBr,Prod,Nu,Pd/IS
0,A1,bromide 22,250 nL (50 nmol),amine S1,250 nL (100 nmol),BINAP Pd G3 30,250 nL (10 nmol),DBU 24,250 nL (200 nmol),240134.0,1206548.0,0.0,2638835.0,0.00
1,A2,bromide 22,250 nL (50 nmol),amine S1,250 nL (100 nmol),BINAP Pd G3 30,250 nL (10 nmol),MTBD 25,250 nL (200 nmol),238726.0,1130276.0,0.0,2474074.0,0.00
2,A3,bromide 22,250 nL (50 nmol),amine S1,250 nL (100 nmol),BINAP Pd G3 30,250 nL (10 nmol),BTMG 26,250 nL (200 nmol),235018.0,1099909.0,0.0,2387052.0,0.00
3,A4,bromide 22,250 nL (50 nmol),amine S1,250 nL (100 nmol),BINAP Pd G3 30,250 nL (10 nmol),BEMP 27,250 nL (200 nmol),238060.0,1111019.0,32684.0,2511884.0,0.14
4,A5,bromide 22,250 nL (50 nmol),amine S1,250 nL (100 nmol),BINAP Pd G3 30,250 nL (10 nmol),BTTP 28,250 nL (200 nmol),232567.0,1116529.0,0.0,2576513.0,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1531,AF44,bromide 22,250 nL (50 nmol),alkyne S16,250 nL (100 nmol),AdBrettPhos Pd G3 45,250 nL (10 nmol),MTBD 25,250 nL (200 nmol),178234.0,0.0,3046576.0,,17.09
1532,AF45,bromide 22,250 nL (50 nmol),alkyne S16,250 nL (100 nmol),AdBrettPhos Pd G3 45,250 nL (10 nmol),BTMG 26,250 nL (200 nmol),229584.0,254050.0,2789535.0,,12.15
1533,AF46,bromide 22,250 nL (50 nmol),alkyne S16,250 nL (100 nmol),AdBrettPhos Pd G3 45,250 nL (10 nmol),BEMP 27,250 nL (200 nmol),202191.0,815521.0,453729.0,,2.24
1534,AF47,bromide 22,250 nL (50 nmol),alkyne S16,250 nL (100 nmol),AdBrettPhos Pd G3 45,250 nL (10 nmol),BTTP 28,250 nL (200 nmol),201368.0,622883.0,1109048.0,,5.51


## Identify and replace coded reaction components

In [13]:
# Extract unique values for each reaction component
reagent_dict = {}
component_types = ['Electrophile', 'Nucleophile', 'Catalyst', 'Base']

# Create dictionary for each component set with names and SMILES
for component in component_types:
    component_dict = {}
    for i, unique in enumerate(list(data[component].unique())):
        component_dict[f'{component}_{i}'] = {'name': unique, 
                                              'SMILES': 'placeholder'}
    reagent_dict[f'{component}s'] = component_dict    

# Check length of each list 
for reagent, uniques in reagent_dict.items():
    print(f'{reagent} count: {len(uniques)}')

reagent_dict

Electrophiles count: 1
Nucleophiles count: 16
Catalysts count: 16
Bases count: 6


{'Electrophiles': {'Electrophile_0': {'name': 'bromide 22',
   'SMILES': 'placeholder'}},
 'Nucleophiles': {'Nucleophile_0': {'name': 'amine S1',
   'SMILES': 'placeholder'},
  'Nucleophile_1': {'name': 'aniline S2', 'SMILES': 'placeholder'},
  'Nucleophile_2': {'name': 'amide S4', 'SMILES': 'placeholder'},
  'Nucleophile_3': {'name': 'sulfonamide S5', 'SMILES': 'placeholder'},
  'Nucleophile_4': {'name': 'aminopyridine S3', 'SMILES': 'placeholder'},
  'Nucleophile_5': {'name': 'amidine S6', 'SMILES': 'placeholder'},
  'Nucleophile_6': {'name': 'tBu carbamate S7', 'SMILES': 'placeholder'},
  'Nucleophile_7': {'name': 'indazole S8', 'SMILES': 'placeholder'},
  'Nucleophile_8': {'name': 'alcohol S9', 'SMILES': 'placeholder'},
  'Nucleophile_9': {'name': 'phenol S10', 'SMILES': 'placeholder'},
  'Nucleophile_10': {'name': 'thiophenol S11', 'SMILES': 'placeholder'},
  'Nucleophile_11': {'name': 'phosphine S12', 'SMILES': 'placeholder'},
  'Nucleophile_12': {'name': 'boronate S14/water', 'S

In [33]:
# Add product identifier (dependent on nucleophile)
# Create nucleophile to product mapping
product_dict = {}
reagent_dict['Products'] = {}
for i, nucleophile in enumerate(reagent_dict['Nucleophiles'].keys()):
    nucleophile_name = reagent_dict['Nucleophiles'][nucleophile]['name']
    product_name = f'Product_{i}'
    product_dict[nucleophile_name] = product_name
    
    # Add products to reagent_dict
    reagent_dict['Products'][f'Product_{i}'] = {'name': product_name,
                                                'SMILES': 'placeholder'}

# Add product column to dataframe using map
data['Product'] = data['Nucleophile'].map(product_dict)

# View changes
data

Unnamed: 0,Plate Position,Electrophile,Electrophile charge,Nucleophile,Nucleophile charge,Catalyst,Catalyst charge,Base,Base charge,IS,ArBr,Prod,Nu,Pd/IS,Product
0,A1,bromide 22,250 nL (50 nmol),amine S1,250 nL (100 nmol),BINAP Pd G3 30,250 nL (10 nmol),DBU 24,250 nL (200 nmol),240134.0,1206548.0,0.0,2638835.0,0.00,Product_0
1,A2,bromide 22,250 nL (50 nmol),amine S1,250 nL (100 nmol),BINAP Pd G3 30,250 nL (10 nmol),MTBD 25,250 nL (200 nmol),238726.0,1130276.0,0.0,2474074.0,0.00,Product_0
2,A3,bromide 22,250 nL (50 nmol),amine S1,250 nL (100 nmol),BINAP Pd G3 30,250 nL (10 nmol),BTMG 26,250 nL (200 nmol),235018.0,1099909.0,0.0,2387052.0,0.00,Product_0
3,A4,bromide 22,250 nL (50 nmol),amine S1,250 nL (100 nmol),BINAP Pd G3 30,250 nL (10 nmol),BEMP 27,250 nL (200 nmol),238060.0,1111019.0,32684.0,2511884.0,0.14,Product_0
4,A5,bromide 22,250 nL (50 nmol),amine S1,250 nL (100 nmol),BINAP Pd G3 30,250 nL (10 nmol),BTTP 28,250 nL (200 nmol),232567.0,1116529.0,0.0,2576513.0,0.00,Product_0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1531,AF44,bromide 22,250 nL (50 nmol),alkyne S16,250 nL (100 nmol),AdBrettPhos Pd G3 45,250 nL (10 nmol),MTBD 25,250 nL (200 nmol),178234.0,0.0,3046576.0,,17.09,Product_15
1532,AF45,bromide 22,250 nL (50 nmol),alkyne S16,250 nL (100 nmol),AdBrettPhos Pd G3 45,250 nL (10 nmol),BTMG 26,250 nL (200 nmol),229584.0,254050.0,2789535.0,,12.15,Product_15
1533,AF46,bromide 22,250 nL (50 nmol),alkyne S16,250 nL (100 nmol),AdBrettPhos Pd G3 45,250 nL (10 nmol),BEMP 27,250 nL (200 nmol),202191.0,815521.0,453729.0,,2.24,Product_15
1534,AF47,bromide 22,250 nL (50 nmol),alkyne S16,250 nL (100 nmol),AdBrettPhos Pd G3 45,250 nL (10 nmol),BTTP 28,250 nL (200 nmol),201368.0,622883.0,1109048.0,,5.51,Product_15


In [34]:
# See updated reagent_dict
reagent_dict

{'Electrophiles': {'Electrophile_0': {'name': 'bromide 22',
   'SMILES': 'placeholder'}},
 'Nucleophiles': {'Nucleophile_0': {'name': 'amine S1',
   'SMILES': 'placeholder'},
  'Nucleophile_1': {'name': 'aniline S2', 'SMILES': 'placeholder'},
  'Nucleophile_2': {'name': 'amide S4', 'SMILES': 'placeholder'},
  'Nucleophile_3': {'name': 'sulfonamide S5', 'SMILES': 'placeholder'},
  'Nucleophile_4': {'name': 'aminopyridine S3', 'SMILES': 'placeholder'},
  'Nucleophile_5': {'name': 'amidine S6', 'SMILES': 'placeholder'},
  'Nucleophile_6': {'name': 'tBu carbamate S7', 'SMILES': 'placeholder'},
  'Nucleophile_7': {'name': 'indazole S8', 'SMILES': 'placeholder'},
  'Nucleophile_8': {'name': 'alcohol S9', 'SMILES': 'placeholder'},
  'Nucleophile_9': {'name': 'phenol S10', 'SMILES': 'placeholder'},
  'Nucleophile_10': {'name': 'thiophenol S11', 'SMILES': 'placeholder'},
  'Nucleophile_11': {'name': 'phosphine S12', 'SMILES': 'placeholder'},
  'Nucleophile_12': {'name': 'boronate S14/water', 'S

### Set SMILES from hand-drawn structures in Chemdraw
Structures are drawn in ./Santanilla_component_smiles.cdx and Chemdraw generated SMILES (non-canonical) are copied to ./experiment_2_molecules.csv

In [36]:
# Load SMILES info from Chemdraw .csv
smiles_data = pd.read_csv('experiment_2_molecules.csv')
smiles_data

Unnamed: 0,component_name,dataset_name,non_canonical_smiles
0,Electrophile_0,bromide 22,BrC1=CN=CC=C1
1,Nucleophile_0,amine S1,CC(N)CCC1=CC=CC=C1
2,Nucleophile_1,aniline S2,NC1=CC=CC=C1
3,Nucleophile_2,amide S4,NC(C1=CC=CC=C1)=O
4,Nucleophile_3,sulfonamide S5,NS(C1=CC=CC=C1)(=O)=O
5,Nucleophile_4,aminopyridine S3,NC1=NC=C(C2=CC=CC=C2)C=C1
6,Nucleophile_5,amidine S6,NC(CC1=CC=CC=C1)=N
7,Nucleophile_6,tBu carbamate S7,NC(OC(C)(C)C)=O
8,Nucleophile_7,indazole S8,C12=CC=CC=C1NN=C2
9,Nucleophile_8,alcohol S9,OCCCC1=CC=CC=C1


In [37]:
# Canonicalize component SMILES
#for smiles in smiles_data['non_canonical_smiles']:
smiles_data['canonical_smiles'] = [Chem.MolToSmiles(Chem.MolFromSmiles(smiles)) for smiles in smiles_data['non_canonical_smiles']]

RDKit ERROR: [15:01:48] Explicit valence for atom # 4 N, 4, is greater than permitted


ArgumentError: Python argument types in
    rdkit.Chem.rdmolfiles.MolToSmiles(NoneType)
did not match C++ signature:
    MolToSmiles(RDKit::ROMol mol, bool isomericSmiles=True, bool kekuleSmiles=False, int rootedAtAtom=-1, bool canonical=True, bool allBondsExplicit=False, bool allHsExplicit=False, bool doRandom=False)

In [None]:
# Define products

# Define a single reaction

This experiment details the screening of cross-coupling reactions between one aryl halide with 16 nucleophiles using 16 different precatalysts and 6 bases (1536 reactions)

The general procedure from the SI is as follows:

**Experiment 2. 1536-Well Plate Screening of Pd Cross-Coupling Reactions of 3- Bromopyridine 22 with 16 Nucleophiles (16 Precatalysts, 6 Bases)**

A 1536-well plate experiment examining the reactivity of 3-bromopyridine 22 with 16 different classes of nucleophiles under 96 Pd cross-coupling reaction conditions was run at 100 nanomolar scale by dosing from a 384-well plate containing stock solutions of the starting materials and reagents into a 1536-Well Plate by Mosquito.<br>

**Procedure.** Stock solutions of each of the reaction components were prepared as follows: Pd- precatalysts (**30-45**, 0.04 M in DMSO), aryl halide (**22**, 0.4 M in DMSO), nucleophiles (**S1-S16**, 0.6 M in DMSO), and base (**24-29**, 0.8 M in DMSO). Each of the solutions was dispensed in 75 uL charges to a 384-well plate (source plate map is shown in Figure S8, components listed in Table S2).<br>

The Mosquito was used to combine the source plate solutions by multi-aspiration of 250 nL of each of the four reaction components and then to dose the resulting reaction mixture (1 uL) into a 1536-well plate. Once the 1536-well plate was fully dosed the plate was covered by a PFA film and clamped to minimize low-level component volatility. The plate was then allowed to sit at room temperature for 22 hours. Using the Mosquito, the plate was then quenched with 3 uL of a DMSO stock solution of acetic 5% acid and biphenyl (to give 3 mol% biphenyl relative to **22**), which was transferred from a 384-well source plate. The Mosquito then sampled 1 uL from the quenched reaction plate into 4 x 384-well plates containing 75 uL of DMSO per well. The Mosquito mixing feature was used three times per aspiration and dispense steps in order to ensure homogeneity of the analytical sample. The 384-well plate was then heat-sealed and subjected to chromatographic analysis by a Waters UPLC Instrument. The ratio of the LC area counts of product over internal standard was used to directly compare the relative performance of these reactions.

In [7]:
# Define Reaction
reaction = reaction_pb2.Reaction()
reaction.identifiers.add(value=r'Pd-catalyzed cross-coupling', type='NAME')

type: NAME
value: "Pd-catalyzed cross-coupling"

In [20]:
print(dir(reaction))

['GetEntryClass', 'MergeFrom', '_MutableMapping__marker', '__abstractmethods__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__setattr__', '__setitem__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '_abc_impl', 'clear', 'get', 'get_or_create', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']
