<a href="https://colab.research.google.com/github/kimjc95/computational-chemistry/blob/main/Boltz_on_Colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Boltz on Colab
2025-02-13 by Joo-Chan Kim at [MSBL](https://msbl.kaist.ac.kr), KAIST

This is a Google Colaboratory Notebook for running Boltz-1 at ease (BSD-3 license).

Please report to [my GitHub](https://github.com/kimjc95/computational-chemistry/issues) or email(kimjoochan@kaist.ac.kr) if you encounter a bug.


In [1]:
#@title Install dependencies
#@markdown This will take 1-3 minutes...

gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)

if gpu_info.find('NVIDIA A100') > 0:
    num_workers = 4
    accelerator = 'gpu'
    print('Found a Nvidia A100 GPU.\n')
elif gpu_info.find('NVIDIA L4') > 0:
    num_workers = 2
    accelerator = 'gpu'
    print('Found a Nvidia L4 GPU.\n')
elif gpu_info.find('Tesla T4') > 0:
    num_workers = 2
    accelerator = 'gpu'
    print('Found a Nvidia T4 GPU.\n')
else:
    num_workers = 2
    accelerator = 'cpu'
    print('You chose a CPU runtime, which can be very slow!\n')

print('Installing dependencies... ', end='')
import subprocess

subprocess.run("pip install rdkit pyyaml ipywidgets", shell=True)
subprocess.run("git clone https://github.com/jwohlwend/boltz.git", shell=True)
subprocess.run("sed -i 's/==/>=/g' /content/boltz/pyproject.toml", shell=True)
subprocess.run("cd boltz; pip install -e .", shell=True)

print('done.')

Found a Nvidia A100 GPU.

Installing dependencies... done.


In [11]:
#@title Enter sequence input
#@markdown Type the job title name without blanks in the box below.
job_title = "BTL2_R" #@param {type:"string"}
#@markdown Run this cell and by using the interactive widgets below, enter the molecule sequence data.

#@markdown For small molecule ligands or modified residues, you can enter the CCD ID (Chemical Compoenent Dictionary code) which can be looked upon the [PDBeChem website](https://www.ebi.ac.uk/pdbe-srv/pdbechem/).

import ipywidgets as widgets
from IPython.display import display, HTML
import os
import re
import requests
from rdkit import Chem, RDLogger
from rdkit.Chem import Draw, AllChem


def validate_input(text, input_type):
    if input_type == 'protein':
        return re.match(r'^[AC-IK-NP-TVWY]+$', text.upper()) is not None
    elif input_type == 'dna':
        return re.match(r'^[ACGT]+$', text.upper()) is not None
    elif input_type == 'rna':
        return re.match(r'^[ACGU]+$', text.upper()) is not None
    elif input_type == 'smiles':
        RDLogger.DisableLog('rdApp.*')
        try:
            mol = Chem.MolFromSmiles(text, sanitize=True)
        except:
            return False
        if mol is None:
            return False
        else:
            return True
    elif input_type == 'ccd':
        url = f"https://data.rcsb.org/rest/v1/core/chemcomp/{text.upper()}"
        try:
            response = requests.get(url)
            return response.status_code == 200
        except requests.exceptions.RequestException:
            return False


def get_residue_ccd(pol_type, sequence, index):
    AA_codes = {
        'A': 'ALA', 'R': 'ARG', 'N': 'ASN', 'D': 'ASP', 'C': 'CYS',
        'E': 'GLU', 'Q': 'GLN', 'G': 'GLY', 'H': 'HIS', 'I': 'ILE',
        'L': 'LEU', 'K': 'LYS', 'M': 'MET', 'F': 'PHE', 'P': 'PRO',
        'S': 'SER', 'T': 'THR', 'W': 'TRP', 'Y': 'TYR', 'V': 'VAL'}

    DNA_codes = {'A': 'DA', 'T': 'DT', 'G': 'DG', 'C': 'DC'}

    if not index:
        return None

    position = int(index)
    if position < 1 or position > len(sequence):
        return None

    residue = sequence[position-1].upper()

    if pol_type == 'protein':
        return AA_codes[residue]
    elif pol_type == 'dna':
        return DNA_codes[residue]
    else:
        return residue


def get_atom_names(ccd_id):
    response = requests.get(f"https://files.rcsb.org/ligands/download/{ccd_id}.cif")
    if response.status_code == 200:
        lines = response.text.split('\n')
    else:
        print(f'Failed fetching {ccd_id}.cif file from RCSB.')
        return None

    atom_names = []

    read_atoms = False
    for line in lines:
        if '_chem_comp_atom.' in line:
            read_atoms = True
            continue
        if read_atoms and line.strip():
            if line.startswith('#'):
                break
            else:
                parts = line.split()
                if len(parts) >= 2:
                    atom_name = parts[1].strip('"')
                    atom_type = parts[3]
                    if atom_type != 'H':
                        atom_names.append(atom_name)

    return atom_names


seq_data = []
mod_data = []
bond_data = []
binder = ''
pocket_data = []

class modify_entries():
    def __init__(self, container):
        self.container = container

    def remove_seq_entry(self, b):
        for i in range(1, len(self.container.children)-1):
            if self.container.children[i].children[0].children[-1] == b:
                newList = []
                for j in range(i+1, len(self.container.children)-1):
                    show_chain = widgets.Label(value= 'chain '+str(chr(ord('A')+j-2)))
                    newline = widgets.HBox([show_chain]+list(self.container.children[j].children[0].children[1:]))
                    newEntry = widgets.VBox([newline]+list(self.container.children[j].children[1:]))
                    newList.append(newEntry)

                self.container.children = list(self.container.children[:i]) + newList + [self.container.children[-1]]
                break

    def add_seq_entry(self, b):
        show_chain = widgets.Label(value= 'chain '+str(chr(ord('A')+len(self.container.children)-2)))

        select_type = widgets.Dropdown(
            options=['protein', 'dna', 'rna', 'smiles', 'ccd'],
            description=' is ',
            style={'description_width': 'initial'},
            layout=widgets.Layout(width='120px'))

        enter_sequence = widgets.Text(
            description=' described as :',
            placeholder='MAKEY... or CC1=CC=CC=C1 or ATP',
            style={'description_width': 'initial'},
            layout=widgets.Layout(width='600px'))

        remove_btn = widgets.Button(description='-', layout=widgets.Layout(width='30px'))
        remove_btn.on_click(self.remove_seq_entry)

        message = widgets.HTML(value='', layout=widgets.Layout(width='600px', padding='5px'))

        def validate_string(change):
            if validate_input(enter_sequence.value, select_type.value):
                message.value = ""
            elif select_type.value in ['protein', 'dna', 'rna']:
                message.value = f"<span style='color: red;'>Enter the valid {select_type.value} sequence.</span>"
            else:
                message.value = f"<span style='color: red;'>Enter the valid {select_type.value} string.</span>"

        enter_sequence.observe(validate_string, names='value')
        select_type.observe(validate_string, names='value')

        line = widgets.HBox([show_chain, select_type, enter_sequence, remove_btn])
        entry = widgets.VBox([line, message])

        self.container.children = list(self.container.children[:-1]) + [entry, self.container.children[-1]]


    def update_seq_data(self, b):
        seq_data.clear()
        for i in range(1, len(self.container.children)-1):
            entry = self.container.children[i]
            line = entry.children[0]
            message = entry.children[1]
            if message.value != '':
                continue
            if line.children[2].value == '':
                continue

            seq = {'chain': line.children[0].value[-1],
                   'type': line.children[1].value,
                   'sequence': line.children[2].value}
            seq_data.append(seq)


    def remove_an_entry(self, b):
        for i in range(1, len(self.container.children)-1):
            if self.container.children[i].children[0].children[-1] == b:
                self.container.children = list(self.container.children[:i]) + list(self.container.children[i+1:])
                break

    def add_mod_entry(self, b):
        polymers = []
        for s in seq_data:
            if s['type'] in ['protein', 'dna', 'rna']:
                polymers.append(s['chain'])

        select_chain = widgets.Dropdown(
            options=polymers,
            description='chain ',
            style={'description_width': 'initial'},
            layout=widgets.Layout(width='90px'))

        enter_index = widgets.Text(
            description='position: ',
            placeholder = '0 <',
            style={'description_width': 'initial'},
            layout=widgets.Layout(width='300px'))

        show_res = widgets.Label(value='change from: -', layout=widgets.Layout(width='150px'))

        enter_ccd = widgets.Text(
            description='to:',
            placeholder='CCD code',
            style={'description_width': 'initial'},
            layout=widgets.Layout(width='300px'))

        remove_btn = widgets.Button(description='-', layout=widgets.Layout(width='30px'))
        remove_btn.on_click(self.remove_an_entry)

        message = widgets.HTML(value='', layout=widgets.Layout(width='600px', padding='5px'))

        def validate_string(change):
            error_type = ''
            for s in seq_data:
                if s['chain'] == select_chain.value:
                    res = get_residue_ccd(s['type'], s['sequence'], enter_index.value)
                    if res is None:
                        error_type = 'index'
                        show_res.value = 'change from: -'
                    else:
                        show_res.value = f"change from: {res}"
                    break

            if not validate_input(enter_ccd.value, 'ccd'):
                error_type = 'CCD'

            for m in mod_data:
                if m['chain'] == select_chain.value:
                    if m['index'] == int(enter_index.value):
                        error_type = "duplicate"
                        break

            if error_type == '':
                message.value = ""
            elif error_type == "duplicate":
                message.value = f"<span style='color: red;'>Duplicate residue index.</span>"
            else:
                message.value = f"<span style='color: red;'>Enter the valid {error_type}.</span>"

        select_chain.observe(validate_string, names='value')
        enter_index.observe(validate_string, names='value')
        enter_ccd.observe(validate_string, names='value')

        line = widgets.HBox([select_chain, enter_index, show_res, enter_ccd, remove_btn])
        entry = widgets.VBox([line, message])

        self.container.children = list(self.container.children[:-1]) + [entry, self.container.children[-1]]

    def update_mod_data(self, b):
        mod_data.clear()
        for i in range(1, len(self.container.children)-1):
            entry = self.container.children[i]
            line = entry.children[0]
            message = entry.children[1]
            if message.value != '':
                continue

            if line.children[1].value == '':
                continue

            if line.children[3].value == '':
                continue

            mod = {'chain': line.children[0].value,
                   'index': int(line.children[1].value),
                   'ccd': line.children[3].value}
            mod_data.append(mod)


    def add_bond_entry(self, b):
        select_chain1 = widgets.Dropdown(
            options=[s['chain'] for s in seq_data],
            description='Atom1 chain:',
            style={'description_width': 'initial'},
            layout=widgets.Layout(width='150px'))

        enter_res1 = widgets.Text(
            description='resi: ',
            placeholder = '> 0',
            style={'description_width': 'initial'},
            layout=widgets.Layout(width='200px'))

        select_atom1 = widgets.Dropdown(
            description='atom: ',
            options=[],
            style={'description_width': 'initial'},
            layout=widgets.Layout(width='200px'))

        select_chain2 = widgets.Dropdown(
            options=[s['chain'] for s in seq_data],
            description='Atom2 chain:',
            style={'description_width': 'initial'},
            layout=widgets.Layout(width='150px'))

        enter_res2 = widgets.Text(
            description='resi: ',
            placeholder = '> 0',
            style={'description_width': 'initial'},
            layout=widgets.Layout(width='200px'))

        select_atom2 = widgets.Dropdown(
            description='atom: ',
            options=[],
            style={'description_width': 'initial'},
            layout=widgets.Layout(width='200px'))

        remove_btn = widgets.Button(description='-', layout=widgets.Layout(width='30px'))
        remove_btn.on_click(self.remove_an_entry)

        message = widgets.HTML(value='', layout=widgets.Layout(width='600px', padding='5px'))

        def validate_string(change):
            error_type = ''

            if select_chain1.value == select_chain2.value and enter_res1.value == enter_res2.value:
                error_type = 'duplicate'

            else:
                for s in seq_data:
                    if s['chain'] == select_chain1.value:
                        res1 = get_residue_ccd(s['type'], s['sequence'], enter_res1.value)
                        if res1 is None:
                            error_type = 'index1'
                            select_atom1.options = []
                        else:
                            select_atom1.options = get_atom_names(res1)
                    if s['chain'] == select_chain2.value:
                        res2 = get_residue_ccd(s['type'], s['sequence'], enter_res2.value)
                        if res2 is None:
                            error_type = 'index2'
                            select_atom2.options = []
                        else:
                            select_atom2.options = get_atom_names(res2)

            if error_type == '':
                message.value = ""
            elif error_type == "duplicate":
                message.value = f"<span style='color: red;'>Duplicate residue index.</span>"
            else:
                message.value = f"<span style='color: red;'>Enter the valid number for {error_type}.</span>"

        select_chain1.observe(validate_string, names='value')
        enter_res1.observe(validate_string, names='value')
        select_chain2.observe(validate_string, names='value')
        enter_res2.observe(validate_string, names='value')

        line = widgets.HBox([select_chain1, enter_res1, select_atom1, select_chain2, enter_res2, select_atom2, remove_btn])
        entry = widgets.VBox([line, message])

        self.container.children = list(self.container.children[:-1]) + [entry, self.container.children[-1]]

    def update_bond_data(self, b):
        bond_data.clear()
        for i in range(1, len(self.container.children)-1):
            entry = self.container.children[i]
            line = entry.children[0]
            message = entry.children[1]
            if message.value != '':
                continue

            if line.children[1].value == '':
                continue
            if line.children[4].value == '':
                continue

            bond = {'chain1': line.children[0].value,
                    'index1': int(line.children[1].value),
                    'atom1': line.children[2].value,
                    'chain2': line.children[3].value,
                    'index2': int(line.children[4].value),
                    'atom2': line.children[5].value}
            bond_data.append(bond)



    def add_pocket_entry(self, b):
        receptor = []
        for s in seq_data:
            if s['chain'] != binder:
                receptor.append(s['chain'])

        select_chain = widgets.Dropdown(
            options=receptor,
            description='receptor chain:',
            style={'description_width': 'initial'},
            layout=widgets.Layout(width='150px'))

        enter_index = widgets.Text(
            description='pocket position: ',
            placeholder = 'resi>0',
            style={'description_width': 'initial'},
            layout=widgets.Layout(width='300px'))

        show_res = widgets.Label(value="residue index : -", layout=widgets.Layout(width='300px'))

        remove_btn = widgets.Button(description='-', layout=widgets.Layout(width='30px'))
        remove_btn.on_click(self.remove_an_entry)

        message = widgets.HTML(value='', layout=widgets.Layout(width='600px', padding='5px'))

        def validate_string(change):
            error_type = ''

            for s in seq_data:
                if s['chain'] == select_chain.value:
                    res = get_residue_ccd(s['type'], s['sequence'], enter_index.value)
                    if res is None:
                        error_type = 'index'
                        show_res.value = 'residue index : -'
                        break
                    else:
                        show_res.value = f"residue index : {res}"
                        break

            for p in pocket_data:
                if p['chain'] == select_chain.value:
                    if p['index'] == int(enter_index.value):
                        error_type = "duplicate"
                        break

            if error_type == '':
                message.value = ""
            elif error_type == "duplicate":
                message.value = f"<span style='color: red;'>Duplicate residue index found.</span>"
            else:
                message.value = f"<span style='color: red;'>Enter the valid {error_type}.</span>"

        select_chain.observe(validate_string, names='value')
        enter_index.observe(validate_string, names='value')

        line = widgets.HBox([select_chain, enter_index, show_res, remove_btn])
        entry = widgets.VBox([line, message])

        self.container.children = list(self.container.children[:-1]) + [entry, self.container.children[-1]]

    def update_pocket_data(self, b):
        pocket_data.clear()
        for i in range(1, len(self.container.children)-1):
            entry = self.container.children[i]
            line = entry.children[0]
            message = entry.children[1]
            if message.value != '':
                continue
            if line.children[1].value == '':
                continue

            pocket = {'chain': line.children[0].value,
                      'index': int(line.children[1].value)}
            pocket_data.append(pocket)


title = widgets.HTML("<h4>Click the plus button to add molecules, and minus button to remove ones. Click the confirm button after entering all entries.</h4>")
add_button = widgets.Button(description='+', layout=widgets.Layout(width='30px'))
confirm_button = widgets.Button(description='confirm', layout=widgets.Layout(width='100px'))
buttons = widgets.HBox([add_button, confirm_button])
seq_container = widgets.VBox([title, buttons])

add_new_seq = modify_entries(seq_container)

add_button.on_click(add_new_seq.add_seq_entry)
confirm_button.on_click(add_new_seq.update_seq_data)
display(seq_container)

VBox(children=(HTML(value='<h4>Click the plus button to add molecules, and minus button to remove ones. Click …

#### (Optional) Enter modification / constraint data

In [None]:
#@markdown (Optional) Run this cell to enter modification data

#@markdown Select the chain, and type the 1-based index for the residue to modify. Type the CCD code for the modified residue.

assert len(seq_data) > 0, "No molecule info entered"
pol_flag = False
for s in seq_data:
    if s['type'] in ['protein', 'dna', 'rna']:
        pol_flag = True
        break
assert pol_flag, "No polymer molecules were entered"

title = widgets.HTML("<h4>Click the plus button to modify biopolymers you entered above, and minus button to remove modifications. Click the confirm button after entering all entries.</h4>")
add_button = widgets.Button(description='+', layout=widgets.Layout(width='30px'))
confirm_button = widgets.Button(description='confirm', layout=widgets.Layout(width='100px'))
buttons = widgets.HBox([add_button, confirm_button])
mod_container = widgets.VBox([title, buttons])

add_new_mod = modify_entries(mod_container)

add_button.on_click(add_new_mod.add_mod_entry)
confirm_button.on_click(add_new_mod.update_mod_data)
display(mod_container)

In [None]:
#@markdown (Optional) Enter bond constraints

assert len(seq_data) > 0, "No molecule info entered"

title = widgets.HTML("<h4>Click the plus button to add covalent bond constraints to the molecules, and minus button to remove constraints. Click the confirm button after entering all entries.</h4>")
add_button = widgets.Button(description='+', layout=widgets.Layout(width='30px'))
confirm_button = widgets.Button(description='confirm', layout=widgets.Layout(width='100px'))
buttons = widgets.HBox([add_button, confirm_button])
bond_container = widgets.VBox([title, buttons])

add_new_bond = modify_entries(bond_container)

add_button.on_click(add_new_bond.add_bond_entry)
confirm_button.on_click(add_new_bond.update_bond_data)
display(bond_container)

In [12]:
#@markdown (Optional) Enter pocket constraints

assert len(seq_data) > 1, "Less than 2 molecules entered"

title = widgets.HTML("<h4>Choose the ligand chain. Then add the pocket constraints. Click the confirm button after adding all entries.</h4>")
choose_ligand = widgets.Dropdown(
    options=[s['chain'] for s in seq_data],
    description='ligand chain ID:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='150px'))
add_button = widgets.Button(description='+', layout=widgets.Layout(width='30px'))
confirm_button = widgets.Button(description='confirm', layout=widgets.Layout(width='100px'))
header = widgets.VBox([title, choose_ligand])
buttons = widgets.HBox([add_button, confirm_button])
pocket_container = widgets.VBox([header, buttons])

def update_binder(change):
    global binder
    binder = change['new']

choose_ligand.observe(update_binder, names='value')

add_new_pocket = modify_entries(pocket_container)
add_button.on_click(add_new_pocket.add_pocket_entry)
confirm_button.on_click(add_new_pocket.update_pocket_data)
display(pocket_container)

VBox(children=(VBox(children=(HTML(value='<h4>Choose the ligand chain. Then add the pocket constraints. Click …

### Run Prediction

In [13]:
#@title Create YAML file from the input data
#@markdown Once you are confident with all the inputs above, run this cell to generate the input file.

import yaml

data = {'sequences':[]}

for seq in seq_data:
    if seq['type'] in ['protein', 'dna', 'rna']:
        modifications = []
        for mod in mod_data:
            if mod['chain'] == seq['chain']:
                modifications.append({'position':mod['index'], 'ccd':mod['ccd']})
        if len(modifications) > 0:
            data['sequences'].append({seq['type']:{'id':seq['chain'], 'sequence':seq['sequence'], 'modifications':modifications}})
        else:
            data['sequences'].append({seq['type']:{'id':seq['chain'], 'sequence':seq['sequence']}})
    elif seq['type'] == 'smiles':
        data['sequences'].append({'ligand':{'id':seq['chain'], 'smiles':seq['sequence']}})
    elif seq['type'] == 'ccd':
        data['sequences'].append({'ligand':{'id':seq['chain'], 'ccd':seq['sequence']}})

if len(bond_data) > 0 or len(pocket_data)>0:
    data['constraints'] = []
for bond in bond_data:
    data['constraints'].append({'bond':{'atom1':[bond['chain1'], bond['index1'], bond['atom1']], 'atom2':[bond['chain2'], bond['index2'], bond['atom2']]}})

if len(pocket_data) > 0:
    data['constraints'].append({'pocket':{'binder':binder, 'contacts':[]}})

for pocket in pocket_data:
     data['constraints'][-1]['pocket']['contacts'].append([pocket['chain'], pocket['index']])

with open(f'{job_title}.yaml', 'w') as f:
    yaml.dump(data, f, default_flow_style=False, sort_keys=False)
    print('Done!')

In [14]:
#@title Run prediction using Boltz-1
#@markdown Lower the step scale to increase the diversity of result. (default: 1.638)
step_scale = 1.638 #@param {type:"slider", min:1, max:2, step:0.001}
#@markdown Set the number of diffusion samples to be generated. (default: 1, AlphaFold3: 5)
num_samples = 5 #@param {type:"slider", min:1, max:10, step:1}
#@markdown Set the number of recycling steps for the prediction. (default: 3, AlphaFold3: 20)
num_recycles = 10 #@param {type:"slider", min:1, max:25, step:1}

!boltz predict {job_title}.yaml --accelerator {accelerator} --num_workers {num_workers} --step_scale {step_scale} --recycling_steps {num_recycles} --diffusion_samples {num_samples} --use_msa_server --out_dir /content/{job_title}

Checking input data.
Running predictions for 1 structure
Processing input data.
  0% 0/1 [00:00<?, ?it/s]Generating MSA for BTL2_R.yaml with 1 protein entities.

  0% 0/150 [00:00<?, ?it/s][A
SUBMIT:   0% 0/150 [00:00<?, ?it/s][A
COMPLETE:   0% 0/150 [00:00<?, ?it/s][A
COMPLETE: 100% 150/150 [00:02<00:00, 62.85it/s] 
100% 1/1 [00:03<00:00,  3.12s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
You are using a CUDA device ('NVIDIA A100-SXM4-40GB') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
2025-02-12 18:39:59.446785: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-poin

In [15]:
#@markdown Download results

import zipfile
from google.colab import files

print("Downloading result files.")
filename = f'{job_title}.zip'

with zipfile.ZipFile(filename, 'w') as zip_file:
    dir_path = f'/content/{job_title}/boltz_results_{job_title}'
    for root, directory, items in os.walk(dir_path):
        for item in items:
            path = os.path.join(root, item)
            zip_file.write(path, arcname=os.path.relpath(os.path.join(root, item), dir_path), compress_type=zipfile.ZIP_DEFLATED)

files.download(filename)

Downloading result files.


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Tools used

* Boltz-1
> J. Wohlwend, G. Corso, S. Passaro, M. Reveiz, K. Leidal, W. Swiderski,  T. Portnoi, I. Chinn, J. Silterra, T. Jaakkola, and R. Barzilay, Regina (2024) "Boltz-1: Democratizing Biomolecular Interaction Modeling" bioRxiv. DOI:[10.1101/2024.11.19.624167](https://doi.org/10.1101/2024.11.19.624167)

* ColabFold
> M. Mirdita, K. Schütze, Y. Moriwaki, L. Heo, S. Ovchinnikov, and M. Steinegger (2022) "ColabFold: making protein folding accessible to all" Nature methods, 19, 679-682. DOI:[10.1038/s41592-022-01488-1](https://doi.org/10.1038/s41592-022-01488-1)

*  RDKit
> RDKit: Open-source cheminformatics. https://www.rdkit.org
