Seq-Stack-Reaction

Software for "A Bayesian Method for Concurrently Designing Molecules and Synthetic Reaction Networks"

Installation

STEP 1: Get source code.

# Step 1: Get sources from GitHub
$ git clone git@github.com:qi-zh/Seq-Statck-Reaction.git

STEP 2: Create conda environment.

Make sure you have Conda installed.

# Step 2: Create conda environment
$ cd conda_env
$ conda env create -f ssr.yml
$ conda activate ssr

The following table lists some core packages in SSR images.

Pytorch	1.7.1
OpenNMT	1.2.0
rdkit	2020.03.1
ugtm	2.0.0

Refer to the following folder hierarchy and move each model and data to its folder.

─SSR
 ├─ssr
 │ └─*.py
 ├─data
 │ └─pool.csv
 ├─model
 │ ├─molecular_transformer.pt
 │ ├─GTM
 │ │ └─enamine_gtm
 │ └─QSPR
 │   ├─product_logp
 │   ├─product_qed
 │   ├─reactant_logp
 │   └─reactant_qed
 ├─conda_env
 │ └─ssr.yml
 ├─*.py
 └─*.ipynb

Quick start

quick implementation of molecular design

STEP 1: Launch the forward prediction module.

# Run the following command in a shell session.
$ python launch_forward.py

STEP 2: Launch the Sequential Monte Carlo module.

# Run the following command in another shell session.
$ python launch_smc.py

Customize experiment

customize parameters

The following parameters are adjustable in the ssr/setting.json file.

Parameter	Description
reactor_gpu_id	Index of GPU, -1 for CPU
n_forward	Number of forward prediction modules
n_smc_steps	Total number of Sequential Monte Carlo steps
n	Number of particles in each Sequential Monte Carlo step
n_r	Number of reaction steps, larger than 1
generation_threshold	The threshold for filtering the reactant, 1 for only using initial reactants, larger value result to more complex reaction
product_len	Maxmum length of the product SMILES
p_exploitation	Proportion of particles for "exploitation"
refresh_rate	The refresh time by which the forward prediction module check the ourput of the Sequential Monte Carlo module
target_region	Region of the properties of interest

customize models

Seq-Stack-Reaction provides a scaffolding for molecular design, where users can plug-in arbitrary reaction prediction models, property prediction models, and a set of commercial compounds.

To use your property prediction models and the set of commercial compounds, simply modify the model/data path in the ssr/setting.json file.

To use your customized reaction prediction model, see this guidence.

Applications

Application 1: design of drug-like molecules

Introduction

The task is to design drug-like molecules with any given region of drug-likeness (QED) score and logP.

Download materials

Download the following components.

Component	Description
pool.csv	Initial reactant pool consists of Enamine building block catalog lobal stock
molecular_transformer.pt	Molecular Transformer model that predicts the product SMILES based on the reactant SMILES
enamine_gtm	Generative Topographic Maps model for dimensionality reduction and clustering
product_logp	Regression model that predicts the log P value for a given product molecule
product_qed	Regression model that predicts the QED value for a given product molecule
reactant_logp	Regression model that predicts the log P value for a given reactant set
reactant_qed	Regression model that predicts the QED value for a given reactant set

Application 2: design of highly viscous lubricant molecules

Introduction

An example application in materials science. The task is to identify highly viscous lubricant molecules. Using approximately 55,000 samples obtained from all-atom classical molecular dynamics simulations, we predict the viscosity index (VI) and dynamic viscosity index (DVI) (properties that describe the temperature dependence of viscosity) from the chemical structure of any given lubricant molecule.

Summary

First, forward models predicting VI and DVI are constructed. An input molecule is transformed into a binary descriptor vector of dimension 3239 with the concatenation of RDKit fingerprints (length 2048), Molecular ACCess System keys (length 167), and Morgan fingerprints of radius 2 (length 1024). The gradient boosting regression is applied to learn a mapping from any given fingerprinted molecule to VI or DVI. As a reaction prediction model, we employed the Molecular Transformer. This attention-based neural translation model defines a translation between the SMILES strings of reactants and their products. We use a subset of the Enamine building block catalog global stock as the set of commercially available reactants by which virtual molecules are synthesized. The design task is to identify highly synthesizable products showing higher VI and DVI that can be synthesized with the Molecular Transformer using with the set of commercially available reactants. In the example code, we perform SMC-RECUR-GTM-SR-PL in which the number of iterations is set to T=500, the number of particles is set to m=100, the exploration-exploitation trade-off parameter of the proposal distribution is set to alpha = 0.8. We illustrate the change of the joint distribution of predicted VI and DVI for increasing steps of SMC here. Four examples of synthetic products exhibiting high VI and DVI and their designed reaction pathways are illustrated here

Property refinements in different steps

Designed synthetic route pathways

Download materials

Download Python pre-trained model.

Component	Description
product_vi	Regression model that predicts the viscosity index value for a given product molecule
product_dvi	Regression model that predicts the dynamic viscosity index value for a given product molecule
reactant_vi	Regression model that predicts the viscosity index value for a reactant set
reactant_dvi	Regression model that predicts the dynamic viscosity index value for a reactant set

Data

Refer to Kajita et al. for more details about the design of this experiment and the data used for model training.

Copyright and license

References

Landrum, G. et al. RDKit: Open-source cheminformatics (2006). http://rdkit.org/

Durant, J. L., Leland, B.A., Henry, D.R., et al. Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci 42, 1273–1280 (2002). https://www.semanticscholar.org/paper/Reoptimization-of-MDL-Keys-for-Use-in-Drug-Durant-Leland/ad40b25e38314f39a82f193dc4806e6a1c2c6b69

Morgan, H.L. The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. J Chem Doc 5, 107–113 (1965). https://pubs.acs.org/doi/10.1021/c160017a018

Schwaller, P. et al. Molecular Transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent Sci 5, 1572-1583 (2019). https://pubmed.ncbi.nlm.nih.gov/31572784/

Kajita, S., Kinjo, T., Nishi, T. Autonomous molecular design by Monte-Carlo tree search and rapid evaluations using molecular dynamics simulations. Commun Phys 3, 77 (2020). https://doi.org/10.1038/s42005-020-0338-y

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
conda_env		conda_env
examples		examples
ssr		ssr
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
customization.ipynb		customization.ipynb
launch_forward.py		launch_forward.py
launch_smc.py		launch_smc.py
setting.json		setting.json

License

qi-zh/Seq-Stack-Reaction

Folders and files

Latest commit

History

Repository files navigation

Seq-Stack-Reaction

Software for "A Bayesian Method for Concurrently Designing Molecules and Synthetic Reaction Networks"

Table of contents

Installation

STEP 1: Get source code.

STEP 2: Create conda environment.

The following table lists some core packages in SSR images.

Quick start

STEP 1: Launch the forward prediction module.

STEP 2: Launch the Sequential Monte Carlo module.

Customize experiment

customize parameters

customize models

Applications

Application 1: design of drug-like molecules

Introduction

Download materials

Application 2: design of highly viscous lubricant molecules

Introduction

Summary

Property refinements in different steps

Designed synthetic route pathways

Download materials

Data

Refer to Kajita et al. for more details about the design of this experiment and the data used for model training.

Copyright and license

References

About

Resources

License

Stars

Watchers

Forks

Languages