Software for "A Bayesian Method for Concurrently Designing Molecules and Synthetic Reaction Networks"
- Installation
- Download materials
- Quick start
- Customize experiment
- Applications
- Copyright and license
- References
# Step 1: Get sources from GitHub
$ git clone git@github.com:qi-zh/Seq-Statck-Reaction.git
Make sure you have Conda installed.
# Step 2: Create conda environment
$ cd conda_env
$ conda env create -f ssr.yml
$ conda activate ssr
Pytorch | 1.7.1 |
OpenNMT | 1.2.0 |
rdkit | 2020.03.1 |
ugtm | 2.0.0 |
Refer to the following folder hierarchy and move each model and data to its folder.
─SSR
├─ssr
│ └─*.py
├─data
│ └─pool.csv
├─model
│ ├─molecular_transformer.pt
│ ├─GTM
│ │ └─enamine_gtm
│ └─QSPR
│ ├─product_logp
│ ├─product_qed
│ ├─reactant_logp
│ └─reactant_qed
├─conda_env
│ └─ssr.yml
├─*.py
└─*.ipynb
quick implementation of molecular design
# Run the following command in a shell session.
$ python launch_forward.py
# Run the following command in another shell session.
$ python launch_smc.py
The following parameters are adjustable in the ssr/setting.json
file.
Parameter | Description |
---|---|
reactor_gpu_id | Index of GPU, -1 for CPU |
n_forward | Number of forward prediction modules |
n_smc_steps | Total number of Sequential Monte Carlo steps |
n | Number of particles in each Sequential Monte Carlo step |
n_r | Number of reaction steps, larger than 1 |
generation_threshold | The threshold for filtering the reactant, 1 for only using initial reactants, larger value result to more complex reaction |
product_len | Maxmum length of the product SMILES |
p_exploitation | Proportion of particles for "exploitation" |
refresh_rate | The refresh time by which the forward prediction module check the ourput of the Sequential Monte Carlo module |
target_region | Region of the properties of interest |
Seq-Stack-Reaction provides a scaffolding for molecular design, where users can plug-in arbitrary reaction prediction models, property prediction models, and a set of commercial compounds.
To use your property prediction models and the set of commercial compounds, simply modify the model/data path in the ssr/setting.json
file.
To use your customized reaction prediction model, see this guidence.
The task is to design drug-like molecules with any given region of drug-likeness (QED) score and logP.
Download the following components.
Component | Description |
---|---|
pool.csv | Initial reactant pool consists of Enamine building block catalog lobal stock |
molecular_transformer.pt | Molecular Transformer model that predicts the product SMILES based on the reactant SMILES |
enamine_gtm | Generative Topographic Maps model for dimensionality reduction and clustering |
product_logp | Regression model that predicts the log P value for a given product molecule |
product_qed | Regression model that predicts the QED value for a given product molecule |
reactant_logp | Regression model that predicts the log P value for a given reactant set |
reactant_qed | Regression model that predicts the QED value for a given reactant set |
An example application in materials science. The task is to identify highly viscous lubricant molecules. Using approximately 55,000 samples obtained from all-atom classical molecular dynamics simulations, we predict the viscosity index (VI) and dynamic viscosity index (DVI) (properties that describe the temperature dependence of viscosity) from the chemical structure of any given lubricant molecule.
First, forward models predicting VI and DVI are constructed. An input molecule is transformed into a binary descriptor vector of dimension 3239 with the concatenation of RDKit fingerprints (length 2048), Molecular ACCess System keys (length 167), and Morgan fingerprints of radius 2 (length 1024). The gradient boosting regression is applied to learn a mapping from any given fingerprinted molecule to VI or DVI. As a reaction prediction model, we employed the Molecular Transformer. This attention-based neural translation model defines a translation between the SMILES strings of reactants and their products. We use a subset of the Enamine building block catalog global stock as the set of commercially available reactants by which virtual molecules are synthesized. The design task is to identify highly synthesizable products showing higher VI and DVI that can be synthesized with the Molecular Transformer using with the set of commercially available reactants. In the example code, we perform SMC-RECUR-GTM-SR-PL in which the number of iterations is set to T=500, the number of particles is set to m=100, the exploration-exploitation trade-off parameter of the proposal distribution is set to alpha = 0.8. We illustrate the change of the joint distribution of predicted VI and DVI for increasing steps of SMC here. Four examples of synthetic products exhibiting high VI and DVI and their designed reaction pathways are illustrated here
Download Python pre-trained model.
Component | Description |
---|---|
product_vi | Regression model that predicts the viscosity index value for a given product molecule |
product_dvi | Regression model that predicts the dynamic viscosity index value for a given product molecule |
reactant_vi | Regression model that predicts the viscosity index value for a reactant set |
reactant_dvi | Regression model that predicts the dynamic viscosity index value for a reactant set |
Refer to Kajita et al. for more details about the design of this experiment and the data used for model training.
Landrum, G. et al. RDKit: Open-source cheminformatics (2006). http://rdkit.org/
Durant, J. L., Leland, B.A., Henry, D.R., et al. Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci 42, 1273–1280 (2002). https://www.semanticscholar.org/paper/Reoptimization-of-MDL-Keys-for-Use-in-Drug-Durant-Leland/ad40b25e38314f39a82f193dc4806e6a1c2c6b69
Morgan, H.L. The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. J Chem Doc 5, 107–113 (1965). https://pubs.acs.org/doi/10.1021/c160017a018
Schwaller, P. et al. Molecular Transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent Sci 5, 1572-1583 (2019). https://pubmed.ncbi.nlm.nih.gov/31572784/
Kajita, S., Kinjo, T., Nishi, T. Autonomous molecular design by Monte-Carlo tree search and rapid evaluations using molecular dynamics simulations. Commun Phys 3, 77 (2020). https://doi.org/10.1038/s42005-020-0338-y