ChemMORT

The ChemMORT (Molecular Represent & Translate) consists of three modules, including SMILES Encoder, Embedding Decoder and Molecular Optimizer.

Introduction

SMILES Encoder

The ChemMRAT SMILES Encoder allows the user to easily embed a SMILES string to a 512-dimensional vector, which can be used for building a QSAR model. Especially for DNN, the encoding descriptors satisfy the fundamental idea of representation learning: DNNs should learn a suitable representation of the data from a simple but complete featurization rather than relying on sophisticated human-engineered representations. Besides, DNNs often require massive amounts of data for training, but the available QSAR data is often small. Through enumerating the SMILES of a molecule, the data is extended to several times of the original repository. Users can input a chemical to be evaluated in the following three ways: drawing it in an included chemical sketcher window, entering a structure text file, or imputing the SMILES of the chemical structures.

Embedding Decoder

The ChemMRAT Embedding Decoder was implemented to translate the embedding descriptors, retrieved from ChemMRAT SMILES Encoder, to a SMILES string. The Decoder assists the molecular property optimization where the user could adjust the embedding descriptors to hit an aimed property, and then use decoder to obtain the SMILES of each molecule. Users can input a .csv file, and a .smi file can be returned after a few seconds to minutes.

Molecular Optimizer

The ChemMRAT Molecular Optimizer, merged the Encoder, Decoder and Particle Swarm Optimization (PSO) method, was designed to optimize molecules with respect to a single objective, under constraints with chemical substructures and a multi-objective value function. Not only does our proposed method exhibit competitive or better performance in finding optimal solutions compared to baseline method, is also achieves significant reduction in computational time. After users input the SMILES of a chemical structure and select property to be optimized, several best solutions can be obtained in the results.

Endpoint of Optimizer

Endpoint	Description	Performance	Type	Method
logD_7.4	Log of the octanol/water distribution coefficient at pH_7.4. * Optimal: 1~3	Test Set RMSE: 0.555±0.010 MAE: 0.426±0.007 R²: 0.840±0.004 5-Fold CV RMSE: 0.562±0.009 MAE: 0.428±0.13 R²: 0.834±0.005	Basic property	XGBoost
AMES	The probability to be positive in Ames test. * The smaller AMES score, the less likely to be AMES positive.	Test Set ACC: 0.813±0.007 SEN: 0.835±0.013 SPE: 0.787±0.013 AUC: 0.888±0.004 5-Fold CV ACC: 0.810±0.016 SEN: 0.838±0.014 SPE: 0.777±0.031 AUC: 0.889±0.013	Toxicity	XGBoost
Caco-2	Papp (Caco-2 Permeability) Optimal: higher than -5.15 Log unit or -4.70 or -4.80	Test Set RMSE: 0.332±0.007 MAE: 0.244±0.004 R²: 0.718±0.019 5-Fold CV RMSE: 0.328±0.004 MAE: 0.245±0.005 R²: 0.728±0.011	Absorption	XGBoost& Data Augment
MDCK	Papp (MDCK Permeability)	Test Set RMSE: 0.323±0.022 MAE: 0.232±0.011 R²: 0.650±0.041 5-Fold CV RMSE: 0.322±0.034 MAE: 0.235±0.021 R²: 0.644±0.057	Absorption	XGBoost& Data Augment
PPB	Plasma Protein Binding * Significant with drugs that are highly protein-bound and have a low therapeutic index.	Test Set RMSE: 0.152±0.003 MAE: 0.104±0.002 R²: 0.691±0.016 5-Fold CV RMSE: 0.154±0.010 MAE: 0.106±0.007 R²: 0.691±0.025	Distribution	DNN
QED	quantitative estimate of drug-likeness	n/a	Drug-likeness score	Molecular Function
SlogP	Log of the octanol/water partition coefficient, based on an atomic contribution model [Crippen 1999]. * Optimal: 0< LogP <3 * logP <0: poor lipid bilayer permeability. * logP >3: poor aqueous solubility.	Fitted on an extensive training set of 9920 molecules, with R² = 0.918 and σ = 0.677	Basic property	Molecular Function
logS	Log of Solubility * Optimal: higher than -4 log mol/L * <10 μg/mL: Low solubility. * 10–60 μg/mL: Moderate solubility. * >60 μg/mL: High solubility	Test Set RMSE: 0.823±0.026 MAE: 0.572±0.009 R²: 0.862±0.011 5-Fold CV RMSE: 0.842±0.084 MAE: 0.592±0.056 R²: 0.839±0.029	Basic property	XGBoost
hERG	The probability to be hERG Blocker * The higher hERG score, the more likely to be hERG Blocker.	Test Set ACC: 0.814±0.026 SEN: 0.841±0.042 SPE: 0.760±0.065 AUC: 0.854±0.032 5-Fold CV ACC: 0.800±0.036 SEN: 0.820±0.068 SPE: 0.754±0.147 AUC: 0.857±0.053	Toxicity	XGBoost
Hepatoxicity	The probability of owning liver toxicity * The smaller hepatoxicity score, the less likely to be liver toxic.	Test Set ACC: 0.729±0.016 SEN: 0.732±0.019 SPE: 0.724±0.044 AUC: 0.794±0.015 5-Fold CV ACC: 0.700±0.026 SEN: 0.701±0.030 SPE: 0.691±0.075 AUC: 0.764±0.030	Toxicity	XGBoost
LD50	LD50 of acute toxicity * High-toxicity: 1~50 mg/kg. * Toxicity: 51~500 mg/kg. * low-toxicity: 501~5000 mg/kg.	Test Set ACC: 0.765±0.007 SEN: 0.764±0.015 SPE: 0.765±0.014 AUC: 0.848±0.007 5-Fold CV ACC: 0.741±0.045 SEN: 0.742±0.128 SPE: 0.740±0.111 AUC: 0.833±0.033	Toxicity	XGBoost

Downloading Pretrained Model

A pretrained model as described in ref. 1 is available on Google Drive. Download and unzip by execuiting the bash script "download_default_model.sh":

./download_default_model.sh

The default_model.zip file can also be downloaded manualy under https://drive.google.com/open?id=1oyknOulq_j0w9kzOKKIHdTLo5HphT99h

Dev Environment

tensorflow=='1.14.0'
scikit-learn=='0.23.2'
rdkit=='2019.03.1'

Base

cddd
mso

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
cddd		cddd
mso		mso
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
download_default_model.sh		download_default_model.sh
optAPI.py		optAPI.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChemMORT

Introduction

SMILES Encoder

Embedding Decoder

Molecular Optimizer

Endpoint of Optimizer

Downloading Pretrained Model

Dev Environment

Base

About

Releases

Packages

Contributors 2

Languages

License

leelasd/ChemMORT

Folders and files

Latest commit

History

Repository files navigation

ChemMORT

Introduction

SMILES Encoder

Embedding Decoder

Molecular Optimizer

Endpoint of Optimizer

Downloading Pretrained Model

Dev Environment

Base

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages