Skip to content

Water solubility predictor (RandomForestRegressor) trained on ESOL-Delaney dataset

Notifications You must be signed in to change notification settings

kuteykin/solubility-predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

project solubility-predictor (in progress)

Predicting Aqueous Solubility of Organic Molecules Using Molecular Descriptor Model

ML model: Random Forest Regressor

ML features: physico-chemical descriptors obtained with PaDEL software

according to article "Predicting Aqueous Solubility of Organic Molecules Using Deep Learning Models withVaried Molecular Representations" by Panapitiya et al, 2021 (https://arxiv.org/pdf/2105.12638v1.pdf), Molecular Descriptor Model overperforms fully connected neural networks (FCNNs), recurrent neural networks (RNNs), graph neural networks (GNNs), and SchNet.

Training Dataset: ESOL Delaney, Water solubility data (LogS, log solubility in mols per litre) for common organic small molecules.

"Input/solubility_delaney_processed.csv" training dataset is obtained from https://moleculenet.org/datasets-1 ,

"Input/solubility_delaney.csv" original data Delaney "ESOL:  Estimating Aqueous Solubility Directly from Molecular Structure" J. Chem. Inf. Comput. Sci. 2004, 44, 3, 1000–1005)

"Input/unknown.smi" UNKNOWN dataset: 105 organic molecules from ChEMBL (SMILES and ChEMBL names) with unknown water solubility, is used to predict solubility from structure

"Output/ * . * " Files generated by script

"Output/unknown_solubility.csv" Results: Predicted solubilities for UNKNOWN dataset

About

Water solubility predictor (RandomForestRegressor) trained on ESOL-Delaney dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published