Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Molecule properties not retained with MolStandardize.rdMolStandardize.Cleanup() #2965

Closed
ZacharyKaplan opened this issue Feb 20, 2020 · 1 comment
Labels
Milestone

Comments

@ZacharyKaplan
Copy link

Configuration:

  • RDKit Version: 2019.09.3.0
  • Operating system: Centos 7
  • Python version (if relevant): 3.6, 3.7
  • Are you using conda? Yes
  • If you are using conda, which channel did you install the rdkit from? rdkit
  • If you are not using conda: how did you install the RDKit? N/A

Description:

Rdkit Team,
Certain molecules do not retain their properties when standardized with the rdkit.Chem.MolStandardize.rdMolStandardize.Cleanup function. This updated function was changed in an earlier release for speed improvements and to fix this very issue. We did not experience this molecular property dropping problem in the 2019.09.2.0 release, but it seems to have appeared again. Running on a collection of a million smiles, we found that the molecular properties are dropped on about 5% of the constructed molecules. Here are some examples of smiles that fail:

Cl.c1cnc(OCCCC2CCNCC2)cn1
CC@@H[C@H]1C(=O)N2C(C(=O)O)=C(S[C@@h]3CNC@HC3)C@H[C@H]12.O
C=C[C@H]1CCC[NH2+]1.[Cl-]
CN1CCC2CCN(C(=O)C3CNC3)C2C1.Cl
COc1cc(NC(=O)CNC(=O)C@@HC(C)C)ccc1NS(C)(=O)=O.Cl
CNC(=O)C(c1ccccc1)n1cnc2c([NH2+]C)ncnc21.[Cl-]
C=CCN+(C)CC=C.[Cl-].[Cl-]
CN1C(=C2C(=[NH2+])N3CSCC3C2=O)N(C)c2ccccc21.[Cl-]

from rdkit.Chem.MolStandardize import rdMolStandardize
from rdkit.Chem import MolFromSmiles

smiles = "COc1cc(NC(=O)CNC(=O)[C@@H](N)C(C)C)ccc1NS(C)(=O)=O.Cl"
mol = MolFromSmiles(smiles)
mol.SetProp("testing_prop", str(1234))

print("Smiles", smiles)

print("Props Before CleanUp")
print(mol.GetPropsAsDict())

standard_mol = rdMolStandardize.Cleanup(mol)

print("Props After CleanUp")
print(standard_mol.GetPropsAsDict())

Thank you,
Zach Kaplan

@greglandrum greglandrum added this to the 2019_09_4 milestone Feb 28, 2020
@greglandrum
Copy link
Member

Hi Zach,
Thanks for reporting this. I'll take a look and try and get this fixed

shrey183 added a commit to shrey183/rdkit that referenced this issue Mar 11, 2020
shrey183 added a commit to shrey183/rdkit that referenced this issue Mar 11, 2020
greglandrum pushed a commit that referenced this issue Mar 14, 2020
* fixed issue #2965

* added test case for issue #2965

* fixed formatting and added comment.
greglandrum added a commit that referenced this issue Oct 9, 2020
* fixed issue #2965

* added test case for issue #2965

* fixed formatting and added comment.

* update

* General Reader files

* removed dependency on boost filesystems

* removed class

* clang-format

* added-comments

* further-cleanup

* added clang-formatting

* braces-for-if-else

* changed error messages, added option for windows file path

* fixed getFileName function

* cleanup

* option for filename without path

* further-cleanup

* added tests for determineFileFormat

* cleanup, const arguments for validate function

* init

* cleanup

* cleanup

* clang-format does not work for CMake

* added RDK_TEST_MULTITHREADED option

* add-flag

* cleanup

* Delete ConcurrentQueue.h

This PR deals with the Generalized File Reader.

* Delete testConcurrentQueue.cpp

This PR deals with the Generalized File Reader.

* no change

* concurrent queue

* print values

* Single Producer Multiple Consumer works

* cleanup

* Producer Consumer Example

* update queue methods and tests

* cleanup

* test

* fixed tests

* cleanup, updated tests

* Delete ProducerConsumer.h

* Delete testProducerConsumer.cpp

* cleanup

* futher cleanup

* changes based on feedback

* make queue non copyable

* psuedocode

* possible implementation

* untested implementation

* change class to typename

* basic-setup

* need to fix segfault

* need to fix blocking

* need to fix blocking

* need to fix blocking

* fix indentation

* one possibility

* without lambda function

* possible fix with some test cases

* performance tests

* added support for record id and item text

* cleanup

* cleanup

* fixed memory leak and added methods with tests for getting last id and item text

* cleanup

* added more test cases with different smi files

* cleanup

* SD mol supplier

* modified the parsing for SDMolSupplier

* cleanup

* cleanup

* new file for testing

* added support for reading molecule properties with tests

* thread-safe logging and exception handling

* cleanup

* without thread safe logging

* cleanup

* cleanup, modified MultithreadedSmilesMolSupplier

* cleanup, made reader and writer functions private

* move O2.sdf

* basic python wrapper with tests

* cleanup, added new methods for python wrappers

* made changes suggested by Andrew

* file and compression formats are case-insensitive

* cannot open files with gzstream

* cleanup

* possible fix for opening compressed streams (SMILES)

* removed seekg() and tellg() methods from multithreadeded suppliers

* cleanup

* test cases for python wrappers

* some wrapper cleanup

* cleanup, removed unused functions

* update the MT tests so that they actually do some work
also includes some cleanup here

* cleanup

* remove iterator_next header include

* added support for multithreaded readers

* use getNumThreadsToUse for multithreaded suppliers

* fixed documentation for multithreaded python wrappers

* commented performance test

* first draft of final evaluation report

* removed inline variables

* first draft getting started in python

* fixed typos in getting started in python

* fixed typos

* fix documentation tests

* fixed documentation tests

* added links to important files and PR

* added perfomance results

* first version of wrappers with compressed streams

* getting rid of streambuf stream method

* modified General File Reader

* make this work when building in non-threads mode

* rename a test

* rename a function in the python API

* rearrange the python test a bit

* disable the stream-based constructors in Python

* mark the multithreaded classes as experimental

Co-authored-by: greg landrum <greg.landrum@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants