Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory overflow with GetBestRMS when removeHs=False #320

Closed
jandom opened this issue Aug 26, 2014 · 3 comments
Closed

Memory overflow with GetBestRMS when removeHs=False #320

jandom opened this issue Aug 26, 2014 · 3 comments
Labels
Milestone

Comments

@jandom
Copy link
Contributor

jandom commented Aug 26, 2014

By memory overflow I don't mean an actual exception: my laptop just stops responding when its 20GB of ram become exhausted ;]

The problem is super-simple: i load a molecule from mol file,

mol = Chem.MolFromMolFile(f, removeHs=False)
ref = Chem.MolFromMolFile(f, removeHs=False)

no hydrogens removed, and try to rms align the two mol objects.

rms = AllChem.GetBestRMS(ref, mol) 

This explodes - memory is just filled up until my machine crashes. I looked at the molecules visually and they seem totally fine. The problem disappears when i set removeHs=True.

Example script and inputs to reproduce the problem are accessible via link below. This test case was created with rdkit from 94aef1f

https://drive.google.com/file/d/0BzI3NK6qw0lJV0RGUU1EM3ZORm8/edit?usp=sharing

@greglandrum
Copy link
Member

GetBestRMS() has a "small" problem with combinatorial explosions when there are Hs in the molecule. There are some notes about this, and suggestions about what to do, here:
https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03417.html

The short description of the problem is that every CH3 increases the number of possible alignments by a factor of 3, and every CH2 group doubles the work.

Something as simple as butane has 288 possible mappings that need to be explored when Hs are taken into account:

In [2]: m= Chem.AddHs(Chem.MolFromSmiles('CCCC'))

In [3]: len(m.GetSubstructMatches(m,uniquify=False))
Out[3]: 288

@jandom
Copy link
Contributor Author

jandom commented Aug 26, 2014

Oh, ok. So the RMS in rdkit attempts all the atom-order permutations? My expectation (from protein alignment) was that even structures with multiple thousands of atoms can be aligned in a flash but that's provided atom order is preserved.

@greglandrum
Copy link
Member

Yeah, AllChem.GetBestRMS() tries all possible atom-order permutations that are compatible with the topological symmetry. If you know the atom order you can just use AllChem.AlignMol(); this only does one alignment and is much faster.

jandom added a commit to jandom/rdkit that referenced this issue Aug 27, 2014
- minor changes,
- improve docs, note about the combinatorial explosion,
- suggest alternative in AlignMol,
- issue a warning if number of matches is above an aribtrary cutoff
greglandrum added a commit that referenced this issue Aug 28, 2014
Issue #320 Making GetBestRMS more idiot-proof
@jandom jandom closed this as completed Aug 29, 2014
@greglandrum greglandrum added this to the 2014_09_1 milestone Oct 1, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants