-
Notifications
You must be signed in to change notification settings - Fork 845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory overflow with GetBestRMS when removeHs=False #320
Comments
GetBestRMS() has a "small" problem with combinatorial explosions when there are Hs in the molecule. There are some notes about this, and suggestions about what to do, here: The short description of the problem is that every CH3 increases the number of possible alignments by a factor of 3, and every CH2 group doubles the work. Something as simple as butane has 288 possible mappings that need to be explored when Hs are taken into account: In [2]: m= Chem.AddHs(Chem.MolFromSmiles('CCCC'))
In [3]: len(m.GetSubstructMatches(m,uniquify=False))
Out[3]: 288 |
Oh, ok. So the RMS in rdkit attempts all the atom-order permutations? My expectation (from protein alignment) was that even structures with multiple thousands of atoms can be aligned in a flash but that's provided atom order is preserved. |
Yeah, AllChem.GetBestRMS() tries all possible atom-order permutations that are compatible with the topological symmetry. If you know the atom order you can just use AllChem.AlignMol(); this only does one alignment and is much faster. |
- minor changes, - improve docs, note about the combinatorial explosion, - suggest alternative in AlignMol, - issue a warning if number of matches is above an aribtrary cutoff
Issue #320 Making GetBestRMS more idiot-proof
By memory overflow I don't mean an actual exception: my laptop just stops responding when its 20GB of ram become exhausted ;]
The problem is super-simple: i load a molecule from mol file,
no hydrogens removed, and try to rms align the two mol objects.
This explodes - memory is just filled up until my machine crashes. I looked at the molecules visually and they seem totally fine. The problem disappears when i set removeHs=True.
Example script and inputs to reproduce the problem are accessible via link below. This test case was created with rdkit from 94aef1f
https://drive.google.com/file/d/0BzI3NK6qw0lJV0RGUU1EM3ZORm8/edit?usp=sharing
The text was updated successfully, but these errors were encountered: