-
Notifications
You must be signed in to change notification settings - Fork 845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DetermineBondOrders running out of memory on medium-sized disconnected structure #5902
Comments
I spent a little more time on this today, and came up with a slightly less pathological example. This molecule has the SMILES string
Importing the attached XYZ file will result in runaway memory usage, though I haven't waited to see if it crashes before I kill it because I see the memory go over 50GB. The bottleneck is these two lines. The above molecule has 20 oxygen atoms, so the variable Even if it could finish that allocation I wonder about the timing for the 4-fold nested It is interesting that the original python |
@jasondbiggs Thanks for the report and for doing the extra research. We'll take a look at it and see what we can do. @gosreya: do you have the time and inclination to look into this or shall I do it? |
@greglandrum Yes, I'll look into this! |
@gosreya thanks! |
@gosreya I had some time off over the holiday, so I took this as a puzzle to solve. It's not too difficult to emulate python's My fix works great for the test_20_ketones.txt example. It also makes the test_13_components.txt not die from consuming available memory, but it still just spins and spins because of the combinatorics. That will require a more extensive fix I think. |
@jasondbiggs Thank you for writing up a solution! Feel free to make a PR, or I can integrate your solution if you'd prefer to copy and paste- whatever works for you. I'd observed the same thing thing about test_13_components.txt when I ran it on the Python xyz2mol so yeah I suppose that case might require reworking of the algorithm itself. |
lazily generate the combinations of possible atom valences rather than computing them up front
I observed the same with this xyz data:
That molecule corresponds to CID=8576 from PubChem, with optimized geometry from PubChemQC Project.
|
This XYZ file is perhaps a bit pathological but it was in our test set. It has 13 disconnected fragments that have the same connectivity.
DetermineConnectivity
has no problem with this molecule, butDetermineBondOrders
runs away, consuming more and more memory, and is killed by the shell:If I take one isolated fragment from the input file and run that through (see attached
test_1_component.txt
) it can determine bond orders, although it does return a charge-separated molecule that isn't quite right (see #5888).I have run
DetermineBondOrders
on structures with many more atoms than this, even disconnected structures with more atoms: make a 3D table of ~50 benzene molecules translated in space and write them to an XYZ file, no issue withDetermineBondOrders
. But this one fails.Is the code somehow considering permutations of bond orders between fragments? I assumed
DetermineBondOrders
would work as if it iterated over the disconnected fragments.test_1_component.txt
test_13_components.txt
The text was updated successfully, but these errors were encountered: