-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exceptionally long runtime for a few organic structures from materials project #20
Comments
Glad you're finding xyz2mol useful! Molecules with many nitro and phosphate groups will take long and there isn't really a general way to speed it up without changing the entire approach. One could make some hacks that identify these specific groups and deal with them differently but I am reluctant to put that in the official version of the code. Let me know if you want to implement it locally and I can give you some tips. I also note that xyz2mol didn't identify the bonding correctly on the second molecule. Maybe removing the Huckel option will fix it. In general it's a good idea to use both and visually inspect those where they differ as a sanity check. |
Alright, thanks a lot! I think it should be fine for now since only very few structures seem to be effected. Also, I will keep the sanity check with and without Huckel option in mind, thanks for the hint! |
I just tried the second example without Huckel and it finished in less than a second and obtained the correct bonding (however, the first example also takes a long time without Huckel). Is there a general rule which of the two approaches is faster/more reliable or does it strongly depend on the structure? |
In my experience, the Huckel option is more reliable. In fact, I was really surprised to see that it failed for molecule 2. Molecules with many nitro groups will always take a long time, but it's hard to predict in general. Anyway, just be aware that xyz2mol will occasionally screw up and it's hard to predict if and when it happens. |
Thanks again! Is it always the same lines of code where molecules get stuck when it takes so long, e.g. a loop? |
it is the loop over |
Okay, I should be fine with handling these special cases then. |
Hello,
first of all thanks for this great script! It is really useful and does a good job at solving this tricky task.
I was using it on a few thousand organic molecules from the materials project database and realized that a few structures always lead to exceptionally long runtime (~10 minutes compared to <1 second for most other molecules).
I'm calling with:
python xyz2mol.py molecule.xyz --use-huckel --charge 0
When I set
--no-charged-fragments
the calculation instead takes only ~1 minute but this is still a lot longer than for other structures.Do you have any idea why these structures take so long? Is there anything I could do about it?
You can find the .xyz of two example structures and the resulting SMILES strings below:
N#CC(C#N)=C1c2cc([N+](=O)[O-])cc([N+](=O)[O-])c2-c2c1cc([N+](=O)[O-])cc2[N+](=O)[O-]
[NH2+]=C1N=CN=C2N3[C@@H]4O[C@H](CO[P@@](=O)(O[P@](=O)(O)O[P@@]([O-])(O)=[OH+])OC35[N-]C125)[C@@H](O)[C@H]4O
The text was updated successfully, but these errors were encountered: