-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
init draft aromatic #38
Conversation
Many thanks! First your questions, I'll have a look at the code as well.
Wildcard should always be able to form a double bond without inducing a charge, so they should be part of the delocalized subgraph. I'm not a hundred percent sure why
I need to brood on this. Can this be summarized in such a general way, or do we need to do actualy octet-rule?
This is a problem IMO, naphthalene does show DIME. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! I do think it can be cleaned up a little bit more though.
We can also draw some inspiration from https://github.com/aspuru-guzik-group/selfies/blob/master/selfies/mol_graph.py#L287
We can assume that any atom that did not specify any number of hydrogens is saturated and can be pruned from the DS. So no need to prefill the valence.
The wildcard atoms can become an issue if they result in a non-perfect matching, so maybe all wildcards that are not in a maximal matching should be removed before checking whether it's perfect.
@pckroon small update: so naphthalene is correctly assigned (now). I had an earlier error. All systems that show DIME are identified as aromatic as far as test-cases go. Only those like Thiophene are not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it. Small nitpick on some code, and a doubt on the indole test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work, I love it <3
Could you add 2 more testcases? One with a triangle, and one that cannot be kekulized and trips the error. That should bring the coverage back up |
This is the initial draft of the new aromaticity algorithm following the ideas outlined here and here.
The key difference is that we are not trying to assign chemical aromaticity but rather kekulize the molecule (i.e. fixing hcount). In a nutshell, the algorithm proceeds as follows:
5a. If the system is cyclic we assume it to be (anti-)aromatic and give it a bond order of 1.5.
5b. If it is not cyclic then we simply assign a bond order of 2 to the edges that constitute the perfect matching.
Some differences in behaviour to the previous version:
The molecule from this blog-post mentioned in #19 is also fixed.
Overall I'd say this algorithm is more robust as it raises an Error for hard fails like
c1cncc1
but is also linenet towards chemically intuative smiles likecccc
.The major problems are:
[*]1[*][*]1
is not aromatic anymore.missing = valance - bonds + charges