You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When creating canonical SMILES from an RDKit molecule with additional context, said context will be ignored for canonicalization. The issue is rather difficult to describe, so please be patient and ask if my explanations are unclear.
The generated SMILES look the same but the atom indexes are in different order depending from what input the molecule was generated. For my use-case that is relevant that atom index stay the same
Example 1 with SMARTS:
fromrdkitimportChemm1=Chem.MolFromSmarts("[C]!@;:C-,=C(-[C&R1])-C")
m2=Chem.MolFromSmarts("C-C(-[C&R1])-,=C!@;:[C]")
Chem.MolToSmiles(m1)
# CC(C)C~CChem.MolToSmiles(m2)
# CC(C)C~C# as said output is the same but now let's look at atom indicesm1.GetProp("_smilesAtomOutputOrder")
# [3,2,4,1,0,]m2.GetProp("_smilesAtomOutputOrder")
# [0,1,2,3,4,]
Depending on how the molecule was created, the canonical SMILES can start with either the "C" atom or the "[C&R1]" atom. This isn't even clear from above image. Since "C" and "[C&R1]" are not the same and to "break a tie" it would be very helpful to take additional context onto account like a query so that the order is clear (either query always first or vice versa).
Example 2 with attachment point:
Same issue here. The same SMILES *OC.c1ccncc1 is generated but the atom order is different. The difference in atom order leads to the result that the "variable attachment atoms" get a different atom index, namely 3,7,8 for m and 3,4,5 for m1.
in case of a tie (eg. for the final SMILES the order doesn't matter), additional context in this case participation in a variable attachment should be taken into account to get a canonical order.
Describe the solution you'd like
When canonicalizing to SMILES, additional context should be considered in case that additional context needs to be used elsewhere and linked to the canonical SMILES by atom index.
Hi @kienerj. I'd love to be able to do this, but canonicalizing queries is quite involved (almost a small research project) and not something which is likely to show up any time soon.
Is your feature request related to a problem? Please describe.
When creating canonical SMILES from an RDKit molecule with additional context, said context will be ignored for canonicalization. The issue is rather difficult to describe, so please be patient and ask if my explanations are unclear.
The generated SMILES look the same but the atom indexes are in different order depending from what input the molecule was generated. For my use-case that is relevant that atom index stay the same
Example 1 with SMARTS:
Depending on how the molecule was created, the canonical SMILES can start with either the "C" atom or the "[C&R1]" atom. This isn't even clear from above image. Since "C" and "[C&R1]" are not the same and to "break a tie" it would be very helpful to take additional context onto account like a query so that the order is clear (either query always first or vice versa).
Example 2 with attachment point:
Same issue here. The same SMILES
*OC.c1ccncc1
is generated but the atom order is different. The difference in atom order leads to the result that the "variable attachment atoms" get a different atom index, namely 3,7,8 for m and 3,4,5 for m1.in case of a tie (eg. for the final SMILES the order doesn't matter), additional context in this case participation in a variable attachment should be taken into account to get a canonical order.
Describe the solution you'd like
When canonicalizing to SMILES, additional context should be considered in case that additional context needs to be used elsewhere and linked to the canonical SMILES by atom index.
Additional context
Molfiles for molecules with attachment points:
The text was updated successfully, but these errors were encountered: