Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atom map from substructure match flips indices for symmetrical atoms #27

Closed
ChayaSt opened this issue Feb 8, 2019 · 4 comments
Closed
Labels
bug Something isn't working

Comments

@ChayaSt
Copy link
Collaborator

ChayaSt commented Feb 8, 2019

This is related to openforcefield/cmiles#15

@ChayaSt ChayaSt added the bug Something isn't working label Feb 8, 2019
@ChayaSt
Copy link
Collaborator Author

ChayaSt commented Feb 17, 2019

Turns out it also flips map indices with the newer solution in openforcefield/cmiles#15

Example:
Mapped SMILES generated with cmiles:
'[H:7][C:1]1([C:2]([C:4]([C:5]([C:3]1([H:11])[H:12])([H:15])[C:6]([H:16])([H:17])[H:18])([H:13])[H:14])([H:9])[H:10])[H:8]'

Generating molecule from mapped SMILES generates this mapping:
image

Generating molecule from canonical SMILES, then canonicalizing the order and adding map indices as +1 of atom index (the way cmiles generates the mapped SMILES) generates this mapping:

image

@j-wags, this means that the logic I implemented here openforcefield/cmiles#15 fails for this molecule because when the molecule's atom order generated from the mapped SMILES is canonicalized, the map indices are not +1 the atom indices.
For this case, doing a substructure search does find the right ordering for the first match. It might be best to use the substructure search and find the match that corresponds to the mapped SMILES.

@jchodera
Copy link
Member

jchodera commented Feb 17, 2019

This problem is likely unavoidable, which is why for fragmenter pipelines we always want to generate the molecule with the ordering we want and continue to use that ordering/mapping throughout the remainder of the pipeline.

@ChayaSt
Copy link
Collaborator Author

ChayaSt commented Feb 18, 2019

The problem arises from the way fragmenter was designed.
Fragmenter generates a list of fragments with cmiles IDs before the fragments have geometries as seen in this planning image:
image
The way to remedy this is to have cmiles only generate a mapped SMILES if the input molecule also has geometry. This will create the following changes in fragmenter and cmiles:

  1. fragment JSON will not have mapped SMILES
  2. cmiles will allow OEMol inputs (currently only isomeric explicit hydrogen SMILES or QCJSON are allowed)
  3. cmiles will only generate mapped SMILES if the input molecule has a geometry. It will first check that the molecule with geometry is in canonical order, and if not reorder it. It can be a good idea to have a strict flag here to allow mapped SMILES without geometries.

@ChayaSt
Copy link
Collaborator Author

ChayaSt commented Jul 12, 2019

This is not a concern because the since the atoms are symmetrical - it is the same molecule.

@ChayaSt ChayaSt closed this as completed Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants