Conserve atom mapping when comparing compounds #1232

kovasap opened this Issue Dec 27, 2016 · 2 comments


None yet

2 participants

kovasap commented Dec 27, 2016

I am currently trying to find a way to solve the following two part problem:

  1. Given a compound A represented as an rdkit mol object, find a compound A' in a list of compounds (also represented as rdkit mols) that has the same structure as A.
  2. Take the two matching compounds (A and A') and generate a mapping between their atoms, which are labeled through the molAtomMapNumber atom property.

My current solution works by using a slightly modified version of the method described here, and does a good job at solving my problem. However, a common recurring case that cannot be solved by this method alone is the case where A and A' differ by charge/protonation state. In this case, I have tried to neutralize both A and A' using this method (modified to deal with rdkit mols only) before comparing them. This works to solve part 1 of the problem, but wipes the atom molAtomMapNumber properties from the atoms that the neutralization function interacts with, making completing the mapping impossible.

I haven't yet been able to figure out how to preserve atom numbers through the neutralization function I use - any idea about how to do this would be greatly appreciated. Additionally, if anyone has ideas for how to better solve this problem, I would love to hear them!

@greglandrum greglandrum added the question label Jan 3, 2017

Am I correct in assuming that the atom mapping information you want to keep is located in the input molecules?

The atom map information is stored as atom properties. These are not copied along by the neutralization strategy from the RDKit Cookbook post. In order to make this work you would need to use an approach that preserves that information. I'm afraid that I can't come up with a quick answer to this one.

An easier solution may be possible if you can provide a bit more information about what your input looks like and what you would like to have for output.

kovasap commented Jan 3, 2017

Yes that is correct: the input molecule A contains the mapping information in its atom's molAtomMapNumber property.

One thing that would help is if somehow the mapping information could be carried through the substructure replacement done by the neutralization function. For instance, is there a way to specify in the smarts/smiles pairs used to neutralize charged groups which atoms map to which? An ideal solution would look like this:

[C:1][O-:2] --> replace_substruct [O-:J] with [O:J] --> [C:1][O:2]

where J is an atom number that can be taken from the substruct in the original molecule to be replaced. In this case, the replace substruct process may look like this:

  1. find a charged substructure in the input molecule that matches a charged substructure in the list currently used by the neutralization function
  2. map atoms in the matching substructure in the molecule to the general atom numbers (or maybe some arbitrary numbers like 1, 2, etc.) in the neutralization function structure list
  3. use the mapping to change the numbers in the neutral substructure that will replace the charged one to match the original molecule's numbers
  4. do the replacement of substructures in the original molecule
  5. get a final product without lost mapping

I think my issue is in step two. I can define atom numbers in the smarts/smiles pairs for neutralization but completing the mapping while doing the replace_substruct call still seems tricky for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment