I am currently trying to find a way to solve the following two part problem:
My current solution works by using a slightly modified version of the method described here, and does a good job at solving my problem. However, a common recurring case that cannot be solved by this method alone is the case where A and A' differ by charge/protonation state. In this case, I have tried to neutralize both A and A' using this method (modified to deal with rdkit mols only) before comparing them. This works to solve part 1 of the problem, but wipes the atom molAtomMapNumber properties from the atoms that the neutralization function interacts with, making completing the mapping impossible.
I haven't yet been able to figure out how to preserve atom numbers through the neutralization function I use - any idea about how to do this would be greatly appreciated. Additionally, if anyone has ideas for how to better solve this problem, I would love to hear them!
Am I correct in assuming that the atom mapping information you want to keep is located in the input molecules?
The atom map information is stored as atom properties. These are not copied along by the neutralization strategy from the RDKit Cookbook post. In order to make this work you would need to use an approach that preserves that information. I'm afraid that I can't come up with a quick answer to this one.
An easier solution may be possible if you can provide a bit more information about what your input looks like and what you would like to have for output.
Yes that is correct: the input molecule A contains the mapping information in its atom's molAtomMapNumber property.
One thing that would help is if somehow the mapping information could be carried through the substructure replacement done by the neutralization function. For instance, is there a way to specify in the smarts/smiles pairs used to neutralize charged groups which atoms map to which? An ideal solution would look like this:
[C:1][O-:2] --> replace_substruct [O-:J] with [O:J] --> [C:1][O:2]
where J is an atom number that can be taken from the substruct in the original molecule to be replaced. In this case, the replace substruct process may look like this:
I think my issue is in step two. I can define atom numbers in the smarts/smiles pairs for neutralization but completing the mapping while doing the replace_substruct call still seems tricky for me.