You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I propose that the rdkit/Contrib/mmpa/indexing.py:get_symmetry_class() change from using Chem.AssignStereochemistry() to use the new CanonicalRankAtoms().
I was just wondering if it there is a function to get the "symmetry class" of an atom in a molecule using rdkit ? Just to clarify, what I mean by "symmetry class" is that it is an arbitrary label for atoms that are symmetrically equivalent in a molecule - so to identify atoms that are symmetrically equivalent you just look for atoms that have the same symmetry class.
On Aug 5, 2011, at 6:43 AM, Greg Landrum wrote:
I did a bit more investigation this morning, and it turns out that it is possible to get the required values already. Here's an example:
>>> m = Chem.MolFromSmiles('c1ncccc1c1cccnc1')
>>> Chem.AssignStereochemistry(m,cleanIt=True,force=True,flagPossibleStereoCenters=True)
>>> [atom.GetProp('_CIPRank') for atom in m.GetAtoms()]
['4', '5', '3', '1', '0', '2', '2', '0', '1', '3', '5', '4']
The key pieces is the "flagPossibleStereoCenters" argument, which tells the stereochemistry assignment code to determine the CIP ranks of atoms even if it doesn't find a potential stereocenter in the molecule.
This insight is used in the MMPA code in rdkit/Contrib/mmpa/indexing.py .
Earlier this year, as a result of github issue #414 at #421 (titled "expose MolOps::rankAtoms() and MolOps::rankAtomsInFragment() to python"), there is a new function named CanonicalRankAtoms(). I decided to investigate if it was more appropriate for the MMPA code, and concluded that it should be used instead of the older CIP code.
I wrote a short program which loops through a set of structures to compute the two assignment lists:
Chem.AssignStereochemistry(mol, flagPossibleStereoCenters=True)
cip = [int(a.GetProp('_CIPRank')) for a in mol.GetAtoms()]
rank = list(Chem.CanonicalRankAtoms(mol, breakTies=False))
This creates values like:
[1, 3, 2, 6, 0, 0, 2, 6, 3, 1, 5, 4, 4, 5]
I also wrote a helper function to convert those lists into a set of symmetry class assignments:
from collections import defaultdict
def get_groups(values):
value_to_indices = defaultdict(list)
for i, value in enumerate(values):
value_to_indices[value].append(i)
groups = set(tuple(indices) for indices in value_to_indices.values())
return groups
>>> get_groups([1, 3, 2, 6, 0, 0, 2, 6, 3, 1, 5, 4, 4, 5])
set([(2, 6), (4, 5), (10, 13), (1, 8), (0, 9), (3, 7), (11, 12)])
>>> sorted(get_groups([1, 3, 2, 6, 0, 0, 2, 6, 3, 1, 5, 4, 4, 5]))
[(0, 9), (1, 8), (2, 6), (3, 7), (4, 5), (10, 13), (11, 12)]
In this case, the atom with index 10 is in the same group as the atom with index 13. These sets should be identical if and only if the AssignStereochemistry() and CanonicalRankAtoms() assign atoms to the same symmetry groups.
I processed 86781 isomeric SMILES from PubChem, selected with the following:
As charge is important for MMPA, I believe this example also shows that CanonicalRankAtom() is the better ranking method than AssignStereochemistry(), for what MMPA needs.
I propose that the rdkit/Contrib/mmpa/indexing.py:get_symmetry_class() change from using Chem.AssignStereochemistry() to use the new CanonicalRankAtoms().
I'll start by dredging up an old conversation, which you can see at http://sourceforge.net/p/rdkit/mailman/message/27902778/
On Aug 3, 2011, at 8:58 PM, Alan Smith wrote:
On Aug 5, 2011, at 6:43 AM, Greg Landrum wrote:
The key pieces is the "flagPossibleStereoCenters" argument, which tells the stereochemistry assignment code to determine the CIP ranks of atoms even if it doesn't find a potential stereocenter in the molecule.
This insight is used in the MMPA code in rdkit/Contrib/mmpa/indexing.py .
Earlier this year, as a result of github issue #414 at #421 (titled "expose MolOps::rankAtoms() and MolOps::rankAtomsInFragment() to python"), there is a new function named CanonicalRankAtoms(). I decided to investigate if it was more appropriate for the MMPA code, and concluded that it should be used instead of the older CIP code.
I wrote a short program which loops through a set of structures to compute the two assignment lists:
This creates values like:
I also wrote a helper function to convert those lists into a set of symmetry class assignments:
In this case, the atom with index 10 is in the same group as the atom with index 13. These sets should be identical if and only if the AssignStereochemistry() and CanonicalRankAtoms() assign atoms to the same symmetry groups.
I processed 86781 isomeric SMILES from PubChem, selected with the following:
and found 25 differences. The full script is attached.
Here's one example:
In that one, the topologies are the same but there isn't perfect symmetry. To get perfect symmetry, the current SMILES:
should be
I think the CanonicalRankAtom is correct (in the context of the needs for the MMPA code) in saying these are different.
Here's another one:
with the simpler test case of:
The difference seems to be the [N-] as changing the charge does not affect the CIP rank:
while it does get CanonicalRankAtoms() to give a symmetrical assignment:
As charge is important for MMPA, I believe this example also shows that CanonicalRankAtom() is the better ranking method than AssignStereochemistry(), for what MMPA needs.
See https://gist.github.com/adalke/fe39441b22251c0c1a5a for a patch and for the comparison code.
The text was updated successfully, but these errors were encountered: