Skip to content

Multiple absolute stereo groups shouldn't be allowed on a single mol #8873

@rachelnwalker

Description

@rachelnwalker

Describe the bug
Currently, RDKit allows multiple absolute stereo groups on a mol. This technically goes against the biovia standard which specifies there should only be 1 ABS stereo group on a mol. It also doesn't really make sense to allow multiple, since separate stereo groups are meant to represent relative configurations which is meaningless if the absolute configuration is known.

To Reproduce
The main way this manifests as a bug is in molhash, since the ABS groups are separate in CXSMILES

from rdkit import Chem
from rdkit.Chem import RegistrationHash

def print_hash(mol):
    layers = RegistrationHash.GetMolLayers(mol)
    deduplication_hash = RegistrationHash.GetMolHash(layers)
    print("SMILES:", layers[RegistrationHash.HashLayer.CANONICAL_SMILES])
    print("Deduplication hash:", deduplication_hash)

one_abs = next(Chem.SDMolSupplier("one_abs_group.sdf"))
two_abs = next(Chem.SDMolSupplier("two_abs_groups.sdf"))

print_hash(one_abs)
print()
print_hash(two_abs)

SMILES: CCC@@HC@@HCC |a:2,4|
Deduplication hash: c348d1abb8b34acc193edac5d7077c7a968d73cd

SMILES: CCC@@HC@@HCC |a:2,a:4|
Deduplication hash: f5d04a0faf2b45866c6a50a6949b3c17f406127a

Expected behavior
For the bug described above, I think the same cxsmiles and hash should be written for both structures. But for a more general solution, there are a few options:

  1. Update the SDF and CXSMILES writers to combine ABS groups before writing to string
  2. Update the SDF reader to combine ABS groups on read (something similar is done for CXSMILES in stereogroups not combined when parsing CXSMILES #6050, except that change combines all stereo groups with the same ID, not just ABS)
  3. Update the enhanced stereo code to enforce the presence of only a single ABS stereo group, probably in SetStereoGroups
  4. Require any functionality that combines molecules that may have their own stereo groups to manually handle the merging of ABS groups

Additional context

We ran into this problem from structures generated from an rgroup enumeration implementation that simply copies over all stereo groups from the scaffold and rgroups onto the product. However, this also seems to be an issue in reaction enumeration, here is a quick example:

from rdkit import Chem
from rdkit.Chem import AllChem


rxn = AllChem.ReactionFromSmarts('[C,c,n,N,o,O:1][#0].[#0][*:2]>>[C,c,n,N,o,O:1]-[*:2]')

scaff = Chem.MolFromMolBlock("""
     RDKit          2D

  0  0  0  0  0  0  0  0  0  0999 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 10 10 0 0 0
M  V30 BEGIN ATOM
M  V30 1 C 0.008608 -4.200000 0.000000 0
M  V30 2 C 1.245787 -3.485714 0.000000 0
M  V30 3 C 1.245787 -2.057143 0.000000 0
M  V30 4 C 0.008608 -1.342857 0.000000 0
M  V30 5 C -1.228571 -2.057143 0.000000 0
M  V30 6 C -1.228571 -3.485714 0.000000 0
M  V30 7 C 2.482966 -1.342857 0.000000 0
M  V30 8 C 3.720145 -2.057143 0.000000 0
M  V30 9 C 2.482966 0.085714 0.000000 0
M  V30 10 R# 3.720145 0.800000 0.000000 101 RGROUPS=(1 1)
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 1 1 1 2
M  V30 2 1 2 3
M  V30 3 1 3 4
M  V30 4 1 4 5
M  V30 5 1 5 6
M  V30 6 1 6 1
M  V30 7 1 3 7
M  V30 8 1 7 8 CFG=1
M  V30 9 1 7 9
M  V30 10 1 9 10
M  V30 END BOND
M  V30 BEGIN COLLECTION
M  V30 MDLV30/STEABS ATOMS=(1 7)
M  V30 END COLLECTION
M  V30 END CTAB
M  END
$$$$

""")
rg = Chem.MolFromSmiles("*CC[C@@H](N)C1CCCCC1 |$_AP1;;;;;;;;;;$,a:3|")
product = rxn.RunReactants((scaff, rg))[0][0]
product.RemoveAllConformers()
for at in product.GetAtoms():
    for prop in at.GetPropNames():
        at.ClearProp(prop)
print(Chem.MolToCXSmiles(product))

CC@HC1CCCCC1 |a:1,a:5|

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions