-
Notifications
You must be signed in to change notification settings - Fork 996
Multiple absolute stereo groups shouldn't be allowed on a single mol #8873
Description
Describe the bug
Currently, RDKit allows multiple absolute stereo groups on a mol. This technically goes against the biovia standard which specifies there should only be 1 ABS stereo group on a mol. It also doesn't really make sense to allow multiple, since separate stereo groups are meant to represent relative configurations which is meaningless if the absolute configuration is known.
To Reproduce
The main way this manifests as a bug is in molhash, since the ABS groups are separate in CXSMILES
from rdkit import Chem
from rdkit.Chem import RegistrationHash
def print_hash(mol):
layers = RegistrationHash.GetMolLayers(mol)
deduplication_hash = RegistrationHash.GetMolHash(layers)
print("SMILES:", layers[RegistrationHash.HashLayer.CANONICAL_SMILES])
print("Deduplication hash:", deduplication_hash)
one_abs = next(Chem.SDMolSupplier("one_abs_group.sdf"))
two_abs = next(Chem.SDMolSupplier("two_abs_groups.sdf"))
print_hash(one_abs)
print()
print_hash(two_abs)
SMILES: CCC@@HC@@HCC |a:2,4|
Deduplication hash: c348d1abb8b34acc193edac5d7077c7a968d73cd
SMILES: CCC@@HC@@HCC |a:2,a:4|
Deduplication hash: f5d04a0faf2b45866c6a50a6949b3c17f406127a
Expected behavior
For the bug described above, I think the same cxsmiles and hash should be written for both structures. But for a more general solution, there are a few options:
- Update the SDF and CXSMILES writers to combine ABS groups before writing to string
- Update the SDF reader to combine ABS groups on read (something similar is done for CXSMILES in stereogroups not combined when parsing CXSMILES #6050, except that change combines all stereo groups with the same ID, not just ABS)
- Update the enhanced stereo code to enforce the presence of only a single ABS stereo group, probably in
SetStereoGroups - Require any functionality that combines molecules that may have their own stereo groups to manually handle the merging of ABS groups
Additional context
We ran into this problem from structures generated from an rgroup enumeration implementation that simply copies over all stereo groups from the scaffold and rgroups onto the product. However, this also seems to be an issue in reaction enumeration, here is a quick example:
from rdkit import Chem
from rdkit.Chem import AllChem
rxn = AllChem.ReactionFromSmarts('[C,c,n,N,o,O:1][#0].[#0][*:2]>>[C,c,n,N,o,O:1]-[*:2]')
scaff = Chem.MolFromMolBlock("""
RDKit 2D
0 0 0 0 0 0 0 0 0 0999 V3000
M V30 BEGIN CTAB
M V30 COUNTS 10 10 0 0 0
M V30 BEGIN ATOM
M V30 1 C 0.008608 -4.200000 0.000000 0
M V30 2 C 1.245787 -3.485714 0.000000 0
M V30 3 C 1.245787 -2.057143 0.000000 0
M V30 4 C 0.008608 -1.342857 0.000000 0
M V30 5 C -1.228571 -2.057143 0.000000 0
M V30 6 C -1.228571 -3.485714 0.000000 0
M V30 7 C 2.482966 -1.342857 0.000000 0
M V30 8 C 3.720145 -2.057143 0.000000 0
M V30 9 C 2.482966 0.085714 0.000000 0
M V30 10 R# 3.720145 0.800000 0.000000 101 RGROUPS=(1 1)
M V30 END ATOM
M V30 BEGIN BOND
M V30 1 1 1 2
M V30 2 1 2 3
M V30 3 1 3 4
M V30 4 1 4 5
M V30 5 1 5 6
M V30 6 1 6 1
M V30 7 1 3 7
M V30 8 1 7 8 CFG=1
M V30 9 1 7 9
M V30 10 1 9 10
M V30 END BOND
M V30 BEGIN COLLECTION
M V30 MDLV30/STEABS ATOMS=(1 7)
M V30 END COLLECTION
M V30 END CTAB
M END
$$$$
""")
rg = Chem.MolFromSmiles("*CC[C@@H](N)C1CCCCC1 |$_AP1;;;;;;;;;;$,a:3|")
product = rxn.RunReactants((scaff, rg))[0][0]
product.RemoveAllConformers()
for at in product.GetAtoms():
for prop in at.GetPropNames():
at.ClearProp(prop)
print(Chem.MolToCXSmiles(product))
CC@HC1CCCCC1 |a:1,a:5|