You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug Chem.SDMolSupplier raises warnings when reading a V3000 SD file made by Dassault/Biovia/Scitegic software, and ignores the enhanced stereochemistry information, whereas the same molecule written out to a V3000 SDF from within rdkit can be read back correctly.
To Reproduce
Run:
with Chem.SDMolSupplier('mol_with_enhanced_stereo_2_And_groups.sdf') as SDF:
ms2 = [m for m in SDF if m is not None]
Expected behavior
Both files should result in a V3000 MolBlock like the following:
RDKit 2D
0 0 0 0 0 0 0 0 0 0999 V3000
M V30 BEGIN CTAB
M V30 COUNTS 22 24 0 0 0
M V30 BEGIN ATOM
M V30 1 O 7.414605 -6.052405 0.000000 0
M V30 2 C 6.201079 -6.934083 0.000000 0
M V30 3 N 4.830761 -6.323978 0.000000 0
M V30 4 C 4.673969 -4.832195 0.000000 0
M V30 5 C 3.303650 -4.222090 0.000000 0
M V30 6 C 2.004612 -4.972090 0.000000 0
M V30 7 C 0.889895 -3.968394 0.000000 0
M V30 8 C 1.500000 -2.598076 0.000000 0
M V30 9 C 0.750000 -1.299038 0.000000 0
M V30 10 C 1.500000 0.000000 0.000000 0
M V30 11 C 0.750000 1.299038 0.000000 0
M V30 12 C -0.750000 1.299038 0.000000 0
M V30 13 C -1.500000 0.000000 0.000000 0
M V30 14 C -0.750000 -1.299038 0.000000 0
M V30 15 O 2.991783 -2.754869 0.000000 0
M V30 16 N 6.357872 -8.425866 0.000000 0
M V30 17 C 7.728190 -9.035971 0.000000 0
M V30 18 C 9.027228 -8.285971 0.000000 0
M V30 19 O 10.141946 -9.289667 0.000000 0
M V30 20 C 9.531841 -10.659985 0.000000 0
M V30 21 C 8.040058 -10.503192 0.000000 0
M V30 22 O 7.036362 -11.617910 0.000000 0
M V30 END ATOM
M V30 BEGIN BOND
M V30 1 2 1 2
M V30 2 1 2 3
M V30 3 1 3 4
M V30 4 1 5 4 CFG=3
M V30 5 1 5 6
M V30 6 1 6 7
M V30 7 1 8 7 CFG=3
M V30 8 1 8 9
M V30 9 2 9 10
M V30 10 1 10 11
M V30 11 2 11 12
M V30 12 1 12 13
M V30 13 2 13 14
M V30 14 1 8 15
M V30 15 1 2 16
M V30 16 1 17 16 CFG=3
M V30 17 1 17 18
M V30 18 1 18 19
M V30 19 1 19 20
M V30 20 1 20 21
M V30 21 1 21 22 CFG=3
M V30 22 1 15 5
M V30 23 1 21 17
M V30 24 1 14 9
M V30 END BOND
M V30 BEGIN COLLECTION
M V30 MDLV30/STERAC1 ATOMS=(2 5 8)
M V30 MDLV30/STERAC2 ATOMS=(2 17 21)
M V30 END COLLECTION
M V30 END CTAB
M END
Instead, the first file (the one causing warnings) results in the following V3000 MolBlock:
2 And groups, from CXSMILES
RDKit 2D
0 0 0 0 0 0 0 0 0 0999 V3000
M V30 BEGIN CTAB
M V30 COUNTS 22 24 0 0 0
M V30 BEGIN ATOM
M V30 1 O 7.414600 -6.052410 0.000000 0
M V30 2 C 6.201080 -6.934080 0.000000 0
M V30 3 N 4.830760 -6.323980 0.000000 0
M V30 4 C 4.673970 -4.832190 0.000000 0
M V30 5 C 3.303650 -4.222090 0.000000 0
M V30 6 C 2.004610 -4.972090 0.000000 0
M V30 7 C 0.889900 -3.968390 0.000000 0
M V30 8 C 1.500000 -2.598080 0.000000 0
M V30 9 C 0.750000 -1.299040 0.000000 0
M V30 10 C 1.500000 0.000000 0.000000 0
M V30 11 C 0.750000 1.299040 0.000000 0
M V30 12 C -0.750000 1.299040 0.000000 0
M V30 13 C -1.500000 0.000000 0.000000 0
M V30 14 C -0.750000 -1.299040 0.000000 0
M V30 15 O 2.991780 -2.754870 0.000000 0
M V30 16 N 6.357870 -8.425870 0.000000 0
M V30 17 C 7.728190 -9.035970 0.000000 0
M V30 18 C 9.027230 -8.285970 0.000000 0
M V30 19 O 10.141950 -9.289670 0.000000 0
M V30 20 C 9.531840 -10.659990 0.000000 0
M V30 21 C 8.040060 -10.503190 0.000000 0
M V30 22 O 7.036360 -11.617910 0.000000 0
M V30 END ATOM
M V30 BEGIN BOND
M V30 1 2 1 2
M V30 2 1 2 3
M V30 3 1 3 4
M V30 4 1 5 4 CFG=3
M V30 5 1 5 6
M V30 6 1 6 7
M V30 7 1 8 7 CFG=3
M V30 8 1 8 9
M V30 9 2 9 10
M V30 10 1 10 11
M V30 11 2 11 12
M V30 12 1 12 13
M V30 13 2 13 14
M V30 14 1 8 15
M V30 15 1 2 16
M V30 16 1 17 16 CFG=3
M V30 17 1 17 18
M V30 18 1 18 19
M V30 19 1 19 20
M V30 20 1 20 21
M V30 21 1 21 22 CFG=3
M V30 22 1 15 5
M V30 23 1 21 17
M V30 24 1 14 9
M V30 END BOND
M V30 END CTAB
M END
Screenshots
Not applicable.
Configuration (please complete the following information):
RDKit version: 2021.09.2 build py39hccf6a74_0
OS: CentOS Linux 7
Python version (if relevant): 3.9.7
Are you using conda? yes
If you are using conda, which channel did you install the rdkit from? conda-forge
If you are not using conda: how did you install the RDKit? NA
Additional context
Not applicable.
The text was updated successfully, but these errors were encountered:
In [1]: from rdkit import Chem
In [2]: Chem.MolFromMolFile('/Users/wandschn/Downloads/mol_with_enhanced_stereo_2_And_groups.sdf')
[16:16:42] Skipping unrecognized collection type at line 58: MDLV30/STERAC1 ATOMS=(2 5 8)
[16:16:42] Skipping unrecognized collection type at line 59: MDLV30/STERAC2 ATOMS=(2 17 21)
It appears that lines 58 & 59 have trailing spaces. This is absolutely a bug in the regex we're using to parse this line.
* Improved regex whitespace handling
Change was made in the parseEnhancedStereo function
* Files for Github #5165 test case
Both files use enhanced stereochemistry, but differ in whitespace content
* Test case for Github Issue #5165
Catches whitespace parsing error
* Improves test case check
Makes test case more specific, less prone to potential invalid access to container
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Improves test case check
Makes test case more specific, less prone to potential invalid access to container
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Update test case "Github #5165"
Add 'require(mol)' to confirm valid molecule before additional testing
* Cleans up test for Issue #5165
* Cleans up test for Issue #5165
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Improved regex whitespace handling
Change was made in the parseEnhancedStereo function
* Files for Github #5165 test case
Both files use enhanced stereochemistry, but differ in whitespace content
* Test case for Github Issue #5165
Catches whitespace parsing error
* Improves test case check
Makes test case more specific, less prone to potential invalid access to container
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Improves test case check
Makes test case more specific, less prone to potential invalid access to container
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Update test case "Github #5165"
Add 'require(mol)' to confirm valid molecule before additional testing
* Cleans up test for Issue #5165
* Cleans up test for Issue #5165
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
Describe the bug
Chem.SDMolSupplier
raises warnings when reading a V3000 SD file made by Dassault/Biovia/Scitegic software, and ignores the enhanced stereochemistry information, whereas the same molecule written out to a V3000 SDF from within rdkit can be read back correctly.To Reproduce
Run:
on file:
mol_with_enhanced_stereo_2_And_groups.sdf.txt
after renaming it to .sdf (GitHub did not accept sdf as file type)
For comparison, run:
on file:
m_with_enh_stereo.sdf.txt
(again after renaming it to .sdf).
Expected behavior
Both files should result in a V3000 MolBlock like the following:
0 0 0 0 0 0 0 0 0 0999 V3000
M V30 BEGIN CTAB
M V30 COUNTS 22 24 0 0 0
M V30 BEGIN ATOM
M V30 1 O 7.414605 -6.052405 0.000000 0
M V30 2 C 6.201079 -6.934083 0.000000 0
M V30 3 N 4.830761 -6.323978 0.000000 0
M V30 4 C 4.673969 -4.832195 0.000000 0
M V30 5 C 3.303650 -4.222090 0.000000 0
M V30 6 C 2.004612 -4.972090 0.000000 0
M V30 7 C 0.889895 -3.968394 0.000000 0
M V30 8 C 1.500000 -2.598076 0.000000 0
M V30 9 C 0.750000 -1.299038 0.000000 0
M V30 10 C 1.500000 0.000000 0.000000 0
M V30 11 C 0.750000 1.299038 0.000000 0
M V30 12 C -0.750000 1.299038 0.000000 0
M V30 13 C -1.500000 0.000000 0.000000 0
M V30 14 C -0.750000 -1.299038 0.000000 0
M V30 15 O 2.991783 -2.754869 0.000000 0
M V30 16 N 6.357872 -8.425866 0.000000 0
M V30 17 C 7.728190 -9.035971 0.000000 0
M V30 18 C 9.027228 -8.285971 0.000000 0
M V30 19 O 10.141946 -9.289667 0.000000 0
M V30 20 C 9.531841 -10.659985 0.000000 0
M V30 21 C 8.040058 -10.503192 0.000000 0
M V30 22 O 7.036362 -11.617910 0.000000 0
M V30 END ATOM
M V30 BEGIN BOND
M V30 1 2 1 2
M V30 2 1 2 3
M V30 3 1 3 4
M V30 4 1 5 4 CFG=3
M V30 5 1 5 6
M V30 6 1 6 7
M V30 7 1 8 7 CFG=3
M V30 8 1 8 9
M V30 9 2 9 10
M V30 10 1 10 11
M V30 11 2 11 12
M V30 12 1 12 13
M V30 13 2 13 14
M V30 14 1 8 15
M V30 15 1 2 16
M V30 16 1 17 16 CFG=3
M V30 17 1 17 18
M V30 18 1 18 19
M V30 19 1 19 20
M V30 20 1 20 21
M V30 21 1 21 22 CFG=3
M V30 22 1 15 5
M V30 23 1 21 17
M V30 24 1 14 9
M V30 END BOND
M V30 BEGIN COLLECTION
M V30 MDLV30/STERAC1 ATOMS=(2 5 8)
M V30 MDLV30/STERAC2 ATOMS=(2 17 21)
M V30 END COLLECTION
M V30 END CTAB
M END
Instead, the first file (the one causing warnings) results in the following V3000 MolBlock:
2 And groups, from CXSMILES
RDKit 2D
0 0 0 0 0 0 0 0 0 0999 V3000
M V30 BEGIN CTAB
M V30 COUNTS 22 24 0 0 0
M V30 BEGIN ATOM
M V30 1 O 7.414600 -6.052410 0.000000 0
M V30 2 C 6.201080 -6.934080 0.000000 0
M V30 3 N 4.830760 -6.323980 0.000000 0
M V30 4 C 4.673970 -4.832190 0.000000 0
M V30 5 C 3.303650 -4.222090 0.000000 0
M V30 6 C 2.004610 -4.972090 0.000000 0
M V30 7 C 0.889900 -3.968390 0.000000 0
M V30 8 C 1.500000 -2.598080 0.000000 0
M V30 9 C 0.750000 -1.299040 0.000000 0
M V30 10 C 1.500000 0.000000 0.000000 0
M V30 11 C 0.750000 1.299040 0.000000 0
M V30 12 C -0.750000 1.299040 0.000000 0
M V30 13 C -1.500000 0.000000 0.000000 0
M V30 14 C -0.750000 -1.299040 0.000000 0
M V30 15 O 2.991780 -2.754870 0.000000 0
M V30 16 N 6.357870 -8.425870 0.000000 0
M V30 17 C 7.728190 -9.035970 0.000000 0
M V30 18 C 9.027230 -8.285970 0.000000 0
M V30 19 O 10.141950 -9.289670 0.000000 0
M V30 20 C 9.531840 -10.659990 0.000000 0
M V30 21 C 8.040060 -10.503190 0.000000 0
M V30 22 O 7.036360 -11.617910 0.000000 0
M V30 END ATOM
M V30 BEGIN BOND
M V30 1 2 1 2
M V30 2 1 2 3
M V30 3 1 3 4
M V30 4 1 5 4 CFG=3
M V30 5 1 5 6
M V30 6 1 6 7
M V30 7 1 8 7 CFG=3
M V30 8 1 8 9
M V30 9 2 9 10
M V30 10 1 10 11
M V30 11 2 11 12
M V30 12 1 12 13
M V30 13 2 13 14
M V30 14 1 8 15
M V30 15 1 2 16
M V30 16 1 17 16 CFG=3
M V30 17 1 17 18
M V30 18 1 18 19
M V30 19 1 19 20
M V30 20 1 20 21
M V30 21 1 21 22 CFG=3
M V30 22 1 15 5
M V30 23 1 21 17
M V30 24 1 14 9
M V30 END BOND
M V30 END CTAB
M END
Screenshots
Not applicable.
Configuration (please complete the following information):
Additional context
Not applicable.
The text was updated successfully, but these errors were encountered: