Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R# atom label information lost in molfile if not handled by the RGP spec #5810

Closed
eloyfelix opened this issue Nov 29, 2022 · 0 comments
Closed
Labels
Milestone

Comments

@eloyfelix
Copy link
Contributor

Describe the bug

related to #5763

We've encountered > 800 molfiles in ChEBI with R# atoms that are not handled by a M RGP entry.
RDKit replaces the R# with* and it is not possible to know, even with the mol.Debug() function that the original molfile had that R symbol/label there

To Reproduce

from rdkit import Chem

molfile = """
  Marvin  08301217132D          

  6  5  0  0  0  0            999 V2000
    7.3491   -3.6869    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    6.6346   -3.2744    0.0000 P   0  0  0  0  0  0  0  0  0  0  0  0
    5.9201   -3.6869    0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
    6.6346   -2.4494    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    6.8533   -4.0696    0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
    8.0660   -3.2844    0.0000 R#   0  0  0  0  0  0  0  0  0  0  0  0
  2  1  1  0  0  0  0
  3  2  1  0  0  0  0
  2  4  2  0  0  0  0
  2  5  1  0  0  0  0
  1  6  1  0  0  0  0
M  CHG  2   3  -1   5  -1
M  END
"""

mol = Chem.MolFromMolBlock(molfile)
mol.Debug()
Atoms:
	0 8 O chg: 0  deg: 2 exp: 2 imp: 0 hyb: 4 arom?: 0 chi: 0
	1 15 P chg: 0  deg: 4 exp: 5 imp: 0 hyb: 4 arom?: 0 chi: 0
	2 8 O chg: -1  deg: 1 exp: 1 imp: 0 hyb: 4 arom?: 0 chi: 0
	3 8 O chg: 0  deg: 1 exp: 2 imp: 0 hyb: 3 arom?: 0 chi: 0
	4 8 O chg: -1  deg: 1 exp: 1 imp: 0 hyb: 4 arom?: 0 chi: 0
	5 0 * chg: 0  deg: 1 exp: 1 imp: 0 hyb: 0 arom?: 0 chi: 0
Bonds:
	0 1->0 order: 1 conj?: 0 aromatic?: 0
	1 2->1 order: 1 conj?: 0 aromatic?: 0
	2 1->3 order: 2 conj?: 0 aromatic?: 0
	3 1->4 order: 1 conj?: 0 aromatic?: 0
	4 0->5 order: 1 conj?: 0 aromatic?: 0

Getting an R# instead of a * is quite important for ChEBI since they have a meaning in the ontology:
* = attaches to something
R = something attaches here

Expected behavior
Keep label info.

Configuration (please complete the following information):

  • RDKit version: 2022.09.2
  • If you are not using conda: how did you install the RDKit? pip
@eloyfelix eloyfelix added the bug label Nov 29, 2022
greglandrum added a commit to greglandrum/rdkit that referenced this issue Nov 29, 2022
@greglandrum greglandrum added this to the 2022_09_3 milestone Nov 29, 2022
greglandrum added a commit that referenced this issue Dec 8, 2022
* Fixes #5810

* expand the test
eloyfelix added a commit to eloyfelix/rdkit that referenced this issue Dec 12, 2022
eloyfelix added a commit to eloyfelix/rdkit that referenced this issue Dec 12, 2022
eloyfelix added a commit to eloyfelix/rdkit that referenced this issue Dec 12, 2022
eloyfelix added a commit to eloyfelix/rdkit that referenced this issue Dec 12, 2022
greglandrum pushed a commit that referenced this issue Dec 14, 2022
* fix #5810 for V2000

* fix #5810 for V2000

* fix #5810 for V2000

* fix #5810 for V2000

* fix test molfile indentation
greglandrum pushed a commit that referenced this issue Feb 23, 2023
* fix #5810 for V2000

* fix #5810 for V2000

* fix #5810 for V2000

* fix #5810 for V2000

* fix test molfile indentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants