Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SubstanceGroup output doesn't correctly quote " symbols #3768

Closed
greglandrum opened this issue Jan 28, 2021 · 0 comments
Closed

SubstanceGroup output doesn't correctly quote " symbols #3768

greglandrum opened this issue Jan 28, 2021 · 0 comments
Labels
Milestone

Comments

@greglandrum
Copy link
Member

Describe the bug
Here's the demonstration:

In [2]: m = Chem.MolFromMolBlock('''foo
   ...:      RDKit          2D
   ...: 
   ...:   0  0  0  0  0  0  0  0  0  0999 V3000
   ...: M  V30 BEGIN CTAB
   ...: M  V30 COUNTS 6 6 1 0 0
   ...: M  V30 BEGIN ATOM
   ...: M  V30 1 C -0.866400 -0.499400 0.000000 0
   ...: M  V30 2 C -0.865600 0.500600 0.000000 0
   ...: M  V30 3 C 0.000800 1.000000 0.000000 0
   ...: M  V30 4 C 0.866400 0.499400 0.000000 0
   ...: M  V30 5 C 0.865600 -0.500600 0.000000 0
   ...: M  V30 6 C -0.000800 -1.000000 0.000000 0
   ...: M  V30 END ATOM
   ...: M  V30 BEGIN BOND
   ...: M  V30 1 2 1 2
   ...: M  V30 2 1 2 3
   ...: M  V30 3 2 3 4
   ...: M  V30 4 1 4 5
   ...: M  V30 5 2 5 6
   ...: M  V30 6 1 1 6
   ...: M  V30 END BOND
   ...: M  V30 BEGIN SGROUP
   ...: M  V30 1 DAT 0 ATOMS=(1 1) FIELDNAME=[DUP]Tempstruct FIELDINFO="""" -
   ...: M  V30 FIELDDISP="    2.1037    1.5400    DA    ALL  0       0" QUERYOP="""" -
   ...: M  V30 FIELDDATA=Foo1
   ...: M  V30 END SGROUP
   ...: M  V30 END CTAB
   ...: M  END
   ...: ''')

In [3]: print(Chem.MolToV3KMolBlock(m))
foo
     RDKit          2D

  0  0  0  0  0  0  0  0  0  0999 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 6 6 1 0 0
M  V30 BEGIN ATOM
M  V30 1 C -0.866400 -0.499400 0.000000 0
M  V30 2 C -0.865600 0.500600 0.000000 0
M  V30 3 C 0.000800 1.000000 0.000000 0
M  V30 4 C 0.866400 0.499400 0.000000 0
M  V30 5 C 0.865600 -0.500600 0.000000 0
M  V30 6 C -0.000800 -1.000000 0.000000 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 1 2 1 2
M  V30 2 1 2 3
M  V30 3 2 3 4
M  V30 4 1 4 5
M  V30 5 2 5 6
M  V30 6 1 1 6
M  V30 END BOND
M  V30 BEGIN SGROUP
M  V30 1 DAT 0 ATOMS=(1 1) FIELDNAME=[DUP]Tempstruct FIELDINFO=" FIELDDISP="   -
M  V30  2.1037    1.5400    DA    ALL  0       0" QUERYOP=" FIELDDATA="Foo1"
M  V30 END SGROUP
M  V30 END CTAB
M  END

we can see that it's not a parsing problem:

In [4]: sgs = Chem.GetMolSubstanceGroups(m)

In [5]: sgs[0].GetPropsAsDict()
Out[5]: 
{'TYPE': 'DAT',
 'index': 1,
 'FIELDNAME': '[DUP]Tempstruct',
 'FIELDINFO': '"',
 'FIELDDISP': '    2.1037    1.5400    DA    ALL  0       0',
 'QUERYOP': '"',
 'DATAFIELDS': <rdkit.rdBase._vectNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE at 0x7f334d06c990>}

Expected behavior
" characters in SubstanceGroup output need to be enclosed in " and doubled.

Configuration (please complete the following information):

  • RDKit version: master and releases
  • OS: all
@greglandrum greglandrum added this to the 2020_09_5 milestone Jan 29, 2021
greglandrum added a commit to greglandrum/rdkit that referenced this issue Jan 29, 2021
greglandrum added a commit that referenced this issue Mar 1, 2021
* Fixes #3768

* changes in response to review

* re-enable a test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant