Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OBConversion reinitialization for WriteString() and WriteFile() #1922

Open
1 task
adalke opened this issue Jan 15, 2019 · 0 comments
Open
1 task

OBConversion reinitialization for WriteString() and WriteFile() #1922

adalke opened this issue Jan 15, 2019 · 0 comments

Comments

@adalke
Copy link
Contributor

adalke commented Jan 15, 2019

  • [ X] I believe this to be a bug with Open Babel
  • This is a feature request

Environment Information

Open Babel version: 2.4.90 (HEAD as of 14 January 2019)
Operating system and version: macOS 10.12.2

Expected Behavior

I expect to be able to write to FPS format using OBConversion without having a segfault. Here are examples of expected behavior:

Example 1 (expected)

>>> import pybel
>>> print(pybel.readstring("smi", "C").write("fps")[:70])
#FPS1
#num_bits=1021
#type=OpenBabel-FP2/1
#software=OpenBabel/2.4.90

Example 2 (expected)

>>> import openbabel as ob
>>> conv = ob.OBConversion()
>>> conv.SetInAndOutFormats("smi", "fps")
True
>>> mol = ob.OBMol()
>>> conv.ReadString(mol, "C")
True
>>> conv.WriteString(mol)
'#FPS1\n#num_bits=1021\n#type=OpenBabel-FP2/1\n#software=OpenBabel/2.4.90\n ...`
>>> conv.WriteFile(mol, "example.fps")
True
>>> print(open("example.fps").read()[:80])
#FPS1
#num_bits=1021
#type=OpenBabel-FP2/1
#software=OpenBabel/2.4.90
#source=
#

Actual Behavior

The above code on the current Open Babel HEAD (and with 2.4.1) causes segmentation faults:

Steps to Reproduce

Example 1 (actual)

>>> import pybel
>>> pybel.readstring("smi", "C").write("fps")
Segmentation fault

Example 2 (actual)

>>> import openbabel as ob
>>> conv = ob.OBConversion()
>>> conv.SetInAndOutFormats("smi", "fps")
True
>>> mol = ob.OBMol()
>>> conv.ReadString(mol, "C")
True
>>> conv.WriteString(mol)
Segmentation fault

(Replace the WriteString() with the WriteFile() and there is still a segmentation fault.)

The problem seems to be that WriteString() and WriteFile() on a newly constructed OBConversion() leave the Index at the initial value of 0.

Some of the formats use GetOutputIndex() and, if it is 1, perform an initialization. For the FPS format, the initialization loads the correct fingerprint type. If it is uninitialized then it will deference an unassigned variable, causing the segfault.

I have patched WriteString() and WriteFile(). I will submit a pull request shortly to demonstrate one possible solution.

Related symptoms

issue #1783

Note that this is likely related to issue #1783 . With an un-patched system:

>>> import pybel
>>> pybel.readstring("smi", "c1ccccc1").write("fpt")
'>\n'

with my patch in place:

>>> import pybel
>>> print(pybel.readstring("smi", "c1ccccc1").write("fpt"))
>   6 bits set
00000000 00000000 00000000 00000200 00000000 00000000
00000000 00000000 00000000 00000840 00000000 00008000
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 08000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00020000
00000000 00000000

The proposed solution in that issue was to enable the 'h' option, which worked because the relevant code sets hexoutput using the following.

    if(pConv->IsOption("h") || (pConv->GetOutputIndex()==1 && pConv->IsLast()))
      hexoutput=true;

It may also be related to issue #852 .

CML output

As another example, the CML output differs between an unpatched system:

>>> print(pybel.readstring("smi", "C").write("cml"))
<molecule>
 <atomArray>
  <atom id="a1" elementType="C"/>
 </atomArray>
</molecule>

and a patched system:

>>> print(pybel.readstring("smi", "C").write("cml"))
<?xml version="1.0"?>
<molecule xmlns="http://www.xml-cml.org/schema">
 <atomArray>
  <atom id="a1" elementType="C" hydrogenCount="4"/>
 </atomArray>
</molecule>

Note the leading XML declaration in the patched version. This is because of the following test in format/xml/cmlformat.cpp :

    if(!_pxmlConv->IsOption("MolsNotStandalone") && _pxmlConv->GetOutputIndex()==1)
     ...
            xmlTextWriterStartDocument(writer(), NULL, NULL, NULL);
adalke added a commit to adalke/openbabel that referenced this issue Jan 16, 2019
… otherwise the formats won't initialize correctly. (Issue openbabel#1922)

This commit includes a new test program which checks WriteString() and WriteFile() for most of the supported formats.

Without this patch, several of them will fail, and the FPS format will segfault.
adalke added a commit to adalke/openbabel that referenced this issue Jan 18, 2019
…iple records.

My first patch set Index=1 in WriteFile() and WriteString() so that
the output format plugin could perform any initialization. This worked,
but there was still a problem with some formats when saving multiple
records, including the CML format.

The CML format uses <molecule> as the top-level element when there is
only one input molecule, and <cml> as the top-level element when there
are two or more molecules. The <cml> element contains a sequence of
<molecule> elements.

If Index==1 and IsLast()==true then it's a single-molecule file.
If Index==1 and IsLast()==false then it's a multi-molecule file.

However, OBConversion::Write() always called SetOneObjectOnly(),
which set Last=true, so the CML format always thought it was writing
a single-record output file.

The fix was to remove SetOneObjectOnly() from the Write() and
do more intialization in WriteFile() and WriteString(). Previously
Write() did Index++ *after* calling the format's WriteMolecule().
This was incorrect. It needs to be done *before*, so the value
of GetOutputIndex() matches the number of molecules which were
sent as output. (This is a better emulation of how Convert() works.)

This in turn means WriteFile() and WriteString() need to reset
Index to 0. They then call Write(), which increments Index, so
WriteMolecule() always sees Index=1 for the first molecule.
baoilleach pushed a commit to baoilleach/openbabel that referenced this issue Apr 11, 2020
… otherwise the formats won't initialize correctly. (Issue openbabel#1922)

This commit includes a new test program which checks WriteString() and WriteFile() for most of the supported formats.

Without this patch, several of them will fail, and the FPS format will segfault.
baoilleach pushed a commit to baoilleach/openbabel that referenced this issue Apr 11, 2020
…iple records.

My first patch set Index=1 in WriteFile() and WriteString() so that
the output format plugin could perform any initialization. This worked,
but there was still a problem with some formats when saving multiple
records, including the CML format.

The CML format uses <molecule> as the top-level element when there is
only one input molecule, and <cml> as the top-level element when there
are two or more molecules. The <cml> element contains a sequence of
<molecule> elements.

If Index==1 and IsLast()==true then it's a single-molecule file.
If Index==1 and IsLast()==false then it's a multi-molecule file.

However, OBConversion::Write() always called SetOneObjectOnly(),
which set Last=true, so the CML format always thought it was writing
a single-record output file.

The fix was to remove SetOneObjectOnly() from the Write() and
do more intialization in WriteFile() and WriteString(). Previously
Write() did Index++ *after* calling the format's WriteMolecule().
This was incorrect. It needs to be done *before*, so the value
of GetOutputIndex() matches the number of molecules which were
sent as output. (This is a better emulation of how Convert() works.)

This in turn means WriteFile() and WriteString() need to reset
Index to 0. They then call Write(), which increments Index, so
WriteMolecule() always sees Index=1 for the first molecule.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant