Rework MaeMolSupplier, fix #2617 #2620

ricrogz · 2019-08-17T05:28:54Z

This started as a fix for #2617, but I added some more changes. That issue is caused by #2579, but after finding the problem, the MaeMolSupplier does not stop iteration despite it does not get any more structures, and continues returning "None" forever.

Also, this depends on schrodinger/maeparser#46 and schrodinger/maeparser#49, and should not be merged until these changes are accessible to RDKit.

Changes in the PR:

Check the state of the stream after the creation of the Supplier: on construction, the Reader populates the internal buffer, so that, if there aren't any problems, the stream may be left either in a good state, if there was more data available than fits in the internal buffer, or with the failed and eof bits activated if there was less data than the buffer tried to read (both states are ok).
Do not read the next block / structure before it is requested. Before this change, the MaeMolSupplier attempted to read the first structure still inside the constructor, right after constructing the Reader object; the second one when the first was requested, etc. With this mechanism, if a problem was encountered while reading the second structure, the first one would not have been returned.
Catching of exceptions thrown during and after reading of the mae::Block, and rethrowing them as FileParseException so that they can be handled by RDKit (e.g., by preventing further reads, like should happen in Chem.rdmolfiles.MaeMolSupplier Never Stops Reading #2617).
Changed the way how atEnd() is handled. This depends on Provide eof() methods to check end of data schrodinger/maeparser#49.

Provide translation of FileParseExceptions into Python RuntimeErrors. This is the part that would actually prevent an infinite reading loop as in Chem.rdmolfiles.MaeMolSupplier Never Stops Reading #2617, and also the one I am most interested in, especially because of the comment at:

rdkit/Code/GraphMol/Wrap/MolSupplier.h

Lines 90 to 94 in 01fbec3

    
           // it's kind of doofy that we can't just catch the FileParseException 
        
           // that is thrown when we run off the end of the supplier, but it seems 
        
           // that the cross-shared library exception handling thing is just too 
        
           // hard for me as of boost 1.35.0 (also 1.34.1) and g++  4.1.x on linux 
        
           // this approach works as well:

I tried to iterate over a MaeMolSupplier, a SDMolSupplier and a ForwardSDMolSupplier after introducing this patch, and did not observe any problems like unexpected RuntimeErrors, e.g. at the end of the iteration).

ricrogz · 2019-08-28T15:35:46Z

Updated with changes to copy over structure and atom properties from the Mae files to the mol, and updated to current master, including @lorton's changes to include PDB info (which I refactored a little).

Checks broken because this still depends on some changes that have not yet been updated into maeparser.

ricrogz · 2019-08-30T21:14:25Z

Updated to parse chirality, pseudochirality and stereo bond properties instead of just copying them to the mol, and updated to current master again.

This adds a new dependence on another maeparser PR, schrodinger/maeparser/pull/50, which adds constants for the prefixes used in stereo properties, and some comment on how these properties are build.

ricrogz · 2019-09-14T16:25:48Z

I removed the dependence on maeparser's eof() method, as we had certain reservations on it, and probably won't implement it (or not this way/right now).

ricrogz · 2019-09-22T11:57:48Z

Merged in current master (after maeparser and coordgen update). This should build fine now, as schrodinger/maeparser/pull/50 has also been merged

greglandrum · 2019-09-22T14:14:57Z

@ricrogz : those build failures on windows look real. Can you look into them?

ricrogz · 2019-09-22T16:42:34Z

@ricrogz : those build failures on windows look real. Can you look into them?

Yeah, sorry, I noticed, but something came up before I could look into it, but will do soon.

ricrogz · 2019-10-03T15:27:27Z

Ok, now it should finally build and pass tests on all platforms :)

ricrogz · 2019-11-01T19:22:31Z

I realized this had become too messy, and have reset the original branch.

d-b-w

Is the diff up to date?

d-b-w · 2019-11-07T21:30:34Z