Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OutOfMemoryException when parsing Excel document / endless while-loop #30

Closed
henning-krause opened this issue Oct 23, 2018 · 11 comments · Fixed by #41
Closed

OutOfMemoryException when parsing Excel document / endless while-loop #30

henning-krause opened this issue Oct 23, 2018 · 11 comments · Fixed by #41
Assignees
Labels

Comments

@henning-krause
Copy link
Contributor

henning-krause commented Oct 23, 2018

I'm trying to parse an Excel document (xls) using the OpenMcdf.Extensions package.

We call AsOLEProperties extension method and eventually, the execution gets stuck in this while loop (https://github.com/ironfede/openmcdf/blob/master/sources/OpenMcdf/CompoundFile.cs#L1493-L1514) :

while (true)
{
    if (nextSecID == Sector.ENDOFCHAIN)
        break;

    Sector ms = new Sector(Sector.MINISECTOR_SIZE, sourceStream);
    byte[] temp = new byte[Sector.MINISECTOR_SIZE];

    ms.Id = nextSecID;
    ms.Type = SectorType.Mini;

    miniStreamView.Seek(nextSecID * Sector.MINISECTOR_SIZE, SeekOrigin.Begin);
    miniStreamView.Read(ms.GetData(), 0, Sector.MINISECTOR_SIZE);

    result.Add(ms);

    miniFATView.Seek(nextSecID * 4, SeekOrigin.Begin);
    nextSecID = miniFATReader.ReadInt32();
}

When the loop is entered, the nextSecID is 27. At the end of the loop the nextSecID is set to 0. And the nextSecID keeps being null as the same data is read on each loop.

Is 0 even a valid value for nextSecID?

Any idea, what to do about this?

The document in question can be opened fine in Excel. Unfortunately, we got it from a customer of ours, so we can't share it.

@michaelp
Copy link
Contributor

I think i have a fix for this but i am in the middle of the release crunch time. @henning-krause can it wait or you need it urgently?

@henning-krause
Copy link
Contributor Author

@michaelp My colleague already added a PR (#31) which fixes this. Until you integrate this in the official build, we'll use our fork.

@ironfede
Copy link
Owner

@henning-krause , fix provided doesn't prevent exception on file loading by OpenMcdf in the described use-case. Is this the expected behaviour?
Fix is ok from a specification point of view and will be integrated asap but it would really useful if you could provide an example excel file without user-data that triggers the described issue in order to understand minifat chaining in the specific case. Has the original file been produced by Excel?

@ironfede
Copy link
Owner

@henning-krause : pr31 merged to master.
@michaelp Do you have any advice/idea on this strange minifat chaining? A corrupted file?

@ironfede ironfede self-assigned this Oct 27, 2018
@ironfede ironfede added the bug label Oct 27, 2018
@henning-krause
Copy link
Contributor Author

@ironfede Of course, this doesn't fix the underlying problem, but at least the load operation fails fast and does not cause an OutOfMemoryException. So it's a quick fix.

That being said, I don't even think that this PR fixed all problems - only the simple ones. In multiple places, the following check is performed:

if (next != nextSecID)
    nextSecID = next;
else
    throw new CFCorruptedFileException("Cyclic sector chain found. File is corrupted");

This certainly helps when you have a corrupt file where one sector points to itself. But what if you have something like a cyclic chain?

1 -> 2 -> 3 -> 1

This wouldn't be catched and we would be in the same mess as before. I would propose a change in that check that we keep a list of all processed ids and if we hit one we had before, we'll throw the CFCorruptedFileException.

As for the source file: It was created by one of our customers. I'll try to strip out all the content and to reproduce the error. If it still happens, I might be able to share it with you.

@henning-krause
Copy link
Contributor Author

@ironfede shouldn't we implement the safeguards I wrote about in my last message? We're happy to provide a PR for that.

@ironfede
Copy link
Owner

Ok, thank you @henning-krause !
I've reopened issue.

@ironfede ironfede reopened this Nov 27, 2018
@Numpsy
Copy link
Contributor

Numpsy commented Jan 8, 2019

Hi,

Not sure what the situation is with the original repro for this, but on a related note:

I've been been trying to have a play with SharpFuzz, and when Itried it against OpenMcdf, it produced the file out_of_memory.zip which causes out of memory errors if you try to load it (i only had a quick look at debugging it, but it did look like it included repeated sector ids as suggested above).

@Numpsy
Copy link
Contributor

Numpsy commented Jan 9, 2019

would something like Numpsy@e9badd8 work as approach to treat repeated sector ids as an error?

@henning-krause
Copy link
Contributor Author

would something like Numpsy@e9badd8 work as approach to treat repeated sector ids as an error?

I think that would work - not ideal from a performance point of view. However, this check should be implemented in all the places where this error can occur: GetMiniSectorChain, GetFatSectorChain and GetNormalSectorChain.

@ironfede Would you agree? If yes, I would implement the necessary changes.

@Numpsy
Copy link
Contributor

Numpsy commented Jan 22, 2019

Yes, there are potentially issues in multiple places (there is a file attached to #40 that makes it blow up in GetDifatSectorChain).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants