Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added CAB decompression parameter MSCABD_PARAM_SALVAGE #18

Merged
merged 1 commit into from Oct 16, 2018

Conversation

micahsnyder
Copy link
Contributor

Added CAB decompression parameter MSCABD_PARAM_SALVAGE which makes a best effort to extract as many files as possible from damaged, mangled, malformed or otherwise non-standard CAB archives.

…best effort to extract as many files as possible from damaged, mangled, malformed or otherwise non-standard CAB archives.
@kyz
Copy link
Owner

kyz commented Oct 16, 2018

Thanks very much for this!

For context, you and I have been discussing merging the differences between ClamAV's copy of libmspack into libmspack proper, and most of the changes are to be more forgiving about non-compliant / broken CAB files, provided it gets to the data inside them.

So, the addition of a SALVAGE mode means you can turn this feature on and get at data no matter how broken a file is. I'll also be adding this to cabextract's existing "-f" flag, which is already used to salvage broken MSZIP files.

Copy link
Owner

@kyz kyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not keen on this redefinition of CAB_BLOCKMAX and CAB_LENGTHMAX.

You found a non-compliant / block CAB file which declares it goes beyond these limits. Having looked at it, it does have individually-compressed MSZIP blocks that expand to >32KB, so it looks like a human created the file without fully reading the specification. However, they wouldn't get away with this trying to make LZX or Quantum compressed blocks, as unlike MSZIP these keep track of the overall offset beyond each block, and would likely break after 2GB.

I think, for the purposes of salvage mode, it should just avoid checking the size of the each block, rather than define and perform a redundant check - block size is an unsigned 16-bit integer, so will be within 0-65535 anyway.

The CAB_LENGTHMAX should stay in place, to protect LZX/Quantum decompression, and in your file's case, it will result in the same outcome: a corrupt file size says the file is >2GB in size, there aren't nearly enough blocks to provide 2GB of data to decompress, so decompressing the file will fail as soon as the blocks run out. Allowing 4GB doesn't decompress any more than allowing 2GB.

@kyz kyz merged commit 5a13e60 into kyz:master Oct 16, 2018
@micahsnyder
Copy link
Contributor Author

Thanks for the collaboration, and for the in-depth analysis of the non-standard/broken CAB files. We definitely appreciate it.

@micahsnyder micahsnyder deleted the clamav-salvage-extraction branch June 1, 2021 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants