Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the behaviour of ZipInputStream when reading zip entries with unsupported compression methods #333

Merged
merged 3 commits into from
Aug 15, 2020

Conversation

Numpsy
Copy link
Contributor

@Numpsy Numpsy commented Apr 1, 2019

When I was looking at BZip2 support in zip files, I had an issue where changing ZipEntry.IsCompressionMethodSupported to return true for BZip2 compression caused issues with ZipInputStream, where it went through its read logic and didn't read anything, but also didn't 'fail'.

When i went to make a change to do something about that, I found that calling GetNextEntry() to get an entry compressed with BZip2 compression (without any other changes) caused it to throw, when it attempted to set ZipEntry.CompressionMethod after construction.

This PR Makes 2 changes that might help with that:

  1. Change ZipInputStream.GetNextEntry to use the ZipEntry constructor that takes the CompressionMethod as a parameter, rather than setting it afterwards.

  2. Change it to use its own IsEntryCompressionMethodSupported function rather than relying on ZipEntry.IsCompressionMethodSupported (Given that the stream itself is in control of the supported methods, it seems reasonable to make that choice locally).

I wonder if it might be better here if ZipEntry.CanDecompress returned a value derived from the input source (ZipFile/ZipInputStream) rather than using a fixed algorithm list? something of a bigger change though.

I certify that I own, and have sufficient rights to contribute, all source code and related material intended to be compiled or integrated with the source code for the SharpZipLib open source product (the "Contribution"). My Contribution is licensed under the MIT License.

@Numpsy Numpsy changed the title Improve the behaviour of ZipInputStream when reading zip entries with BZip2 compression. Improve the behaviour of ZipInputStream when reading zip entries with unsupported compression methods Apr 3, 2019
@Numpsy
Copy link
Contributor Author

Numpsy commented Apr 3, 2019

Actually, I labelled this as BZip2, but it's really an issue with any compression method that it doesnt understand (e.g. Deflate64).

e.g., if you just do

using (var input = File.OpenRead("Deflate64.zip"))
{
    var zis = new ZipInputStream(input);
    var entry = zis.GetNextEntry();
}

Then it throws System.NotSupportedException, which doesn't seem ideal - this changes allows it to get the ZipEntry and get the details from that, while still giving errors if you try to actually decompress the data.

@Numpsy
Copy link
Contributor Author

Numpsy commented May 23, 2019

Would it be better if I split this into two?
The change to pass the compresion method into the ZipEntry constructor instead of setting it afterwards is seperate from the changes to the query functions really.

@piksel
Copy link
Member

piksel commented Jun 15, 2019

Yes, the constructor switch is much more sane.

The IsCompressionMethodSupported is a static method on ZipEntry, which is shared between Zip[In/Out]putStream and ZipFile. Since technically, one could support a format while the other doesn't, splitting them might be a good idea.

@Numpsy
Copy link
Contributor Author

Numpsy commented Jun 15, 2019

I'll look at breaking the constructor change out into its own PR.

My original issue with the static IsCompressionMethodSupported things was that I changed things so that ZipFile supported reading BZip2 compressed entries but ZipInputStream still didn't, but as the same situation might occur with ZipOutputStream the area might benefit from another look (I suppose you could also have a situation where ZipFile can read more formats than it can write)

@piksel
Copy link
Member

piksel commented Jun 15, 2019

About IsCompressionMethodSupported: Well, that was exactly my point :) It would probably be better if ZipInputStream also had it's own method and the shared one removed.

The changes are related in that they are making it easier to support additional formats, but they should probably be split into different PRs anyway.

@Numpsy
Copy link
Contributor Author

Numpsy commented Jun 16, 2019

Rebased and minimized to just use a local IsCompressionMethodSupported instead of the ZipEntry version.

The IsCompressionMethodSupported method I added to ZipInputStream is private - not sure if there is any benedit to it being public given there is already a CanDecompressEntry function that can be used?

@Numpsy
Copy link
Contributor Author

Numpsy commented Jun 16, 2019

With regards to:

It would probably be better if ZipInputStream also had it's own method and the shared one removed.

Seems reasonable, api breaks aside ( a static function there doesn't make much sense i think), there's a couple other other attached things there:

  1. The property setter for ZipEntry.CompressionMethod checks for supported methods and throws if it think thing's are unsupported. Possibly that isn't the right place to check anyway and it should'nt try?

  2. The ZipEntry.CanDecompress property uses that hardcoded list to check if it thinks the method is supported, which also doesn;t really work. That was what my last comment in the op was about - perhaps ZipInputStream and the read side of ZipFile could set the ZipEntry CanDecompress property based on their own capabilities when they create the entries?

@piksel
Copy link
Member

piksel commented Jun 18, 2019

  1. no, I don't think the Entry in itself should have any knowledge of potential compression methods. It should be up to the consumers of said entry to handle.

  2. That would work, but I'm not sure that it'd be very useful. The check could be done when the entity is added to the consumer, and likewise, the consumer could be asked if it can decompress an entity or if it supports a method, no?
    For backwards compatibility, we could let the consumers set the property, but only through an internal constructor and with clear documentation about what it actually represents.

@Numpsy
Copy link
Contributor Author

Numpsy commented Jun 18, 2019

Backwards compatibility with the entry members is what i was thinking about, and I did mean setting the CanDecompress flag via an internal property.

The consumers/producers do make some attempts to do the validation, but it's a bit mixed:

ZipInputStream has a CanDecompressEntry property and could have a means of querying supported algorithms (this PR adds a private member function to do that, could have a public version if that's useful).

ZipOutputStream doesn't seem to validate things, but PutNextEntry could throw for unsupported methods.

ZipFile looks a bit inconsistent, where some of the Add() overrides throw ArgumentOutOfRangeException if the compression method is unsupported and some don't (would that be considered a bug?). It does throw on stream access for unsupported methds though.
If ZipFile needed a public api for querying supported methods (rather than just calling it and letting it throw), it might need seperate functions for compression/decompression support, in case the 2 cases ended up different at any point?

@piksel
Copy link
Member

piksel commented Jun 19, 2019

I think we should just decide where this should be done and then be consistent with it.
It's hard to say if it's a bug since we don't really have any defined "right" way to do it. Perhaps it throws somewhere else? Or relies on another check? The point is that it's too hard to tell.

Perhaps interfaces, like IZipConsumer and IZipProducer could be used for querying supported methods? Something like

public interface IZipConsumer
{
    IEnumerable<CompressionMethod> GetSupportedDecompressMethods()
}

public static class ZipConsumerExtensions 
{
    public static bool CanDecompress(this IZipConsumer zc, ZipEntry ze)
        => zc.GetSupportedDecompressMethods().Contains(ze.CompressionMethod);
}

Which ZipInfo then would implement both the interface and are free to respond differently to Compress and Decompress.

@Numpsy
Copy link
Contributor Author

Numpsy commented Jun 19, 2019

PR updated to get ZipInputStream.CanDecompressEntry to return false for AES encrypted entries that it can't handle, though that seems to have caused the debug CI build of the InvalidPasswordNonSeekable test fail for some reason (it passes locally :-( )

(Is the CI build supposed to be listed as a pass if a test fails?).

@Numpsy
Copy link
Contributor Author

Numpsy commented Jun 19, 2019

It's hard to say if it's a bug since we don't really have any defined "right" way to do it. Perhaps it throws somewhere else? Or relies on another check? The point is that it's too hard to tell.

Some of the ZipFile.Add() overloads throw ArgumentOutOfRangeException, the ZipEntry property setter throws NotSupportedException, and ZipFile.GetOutputStream throws ZipException.

Given that some of the Add() variants are documented as throwing ArgumentOutOfRangeException, could make them all consistent with that?

Perhaps interfaces, like IZipConsumer and IZipProducer could be used for querying supported methods? Something like

Could od something like that (I hadn't been thinking about querying lists of supportedmethods rather than just querying the state of a single one.

@piksel piksel merged commit 83c80d2 into icsharpcode:master Aug 15, 2020
@Numpsy Numpsy deleted the zipinputstream_bzip2 branch August 16, 2020 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants