Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add async support on ZipOutputStream #547

Closed
wants to merge 1 commit into from

Conversation

0xced
Copy link
Contributor

@0xced 0xced commented Dec 7, 2020

This is a very rough first pass at adding async support to ZipOutputStream.

All method doing synchronous stream writes have been duplicated with an asynchronous counterpart, starting from ZipOutputStream.PutNextEntryAsync(). Unfortunately, this makes a lot of duplicated code but is unavoidable to have both sync + async code paths.

.NET Standard 2.1 has been added to the target frameworks. It's required to get full async support with Stream.DisposeAsync().

New unit tests must be written to cover the new async code paths.

Fixes #223

I certify that I own, and have sufficient rights to contribute, all source code and related material intended to be compiled or integrated with the source code for the SharpZipLib open source product (the "Contribution"). My Contribution is licensed under the MIT License.

This is a very rough first pass at adding async support to ZipOutputStream.

All method doing synchronous stream writes have been duplicated with an asynchronous counterpart, starting from `ZipOutputStream.PutNextEntryAsync()`. Unfortunately, this makes a lot of duplicated code but is unavoidable to have both sync + async code paths.

.NET Standard 2.1 has been added to the target frameworks. It's required to get full async support with `Stream.DisposeAsync()`.

New unit tests must be written to cover the new async code paths.

Fixes icsharpcode#223
@Numpsy
Copy link
Contributor

Numpsy commented Dec 7, 2020

New unit tests must be written to cover the new async code paths.

I started adapting some of the existing sync unit tests in #457 (only the deflate ones to start with) with the idea that those could be a base for testing async changes later (as there can be tests that call the async stream functions before implementing said async functions in the zip/gzip streams), but there is more work to do there for the zip/gzip subclass streams.

this makes a lot of duplicated code

hopefully there is some scope for refactoring things to avoid duplicating everything!

@Numpsy
Copy link
Contributor

Numpsy commented Dec 7, 2020

.NET Standard 2.1 is perhaps something that can be considered seperately, as you can add it seperately from the other code changes, and that also opens up the possibility of doing things with the Span based apis and such as well?

@0xced
Copy link
Contributor Author

0xced commented Dec 7, 2020

.NET Standard 2.1 is perhaps something that can be considered separately

I don't think it's possible to consider it separately from the async stream feature because Stream.DisposeAsync() was introduced in .NET Standard 2.1.

If you want to totally avoid synchronous stream writes (typically for writing to an ASP.NET Core HTTP response body), you must also support DisposeAsync() (through await using in this example) which must be used instead of Dispose() to finish writing the zip stream asynchronously:

await using var zipStream = new ZipOutputStream(httpContext.Response.Body);
await zipStream.PutNextEntryAsync();

See also Add async dispose support to ZipArchive which is the exact same issue but for the built-in ZipArchive class.

@0xced
Copy link
Contributor Author

0xced commented Dec 7, 2020

hopefully there is some scope for refactoring things to avoid duplicating everything!

There's some room for improvement there, especially in methods which do several WriteLE… in a row where this could be extracted in a new method writing to a stream and then that stream would be CopyTo or CopyToAsync to the destination.

I just wanted to have a quick working proof of concept so duplicating everything and adding async support was faster without refactoring everything. I used a modified version of https://github.com/StephenClearyExamples/AsyncDynamicZip to ensure that the whole chain was actually async. I'll try to write some tests with an AsyncOnlyStream to ensure that no synchronous call was left.

@Numpsy
Copy link
Contributor

Numpsy commented Dec 7, 2020

I don't think it's possible to consider it separately from the async stream feature because Stream.DisposeAsync() was introduced in .NET Standard 2.1.

I mean that you can add a 2.1 target to the build/package without making any other changes, so a seperate PR could be done first to keep that change on its own.

several WriteLE… in a row

Think I've wondered in the past (in #470 perhaps) if doing many sequential writes for a small amount of fixed size datais the best approach, or if it could be assembled into memory first and then written into the stream in one go

@0xced
Copy link
Contributor Author

0xced commented Dec 7, 2020

I mean that you can add a 2.1 target to the build/package without making any other changes, so a separate PR could be done first to keep that change on its own.

Oh I see, I misunderstood what you meant. I can do a pull request with just that: adding the .NET Standard 2.1 target framework.

Think I've wondered in the past (in #470 perhaps) if doing many sequential writes for a small amount of fixed size datais the best approach, or if it could be assembled into memory first and then written into the stream in one go

This should probably be benchmarked to make sure what approach is the most performant. Let's hope that writing in memory first is more performant so that we can win on both maintainability and performance!

@piksel
Copy link
Member

piksel commented Dec 8, 2020

Yeah, basically everything that is not actual compressed data should probably just be written to a buffer and then piped asynchronously or not to the base stream.
This would allow us to keep the duplicated code minimal and probably not lose any noticeable performance.
Calls to WriteLEShortAsync is probably not ideal to do anyway, since the stack overhead of that call is larger than the actual data needing to be buffered.

@Numpsy
Copy link
Contributor

Numpsy commented Dec 11, 2020

This might be getting too far ahead of things, but something to consider in case - The way that ZipOutputStream and GZipOutputStream both derive from DeflatorOutputStream could possibly cause issues if Async functions are added in DeflatorOutputStream but not GZipOutputStream (e.g. does adding a FinishAsync to DeflatorOutputStream mean that someone could call that via an instance of GZipOutputStream and have it do the wrong thing?)

@piksel
Copy link
Member

piksel commented Dec 12, 2020

@Numpsy If we don't implement them, I guess they should be made virtual and overridden by throw NotImplementedException() and a comment about them not being available for GZip yet.

@0xced
Copy link
Contributor Author

0xced commented Dec 14, 2020

Another possible approach to avoid code duplication is using a t4 text template to generate both sync and async versions of the same code. I saw this today in the CsvHelper library.

Not sure what is the most maintainable...

@piksel
Copy link
Member

piksel commented Dec 16, 2020

I think this a pretty good use case for meta programming. Too bad the tooling support for T4 is terrible. Maybe that has changed though, haven't tried it with OmniSharp or Rider...

@0xced
Copy link
Contributor Author

0xced commented Dec 17, 2020

Rider got support for T4 Text Templates in version 2019.3 and it's pretty decent.

@piksel
Copy link
Member

piksel commented Feb 14, 2021

Actually, that link pointed to another problem; github doesn't highlight them :/

I made an attempt at splitting the logic from the "actual" writes so that just the writes could be put in separate Sync/Async methods in #574.

The methods like PutNextEntry are ~250 LoC, which is really hard to maintain in two copies.

@0xced
Copy link
Contributor Author

0xced commented May 2, 2021

Awesome work, Nils! I'm closing this pull request in favour of yours (much improved) in #574.

@0xced 0xced closed this May 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature request: full async/await pattern stream support
3 participants