Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive/zip: not possible to specify deflate compression level #8359

Closed
gopherbot opened this issue Jul 11, 2014 · 13 comments
Closed

archive/zip: not possible to specify deflate compression level #8359

gopherbot opened this issue Jul 11, 2014 · 13 comments
Assignees
Milestone

Comments

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 11, 2014

by peter.waller:

Discussion link:

https://groups.google.com/forum/#!topic/golang-dev/WtYAcfgPNBw/discussion

There is a TODO to support specifying the compression level.

https://code.google.com/p/go/source/browse/src/pkg/archive/zip/writer.go#17

I do not currently see how it is possible to fit this in to the API without breaking
backwards compatibility, especially since the Compressor interface has no room for
additional parameters.

I would tack a crack at this myself otherwise.
@minux
Copy link
Member

@minux minux commented Jul 11, 2014

Comment 1:

Labels changed: added release-none, repo-main.

Status changed to Accepted.

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Jul 11, 2014

Comment 2:

We could add a struct field to FileHeader, like:
   // CompressionLevel optionally specifies the flate compression level
   // when used with CreateHeader. It isn't populated when reading zip
   // files.
   CompressionLevel int

Owner changed to @adg.

@gopherbot
Copy link
Author

@gopherbot gopherbot commented Jul 11, 2014

Comment 3 by peter.waller:

How does one then get it into the compressor?
https://code.google.com/p/go/source/browse/src/pkg/archive/zip/writer.go#183
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Jul 11, 2014

Comment 4:

Probably special-case the Deflate compressor type first and do our thing for that type.
We could also extend the definition of Compressor to say that compressor can type-assert
an optional interface on top of the io.Writer (like zip.RequestedCompressionLeveler) but
that's gross.  But that could be how we special case it internally with a private
interface.
The biggest question is how to handle the sync.Pool when the flate level can change.
One answer is to not use a sync.Pool for all flate.Writers which aren't level 5. That
will lead to bad performance when zipping lots of items (which is why we added the pool
here in the first place). Another item is having a sync.Pool per flate size. That seems
controversial. I recommend you get an answer from Russ or Rob on that first before you
start doing any work on this.
@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@rsc rsc removed release-none labels Apr 10, 2015
@dsnet
Copy link
Member

@dsnet dsnet commented Aug 6, 2015

Just a thought, since I ran into something related to this.

Perhaps the solution would be simply to allow an existing Compressor to be replaced. Currently, RegisterCompressor panics if the method was already registered. Maybe the library can allow them to be replaced.

This way, the user has more flexibility than simply setting the compression level. They can do:

zip.RegisterCompressor(zip.Deflate, func(w io.Writer) (io.WriteCloser, error) {
    return flate.NewWriter(w, flate.BestSpeed)
})

This gives users more power. For example, I can swap in custom version of the flate library for various purposes (like one that supports parallel compression or has a faster version of flate). One down-side is that this will lose the benefit of the sync.Pool that the library places used writers back in, but it shouldn't be too difficult for a user to re-implement the sync.Pool features.

@bradfitz In archive/zip/register.go, what was the purpose of guarding each Write and Close call of pooledFlateWriter with a mutex?

@gopherbot
Copy link
Author

@gopherbot gopherbot commented Nov 5, 2015

CL https://golang.org/cl/16669 mentions this issue.

@dsnet
Copy link
Member

@dsnet dsnet commented Nov 5, 2015

Copying my comments from CL/16669 for discussion (and is an expanded argument from my comment above):

I definitely have an interest in seeing finer control of the compression performed in archive/zip, but I'm skeptical that the right approach is to couple aspects of DEFLATE into archive/zip. While this does solve the ability to select the compression level for the standard library's DEFLATE, what happens if I wanted to completely switch out the DEFLATE encoder with a parallelized one (for very high throughput)? What if I wanted to use the Zopfli algorithm (which produces a higher compression ratio than level 9, at the expense of very slow encoding speed)?

Thus, I believe the better solution is to just give the user the power to register any arbitrary Compressor or Decompressor. As it exists, it would be problematic to do this at the package level. For example, I may want to use Zopfli for one specific zip file, but not all of them. Thus, maybe the right approach would be to allow a user to register a Compressor/Decompressor at the Reader and Writer level? Thus, if a specific DEFLATE compressor is specified for that given Writer, it will use that instead of the package level pool.

New API methods:

// RegisterCompressor registers custom compressors for a specific method ID.
// If no compressors are registered, Writer will default to using the
// compressors defined at the package level.
(*Writer) RegisterCompressor(method uint16, enc Compressor)

// RegisterDecompressor registers custom decompressors for a specific method ID.
// If no decompressors are registered, Reader will default to using the
// decompressors defined at the package level.
(*Reader) RegisterDecompressor(method uint16, dec Decompressor)

Example usage:

w := zip.NewWriter(buf)
w.RegisterCompressor(zip.Deflate, func(wr io.Writer) (io.Writecloser, error) {
    return zopfli.NewWriter(wr), nil
})
...

I believe this solution is better for the following reasons:

  • It enables much more flexibility and power. Although the expanded API does not directly solve the feature requested in this issue, it allows it to be trivially achieve with a few lines of code. In this sense, it enables exponentially more by doing less.
  • It avoids more coupling of DEFLATE components into archive/zip, which I believe is the wrong approach. The zip format supports more compression methods than DEFLATE, including bzip2, and lzma. Why special case DEFLATE but not other algorithms?
@colincross
Copy link
Contributor

@colincross colincross commented Nov 5, 2015

I like the per-Writer compressors, but the pooled compressor code is a little tricky to expect every user to copy and paste. What about a precursor patch that exports a new api:

// ResetableWriteCloser is an interface that groups the Write, Close, and Reset
// methods.
type ResetableWriteCloser interface {
    io.WriteCloser
    Reset(io.Writer)
}

var _ ResetableWriteCloser = (*flate.Writer)(nil)

// NewPooledCompressor wraps a compressor that returns a ResetableWriteCloser
// in a pool, reusing the ResetableWriteCloser after Close is called on it.
func NewPooledCompressor(comp Compressor) Compressor
@dsnet
Copy link
Member

@dsnet dsnet commented Nov 6, 2015

Hmmm, that's true, but I definitely don't believe we should expose any of this pooled compressor stuff to the public API of zip.

It may make more sense to define:

type CompressResetter interface { Reset(io.Writer) }

And basically, say in the API that if the compressor satisfies the CompressResetter interface, then an internal pool will be used. Otherwise, the compressor function will be called for every call to Writer.Create or Writer.CreateHeader.

Personally, I even vote to have CompressResetter be an unexported interface and just say that if the compressor contains the Reset(io.Writer) method, then it will use an internal compressor pool. This helps keep the zip documentation cleaner; but we can see what other people think.

For example:

// RegisterCompressor registers custom compressors for a specific method ID.
// If no compressors are registered, Writer will default to using the
// compressors defined at the package level.
// If the compressor returned by enc also contains a Reset(io.Writer) method,
// then an internal compressor pool will be used for higher performance.
(*Writer) RegisterCompressor(method uint16, enc Compressor)

As a side note, I am a little surprised that a pool is not used for decompression since each DEFLATE reader allocates more than 32KiB for it's internal dictionary and state.

@colincross
Copy link
Contributor

@colincross colincross commented Nov 6, 2015

I believe the pool is expected to survive across multiple calls to zip.NewWriter, at least that is what BenchmarkCompressedZipGarbage tests. If the pool is created automatically by Writer.RegisterCompressor the benchmark will always have an empty pool.

@dsnet
Copy link
Member

@dsnet dsnet commented Nov 6, 2015

According to the original commit that introduced pooling, this was to prevent the trash generated by a single large zip file (which I believe is the far more common use case). It is probably sufficient to just reset the compressor within a given zip.Writer.

There is another benefit here. The API for zip.Writer forbids concurrent calls to its methods, so you wouldn't even need a sync.Pool anymore, and can just store a local reference to a single compressor and reset the compressor for every call to zip.Writer.Create or zip.Writer.CreateHeader.

@dsnet
Copy link
Member

@dsnet dsnet commented Nov 6, 2015

Actually, now that I realized that a custom compressor to zip.Writer doesn't require any crazy sync.Pool, it would be easier to avoid doing any form of resetting for the user at all since the user can trivially do it on their own using closures.

For example:

var zw *zopfli.Writer
w := zip.NewWriter(buf)
w.RegisterCompressor(zip.Deflate, func(wr io.Writer) (_ io.Writecloser, err error) { 
    if zw == nil {
        zw, err = zopfli.NewWriter(wr)
        return zw, err
    }
    zw.Reset(wr)
    return &zw, nil
})
...
@bradfitz bradfitz closed this in 46300a0 Dec 1, 2015
@gopherbot
Copy link
Author

@gopherbot gopherbot commented Jan 14, 2016

CL https://golang.org/cl/18627 mentions this issue.

gopherbot pushed a commit that referenced this issue Jan 27, 2016
…ecompressor

Clarify that Compressor and Decompressor callbacks must support being invoked
concurrently, but that the writer or reader returned need not be.

Updates #8359

Change-Id: Ia407b581dd124185f165c25f5701018a8ce4357a
Reviewed-on: https://go-review.googlesource.com/18627
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
@golang golang locked and limited conversation to collaborators Jan 13, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.