New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor to support caching compression. #867
Conversation
Today `tarball.LayerFromOpener` must either compress or decompress its input based on which it receives. Virtually all clients of this interface have an uncompressed tarball they want to turn into a layer, but the API we have presents two fun problems. 1. If we simply present the uncompressed tarball (as most clients do) then we will often end up compressing the layer (expensive!) twice. The first time is to compute the layer's digest, and the second is typically to publish the layer as part of `remote.Write`. 2. If we present a precompressed tarball (as ko does), then we will have an extra decompression (less expensive, but still redundant) to compute the diff id. This refactors the layer to store an opener for each form of the layer. The constructor sniffs whether the layer is compressed and based on this makes the "other" opener (that requires conversion) a lazy thunk that performs the conversion each time (this is effectively what we were doing before based on the stored `compressed` bit). The advantage of this approach is that by setting them up before option evaluation we can make them memoizing on their first invocation. Given that most consumers of this library only pass the uncompressed form (or should), I have only provided an option for memoizing the compression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does ko pass a pre-compressed tarball? Should we recommend against it if we're going to treat the other path as special?
@@ -79,6 +70,31 @@ func WithCompressionLevel(level int) LayerOption { | |||
} | |||
} | |||
|
|||
// WithCompressedCaching ensures that we only compress the tarball once. | |||
func WithCompressedCaching(l *layer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an exported method that takes a param of an unexported type, so I don't think any external package can actually successfully call it.
Should it just be unexported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implements LayerOption
, it is meant to pass to LayerFromOpener
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To pile on: tarball.LayerFromOpener(opener, tarball.WithCompressedCaching)
-- it doesn't get called, just passed in.
We could thunk it or mess with signatures to make it clearer that this is an option? Or just document it as WithCompressedCaching is a functional option that...
.
I think @jonjohnsonjr was trying to optimize the ko side, and by eagerly compressing things we trade off 2 gzips for 1 gzip and 1 gunzip (and gunzip is faster). When an uncompressed tarball is passed with this option, it will now be a single gzip.
It's not really "special", we just don't surface an option for the other path. We could if folks are using it, but the consumers I've looked at don't seem to be using that path (except |
Yeah I guess I just mean, should we document, "you don't have to compress before passing into here, this will take care of compression for you if it's not compressed." |
👍 I added a detailed comment. In other news, does Windows work at all, ever? |
👍 this was the case when I measured it.
Yes, but seems to be flakier than other legs. |
opener: opener, | ||
} | ||
|
||
if compressed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really like to drop the above (L123) use of IsGzipped
because this came up recently and is really annoying: #112 (comment)
We should be able to call opener just once, which would fix this... what about something like this:
Use a bufio.Reader
to Peek
at the first 2 bytes. We technically only need a 2 byte buffer for this, but we should probably do 4K to read (at least) a whole block. This will let us know if it's compressed without consuming the reader.
crane append
uses tarball.LayerFromFile
, which we should usually be able to call multiple times, but that doesn't seem to work with process substitution without fiddling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so I went ahead and fixed the thing that I actually care about: #868
I still think it's a good idea to use a bufio.Reader
here to peek at stuff, but I care about 15% less about it than I did before.
@imjasonh somehow this change is causing windows to fail because we're trying to open the same file twice without closing it (I think?), but scanning through the code I don't see an obvious issue. It looks like we Close everything but maybe not? |
I think I tracked down the windows bug (see the final commit). We may have this problem other places, but this was dislodged by this change using the utility function in a place that previously used a |
That fixed things 😅 |
Today
tarball.LayerFromOpener
must either compress or decompress its input based on which it receives.Virtually all clients of this interface have an uncompressed tarball they want to turn into a layer, but the API we have presents two fun problems.
remote.Write
.This refactors the layer to store an opener for each form of the layer. The constructor sniffs whether the layer is compressed and based on this makes the "other" opener (that requires conversion) a lazy thunk that performs the conversion each time (this is effectively what we were doing before based on the stored
compressed
bit). The advantage of this approach is that by setting them up before option evaluation we can make them memoizing on their first invocation. Given that most consumers of this library only pass the uncompressed form (or should), I have only provided an option for memoizing the compression.