Refactor to support caching compression. #867

mattmoor · 2020-12-12T03:10:45Z

Today tarball.LayerFromOpener must either compress or decompress its input based on which it receives.

Virtually all clients of this interface have an uncompressed tarball they want to turn into a layer, but the API we have presents two fun problems.

If we simply present the uncompressed tarball (as most clients do) then we will often end up compressing the layer (expensive!) twice. The first time is to compute the layer's digest, and the second is typically to publish the layer as part of remote.Write.
If we present a precompressed tarball (as ko does), then we will have an extra decompression (less expensive, but still redundant) to compute the diff id.

This refactors the layer to store an opener for each form of the layer. The constructor sniffs whether the layer is compressed and based on this makes the "other" opener (that requires conversion) a lazy thunk that performs the conversion each time (this is effectively what we were doing before based on the stored compressed bit). The advantage of this approach is that by setting them up before option evaluation we can make them memoizing on their first invocation. Given that most consumers of this library only pass the uncompressed form (or should), I have only provided an option for memoizing the compression.

Today `tarball.LayerFromOpener` must either compress or decompress its input based on which it receives. Virtually all clients of this interface have an uncompressed tarball they want to turn into a layer, but the API we have presents two fun problems. 1. If we simply present the uncompressed tarball (as most clients do) then we will often end up compressing the layer (expensive!) twice. The first time is to compute the layer's digest, and the second is typically to publish the layer as part of `remote.Write`. 2. If we present a precompressed tarball (as ko does), then we will have an extra decompression (less expensive, but still redundant) to compute the diff id. This refactors the layer to store an opener for each form of the layer. The constructor sniffs whether the layer is compressed and based on this makes the "other" opener (that requires conversion) a lazy thunk that performs the conversion each time (this is effectively what we were doing before based on the stored `compressed` bit). The advantage of this approach is that by setting them up before option evaluation we can make them memoizing on their first invocation. Given that most consumers of this library only pass the uncompressed form (or should), I have only provided an option for memoizing the compression.

imjasonh

Why does ko pass a pre-compressed tarball? Should we recommend against it if we're going to treat the other path as special?

imjasonh · 2020-12-12T13:59:30Z

pkg/v1/tarball/layer.go

@@ -79,6 +70,31 @@ func WithCompressionLevel(level int) LayerOption {
 	}
 }

+// WithCompressedCaching ensures that we only compress the tarball once.
+func WithCompressedCaching(l *layer) {


This is an exported method that takes a param of an unexported type, so I don't think any external package can actually successfully call it.

Should it just be unexported?

This implements LayerOption, it is meant to pass to LayerFromOpener

To pile on: tarball.LayerFromOpener(opener, tarball.WithCompressedCaching) -- it doesn't get called, just passed in.

We could thunk it or mess with signatures to make it clearer that this is an option? Or just document it as WithCompressedCaching is a functional option that....

mattmoor · 2020-12-12T14:23:00Z

Why does ko pass a pre-compressed tarball?

I think @jonjohnsonjr was trying to optimize the ko side, and by eagerly compressing things we trade off 2 gzips for 1 gzip and 1 gunzip (and gunzip is faster). When an uncompressed tarball is passed with this option, it will now be a single gzip.

Should we recommend against it if we're going to treat the other path as special?

It's not really "special", we just don't surface an option for the other path. We could if folks are using it, but the consumers I've looked at don't seem to be using that path (except ko, which I will change as soon as this lands).

imjasonh · 2020-12-12T15:10:52Z

Yeah I guess I just mean, should we document, "you don't have to compress before passing into here, this will take care of compression for you if it's not compressed."

mattmoor · 2020-12-12T15:19:49Z

👍 I added a detailed comment. In other news, does Windows work at all, ever?

jonjohnsonjr · 2020-12-12T17:33:06Z

I think @jonjohnsonjr was trying to optimize the ko side, and by eagerly compressing things we trade off 2 gzips for 1 gzip and 1 gunzip (and gunzip is faster).

👍 this was the case when I measured it.

In other news, does Windows work at all, ever?

Yes, but seems to be flakier than other legs.

jonjohnsonjr · 2020-12-12T17:47:17Z

pkg/v1/tarball/layer.go

-		opener:      opener,
+	}
+
+	if compressed {


I'd really like to drop the above (L123) use of IsGzipped because this came up recently and is really annoying: #112 (comment)

We should be able to call opener just once, which would fix this... what about something like this:

Use a bufio.Reader to Peek at the first 2 bytes. We technically only need a 2 byte buffer for this, but we should probably do 4K to read (at least) a whole block. This will let us know if it's compressed without consuming the reader.

crane append uses tarball.LayerFromFile, which we should usually be able to call multiple times, but that doesn't seem to work with process substitution without fiddling.

Okay, so I went ahead and fixed the thing that I actually care about: #868

I still think it's a good idea to use a bufio.Reader here to peek at stuff, but I care about 15% less about it than I did before.

jonjohnsonjr · 2020-12-12T23:29:48Z

@imjasonh somehow this change is causing windows to fail because we're trying to open the same file twice without closing it (I think?), but scanning through the code I don't see an obvious issue. It looks like we Close everything but maybe not?

mattmoor · 2020-12-13T03:43:10Z

I think I tracked down the windows bug (see the final commit). We may have this problem other places, but this was dislodged by this change using the utility function in a place that previously used a gzip.NewReader directly.

mattmoor · 2020-12-13T15:23:29Z

That fixed things 😅

mattmoor requested a review from jonjohnsonjr December 12, 2020 03:10

Fix style stuff

86a40d1

imjasonh reviewed Dec 12, 2020

View reviewed changes

Add a detailed comment for LayerFromOpener with practical learnings

8cf2133

jonjohnsonjr reviewed Dec 12, 2020

View reviewed changes

Update comments

63829d9

jonjohnsonjr approved these changes Dec 12, 2020

View reviewed changes

mattmoor added 2 commits December 12, 2020 18:36

Try renaming variables

34c5b89

Fix the Gunzip utility's close bug

d2c3c08

jonjohnsonjr merged commit 8b5370a into google:master Dec 13, 2020

mattmoor deleted the refactor-tarball-layer branch December 13, 2020 19:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor to support caching compression. #867

Refactor to support caching compression. #867

mattmoor commented Dec 12, 2020

imjasonh left a comment

imjasonh Dec 12, 2020

mattmoor Dec 12, 2020

jonjohnsonjr Dec 12, 2020 •

edited

mattmoor commented Dec 12, 2020

imjasonh commented Dec 12, 2020

mattmoor commented Dec 12, 2020

jonjohnsonjr commented Dec 12, 2020

jonjohnsonjr Dec 12, 2020

jonjohnsonjr Dec 12, 2020

jonjohnsonjr commented Dec 12, 2020

mattmoor commented Dec 13, 2020

mattmoor commented Dec 13, 2020

Refactor to support caching compression. #867

Refactor to support caching compression. #867

Conversation

mattmoor commented Dec 12, 2020

imjasonh left a comment

Choose a reason for hiding this comment

imjasonh Dec 12, 2020

Choose a reason for hiding this comment

mattmoor Dec 12, 2020

Choose a reason for hiding this comment

jonjohnsonjr Dec 12, 2020 • edited

Choose a reason for hiding this comment

mattmoor commented Dec 12, 2020

imjasonh commented Dec 12, 2020

mattmoor commented Dec 12, 2020

jonjohnsonjr commented Dec 12, 2020

jonjohnsonjr Dec 12, 2020

Choose a reason for hiding this comment

jonjohnsonjr Dec 12, 2020

Choose a reason for hiding this comment

jonjohnsonjr commented Dec 12, 2020

mattmoor commented Dec 13, 2020

mattmoor commented Dec 13, 2020

jonjohnsonjr Dec 12, 2020 •

edited