Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate compression options to the inline cache export #2405

Merged
merged 1 commit into from
Oct 28, 2021

Conversation

ktock
Copy link
Collaborator

@ktock ktock commented Oct 7, 2021

This is a following-up patch for #2350 to move that forward.

Currently, compression options aren't propagated to the inline cache export and it always uses gzip compressor. This leads to an issue that the compression option is ignored when --export-cache type=inline is specified.

For example, a build something like the following ignores compression=uncompressed option and creates gzip images.

buildctl build --progress=plain --frontend=dockerfile.v0 --local context=/tmp/tmp.XAIR6qwH5h --local dockerfile=/tmp/tmp.XAIR6qwH5h \
               --output type=image,name=registry:5000/image:1,push=true,compression=uncompressed \
               --export-cache type=inline

This patch solves this issue by propagating compression options to the inline cache export as well. This also adds an option to solver.(*exporter).ExportTo() to avoid unexpected contents are recorded when inline export.

This adds @tonistiigi as a co-author because this patch is based on a commit of #2350.

@tonistiigi
Copy link
Member

I'm not sure if TargetLayers is the correct approach in here. We seem to be leaking inline cache logic into solver components. In #2350 the suggestion was LoadRemotes([]compression) []Remotes(not sure if compression needs to be full array) and then inline code can do the filter like before. We may need a possibility to export multiple blobs anyway. The case I want to support (for regular remote cache, not inline) is export cache using zstd and image export in gzip. Also explained in #2347 . For that, if the blob is cached as zstd but then image is created from the same blob, the blob should now be cached as gzip as well. Either we need to export both or we need mode where if both gzip and zstd blob exists, then gzip is exported, but if no blob exists then zstd blob is created.

From #2347, if it doesn't depend on this PR, maybe make sense to try to change the field to array before changing this logic as it will need to change with the datatype change anyway?

@ktock
Copy link
Collaborator Author

ktock commented Oct 8, 2021

In LoadRemotes([]Compression) []Remotes, does []Compression mean the allowed list of compression right?

Why I introduced TargetLayers here is that Compression option is ambiguous and LoadRemote(gzip) can return a chain like [gzip,zstd,zstd] when the caller wants [gzip,gzip,gzip]. And additionally, there are no guarantee that the digest of the returned blob is the same as expected even if the compression type is the same. If LoadRemote returns an unexpected chain, Load and Convert won't be called anymore in ExportTo() so it possibly loses the chance to load more candidate blobs.

@tonistiigi
Copy link
Member

tonistiigi commented Oct 8, 2021

In LoadRemotes([]Compression) []Remotes, does []Compression mean the allowed list of compression right?

Yes, maybe with #2347 we don't need an array of compressions as every object already contains array. But we would still need []Remote.

Why I introduced TargetLayers here is that Compression option is ambiguous and LoadRemote(gzip) can return a chain like [gzip,zstd,zstd] when the caller wants [gzip,gzip,gzip].

Yes, and with the alternative logic it would just export all available chains. So when inline cache filters the one it needs should be in there.

Load and Convert won't be called anymore in ExportTo() so it possibly loses the chance to load more candidate blobs.

ExportTo calls CacheExportModeMin anyway so stopping early doesn't give us any benefit as the export step never makes new blobs anyway.

@ktock ktock marked this pull request as draft October 11, 2021 00:25
@ktock
Copy link
Collaborator Author

ktock commented Oct 11, 2021

@tonistiigi Committed the initial attempt of LoadRemotes. PTAL.

solver/types.go Outdated
// Mode defines a cache export algorithm
Mode CacheExportMode
// Session is the session group to client (for auth credentials etc)
Session session.Group
// CompressionOpt is options to specify compressions. All objects that meets
// any of the options will be cached
CompressionOpt []CompressionOpt
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't strictly need to be a slice as of now because we always pass one or zero element here. (maybe *CompressionOpt is just enough) But in the future, when we want to pass more compressions to LoadRemote this might need to be a slice.
WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think eventually CompressionOpt.Type should be a slice and then in here and in LoadRemotes we possibly don't need a slice. Unless I'm missing something atm I don't think we even need a Force field (still need it as csv option). I'm fine with leaving the array conversion to the next PR though if you prefer.


func (sr *immutableRef) getRemote(ctx context.Context, createIfNeeded bool, compressionopt solver.CompressionOpt, s session.Group) (*solver.Remote, error) {
compressionType := compressionopt.Type
forceCompression := compressionopt.Force
ctx, done, err := leaseutil.WithLease(ctx, sr.cm.LeaseManager, leaseutil.MakeTemporary)
Copy link
Member

@tonistiigi tonistiigi Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lease should be moved to the caller as it is somewhat expensive and we shouldn't call it for every compression. In follow-up we should figure out if there is a way to avoid it at all when no new blobs need to be created.

remote, remotes = remotes[0], remotes[1:] // pop the first element
}
if len(opt.CompressionOpt) > 0 {
for _, r := range remotes { // record all reamaining remotes as well
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remaining (same on lin 136)

solver/types.go Outdated
// Mode defines a cache export algorithm
Mode CacheExportMode
// Session is the session group to client (for auth credentials etc)
Session session.Group
// CompressionOpt is options to specify compressions. All objects that meets
// any of the options will be cached
CompressionOpt []CompressionOpt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think eventually CompressionOpt.Type should be a slice and then in here and in LoadRemotes we possibly don't need a slice. Unless I'm missing something atm I don't think we even need a Force field (still need it as csv option). I'm fine with leaving the array conversion to the next PR though if you prefer.

@ktock ktock marked this pull request as ready for review October 12, 2021 11:25
@ktock
Copy link
Collaborator Author

ktock commented Oct 12, 2021

@tonistiigi I think we can focus on the bugfix part mentioned in #2347 and #2350 then we can unblock enhancement PRs.
For fixing the current broken behaviour of the cache export, we don't need array compressions so let's keep it as non-array.

@ktock ktock marked this pull request as draft October 12, 2021 11:32
@ktock ktock marked this pull request as ready for review October 12, 2021 11:45
@tonistiigi
Copy link
Member

I'm not sure why you have reverted the LoadRemotes() in the last refactor. There can be multiple blobs matching single compression(eg. gzip) plus also see the performance comment in #2405 (review)

@ktock ktock marked this pull request as draft October 13, 2021 04:00
@ktock
Copy link
Collaborator Author

ktock commented Oct 13, 2021

@tonistiigi Fixed to have LoadRemotes and GetRemotes.
The current design of them returns []*solver.Remote with all combination of the available compression variants. WDYT?
Once the design is agreed, I'll add tests.

@tonistiigi
Copy link
Member

Current design sgtm if we agree that follow up will change compression params into array.

@ktock
Copy link
Collaborator Author

ktock commented Oct 14, 2021

@tonistiigi Added tests. The following is the overview and note about the current behaviour of GetRemotes:

GetRemotes(ctx context.Context, createIfNeeded bool, compressionopt solver.CompressionOpt, all bool, s session.Group) ([]*solver.Remote, error)

If all = false, the specified compression type is applied only to the newly created blobs unless Force == true. It always returns a single *solver.Remote on succeeded.

If all = true, GetRemotes appends all available chains that have the specified compression at the topmost blob to the returned slice. Please note that the first element is still the same as the return value of GetRemotes with all = false. Thus it's possible that the first element of the returned []*solver.Remote doesn't contain any blobs of the specified compressionopt type (because the compresion type was only applied to the newly created blobs).

Maybe we can change the behaviour of GetRemotes with all = true to always discard the first element and guarantee that all returned *solver.Remote has the compression type at the topmost blob.

follow up will change compression params into array.

I think we can change solver.CompressionOpt.Type to slice when we support the array compression options,.

@ktock ktock marked this pull request as ready for review October 14, 2021 09:59
cache/remote.go Outdated
return
}

func (sr *immutableRef) getAvailableBlobs(ctx context.Context, compressionopt solver.CompressionOpt, s session.Group, target *compression.Type) ([]*solver.Remote, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is target pointer in here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored not to use target.

cache/remote.go Outdated
if len(compressions) == 0 {
// no available compression blobs for this blob (maybe a lazy ref).
// return a single remote with the specified compression
remote, err := sr.getRemote(ctx, false, compressionopt, s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this returns nil?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored not to call getRemote here.

cache/remote.go Outdated
var parentVariants []*solver.Remote
if sr.parent != nil {
var err error
parentVariants, err = sr.parent.getAvailableBlobs(ctx, compressionopt, s, nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this inefficient? computeBlobChain/getRemote already resolves parents and this does the same over again? The parents returned by getRemote are thrown away iiuc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to leverage the chain already resolved by getRemote.

solver/types.go Outdated
@@ -94,11 +95,20 @@ const (
// CacheExportOpt defines options for exporting build cache
type CacheExportOpt struct {
// Convert can convert a build result to transferable object
Convert func(context.Context, Result) (*Remote, error)
Convert func(context.Context, Result) ([]*Remote, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe smth like ResolveRemotes makes more sense as a name now as it doesn't quite make sense how you can convert something into an array. Or at least pluralize the comment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to ResolveRemotes.

Co-authored-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@tonistiigi tonistiigi merged commit 4b86211 into moby:master Oct 28, 2021
@ktock ktock deleted the cachecompression branch October 28, 2021 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants