Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cache cannot reuse lazy layers #3109

Merged
merged 1 commit into from
Feb 15, 2023
Merged

Conversation

ktock
Copy link
Collaborator

@ktock ktock commented Sep 12, 2022

Fixes docker/buildx#1306

Currently, cache cannot reuse lazy layers that returns cache.NeedsRemoteProviderError on cacheresult.cacheResultStorage.Exist().
This causes the issue that cache doesn't reuse lazy layers as reported in docker/buildx#1306 .
This commit tries to fix this issue by allowing cache.NeedsRemoteProviderError during cache lookup.

@ktock ktock marked this pull request as ready for review September 12, 2022 22:19
@ktock
Copy link
Collaborator Author

ktock commented Sep 12, 2022

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this specific to stargz or for unpulled layers as well. Eg. cache match for merge(image(), ..).

Do we have a guarantee that we have connected the correct remote provider when the actual cache load would happen in this case?

@sipsma
Copy link
Collaborator

sipsma commented Sep 17, 2022

Is this specific to stargz or for unpulled layers as well. Eg. cache match for merge(image(), ..).

Thinking out loud: not sure if a cache match on unpulled non-stargz layers is really that useful. If you need to pull the layers anyways then that's as good as a cache miss. I'm guessing that for stargz this matters because you can have a "half-pulled" layer where some of the data is present locally but you still need a remote provider for the unpulled data, is that correct @ktock?

For the change itself, it seems like the main code path where Exists gets called is CacheManager.Records, which I think only gets called in solver/edge.go when e.cacheMap is available. That in turn probably means that we could plumb the cache opts from cacheMap through ctx to Exists and verify that the caller actually has a remote provider hooked up? Maybe? It's obviously kind of convoluted, so not sure if this actually adds up. If it does make sense though then maybe that'd help guarantee the result is actually going to be loadable.

The other place Exists gets called is ReleaseUnreferenced, which we'd need to handle too.

If there's a different way of guaranteeing the record will be loadable that's simpler then I'm all for it, the above is just my first thought looking through the code paths.

@ktock
Copy link
Collaborator Author

ktock commented Sep 19, 2022

@tonistiigi @sipsma

Thank you for taking a look at this.

I'm guessing that for stargz this matters because you can have a "half-pulled" layer where some of the data is present locally but you still need a remote provider for the unpulled data, is that correct @ktock?

Thank you for elaborating this. Yes, it's correct.

For the change itself, it seems like the main code path where Exists gets called is CacheManager.Records, which I think only gets called in solver/edge.go when e.cacheMap is available. That in turn probably means that we could plumb the cache opts from cacheMap through ctx to Exists and verify that the caller actually has a remote provider hooked up? Maybe? It's obviously kind of convoluted, so not sure if this actually adds up. If it does make sense though then maybe that'd help guarantee the result is actually going to be loadable.
The other place Exists gets called is ReleaseUnreferenced, which we'd need to handle too.

Thank you for the suggestion. Fixed the patch to pass cache opts to CacheManager.Records through ctx (as similar as what is done by sharedOp.LoadCache calling CacheManager.Load).

@tonistiigi
Copy link
Member

@ktock Could you also take a look at docker/buildx#1325 (comment) . Lmk if you think that is completely different.

@ktock
Copy link
Collaborator Author

ktock commented Oct 27, 2022

@ktock Could you also take a look at docker/buildx#1325 (comment) . Lmk if you think that is completely different.

@tonistiigi @sipsma It seems that docker/buildx#1325 (comment) can be addressed separately from this PR. (Please see also #3229).
Could we move this PR forward if the patch looks fine?

@ktock ktock requested a review from tonistiigi October 31, 2022 22:49
Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not mean thatRecords() can be a long-running call, right? Because it looks like it is called directly from the scheduler event loop where blocking is not allowed.

@ktock
Copy link
Collaborator Author

ktock commented Nov 4, 2022

@tonistiigi It seems that the call of Records API can reach to (worker/base).Worker.LoadRef() which calls cache.cacheManager.Get() API. Though the current implementation of the Get API checks the existence of descHandlers but doesn't look like cause slow operation like unlazying of the layer. It calls Stat API of the snapshotter but it's just a metadata lookup on the bbot db.

@ktock ktock force-pushed the reuseremotelayers branch 2 times, most recently from e9d3c6f to e198844 Compare November 29, 2022 08:32
Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
@ktock
Copy link
Collaborator Author

ktock commented Jan 25, 2023

CI passed, can we move this patch forward? @tonistiigi

@ktock ktock requested a review from tonistiigi February 7, 2023 05:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache is not used in simple Dockerfile with stargz snapshotter
4 participants