Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solver: Release unused refs in LoadWithParents #3815

Merged
merged 1 commit into from
Apr 25, 2023

Conversation

sipsma
Copy link
Collaborator

@sipsma sipsma commented Apr 24, 2023

Before this any refs that were loaded but not put into results would be leaked.


cc @tonistiigi noticed in certain load tests I was running w/ Dagger that cache refs were leaking, this fixes it. I am still trying to understand the exact circumstances it arises in.

I know for sure it only happens when using a cacheManager constructed from a remote cache import. And I also think it might happen when there are multiple results that end up with the same layers (i.e. two image refs resolve to the same set of layers). However, the context in which this was triggered involved cache manifest json constructed by custom code, not by the normal buildkit code, so I don't know if it's possible to trigger this using a cache manifest generated by buildkit upstream.

Either way, I think this fix is desirable upstream since we should never leave refs unreleased anyways.


EDIT: @tonistiigi I was able to reproduce this same bug purely with Buildkit, see the newly added integration test, which fails without this change present.

The problem happens when a remote cache import has multiple records with the same layers. In this case the resultID is calculated as a hash of the layer digests, so both records are associated with the same resultID. This causes them both to be loaded in LoadWithParents in the remote cache result storage. filterResults then only chooses one of them to use for the returned results list, but it never releases the refs for the others it didn't end up choosing.

Before this any refs that were loaded but not put into results would be
leaked.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>
@@ -217,6 +217,11 @@ func (c *cacheManager) LoadWithParents(ctx context.Context, rec *CacheRecord) ([
r.Release(context.TODO())
}
}
for _, r := range m {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be in the defer of filterResults?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filterResults is recursive so it seemed complicated to try to update it relative to just emptying it out here. Figured it made sense still since m isn't constructed by filterResults

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there is also inefficiency with LoadWithParents. IIRC it will load parents recursively multiple times. This will not cause extra cache loads but will cause too many boltdb accesses. Just FYI if you are looking into this area and have ideas.

@sipsma sipsma merged commit 20af103 into moby:master Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants