-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add locking to the ZFS driver #43136
Conversation
Confirmed it solves the issue here |
Would love to see this merged. This problem happens in some other areas of building too, not just for COPY. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you switch this to use https://github.com/moby/locker instead?
Otherwise this LGTM
Sure, I was referring to the code from fc1cf19 and didn't notice that the locking mechanism is now imported differently. I'll try to update this and re-test today/tomorrow. |
@cpuguy83 also, any suggestion how to test this, so it doesn't regress? I'm not very familiar with the go ecosystem or how this codebase manages it's tests, so any pointers are welcome. If it's non-trivial, I could also do it as a follow-up PR, so it's not held up more. |
@jaen Testing race conditions relaibly is rarely easy. |
@jaen would you be able to address #43136 (review)? If that's addressed I think we should be able to merge this.
|
Trying to build Docker images with buildkit using a ZFS-backed storage was unreliable due to apparent race condition between adding and removing layers to the storage (see: moby/buildkit#1758). The issue describes a similar problem with the BTRFS driver that was resolved by adding additional locking based on the scheme used in the OverlayFS driver. This commit replicates the scheme to the ZFS driver which makes the problem as reported in the issue stop happening. Signed-off-by: Tomasz Mańko <hi@jaen.me>
@cpuguy83 right, my point is I'm not really sure how to "test" this other than running my reproduction script from the issue thread. And I don't think shelling out to a script using a live docker daemon would be an acceptable test. If you could show me an existing example of a test which drives a docker daemon from go code then I could figure out how to translate this, but otherwise I have no idea how to do it. I think it would be useful to test this, even in a follow-up PR, if there's an acceptable way to write this test in the Docker test suite, so if I could be pointed into an existing test doing similar things, then I'll be happy to implement it when I have some time to spare. @thaJeztah sorry, didn't have time to do this since the review comment was left, it should be using the new import path now. Verified to still work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Let's get this on in; I'll take @cpuguy83's earlier LGTM (as the only thing changed since was the import path to |
The patch incorporates changes merged into the upstream in this PR: moby/moby#43136
The patch incorporates changes merged into the upstream in this PR: moby/moby#43136 (cherry picked from commit 3d25f04)
The patch incorporates changes merged into the upstream in this PR: moby/moby#43136 (cherry picked from commit 3d25f04) # Conflicts: # pkgs/applications/virtualization/docker/default.nix
The patch incorporates changes merged into the upstream in this PR: moby/moby#43136 (cherry picked from commit 3d25f04) # Conflicts: # pkgs/applications/virtualization/docker/default.nix
The patch incorporates changes merged into the upstream in this PR: moby/moby#43136 (cherry picked from commit 3d25f04) # Conflicts: # pkgs/applications/virtualization/docker/default.nix
The patch incorporates changes merged into the upstream in this PR: moby/moby#43136 (cherry picked from commit 3d25f04) # Conflicts: # pkgs/applications/virtualization/docker/default.nix
The patch incorporates changes merged into the upstream in this PR: moby/moby#43136 (cherry picked from commit 3d25f04) # Conflicts: # pkgs/applications/virtualization/docker/default.nix
The patch incorporates changes merged into the upstream in this PR: moby/moby#43136 (cherry picked from commit 3d25f04) # Conflicts: # pkgs/applications/virtualization/docker/default.nix
What I did
Fixed an issue with the ZFS driver race condition when building images with buildkit.
How I did it
The relevant issue (COPY --from consistently fails with ZFS storage driver buildkit#1758)
describes a similar problem with the BTRFS driver that was resolved by adding
additional locking based on the scheme used in the OverlayFS driver. This PR
replicates the scheme to the ZFS driver which makes the problem as reported
in the issue stop happening. I'm not sure how optimal this is, but at the very least
it appears to not be incorrect.
How to verify it
There's a bash script to reproduce the problem in the aforementioned issue.
Without the fix the problem will be happening (unless the build is not already
cached), with the fix it stops happening. Maybe a real test would be useful for
this so no regression happens, but no idea how to go about it.
Description for the changelog
Fix a race condition when using buildkit on a ZFS-backed Docker daemon.
A picture of a cute animal (not mandatory but encouraged)
![photo5972299493246351105](https://user-images.githubusercontent.com/227236/148763610-8dfb2a0c-a66d-4a63-8215-04f5e0229c0e.jpg)
‘Developer unable to believe he finally has a fix for the Docker ZFS issue that had been plaguing him for months’, 2022 (colorised, private collection)