Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add locking to the ZFS driver #43136

Merged
merged 1 commit into from
Mar 7, 2022
Merged

Add locking to the ZFS driver #43136

merged 1 commit into from
Mar 7, 2022

Conversation

jaen
Copy link
Contributor

@jaen jaen commented Jan 10, 2022

  • What I did
    Fixed an issue with the ZFS driver race condition when building images with buildkit.

  • How I did it
    The relevant issue (COPY --from consistently fails with ZFS storage driver buildkit#1758)
    describes a similar problem with the BTRFS driver that was resolved by adding
    additional locking based on the scheme used in the OverlayFS driver. This PR
    replicates the scheme to the ZFS driver which makes the problem as reported
    in the issue stop happening. I'm not sure how optimal this is, but at the very least
    it appears to not be incorrect.

  • How to verify it
    There's a bash script to reproduce the problem in the aforementioned issue.
    Without the fix the problem will be happening (unless the build is not already
    cached), with the fix it stops happening. Maybe a real test would be useful for
    this so no regression happens, but no idea how to go about it.

  • Description for the changelog
    Fix a race condition when using buildkit on a ZFS-backed Docker daemon.

  • A picture of a cute animal (not mandatory but encouraged)
    photo5972299493246351105
    ‘Developer unable to believe he finally has a fix for the Docker ZFS issue that had been plaguing him for months’, 2022 (colorised, private collection)

@3nprob
Copy link

3nprob commented Feb 28, 2022

Confirmed it solves the issue here

@kmcmurtrieca
Copy link

Would love to see this merged. This problem happens in some other areas of building too, not just for COPY.

Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you switch this to use https://github.com/moby/locker instead?

Otherwise this LGTM

@jaen
Copy link
Contributor Author

jaen commented Mar 1, 2022

Sure, I was referring to the code from fc1cf19 and didn't notice that the locking mechanism is now imported differently. I'll try to update this and re-test today/tomorrow.

@jaen
Copy link
Contributor Author

jaen commented Mar 1, 2022

@cpuguy83 also, any suggestion how to test this, so it doesn't regress? I'm not very familiar with the go ecosystem or how this codebase manages it's tests, so any pointers are welcome. If it's non-trivial, I could also do it as a follow-up PR, so it's not held up more.

@cpuguy83
Copy link
Member

cpuguy83 commented Mar 1, 2022

@jaen Testing race conditions relaibly is rarely easy.
If you can setup a unit test that 100% always exploits the issue that would be best.
Otherwise it is fair, particularly for this case, to just have the fix.

@thaJeztah
Copy link
Member

@jaen would you be able to address #43136 (review)? If that's addressed I think we should be able to merge this.

Can you switch this to use https://github.com/moby/locker instead?

Otherwise this LGTM

@thaJeztah thaJeztah added this to the 21.xx milestone Mar 5, 2022
Trying to build Docker images with buildkit using a ZFS-backed storage
was unreliable due to apparent race condition between adding and
removing layers to the storage (see: moby/buildkit#1758).
The issue describes a similar problem with the BTRFS driver that was
resolved by adding additional locking based on the scheme used in the
OverlayFS driver. This commit replicates the scheme to the ZFS driver
which makes the problem as reported in the issue stop happening.

Signed-off-by: Tomasz Mańko <hi@jaen.me>
@jaen
Copy link
Contributor Author

jaen commented Mar 6, 2022

@cpuguy83 right, my point is I'm not really sure how to "test" this other than running my reproduction script from the issue thread. And I don't think shelling out to a script using a live docker daemon would be an acceptable test. If you could show me an existing example of a test which drives a docker daemon from go code then I could figure out how to translate this, but otherwise I have no idea how to do it. I think it would be useful to test this, even in a follow-up PR, if there's an acceptable way to write this test in the Docker test suite, so if I could be pointed into an existing test doing similar things, then I'll be happy to implement it when I have some time to spare.

@thaJeztah sorry, didn't have time to do this since the review comment was left, it should be using the new import path now. Verified to still work.

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@thaJeztah
Copy link
Member

Let's get this on in; I'll take @cpuguy83's earlier LGTM (as the only thing changed since was the import path to github.com/moby/locker)

@thaJeztah thaJeztah merged commit 327699c into moby:master Mar 7, 2022
amaxine pushed a commit to NixOS/nixpkgs that referenced this pull request Mar 20, 2022
The patch incorporates changes merged into the upstream in this PR:
moby/moby#43136
blitz pushed a commit to blitz/nixpkgs that referenced this pull request Apr 29, 2022
The patch incorporates changes merged into the upstream in this PR:
moby/moby#43136

(cherry picked from commit 3d25f04)
awake-bot pushed a commit to awakesecurity/nixpkgs that referenced this pull request Jul 11, 2023
The patch incorporates changes merged into the upstream in this PR:
moby/moby#43136

(cherry picked from commit 3d25f04)

# Conflicts:
#	pkgs/applications/virtualization/docker/default.nix
jsoo1 added a commit to awakesecurity/nixpkgs that referenced this pull request Jul 11, 2023
The patch incorporates changes merged into the upstream in this PR:
moby/moby#43136

(cherry picked from commit 3d25f04)

# Conflicts:
#	pkgs/applications/virtualization/docker/default.nix
jsoo1 added a commit to awakesecurity/nixpkgs that referenced this pull request Jul 12, 2023
The patch incorporates changes merged into the upstream in this PR:
moby/moby#43136

(cherry picked from commit 3d25f04)

# Conflicts:
#	pkgs/applications/virtualization/docker/default.nix
jsoo1 added a commit to awakesecurity/nixpkgs that referenced this pull request Jul 13, 2023
The patch incorporates changes merged into the upstream in this PR:
moby/moby#43136

(cherry picked from commit 3d25f04)

# Conflicts:
#	pkgs/applications/virtualization/docker/default.nix
awake-bot pushed a commit to awakesecurity/nixpkgs that referenced this pull request Jul 20, 2023
The patch incorporates changes merged into the upstream in this PR:
moby/moby#43136

(cherry picked from commit 3d25f04)

# Conflicts:
#	pkgs/applications/virtualization/docker/default.nix
jsoo1 added a commit to awakesecurity/nixpkgs that referenced this pull request Jul 21, 2023
The patch incorporates changes merged into the upstream in this PR:
moby/moby#43136

(cherry picked from commit 3d25f04)

# Conflicts:
#	pkgs/applications/virtualization/docker/default.nix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants