-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COPY --from consistently fails with ZFS storage driver #1758
Comments
It would be awesome to see this fixed because using BUILDKIT on zfs is otherwise much faster than the normal docker build engine when using the zfs storage driver. |
Please provide a reproducible testcase. |
Edit: sorry, was following a link from stackoverflow that talked about the copy range issue and referenced this ticket, but this appears distinct. I opened a new issue on moby about my problem. Using
running:
`docker inspect crossbario/crossbar:latest`
|
@tonistiigi I think I have a fairly minimal reproduction, a script follows: #!/usr/bin/env bash
CONTEXT_TAR_PATH="${PWD}/context.tar"
if [ ! -f $CONTEXT_TAR_PATH ]; then
CONTEXT_TEMP=`mktemp -d`
echo "test-1" > ${CONTEXT_TEMP}/test-1
echo "test-2" > ${CONTEXT_TEMP}/test-2
cat > ${CONTEXT_TEMP}/Dockerfile <<-EODOCKERFILE
FROM alpine as builder1
COPY test-1 /test-1
FROM alpine as builder2
COPY test-2 /test-2
FROM alpine
COPY --from=builder1 /test-1 /stuff/test-1
COPY --from=builder2 /test-2 /stuff/test-2
EODOCKERFILE
(
cd ${CONTEXT_TEMP}
tar -cf ${CONTEXT_TAR_PATH} .
)
fi
cat ${CONTEXT_TAR_PATH} | env DOCKER_BUILDKIT=1 docker build -t zfs-issue - This fails fairly reliably for me if I run
Any idea how hard this could be to fix? I'm using ZFS and I need to use buildkit with it, so it's a fairly blocking issue for me. |
Sorry I didn't circle back to this but the above comment provides a reproducible example that is in line with my experience that I opened the issue for. |
This is still happening with containerd.io 1.4.3 and docker-ce 20.10.3 It seems to be a race condition that happens when Docker's root is on a ZFS volume and the build is multi-stage, containing a @tonistiigi This script reliably reproduces it, without the need for system prune or such workarounds (thanks @jaen) #!/bin/sh
cd `mktemp -d`
echo test-1 > test-1
echo test-2 > test-2
cat > Dockerfile <<EOF
FROM alpine as builder1
COPY test-1 /test-1
FROM alpine as builder2
COPY test-2 /test-2
FROM alpine
COPY --from=builder1 /test-1 /stuff/test-1
COPY --from=builder2 /test-2 /stuff/test-2
EOF
TAR=`mktemp`
tar -cf $TAR .
env DOCKER_BUILDKIT=1 docker build - < $TAR I don't know why the TAR step triggers the bug more easily, but it does. Sometimes you get this error:
Other times you get this:
PS. I think this is the same issue as moby/moby#37207 |
The only a workaround I found for this issue is not to use the zfs storage driver at all (explicitly set |
@tonistiigi now that there is a reproducible case multiple people corroborate, can we please give this a higher priority? It seriously hinders usage of buildkit on ZFS. |
@jaen Sorry, zfs is not at the top of the priorities atm for us so can't give ETA. It has always been a community maintained driver. Maybe https://github.com/moby/moby/blob/master/daemon/graphdriver/zfs/MAINTAINERS have time to look at it meanwhile. |
@tonistiigi sure, while somewhat disappointing for me personally, this is at least a clear statement of current priorities, so I know what to expect. Thanks for the update! @Mic92 @baloo, sorry for the ats, but do any of you have an inkling what's going on here? I'm afraid my ZFS/docker internals/go–fu isn't good enough to even guess at the cause (and possibly not good enough to fix it even if you point me in the right direction, but it's worth a shot). |
My knowledge of both Docker and ZFS internals is exactly zero, but from an outsider point of view, it seems very likely to be a race condition. Meaning: one thread (or process, or green thread, I don't know) is building a layer called X with the previous stage; another thread needs layer X as the source of a COPY command. Somehow, with the Docker ZFS driver, the second thread is signaled ("wake up, layer X is ready") before layer X is actually committed to disk, therefore causing a file not found error. Considering the relative maturity and usage of the Linux ZFS driver and of the Docker ZFS driver, I would wager the issue is in the latter. If i knew where to look, I would strengthen the locks that are involved in this. The issue is likely to be on the committing side, meaning the code that is signaling "layer is ready" is doing so before all underlying operations have been flushed, so maybe it needs an additional lock or system call. Maybe there's a ZFS system call that is usually "fast enough" but is not technically synchronous, so we need an additional system call to wait until it's complete. The easiest way to "work around" this bug, without having to identify the actual culprit system call, is to add a tiny "busy loop" right after a ZFS layer is supposedly created (but before telling the rest of Docker that it has been created) that tries to access the new layer from disk and waits until it actually succeeds. Where are the sources of the Docker ZFS driver anyways? |
Mhm. Never used buildkit. The same script seems to work with both docker and podman. |
But I can reproduce it with the buildkite option:
|
This is where the code lives: https://github.com/moby/moby/blob/master/daemon/graphdriver/zfs/zfs.go |
This is where the error is thrown: https://github.com/moby/moby/blob/01ae718aef64ff684501a8702f587af0ec034e71/daemon/graphdriver/zfs/zfs.go#L390 Here is one possible location where it could delete the mountpoint: https://github.com/moby/moby/blob/01ae718aef64ff684501a8702f587af0ec034e71/daemon/graphdriver/zfs/zfs.go#L417 |
I have never used buildkit and don't know anything about its architecture but does it somehow calls Get() and Put() from different processes? Otherwise I cannot explain how it would delete the mountpoint before mounting a different one. |
No, everything is internal to dockerd |
We hit this issue earlier this week, switched out buildx from docker (with zfs storage driver) to docker-container (which points to local docker, but it seems that buildkit uses different storage within the container, kind of expected), and our build times skyrocketed (multiple large images add tens of seconds on each build instruction instead of sub-seconds, even COPY of a small entrypoint file or definition of WORKDIR, resulting in hours of build time instead of a few minutes that we had with zfs storage). Anyway, we found a dirty / ugly workaround that allows us to stay with default docker builder on zfs, not sure yet how reliable it is as timing is likely to be of essence here, but we have not been able to reproduce that error with this approach yet. Basically, for each stage that will be copied from we run docker build with --target=stage and no --output specified before running final build without any --target.
Need to do more cleanup at the end to get rid of temporary images created for those builder stages, but at least it gives us our favorite zfs driver back. Really looking forward for this defect to be fixed though. |
With BuildKit, it is possible for Briefly looking at the zfs code, I could maybe see how there can be a race condition related to the mount directory being missing. Assume that there are two threads, and some layer with ID 824f183 has been mounted once by thread 1. Now, thread 1 wants to unmount it, and thread 2 wants to mount that same layer. Here's a possible execution sequence which will result in the above error:
In this sequence, the only mutex/lock that is present is in the refcount increment/decrement (this is from code outside of the ZFS file); the rest don't have any mutexes. |
@saiarcot895 If you compare this with overlay/aufs then afaics there the reference counter logic is always behind an id-based locker. |
@tonistiigi I see locks on the complete |
So does this mean that I should try replicating locks from that commit in the ZFS driver and see if it fixes the issue? |
I believe so, yes. I was working on some changes to the btrfs driver (porting patches that another person posted in a merge request to the current version of the code), and hit a similar issue when using buildkit to build the Docker build environment (with the host Docker running with my changes). After I added in the locks that the overlay driver has into the btrfs driver, that issue went away. |
Ok, so I have built docker with this commit jaen/moby@a4ab9f5: and it seems to fix the issue, I can't reproduce it using the script I posted above anymore. |
Feel free to open a pull request with your changes to have it reviewed 👍 |
Trying to build Docker images with buildkit using a ZFS-backed storage was unreliable due to apparent race condition between adding and removing layers to the storage (see: moby/buildkit#1758). The issue describes a similar problem with the BTRFS driver that was resolved by adding additional locking based on the scheme used in the OverlayFS driver. This commit replicates the scheme to the ZFS driver which makes the problem as reported in the issue stop happening. Signed-off-by: Tomasz Mańko <hi@jaen.me>
Made a PR with it, @saiarcot895 thanks a lot for pointing me in the right direction! |
I'm potentially in a good position to test this. I've been testing a deployment using Arch Linux on ZFS, and never got this before recently moby/moby#41866, which it seems this would fix. Sorry to ask for hand-holding but what's the easiest way to get this patch on Arch for testing? I'm consistently getting the 'failed to copy files: copy file range failed: invalid argument' error when trying to build or run this container: https://github.com/dperson/samba I can build the container after removing the VOLUMES section at the bottom of the Dockerfile, strangely. |
@oramirite I've since moved to NixOS so can't easily test this works for sure, but I think you would just want to use |
I can reproduce this at will by running certain containers. For example:
I had trouble building with the patch, so I was not able to test the fix. |
Trying to build Docker images with buildkit using a ZFS-backed storage was unreliable due to apparent race condition between adding and removing layers to the storage (see: moby/buildkit#1758). The issue describes a similar problem with the BTRFS driver that was resolved by adding additional locking based on the scheme used in the OverlayFS driver. This commit replicates the scheme to the ZFS driver which makes the problem as reported in the issue stop happening. Signed-off-by: Tomasz Mańko <hi@jaen.me>
FWIW https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=docker-git is very out of date and does not build as-is. This is not complete, may be broken, and required me to manually run |
Trying to build Docker images with buildkit using a ZFS-backed storage was unreliable due to apparent race condition between adding and removing layers to the storage (see: moby/buildkit#1758). The issue describes a similar problem with the BTRFS driver that was resolved by adding additional locking based on the scheme used in the OverlayFS driver. This commit replicates the scheme to the ZFS driver which makes the problem as reported in the issue stop happening. Signed-off-by: Tomasz Mańko <hi@jaen.me>
Is there a chance that the merged fix will make it into a docker-ce release any time soon? Will it be in the 20.10.x series? Some future version? |
FWIW I don't think that fix… well, fixes all the issues with ZFS driver. It's better, but sometimes I can with errors saying that a snapshot doesn't exist (for example when building or even pulling images), but I guess that could be considered a separate issue, since the reproduction isn't failing anymore. |
I installed the 22.06 beta build from the APT test channel about a week ago and have not seen this problem since then. It was happening roughly every other build otherwise. The fix from moby/moby#43136 is in 22.06 beta, but not in the 20.10.x series. |
This issue is still present in Docker 20.10.21 on Debian 11 when my root drive is zfs.
Oddly it only showed up today. No idea why previous builds didn't have the problem. Though I did just reinstall over Thanksgiving, and haven't run many builds locally since... My previous install was on ext4 (I think.) My work around is switching to the devicemapper driver. I tried overlay2 first, but docker says it isn't supported? devicemapper seems to work fine. So I'm good for now. Hopefully this will eventually get fixed. |
To be clear, its fixed, but only in the 22.x builds. |
@mschout fyi, 22.x is now 23, see moby/moby#44589 for more info. |
Apologies @crazy-max. 23.x is also immune. |
Given that 23.x is out now, I feel like this can be closed. I have not seen it at all since upgrading. |
Yup, let's go ahead and close this one 👍 |
Running any Dockerfile that uses
COPY --from
when the previous layer wasn't already in the cache, appears to cause the following issue:Other info:
Please let me know if I can provide any further helpful info.
The text was updated successfully, but these errors were encountered: