Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error creating zfs mount: no such file or directory #37207

Open
stephan2012 opened this issue Jun 4, 2018 · 17 comments
Open

error creating zfs mount: no such file or directory #37207

stephan2012 opened this issue Jun 4, 2018 · 17 comments

Comments

@stephan2012
Copy link

stephan2012 commented Jun 4, 2018

Description

docker build reports no such file or directory when running docker build on a ZFS.

Steps to reproduce the issue:
Execute docker build when /var/lib/docker is mounted on a ZFS filesystem.

Describe the results you received:

Step 7/7 : COPY ${APP_BINARY_FILENAME} ${TOMCAT_HOME}/webapps/${APP_BINARY_FILENAME}
error creating zfs mount of kpool/docker/192a72289b6040ea928e8873c5f2695029bf856ada2c232df312a17775ddae9a to /var/lib/docker/zfs/graph/192a72289b6040ea928e8873c5f2695029bf856ada2c232df312a17775ddae9a: no such file or directory
ERROR: Job failed: error executing remote command: command terminated with non-zero exit code: Error executing in Docker Container: 1

Describe the results you expected:
No errors

Additional information you deem important (e.g. issue happens only occasionally):
Intermittent error. Usually docker build succeeds after three to four retries. Probably a race condition.

# df -h /var/lib/docker
Filesystem      Size  Used Avail Use% Mounted on
kpool/docker     45G   14M   45G   1% /var/lib/docker
# rpm -q zfs
zfs-0.7.8-1.el7_4.x86_64

Output of docker version:

Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:20:16 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:23:58 2018
  OS/Arch:      linux/amd64
  Experimental: false

Output of docker info:

Containers: 14
 Running: 14
 Paused: 0
 Stopped: 0
Images: 49
Server Version: 18.03.1-ce
Storage Driver: zfs
 Zpool: kpool
 Zpool Health: ONLINE
 Parent Dataset: kpool/docker
 Space Used By Parent: 3222265856
 Space Available: 48431497216
 Parent Quota: no
 Compression: lz4
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-862.2.3.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 15.51GiB
Name: hseeckm01
ID: VE37:YIDL:4NAH:TOQR:SUED:MTR5:MZBK:ZJI6:BVQV:CGMR:TVRX:XA2X
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
HTTP Proxy: http://proxy.company.com:8080
HTTPS Proxy: http://proxy.company.com8080
No Proxy: localhost,127.0.0.1,.internal.company.com
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: bridge-nf-call-ip6tables is disabled

Additional environment details (AWS, VirtualBox, physical, etc.):

@ibm-rtvs
Copy link

Issue also seen on Ubuntu 16.04

uname -a

Linux XXXX 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

df -h /app/docker

Filesystem Size Used Avail Use% Mounted on
app 6.0T 2.4T 3.6T 40% /app

apt-cache policy zfsutils-linux

zfsutils-linux:
Installed: 0.6.5.6-0ubuntu25
Candidate: 0.6.5.6-0ubuntu25
Version table:
*** 0.6.5.6-0ubuntu25 500
500 http://us.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
100 /var/lib/dpkg/status
0.6.5.6-0ubuntu8 500
500 http://us.archive.ubuntu.com/ubuntu xenial/universe amd64 Packages

docker version

Client:
Version: 18.06.1-ce
API version: 1.38
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:24:56 2018
OS/Arch: linux/amd64
Experimental: false

Server:
Engine:
Version: 18.06.1-ce
API version: 1.38 (minimum version 1.12)
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:23:21 2018
OS/Arch: linux/amd64
Experimental: false

docker info

Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 58
Server Version: 18.06.1-ce
Storage Driver: zfs
Zpool: app
Zpool Health: ONLINE
Parent Dataset: app
Space Used By Parent: 2602410360832
Space Available: 3902053898240
Parent Quota: no
Compression: off
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-138-generic
Operating System: Ubuntu 16.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 28
Total Memory: 51.11GiB
Name: XXXXXXXXXXXX
ID: LMTP:WEWR:GLWB:GU53:B4Z3:TZHY:G4VT:MZ6R:JP3U:Y3JP:XMHB:2AUX
Docker Root Dir: /app/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
provider=generic
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

@ibm-rtvs
Copy link

 [exec] Step 6/8 : COPY --from=base /config/apps /config/apps
 [exec] error creating zfs mount of app/13b0a7c2ccbb0cffd077895a33875ac5002709a1ad2346bc0abfb3946989cd6d to /app/docker/zfs/graph/13b0a7c2ccbb0cffd077895a33875ac5002709a1ad2346bc0abfb3946989cd6d: no such file or directory

@alphaDev23
Copy link

Same issue. Output of docker info:

Containers: 96
Running: 0
Paused: 0
Stopped: 96
Images: 673
Server Version: 18.06.1-ce
Storage Driver: zfs
Zpool: error while getting pool information strconv.ParseUint: parsing "": invalid syntax
Zpool Health: not available
Parent Dataset: tankssd1/docker
Space Used By Parent: 2217741455360
Space Available: 1643426701312
Parent Quota: no
Compression: off
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local local-persist
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-34-generic
Operating System: Ubuntu 18.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 24
Total Memory: 188.8GiB
Name: st1
ID: YLV7:USHH:UUVL:XJ7Q:ARFP:QNB2:SEJG:XDZW:RICN:S6WU:RFHC:UMXZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

@bsutton
Copy link

bsutton commented Jul 15, 2020

I'm having the same problem.
Build had been working.
Stopped working for 3 or 4 builds and started working again.

The build command:

docker build --progress=plain --ssh default -t myrepo/nginx:$version .

The exact error:

> [stage-1 25/25] COPY --from=builder /home/build/bin/certbot_dns_cleanup_hook /home/bin/certbot_dns_cleanup_hook:
------
failed to solve with frontend dockerfile.v0: failed to solve with frontend gateway.v0: rpc error: code = Unknown desc = failed to build LLB: failed to compute cache key: error creating zfs mount: mount rpool/var/lib/docker/ux47l87frh6rq0rpw4m0gt20o:/var/lib/docker/zfs/graph/ux47l87frh6rq0rpw4m0gt20o: no such file or directory
Unhandled exception:
docker build --progress=plain --ssh default -t myrepo/nginx:1.0.0 . 

The dockerfile

# syntax=docker/dockerfile:1.0.0-experimental
# builder
# compiles the dart scripts.
FROM ubuntu:18.04 as builder


RUN apt-get update
RUN apt-get install --no-install-recommends -y wget ca-certificates gnupg2 openssh-client

RUN apt-get install -y git

# install dshell
# update this line to force a download of the latest dshell release.
COPY pull-dshell.txt  /dev/nul
RUN wget https://github.com/bsutton/dshell/raw/master/bin/linux/dshell_install
RUN chmod +x dshell_install
RUN ./dshell_install
ENV PATH="${PATH}:/usr/lib/dart/bin:/root/.pub-cache/bin"

RUN dshell version


RUN mkdir -p /home/build/target
RUN mkdir -p /home/build/bin
RUN mkdir -p /home/build/lib

COPY bin/*.dart /home/build/bin/
COPY lib /home/build/lib/
COPY pubspec.yaml /home/build
# COPY pubspec.lock /home/build
# COPY .packages /home/build

# Give git access to your ssh keays
RUN mkdir -m 700 /root/.ssh; 
RUN touch -m 600 /root/.ssh/known_hosts; 
RUN ssh-keyscan bitbucket.org > /root/.ssh/known_hosts

WORKDIR /home/build

RUN ls -lar


#RUN --mount=type=ssh  pub get
RUN --mount=type=ssh  dshell compile bin/cmd_dispatcher.dart 
RUN --mount=type=ssh  dshell compile bin/certbot_dns_auth_hook.dart
RUN --mount=type=ssh  dshell compile bin/certbot_dns_cleanup_hook.dart


# CMD ["/bin/bash"]

# Final image
FROM ubuntu:18.04

RUN apt-get update


# set the timezone
ENV TZ=Australia/Melbourne
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN apt-get install -y  tzdata

RUN apt-get install -y  gnupg nginx ca-certificates openssl # lastpass-cli 

RUN useradd nginx



# install certbot 

RUN apt-get install -y certbot python3-certbot-nginx
RUN apt-get install -y software-properties-common
RUN add-apt-repository universe
RUN add-apt-repository -y ppa:certbot/certbot
RUN apt-get update

WORKDIR /

# per generate diffie helman key exchange parameters
RUN mkdir -p /etc/nginx/ssl/
RUN openssl dhparam -out /etc/nginx/ssl/dhparam.pem 2048

# location for storing lets-encrypt certificates
# This needs to be mapped to a persistent volume
# so the certificates persist across sessions.
RUN mkdir -p /etc/letsencrypt

# Defines the directory where you can place 'upstream' files
# that define the set of upstream servers that we proxy calls to.
# You will need to include locations files in /etc/nginx/locations
# with proxy_pass statements to send requests to these upstream servers.
# Files should be of the form <purpose.upstream>
RUN mkdir -p /etc/nginx/upstream

# Defines the directory where non-standard locations
# can be included.
# Files should be of the form <purppose.location>
RUN mkdir -p /etc/nginx/locations

# location for the .well-know folder certbot will interact with.
RUN mkdir -p /opt/letsencrypt/wwwroot

COPY nginx_config/default.conf /etc/nginx/default.conf
COPY nginx_config/nginx.conf /etc/nginx/nginx.conf


RUN mkdir -p /home
ENV HOME="/home"


# lastpass volume will be mounted here.
RUN mkdir -p /home/.lastpass
# update the following no. if you have changed the cmd_dispatcher or cerbotDNSAuthHook
# and need the latest compiled version included.
ENV UPDATE_CMD_DISPATCHER=6
# copy in the cmd_dispatcher
RUN mkdir -p /home/bin
COPY --from=builder /home/build/bin/cmd_dispatcher /home/bin/cmd_dispatcher
COPY --from=builder /home/build/bin/certbot_dns_auth_hook /home/bin/certbot_dns_auth_hook
COPY --from=builder /home/build/bin/certbot_dns_cleanup_hook /home/bin/certbot_dns_cleanup_hook

ENV CERTBOT_DNS_AUTH_HOOK_PATH="/home/bin/certbot_dns_auth_hook"
ENV CERTBOT_DNS_CLEANUP_HOOK_PATH="/home/bin/certbot_dns_cleanup_hook"


EXPOSE 80 443

ENTRYPOINT ["/home/bin/cmd_dispatcher"]

CMD ["start"]


docker info
Client:
 Debug Mode: false

Server:
 Containers: 178
  Running: 1
  Paused: 0
  Stopped: 177
 Images: 106
 Server Version: 19.03.8
 Storage Driver: zfs
  Zpool: rpool
  Zpool Health: ONLINE
  Parent Dataset: rpool/var/lib/docker
  Space Used By Parent: 19263926272
  Space Available: 211152093184
  Parent Quota: no
  Compression: lz4
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 
 runc version: 
 init version: 
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-40-generic
 Operating System: Ubuntu 20.04 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 31.37GiB
 Name: slayer4
 ID: JDNU:CPUL:NO37:VT6E:HXIF:26LA:E4PB:DLAY:TEAT:VJJP:JQDV:YXWZ
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: <me>
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

@bsutton
Copy link

bsutton commented Jul 16, 2020

This is now happening every second build. If you are looking for some help to reproduce it I can probably help.

@stephan2012
Copy link
Author

@bsutton I have not seen this issue for a while. Maybe it is worth trying with recent versions of Docker and containerd.io (the latter one probably being more important).

@bsutton
Copy link

bsutton commented Jul 16, 2020

Sorry I'm not even certain what containerd.io is (I'm new to docker).
I have just managed to side step the problem by restructuring my project. I need ssh keys hence using buildkit. But with some restructuring I don't need the ssh keys anymore.

@tobia
Copy link

tobia commented Feb 8, 2021

@stephan2012 I can confirm that this is still happening with containerd.io 1.4.3 and docker-ce 20.10.3 (from Docker's official repo for debian buster)

This seems to be a race condition that happens when Docker's root is on a ZFS volume and the build is multi-stage, containing a COPY --from=... If the source stage takes more time to build than the destination stage (and is not already in cache) this triggers the bug.

The reason it seems to work every n-th time is that the race condition is not triggered if the previous build step is already in cache. The way to reproduce this is:

  1. use a ZFS installation, meaning that data-root is on a ZFS volume (even if daemon.json does not explicitly set "storage-driver": "zfs")
  2. take a moderately complex multi-stage build, with COPY --from=... commands (any such build should trigger the bug, as long as the source stage take more time to build than the destination stage)
  3. perform a build with --no-cache

See also moby/buildkit#1758 it is the same bug IMHO

@bsutton
Copy link

bsutton commented Feb 8, 2021 via email

@tobia
Copy link

tobia commented Feb 8, 2021

@bsutton If anything, the sleep should go in the destination stage, so that it can wait for the source stage to be ready. But it's not reliable. Sometimes you even get a successful build, but the image does not contain some of the files! So I would advise against it.

As I posted on the other issue moby/buildkit#1758 this is a minimal script that triggers the bug (thanks @jaen)

#!/bin/sh
cd `mktemp -d`
echo test-1 > test-1
echo test-2 > test-2
cat > Dockerfile <<EOF
FROM alpine as builder1
COPY test-1 /test-1
FROM alpine as builder2
COPY test-2 /test-2
FROM alpine
COPY --from=builder1 /test-1 /stuff/test-1
COPY --from=builder2 /test-2 /stuff/test-2
EOF
TAR=`mktemp`
tar -cf $TAR .
env DOCKER_BUILDKIT=1 docker build - < $TAR

I don't know why the TAR step triggers it more easily, but it does.

Sometimes you get this error:

[+] Building 0.8s (5/8)                                                                             
 => [internal] load remote build context                                                       0.0s
 => copy /context /                                                                            0.1s
 => [internal] load metadata for docker.io/library/alpine:latest                               0.5s
 => [builder2 1/2] FROM docker.io/library/alpine@sha256:08d6ca16c60fe7490c03d10dc339d9fd8ea67  0.0s
 => ERROR [builder2 2/2] COPY test-2 /test-2                                                   0.0s
------
 > [builder2 2/2] COPY test-2 /test-2:
------
failed to compute cache key: error creating zfs mount: mount tank/docker/x4pmhzudwpq49o1f2cg24mec8:/tank/docker/zfs/graph/x4pmhzudwpq49o1f2cg24mec8: no such file or directory

Other times you get this:

[+] Building 0.2s (2/2) FINISHED                                                                    
 => [internal] load remote build context                                                       0.0s
 => copy /context /                                                                            0.1s
failed to solve with frontend dockerfile.v0: failed to read dockerfile: open /tank/docker/tmp/buildkit-mount210258653/Dockerfile: no such file or directory

@tobia
Copy link

tobia commented Feb 10, 2021

PS. the only a workaround I found for this issue is not to use the zfs storage driver at all (explicitly set "storage-driver": "aufs" in /etc/docker/daemon.json) Of course if you want to do this on an existing server, you will have to backup your volumes, destroy your data root and re-create it.

@sebastianwebber
Copy link

I'm having the same issue, does any knows how can I set up the dev environment to work in a PR for this?

I identified that the error is thrown in this file but I have no idea about how debug this =/

Any tips?

@vespian
Copy link

vespian commented Jul 28, 2021

+1

I am going to move my docker mount onto a zvol + zfs + overlay. This is not the only bug I experienced :(

@ritchie-spinlock
Copy link

+1

Also experiencing this issue.

Running Arch Linux with kernel version 6.0.10.

Output for df -h /var/lib/docker

Filesystem            Size  Used Avail Use% Mounted on
zroot/var/lib/docker  664G   30M  664G   1% /var/lib/docker

Output of docker version

Client:
 Version:           20.10.21
 API version:       1.41
 Go version:        go1.19.2
 Git commit:        baeda1f82a
 Built:             Thu Oct 27 21:30:31 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.21
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.19.2
  Git commit:       3056208812
  Built:            Thu Oct 27 21:29:34 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.6.10
  GitCommit:        770bd0108c32f3fb5c73ae1264f7e503fe7b2661.m
 runc:
  Version:          1.1.4
  GitCommit:
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info

Client:
 Version:           20.10.21
 API version:       1.41
 Go version:        go1.19.2
 Git commit:        baeda1f82a
 Built:             Thu Oct 27 21:30:31 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.21
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.19.2
  Git commit:       3056208812
  Built:            Thu Oct 27 21:29:34 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.6.10
  GitCommit:        770bd0108c32f3fb5c73ae1264f7e503fe7b2661.m
 runc:
  Version:          1.1.4
  GitCommit:
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

@martinpal

This comment was marked as outdated.

@martinpal
Copy link

The fix discussed in moby/buildkit#1758 works. Backporting to docker.io 20.10.12 is straightforward.
moby/buildkit#1758

@heyarne
Copy link

heyarne commented Dec 18, 2023

I'm hitting this error in docker 24.0.5; is this a regression? Should this be fixed?

# sudo docker info
Client:
 Version:    24.0.5
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.12.0
    Path:     /nix/store/fx7wfc4k0a60alm3b40qrcmxj7w0kgl1-docker-plugins/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  2.23.1
    Path:     /nix/store/fx7wfc4k0a60alm3b40qrcmxj7w0kgl1-docker-plugins/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 5
  Running: 1
  Paused: 0
  Stopped: 4
 Images: 3
 Server Version: 24.0.5
 Storage Driver: zfs
  Zpool: rpool
  Zpool Health: ONLINE
  Parent Dataset: rpool/nixos/var/lib
  Space Used By Parent: 19275714560
  Space Available: 91500867584
  Parent Quota: no
  Compression: zstd
 Logging Driver: journald
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: v1.7.9
 runc version: 
 init version: 
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.6.4
 Operating System: NixOS 24.05 (Uakari)
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 31.24GiB
 Name: miso
 ID: edfcbd95-aec6-4f95-8b71-5fc9d05d9566
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests