Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker messes up /etc symlink in an imported filesystem #42706

Open
clime opened this issue Aug 2, 2021 · 4 comments
Open

docker messes up /etc symlink in an imported filesystem #42706

clime opened this issue Aug 2, 2021 · 4 comments
Labels
area/builder area/runtime kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/20.10

Comments

@clime
Copy link

clime commented Aug 2, 2021

Please, see the reproducer in the image below (the second screen on the right is continuation of the first one on the left). It seems docker cannot cope with /etc being a symlink to somewhere in an imported filesystem (e.g. /etc -> media/etc as in the reproducer). It replaces it with a normal directory with a few generated files (hosts, hostname, resolv.conf, mtab) at some point.

strange-docker-etc-stuff-3

(Please, forgive me the embarrassing problems with permissions along the way but I think they didn't affect anything).


Docker version 20.10.6, build 370c289 (tried also Docker version 20.10.7, build f0df350 with the same result)


More info:

$ docker version
Client:
 Version:           20.10.6
 API version:       1.41
 Go version:        go1.16
 Git commit:        370c289
 Built:             Tue Apr 20 22:03:35 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.6
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16
  Git commit:       8728dd2
  Built:            Tue Apr 20 00:00:00 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.0~rc.1
  GitCommit:        
 runc:
  Version:          1.0.0-rc95
  GitCommit:        4c62ef789fd7a2963bf61ffbf421ce9646063648
 docker-init:
  Version:          0.19.0
  GitCommit:
$ docker info
Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 23
  Running: 2
  Paused: 0
  Stopped: 21
 Images: 54
 Server Version: 20.10.6
 Storage Driver: btrfs
  Build Version: Btrfs v5.11.1 
  Library Version: 102
 Logging Driver: journald
 Cgroup Driver: systemd
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: /usr/libexec/docker/docker-init
 containerd version: 
 runc version: 4c62ef789fd7a2963bf61ffbf421ce9646063648
 init version: 
 Security Options:
  selinux
 Kernel Version: 5.12.15-300.fc34.x86_64
 Operating System: Fedora 34 (Workstation Edition)
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 15.07GiB
 Name: den.exe
 ID: 7SQ3:BJTV:NNJZ:374D:SGJL:FIB2:ECAZ:KWAJ:ARAZ:HAZ2:G5SE:PVDT
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: true
@thaJeztah
Copy link
Member

Thanks for reporting. Could you also add the full output of docker version and docker info to your description?

So, what I think happens here, is that the image may have the symlink, but when starting a container from the image, the symlink will be masked by mounts set up for the container's networking;

func (container *Container) NetworkMounts() []Mount {

Each container (by default) gets a generated /etc/resolv.conf, /etc/hostname, and /etc/hosts. Those files are created on the host, and bind-mounted into the container. If I'm not mistaken, they're bind-mounted to both make sure that runtime-configuration is not persisted in the container's filesystem, to allow dynamically updating them, and to allow these files to be "writable", even if the container is started with --read-only:

docker run --rm alpine sh -c 'mount | grep etc'
/dev/vda1 on /etc/resolv.conf type ext4 (rw,relatime)
/dev/vda1 on /etc/hostname type ext4 (rw,relatime)
/dev/vda1 on /etc/hosts type ext4 (rw,relatime)

The NetworkMounts() function linked above is responsible for defining those mounts in the container's configuration; the actual mounting is handled by runc, which creates the mountpoint (if missing); https://github.com/opencontainers/runc/blob/5547b5774f71f75a088e7432fa961778750a0fbd/libcontainer/rootfs_linux.go#L443, https://github.com/opencontainers/runc/blob/5547b5774f71f75a088e7432fa961778750a0fbd/libcontainer/rootfs_linux.go#L953

That said, I'd somewhat expect symlinks to be resolved when creating those mounts. Doing a quick test;

Create an image that has a symlinked directory /destination -> /actual-destination

docker build -t foo -<<EOF
FROM alpine
RUN mkdir /actual-destination && ln -s /actual-destination /destination
EOF

Running a container that bind-mounts a file within /destination/ inside the container. Looking at the output, it does resolve the target before doing the mount, so it's mounted at /actual-destination/somefile.sh;

docker run --rm -v $(pwd)/somefile.txt:/destination/somefile.sh foo sh -c 'ls -l /destination /destination/ /actual-destination/'
lrwxrwxrwx    1 root     root            19 Aug  3 09:42 /destination -> /actual-destination

/actual-destination/:
total 4
-rwxr-xr-x    1 root     root          1236 Jul 29 23:00 somefile.sh

/destination/:
total 4
-rwxr-xr-x    1 root     root          1236 Jul 29 23:00 somefile.sh

That said, it's possible the classic (non-buildkit) builder creates a container for the ADD step, which potentialy already has a mount set up for /etc. Do you see the same result if you build the image using BuildKit? (DOCKER_BUILDKIT=1 docker build ......)

@clime
Copy link
Author

clime commented Aug 3, 2021

I have added output of docker version and docker info. DOCKER_BUILDKIT=1 docker build ... gives the same result (/etc/ becomes a normal directory with a few generated files). It's quite a strange problem, I admit.

@thaJeztah
Copy link
Member

Did some quick tests to try to narrow down the issue (FWIW; still a bit curious what your use-case is, but I got a bit intrigued as to "why" these mounts don't use the symlink 😂)

Create a directory to test in, pull the centos:7 image, and docker save the image (I'm saving the image instead of the container's rootfs, just in case "creating the container" changes things);

mkdir -p test/temp && cd test
docker pull centos:7
docker save -o img.tar centos:7
tar -xf img.tar

Extract the image, and find the layer we're interested in (it's a single-layer image);

ls -l
total 206764
-rw-r--r-- 1 root root      2755 Nov 14  2020 8652b9f0cb4c0599575e5a003f5906876e10c1ceb2ab9fe1786712dac14a50cf.json
drwxr-xr-x 2 root root      4096 Nov 14  2020 cf6619f89e575099622d9f069d0d94f21460504754861e2125b36313bbd34188
-rw-r--r-- 1 root root        13 Aug  7  2019 Dockerfile
-rw------- 1 root root 211696640 Aug  4 16:31 img.tar
-rw-r--r-- 1 root root       197 Jan  1  1970 manifest.json
-rw-r--r-- 1 root root        84 Jan  1  1970 repositories
drwxr-xr-x 2 root root      4096 Aug  4 16:34 temp

ls -l cf6619f89e575099622d9f069d0d94f21460504754861e2125b36313bbd34188
total 206736
-rw-r--r-- 1 root root      1954 Nov 14  2020 json
-rw-r--r-- 1 root root 211685376 Nov 14  2020 layer.tar
-rw-r--r-- 1 root root         3 Nov 14  2020 VERSION

Extract the layer in the temp directory:

cd temp
tar -xvf ../cf6619f89e575099622d9f069d0d94f21460504754861e2125b36313bbd34188/layer.tar

Move the etc directory into media, and create a symlink. Run ls to verify what we just did:

mv etc/ media/
ln -s media/etc etc

ls -l
total 64
-rw-r--r--  1 root root 12114 Nov 13  2020 anaconda-post.log
lrwxrwxrwx  1 root root     7 Nov 13  2020 bin -> usr/bin
drwxr-xr-x  2 root root  4096 Nov 13  2020 dev
lrwxrwxrwx  1 root root     9 Aug  4 16:35 etc -> media/etc
drwxr-xr-x  2 root root  4096 Apr 11  2018 home
lrwxrwxrwx  1 root root     7 Nov 13  2020 lib -> usr/lib
lrwxrwxrwx  1 root root     9 Nov 13  2020 lib64 -> usr/lib64
drwxr-xr-x  3 root root  4096 Aug  4 16:35 media
drwxr-xr-x  2 root root  4096 Apr 11  2018 mnt
drwxr-xr-x  2 root root  4096 Apr 11  2018 opt
drwxr-xr-x  2 root root  4096 Nov 13  2020 proc
dr-xr-x---  2 root root  4096 Nov 13  2020 root
drwxr-xr-x 11 root root  4096 Nov 13  2020 run
lrwxrwxrwx  1 root root     8 Nov 13  2020 sbin -> usr/sbin
drwxr-xr-x  2 root root  4096 Apr 11  2018 srv
drwxr-xr-x  2 root root  4096 Nov 13  2020 sys
drwxrwxrwt  7 root root  4096 Nov 13  2020 tmp
drwxr-xr-x 13 root root  4096 Nov 13  2020 usr
drwxr-xr-x 18 root root  4096 Nov 13  2020 var

Create a tarball of the new rootfs, and save it as new-image.tgz:

tar -czf ../new-image.tgz .

Build a FROM scratch image. I'm using BuildKit, so that it doesn't send all the other files to the daemon.

cd ../

DOCKER_BUILDKIT=1 docker build -t newimage -f- . <<EOF
FROM scratch
ADD new-image.tgz /
EOF

Run a container from the image, and (as reported), etc is now a directory, not a symlink:

docker run --rm newimage ls -l /
total 56
-rw-r--r--   1 0 0 12114 Nov 13  2020 anaconda-post.log
lrwxrwxrwx   1 0 0     7 Nov 13  2020 bin -> usr/bin
drwxr-xr-x   5 0 0   340 Aug  4 16:39 dev
drwxr-xr-x   2 0 0  4096 Aug  4 16:39 etc
drwxr-xr-x   2 0 0  4096 Apr 11  2018 home
lrwxrwxrwx   1 0 0     7 Nov 13  2020 lib -> usr/lib
lrwxrwxrwx   1 0 0     9 Nov 13  2020 lib64 -> usr/lib64
drwxr-xr-x   3 0 0  4096 Aug  4 16:35 media
drwxr-xr-x   2 0 0  4096 Apr 11  2018 mnt
drwxr-xr-x   2 0 0  4096 Apr 11  2018 opt
dr-xr-xr-x 109 0 0     0 Aug  4 16:39 proc
dr-xr-x---   2 0 0  4096 Nov 13  2020 root
drwxr-xr-x  11 0 0  4096 Nov 13  2020 run
lrwxrwxrwx   1 0 0     8 Nov 13  2020 sbin -> usr/sbin
drwxr-xr-x   2 0 0  4096 Apr 11  2018 srv
dr-xr-xr-x  13 0 0     0 Aug  4 16:38 sys
drwxrwxrwt   7 0 0  4096 Nov 13  2020 tmp
drwxr-xr-x  13 0 0  4096 Nov 13  2020 usr
drwxr-xr-x  18 0 0  4096 Nov 13  2020 var

As before in this thread, /etc has three mounts for the networking files:

docker run --rm newimage sh -c 'mount | grep /etc'
/dev/vda1 on /etc/resolv.conf type ext4 (rw,relatime,data=ordered)
/dev/vda1 on /etc/hostname type ext4 (rw,relatime,data=ordered)
/dev/vda1 on /etc/hosts type ext4 (rw,relatime,data=ordered)

To narrow down "where" things happen, I have a look at the image we built, so I save the image to a tar (newimage-saved.tar), create a directory and extract the image into that directory

docker save -o newimage-saved.tar newimage:latest

mkdir newimage-extracted && cd newimage-extracted
tar -xf ../newimage-saved.tar

Find the location of the image's layer:

ls -l
total 16
drwxr-xr-x 2 root root 4096 Aug  4 16:38 64d5743bed13da26b52d0967fb64faecb9f3a4c6d9c1d5997ad17d00e1226a67
-rw-r--r-- 1 root root  451 Aug  4 16:38 aaa80cbebf7fd4e9582b45bd520040f2ad81186eb866f52656574ef41c2d0c6b.json
-rw-r--r-- 1 root root  204 Jan  1  1970 manifest.json
-rw-r--r-- 1 root root   91 Jan  1  1970 repositories

ls -l 64d5743bed13da26b52d0967fb64faecb9f3a4c6d9c1d5997ad17d00e1226a67
total 206732
-rw-r--r-- 1 root root       772 Aug  4 16:38 json
-rw-r--r-- 1 root root 211680768 Aug  4 16:38 layer.tar
-rw-r--r-- 1 root root         3 Aug  4 16:38 VERSION

Create a new directory, and extract the image layer into that directory:

mkdir layer-extracted && cd layer-extracted
tar -xf ../64d5743bed13da26b52d0967fb64faecb9f3a4c6d9c1d5997ad17d00e1226a67/layer.tar

List what's in there, and notice that the image itself does have the symlink:

ls -l
total 64
-rw-r--r--  1 root root 12114 Nov 13  2020 anaconda-post.log
lrwxrwxrwx  1 root root     7 Nov 13  2020 bin -> usr/bin
drwxr-xr-x  2 root root  4096 Nov 13  2020 dev
lrwxrwxrwx  1 root root     9 Aug  4 16:35 etc -> media/etc
drwxr-xr-x  2 root root  4096 Apr 11  2018 home
lrwxrwxrwx  1 root root     7 Nov 13  2020 lib -> usr/lib
lrwxrwxrwx  1 root root     9 Nov 13  2020 lib64 -> usr/lib64
drwxr-xr-x  3 root root  4096 Aug  4 16:35 media
drwxr-xr-x  2 root root  4096 Apr 11  2018 mnt
drwxr-xr-x  2 root root  4096 Apr 11  2018 opt
drwxr-xr-x  2 root root  4096 Nov 13  2020 proc
dr-xr-x---  2 root root  4096 Nov 13  2020 root
drwxr-xr-x 11 root root  4096 Nov 13  2020 run
lrwxrwxrwx  1 root root     8 Nov 13  2020 sbin -> usr/sbin
drwxr-xr-x  2 root root  4096 Apr 11  2018 srv
drwxr-xr-x  2 root root  4096 Nov 13  2020 sys
drwxrwxrwt  7 root root  4096 Nov 13  2020 tmp
drwxr-xr-x 13 root root  4096 Nov 13  2020 usr
drwxr-xr-x 18 root root  4096 Nov 13  2020 var

And the media/etc directory that the symlink points to contains the expected files;

ls -l etc/
total 1056
-rw-r--r--  1 root root     16 Nov 13  2020 adjtime.rpmsave
-rw-r--r--  1 root root   1529 Apr  1  2020 aliases
drwxr-xr-x  2 root root   4096 Nov 13  2020 alternatives
drwxr-xr-x  2 root root   4096 Nov 13  2020 bash_completion.d
...

So from the above, it looks like:

  • building the image works as expected
  • when running a container, the mounts that are mounted at /etc ignore the symlink, and instead of resolving the symlink's target, they're mounted at /etc/xxx (instead of /media/etc/xxx).
  • this is different from running a container with a custom bind-mount, which (as shown in docker messes up /etc symlink in an imported filesystem #42706 (comment)) does not mask the symlink, and instead follows the symlink, and uses it as the mount-destination.

So to resolve this, it would be needed to find out what's different between the two; do they follow different code paths? It's possible that these default mounts are created earlier in the container's lifecycle; perhaps they're created from the host's namespace before the container is started, then moved into the container's namespace.

@clime
Copy link
Author

clime commented Aug 5, 2021

Thank you for reproducing this as well. I had a doubt that something is off with my system but it seems not.

@sam-thibault sam-thibault added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. and removed status/more-info-needed labels May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/builder area/runtime kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/20.10
Projects
None yet
Development

No branches or pull requests

3 participants