Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong directory permissions with btrfs or devicemapper storage drivers #44106

Closed
vasiliy-ul opened this issue Sep 7, 2022 · 9 comments · Fixed by #44196
Closed

Wrong directory permissions with btrfs or devicemapper storage drivers #44106

vasiliy-ul opened this issue Sep 7, 2022 · 9 comments · Fixed by #44196
Labels
area/storage/btrfs area/storage/devicemapper area/storage kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage version/20.10

Comments

@vasiliy-ul
Copy link
Contributor

vasiliy-ul commented Sep 7, 2022

Description

Hello. We currently see an issue with one of our images. When we run a container, some directories end up to have 0600 permissions. This happens only when docker is configured with either btrfs or devicemapper storage drivers. When overlay2 is used, the directories are created with 0755 as expected.

The image itself does not define explicitly the permissions for the affected directories. But I guess when unpacking the files, they should be created with defaults 0755 (or at least be consistent no matter what storage driver is in use). Might it be some bug in the unpack code?

Reproduce

Run the commands below using docker configured with either btrfs or devicemapper storage drivers, and observe that /usr/lib64/qemu-kvm/ has 0600 permissions:

docker run --rm -ti --entrypoint /bin/bash quay.io/kubevirt/virt-launcher:v0.55.0
bash-4.4# ls -la /usr/lib64/qemu-kvm/
total 272
drw------- 1 root root   466 Sep  5 06:12 .
dr-xr-xr-x 1 root root 14218 Jan  1  1970 ..
-rwxr-xr-x 1 root root 11792 Jan  1  1970 accel-qtest-x86_64.so
-rwxr-xr-x 1 root root 24832 Jan  1  1970 accel-tcg-x86_64.so
-rwxr-xr-x 1 root root  7568 Jan  1  1970 hw-display-virtio-gpu-gl.so
-rwxr-xr-x 1 root root  7576 Jan  1  1970 hw-display-virtio-gpu-pci-gl.so
-rwxr-xr-x 1 root root 12688 Jan  1  1970 hw-display-virtio-gpu-pci.so
-rwxr-xr-x 1 root root 53792 Jan  1  1970 hw-display-virtio-gpu.so
-rwxr-xr-x 1 root root  7568 Jan  1  1970 hw-display-virtio-vga-gl.so
-rwxr-xr-x 1 root root 17368 Jan  1  1970 hw-display-virtio-vga.so
-rwxr-xr-x 1 root root 47688 Jan  1  1970 hw-usb-host.so
-rwxr-xr-x 1 root root 67584 Jan  1  1970 hw-usb-redirect.so

Expected behavior

/usr/lib64/qemu-kvm/ should have 0755 permissions like when docker is configured to use overlay2 storage:

docker run --rm -ti --entrypoint /bin/bash quay.io/kubevirt/virt-launcher:v0.55.0
bash-4.4# ls -la /usr/lib64/qemu-kvm/
total 300
drwxr-xr-x  2 root root  4096 Sep  5 06:12 .
dr-xr-xr-x 29 root root 20480 Jan  1  1970 ..
-rwxr-xr-x  1 root root 11792 Jan  1  1970 accel-qtest-x86_64.so
-rwxr-xr-x  1 root root 24832 Jan  1  1970 accel-tcg-x86_64.so
-rwxr-xr-x  1 root root  7568 Jan  1  1970 hw-display-virtio-gpu-gl.so
-rwxr-xr-x  1 root root  7576 Jan  1  1970 hw-display-virtio-gpu-pci-gl.so
-rwxr-xr-x  1 root root 12688 Jan  1  1970 hw-display-virtio-gpu-pci.so
-rwxr-xr-x  1 root root 53792 Jan  1  1970 hw-display-virtio-gpu.so
-rwxr-xr-x  1 root root  7568 Jan  1  1970 hw-display-virtio-vga-gl.so
-rwxr-xr-x  1 root root 17368 Jan  1  1970 hw-display-virtio-vga.so
-rwxr-xr-x  1 root root 47688 Jan  1  1970 hw-usb-host.so
-rwxr-xr-x  1 root root 67584 Jan  1  1970 hw-usb-redirect.so

docker version

Client:
 Version:           20.10.17-ce
 API version:       1.41
 Go version:        go1.17.11
 Git commit:        a89b84221c85
 Built:             Wed Jun 29 00:00:00 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.17-ce
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.11
  Git commit:       a89b84221c85
  Built:            Wed Jun 29 00:00:00 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.6.6
  GitCommit:        10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc:
  Version:          1.1.3
  GitCommit:        v1.1.3-0-ga916309fff0f
 docker-init:
  Version:          0.1.7_catatonit
  GitCommit:

docker info

Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 4
  Running: 3
  Paused: 0
  Stopped: 1
 Images: 38
 Server Version: 20.10.17-ce
 Storage Driver: btrfs
  Build Version: Btrfs v5.18.1
  Library Version: 102
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux oci runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc version: v1.1.3-0-ga916309fff0f
 init version: 
 Security Options:
  apparmor
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.19.2-1-default
 Operating System: openSUSE Tumbleweed
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 13.53GiB
 Name: <...>
 ID: <...>
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  192.168.122.1:5000
  127.0.0.0/8
 Live Restore Enabled: false

Additional Info

This was discovered while debugging kubevirt/kubevirt#8195

@vasiliy-ul vasiliy-ul added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage labels Sep 7, 2022
@thaJeztah
Copy link
Member

Thanks for reporting! Thinking out loud; the difference here could be that both btrfs and devicemapper create a new device for the container, whereas overlayfs uses directories; my suspicion is that for the directory, the default umask would be used (resulting in 0755) whereas for the new device, different permissions?

/cc @cpuguy83 any thoughts?

@vasiliy-ul
Copy link
Contributor Author

vasiliy-ul commented Sep 7, 2022

Yeah, that might be... but I now see that in the description I've specified wrong 0700 perms. It's in fact 0600 (this can be seen in the commands output). And 0600 seem kind of strange for directories... My understanding is that a new device is created, mounted and the image layer is untared there. Or am I missing smth?

@vasiliy-ul
Copy link
Contributor Author

vasiliy-ul commented Sep 8, 2022

I am not very familiar with the sources yet, but after a quick look, the UnpackLayer function looks suspicious (it sets 0600 for the created dir):

moby/pkg/archive/diff.go

Lines 75 to 89 in 7860686

// Note as these operations are platform specific, so must the slash be.
if !strings.HasSuffix(hdr.Name, string(os.PathSeparator)) {
// Not the root directory, ensure that the parent directory exists.
// This happened in some tests where an image had a tarfile without any
// parent directories.
parent := filepath.Dir(hdr.Name)
parentPath := filepath.Join(dest, parent)
if _, err := os.Lstat(parentPath); err != nil && os.IsNotExist(err) {
err = system.MkdirAll(parentPath, 0600)
if err != nil {
return 0, err
}
}
}

Though, no idea why the result varies depending on the storage driver.

@thaJeztah
Copy link
Member

Thanks! I'll need to do further digging (or asking around); from the looks of it, and the comment, that code-path is meant to not be hit for the "root" path, but admittedly, I'd have to look at the full code to see if that means "root" of the container's rootfs path, or "root" of the host's drive.

Just to get a bit of context, I did do a quick search in history for that line;

@vasiliy-ul
Copy link
Contributor Author

Yeah, I am also not sure what root refers to in this context. But this code seems to be run for every unpacked file which does not have a trailing os.PathSeparator (which is /) in it's name. So literally for every file in the tarball it pre-creates a parent directory.

vasiliy-ul added a commit to vasiliy-ul/kubevirt that referenced this issue Sep 8, 2022
Workaround for moby/moby#44106

Need to create the directory upfront, otherwise it gets assigned wrong
permissions when unpacked.

Signed-off-by: Vasiliy Ulyanov <vulyanov@suse.de>
@vasiliy-ul
Copy link
Contributor Author

vasiliy-ul commented Sep 12, 2022

I think I have some more findings after having a closer look at the code in daemon/graphdriver.

Neither btrfs nor devicemapper implement the DiffDriver interface (the one that manages image layers/diffs). Instead, they only provide the basic methods required by the ProtoDriver interface and then rely on the 'generic' implementation of the diff operations provided by NaiveDiffDriver. Under the hood, NaiveDiffDriver uses chrootarchive.ApplyUncompressedLayer to untar files, which in turn is based on the archive.UnpackLayer function mentioned above. That leads to 0600 permissions set for directories that are not explicitly stated in the tarball.

Now if we look at the implementation of the overlay2, this is done differently. There is a condition useNaiveDiff and this specifies whether to use the NaiveDiffDriver or not. In case the driver is not used, overlay2 implements the ApplyDiff method using chrootarchive.UntarUncompressed. This one calls archive.Unpack which sets the default permissions similarly to archive.UnpackLayer but the value is 0755:

moby/pkg/archive/archive.go

Lines 1049 to 1062 in 924edb9

// After calling filepath.Clean(hdr.Name) above, hdr.Name will now be in
// the filepath format for the OS on which the daemon is running. Hence
// the check for a slash-suffix MUST be done in an OS-agnostic way.
if !strings.HasSuffix(hdr.Name, string(os.PathSeparator)) {
// Not the root directory, ensure that the parent directory exists
parent := filepath.Dir(hdr.Name)
parentPath := filepath.Join(dest, parent)
if _, err := os.Lstat(parentPath); err != nil && os.IsNotExist(err) {
err = idtools.MkdirAllAndChownNew(parentPath, 0755, rootIDs)
if err != nil {
return err
}
}
}

I've checked out the sources to the version 20.10.17... the latest master may have some new changes.

I am not sure why there are actually two similar implementations for unpacking tar files. But anyway, I wonder: would it be okay to just change 0600 to 0755 in the archive.UnpackLayer considering that overlay2 seems to already set those to 0755 in archive.Unpack? I haven't yet tried to build and test, but if that would be smth acceptable, I could open a PR with the fix.

UPD: docker info from a machine with overlay2 where the issue is not observed (perms are set to 0755):

 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false

Here Native Overlay Diff: true seems to indicate that it does not use the NaiveDiffDriver.

@vasiliy-ul
Copy link
Contributor Author

This problem seems to be relevant for the master branch as well. Did some builds and testing. Created a tentative PR: #44140

kubevirt-bot pushed a commit to kubevirt-bot/kubevirt that referenced this issue Sep 14, 2022
Workaround for moby/moby#44106

Need to create the directory upfront, otherwise it gets assigned wrong
permissions when unpacked.

Signed-off-by: Vasiliy Ulyanov <vulyanov@suse.de>
@neersighted
Copy link
Member

neersighted commented Sep 23, 2022

We've had a fair amount of discussion around what the correct (or less surprising) behavior is here, among many of the maintainers/regular contributors. In short, we've concluded that the OCI spec does not define any default permissions for 'implied' directories of an image. An implied directory is a directory which must for the file layout of a tar file to make sense, but it does not have an entry in the tar file, and thus the permissions are unknown.

During a regular untar, mkdir(2) is called with 0777 and combined with the process umask. umask affecting the behavior of the daemon would be surprising, hence the explicit permissions currently in the code base.

However, as has been discovered here, those permissions are inconsistent by graph driver/extraction code path. While this is surprising, this is not necessarily a bug -- as this behavior is not defined by any spec or convention among container runtimes, the daemon may be surprising and inconsistent -- but not incorrect.

As a near term fix for surprising behavior, we should unify both code paths to use the same permissions. Given 0755 is the resulted produced by a default umask (0022) with no explicit permissions bits, it makes sense to use this as a sane default for Moby. However, there are no guarantees that Moby continues to do this in the future, or that an image will be portable across implementations/OCI runtimes if it relies on this behavior.

In the medium term, if directory permissions matter (and they usually do), they should be captured in the layer for portability. Since the OCI image spec currently does not define this behavior, any image that wants to be portable really should control this explicitly rather than relying on implementation-specific behavior. I understand that the reproducer image in question comes from a Bazel-based build process. While I am not familiar with the specifics of Bazel-based OCI image builds, I suspect that one would have to start there to make sure that the metadata is fully captured in the image.

In the longer term, it would make sense to try and standardize this across the whole ecosystem to increase the portability of images, reduce the ambiguity of the spec, and make it clear what image builders can expect when they strip directory metadata (rather than the current situation of "who knows?") -- a PR/improvement to the OCI spec (which, as I understand it, the Docker Image spec implicitly extends) would be the best way to start that discussion and process.

kubevirt-bot pushed a commit to kubevirt-bot/kubevirt that referenced this issue Sep 27, 2022
Workaround for moby/moby#44106

Need to create the directory upfront, otherwise it gets assigned wrong
permissions when unpacked.

Signed-off-by: Vasiliy Ulyanov <vulyanov@suse.de>
@vasiliy-ul
Copy link
Contributor Author

Thanks to everyone for fixing that!

neersighted added a commit to neersighted/oci-image-spec that referenced this issue Nov 2, 2022
The image specification currently does not describe how conformant
implementations should handle the case of a layer that contains "implied
directories" -- entries that imply parent directories exist through
their path, without those parent directories having their own entires in
the archive.

As such, this behavior is currently implementation-defined and may not
be consistent, even in the same implementation (e.g. moby/moby#44106).

To resolve this, we explicitly define what behavior is expected in this
situation, selecting 'neutral' attributes (e.g. using the container
`USER`'s UID/GID, and using `0755` for mode, as derived from the default
`umask(2)` of 0022).

Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
neersighted added a commit to neersighted/oci-image-spec that referenced this issue Nov 8, 2022
The image specification currently does not describe how conformant
implementations should handle the case of a layer that contains "implied
directories" -- entries that imply parent directories exist through
their path, without those parent directories having their own entires in
the archive.

As such, this behavior is currently implementation-defined and may not
be consistent, even in the same implementation (e.g. moby/moby#44106).

To resolve this, we explicitly define what behavior is expected in this
situation, selecting 'neutral' attributes (e.g. using the container
`USER`'s UID/GID, and using `0755` for mode, as derived from the default
`umask(2)` of 0022).

Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage/btrfs area/storage/devicemapper area/storage kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage version/20.10
Projects
None yet
3 participants