Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tar files added in dockerfile don't use gname/uname attribute for gid/uid mapping #37777

Open
sjpotter opened this issue Sep 6, 2018 · 6 comments
Labels
area/builder kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. version/17.05

Comments

@sjpotter
Copy link

sjpotter commented Sep 6, 2018

Description

If one extracts a tar file with a version of tar that understands the gname/uname string attribute, if gname/uname exist in the group/user lookup, it will use that to set the gid/uid of the file instead of the gid/uid stored numerically in the tar file. However, if one adds a tar file within a Dockerfile, it ignores gname/uname, and only uses the gid/uid.

This would make sense if the claim was that a Dockerfile context cannot know the name to id mapping within a container image, but this doesn't seem to be true. the --chown argument to the same dockerfile directive can take a name and map it to an id if it's defined within the container image.

Steps to reproduce the issue:

  1. create a tarball with a file owned by you (note your uid, make sure its not the same uid as the first user that will be created in a debian system, i.e. 1000). In my case "spotter/1001"
tar cf test.tar <some file>
  1. create a simple dockerfile as such
FROM ubuntu:16.04
RUN useradd spotter
ADD ./test.tar
  1. see file ownership within a running container
docker run -it <image name> ls -l /<file you tar'ed up>

Describe the results you expected:

expect to see file owned by user spotter

Describe the results you received:

instead see it owned by user with id 1001.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

 Version:      17.05.0-ce
 API version:  1.29
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:15:36 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.05.0-ce
 API version:  1.29 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:15:36 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 10
 Running: 2
 Paused: 0
 Stopped: 8
Images: 598
Server Version: 17.05.0-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 299
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.15.0-24-generic
Operating System: Ubuntu 18.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.4GiB
Name: x1
ID: 7E3V:YIUY:MIXX:JIMW:XDHE:ENCS:H7LY:EU7E:FMGZ:ZGYC:AFMF:2RPV
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: spotter
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 52.229.48.71:30002
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support
@sjpotter
Copy link
Author

sjpotter commented Sep 6, 2018

The motivation for this is to include (for example) a mysql/postgres db within a container and not have to worry about what uid/gid either was assigned at installation time (say via apt-get). we know that a mysql or postgres (etc...) user/group will be created, so we know that the tarball we have will be installed with the correct set of ownership attributes, much like would happen with regular gnu tar if installing into a more traditional system.

while one could use --chown, this only works if all the files have the same exact ownership attribute. If some can be owned by root but some have to be owned by a different user (or multiple users), this doesn't work or becomes much more complicated as need multiple sets of tar files, one for each ownership attribute set.

@sjpotter
Copy link
Author

sjpotter commented Sep 6, 2018

this is a POC patch, lookupUser() and lookupGroup() are stolen from builder/dockerfile/internal_linux.go

diff --git a/pkg/archive/archive.go b/pkg/archive/archive.go
index 070dccb756..4dd44564a0 100644
--- a/pkg/archive/archive.go
+++ b/pkg/archive/archive.go
@@ -7,6 +7,7 @@ import (
        "compress/bzip2"
        "compress/gzip"
        "context"
+       "errors"
        "fmt"
        "io"
        "io/ioutil"
@@ -19,6 +20,8 @@ import (
        "syscall"
        "time"
 
+       lcUser "github.com/opencontainers/runc/libcontainer/user"
+
        "github.com/docker/docker/pkg/fileutils"
        "github.com/docker/docker/pkg/idtools"
        "github.com/docker/docker/pkg/ioutils"
@@ -650,7 +653,15 @@ func createTarFile(path, extractDir string, hdr *tar.Header, reader io.Reader, L
        // Lchown is not supported on Windows.
        if Lchown && runtime.GOOS != "windows" {
                if chownOpts == nil {
-                       chownOpts = &idtools.Identity{UID: hdr.Uid, GID: hdr.Gid}
+                       uid, err := lookupUser(hdr.Uname, "/etc/passwd")
+                       if err != nil {
+                               uid = hdr.Uid
+                       }
+                       gid, err := lookupGroup(hdr.Gname, "/etc/group")
+                       if err != nil {
+                               gid = hdr.Gid
+                       }
+                       chownOpts = &idtools.Identity{UID: uid, GID: gid}
                }
                if err := os.Lchown(path, chownOpts.UID, chownOpts.GID); err != nil {
                        return err
@@ -711,6 +722,45 @@ func createTarFile(path, extractDir string, hdr *tar.Header, reader io.Reader, L
        return nil
 }
 
+func lookupUser(userStr, filepath string) (int, error) {
+       // if the string is actually a uid integer, parse to int and return
+       // as we don't need to translate with the help of files
+       uid, err := strconv.Atoi(userStr)
+       if err == nil {
+               return uid, nil
+       }
+       users, err := lcUser.ParsePasswdFileFilter(filepath, func(u lcUser.User) bool {
+               return u.Name == userStr
+       })
+       if err != nil {
+               return 0, err
+       }
+       if len(users) == 0 {
+               return 0, errors.New("no such user: " + userStr)
+       }
+       return users[0].Uid, nil
+}
+
+func lookupGroup(groupStr, filepath string) (int, error) {
+       // if the string is actually a gid integer, parse to int and return
+       // as we don't need to translate with the help of files
+       gid, err := strconv.Atoi(groupStr)
+       if err == nil {
+               return gid, nil
+       }
+       groups, err := lcUser.ParseGroupFileFilter(filepath, func(g lcUser.Group) bool {
+               return g.Name == groupStr
+       })
+       if err != nil {
+               return 0, err
+       }
+       if len(groups) == 0 {
+               return 0, errors.New("no such group: " + groupStr)
+       }
+       return groups[0].Gid, nil
+}
+
+
 // Tar creates an archive from the directory at `path`, and returns it as a
 // stream of bytes.
 func Tar(path string, compression Compression) (io.ReadCloser, error) {

@sjpotter
Copy link
Author

sjpotter commented Sep 6, 2018

A thing I dislike about my patch is that lookupUser/Group are called on every single file. hitting the file system, even if its already looked up that uname/gname already. gnu tar caches all this data in a per execution cache, but not 100% sure how to do that in this case. i.e. don't want the cache leaking from one invocation to the next.

@AkihiroSuda
Copy link
Member

If we are going to add this feature to ADD, probably this should be an optional flag like ADD --honor-owner-name, but I think we have freezed ADD instruction.

Maybe we can just suggest using COPY and RUN tar, and call it a day?

@tonistiigi @tiborvass @dmcgowan @thaJeztah

@AkihiroSuda AkihiroSuda added the kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. label Sep 7, 2018
@sjpotter
Copy link
Author

sjpotter commented Sep 7, 2018

COPY/RUN are what we do now but it basically doubles the size of the image (we could squash it but that defeats the point of layers/cache)

The point of the poc code above isn't for PR purpose (I think it would actually break layer pulling due to diff.go calling the same code and only wanting to follow gid/uid for chown) more as proof that it can be implemented fairly simply.

@pan3793
Copy link

pan3793 commented Mar 14, 2024

COPY/RUN are what we do now but it basically doubles the size of the image (we could squash it but that defeats the point of layers/cache)

I encountered the same issue, do you have a better solution in 2024?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/builder kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. version/17.05
Projects
None yet
Development

No branches or pull requests

4 participants