Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive/tar: add FileInfoNames interface #50102

Closed
kolyshkin opened this issue Dec 11, 2021 · 42 comments
Closed

archive/tar: add FileInfoNames interface #50102

kolyshkin opened this issue Dec 11, 2021 · 42 comments

Comments

@kolyshkin
Copy link
Contributor

kolyshkin commented Dec 11, 2021

Note, Feb 2 2022: The current proposal is in #50102 (comment).


Abstract

archive/tar function FileInfoHeader does uid -> uname and gid -> name lookups,
which are not always necessary and can sometimes be problematic. A new function,
FileInfoHeaderNoNames, is proposed to address these issues.

Background

Change https://go-review.googlesource.com/59531
(which made its way to Go 1.10) implemented
user/group name lookups in tar/archive's FileInfoHeader.
It fills in tar file info header fields Uname and Gname,
looking up user and group names (from Uid and Gid)
via os/user.LookupId and LookupGroupId functions.

Doing that is not always desirable, and is sometimes problematic:

  1. In a chrooted environment, /etc/passwd and /etc/group may be
    absent, or their contents may be entirely different from that of the host.

  2. Failed name lookups are not currently cached, which may result in a
    considerable performance regression, caused by re-parsing of
    /etc/passwd and /etc/group for every file entry added to the tar.

  3. In case of static linking against glibc, the latter wants to dlopen
    some libraries that might either be unavailable (which results in
    a panic/crash) or (in case of untrusted chroot) a malicious library
    can be substituted by a bad actor.

  4. There may be a need to create a tarball without any user/group names
    (only with numeric uids/gids), akin to GNU tar's --numeric-owner option.

  5. There may be a need to use custom uid -> name and gid -> name
    lookup functions.

Now, problem 2 can be mitigated by using (indirectly, via os/user Lookup{,Group}Id)
a good C library that does caching, or by caching failed lookups as well.
Problem 3 can be solved by using osusergo build tag, but it's compile unit wide,
meaning it will also affect other os/user uses, not just archive/tar.
Yet it seems impossible to solve both 2 and 3 at the same time.

As far as I know, there are no easy solutions for problems 1 and 5.

In particular, this affects Docker, which performs image unpacking by re-executing
the main binary (dockerd) in the container context (essentially a chroot). As a workaround,
Docker maintains a fork of archive/tar with commit 0564e30 partially reverted
(see moby/moby#42402).

Proposal

Add a function similar to FileInfoHeader, which does not perform any id -> name lookups,
leaving it to a user. The proposed name is FileInfoHeaderNoNames (can also be *NoLookup,
*Num, etc).

Rationale

Adding a new function seems to be the most simple and elegant approach, with very little code to add, and yet solving all the issues raised above.

Alternatives are:

  • (for users) to maintain the fork of archive/tar
  • (for archive/tar) to implement a build tag which disable lookups (e.g. archivetarnolookups or archivetarnumeric)
  • (for archive/tar) to add a way to provide own name lookup function
  • (for archive/tar) to add another way to disable lookups (so a user can do tar.NameLookup = false or tar.NameLookup(false)
  • to unconditionally remove id -> name lookups from archive/tar (might bring compatibility issues)

Compatibility

Since this is a new API, and the existing functionality of FileInfoHeader is left intact,
there are no compatibility issues.

Implementation

See https://go-review.googlesource.com/c/go/+/371054 for the example code.

@gopherbot gopherbot added this to the Proposal milestone Dec 11, 2021
@kolyshkin
Copy link
Contributor Author

Cc @thaJeztah @tianon @cpuguy83 @tonistiigi @errordeveloper

@errordeveloper
Copy link

errordeveloper commented Dec 11, 2021

Alternatives are:

  • (for users) to maintain the fork of archive/tar

It is not quite feasible as modules don't actually allow vendoring standard library packages. That is only really possible with "more traditional" vendoring tools, some of which are not quite compatible with some features of modules either, so it is a major pain to use those.

Another alternative would be to have a new tar package, with a new import path and potentially some new functionality.

Forking and enforcing archive/tar import path is basically not really feasible as far as I can tell.

  • (for archive/tar) to implement a build tag which disable lookups (e.g. archivetarnolookups or archivetarnumeric)

Is it a common pattern in the standard library to control functionality with build tags? I thought tags would more suitable for toggling optimisations or implementation variants (like the netgo tag).

  • to unconditionally remove id -> name lookups from archive/tar (might bring compatibility issues)

It might makes sense to add a new function that does what is currently done, and simplify the existing function. But, yes, there is the compatibility question, however perhaps that can waived as the functionality is best effort and hereby proven problematic, so moving it into an optional secondary function might be quite reasonable.

@tonistiigi
Copy link
Contributor

Just throwing out some alternatives: If we want to keep the impact to Go's public API minimal, current FileInfoHeader could do an interface check against.

interface {
  LookupUserName(int) (string, error)
  LookupGroupName(int) (string, error)
}

If fs.FileInfo implements these methods then they would be used instead and the caller can control behavior. It would remain as a very advanced case though because of its hidden nature, but I guess there is no way to move the default path away from current behavior because of the backward compatibility guarantees anyway.

@ianlancetaylor
Copy link
Contributor

CC @dsnet

@errordeveloper
Copy link

It might makes sense to add a new function that does what is currently done, and simplify the existing function. But, yes, there is the compatibility question, however perhaps that can waived as the functionality is best effort and hereby proven problematic, so moving it into an optional secondary function might be quite reasonable.

I'm taking this suggestion back now, I've review this and see where compatibility issue is.

@rsc
Copy link
Contributor

rsc commented Dec 15, 2021

How often do you want all the file system-loaded info except the names?
Maybe code should clear the Sys field in the FileInfo and pass that to FileInfoHeader?

That is:

type FileInfoNoSys { fs.FileInfo }
func (FileInfoNoSys) Sys() interface{} { return nil }

blah blah blah FileInfoHeader(FileInfoNoSys(info)) 

It seems weird to carve out this one detail but leave all the other work that statUnix is doing.
Wouldn't Docker rather leave out all the extraneous system info?

@rsc
Copy link
Contributor

rsc commented Dec 15, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@kolyshkin
Copy link
Contributor Author

How often do you want all the file system-loaded info except the names?

Every time the tarball is unpacked in an environment where user/group names
resolution (id -> name lookup) is problematic.

Usually, such resolution is system-dependent and can rely on local plain text
/etc/passwd and /etc/group, or some other kind of local database, or a remote
distributed system such as NIS or LDAP.

When cgo is used, Go os/user relies on libc, which implements all these methods,
as configured on the system. Which is all right, except in case of static linking against
GNU libc, as in this case glibc wants to dlopen NSS libraries (meaning the static
binary will segfault on a system where these libraries are not present). While this
is a glibc and not golang issue per se, this makes it impossible to have Go static
binaries linked with glibc (only because archive/tar wants to look up user/group
names).

Another issue with name resolution using glibc is when chroot (or pivot root) into
an untrusted directory is used (such as, chrooting/entering into a user container),
calling FileInfoHeader may result in dlopen-ing some untrusted library. This is a
major security issue.

All that can be solved by ether disabling cgo or using osusergo tag (available
since Go 1.13 or so), in which case os/user switches to native implementation of
parsing plain text /etc/passwd and /etc/group. This is also problematic because

  • if ways of user/group name resolution other than /etc/passwd and /etc/group are
    configured on the system, they are not used;
  • if chroot is used (see above), the program will end up reading wrong (untrusted
    and potentially malicious) /etc/passwd and /etc/group files;
  • disabling cgo or using osusergo come with a price, and it can't be done selectively
    for archve/tar use only.

TL;DR: user and group name resolution is rather complicated subject, this is why
there is a need to make it optional (one way or another).

It seems weird to carve out this one detail but leave all the other work that statUnix is doing.

This was the way it worked before https://go-review.googlesource.com/59531 / Go 1.10.

The thing is, id -> name lookup is the only tricky part here, anything else in statUnix
is rather straightforward. Please see above.

@ianlancetaylor
Copy link
Contributor

Thanks, but I'm not sure that answers the question. We understand the problem with resolving the names. The question is: do you need the other information extracted by the statUnix function? That is, the uid, gid, access time, change time, and device major and minor fields? Is that information valuable when unpacking a tar file in a Docker container?

@kolyshkin
Copy link
Contributor Author

The question is: do you need the other information extracted by the statUnix function? That is, the uid, gid, access time, change time, and device major and minor fields

(sorry for being vague before)
Yes, we definitely do do. All that is needed in our use case (I'm only unsure about atime) and, I guess, in most other use cases of FileInfoHeader.

@ianlancetaylor
Copy link
Contributor

Thanks.

@rsc
Copy link
Contributor

rsc commented Jan 5, 2022

Every time the tarball is unpacked in an environment where user/group names
resolution (id -> name lookup) is problematic.

Isn't this issue about packing the tar file, not unpacking it?

It is always possible to change the names in the returned Header.
Many use cases will be able to do that.

We're having a hard time coming up with a suitable way
to express 'don't do the name lookups' in a general way.

It seems like the issue here is fundamentally about glibc being broken in certain ways.
In that (fairly rare) case a copy of FileInfoHeader doesn't seem like the worst workaround.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/371054 mentions this issue: archive/tar: add FileInfoHeaderNoNames

@kolyshkin
Copy link
Contributor Author

kolyshkin commented Jan 6, 2022

Isn't this issue about packing the tar file, not unpacking it?

Yes, sorry, I mixed that up earlier. We're talking about creating a tar archive.

It is always possible to change the names in the returned Header.

Sure, but this is not solving issues (2) and (3) from the description. It is also not performance-wise to do something that you don't need, and those lookups might not be cheap.

We're having a hard time coming up with a suitable way
to express 'don't do the name lookups' in a general way.

I totally agree, and in this case this has to be package-specific (we do not want to disable the lookups globally).

Thinking out loud, the alternatives to what is proposed above (can be
a. a build tag (e.g archivetarnonamelookup);
b. a runtime switch (tar.NameLookup = false or tar.NameLookup(false));
c. a way to provide a custom function for name lookups (incontext of https://go-review.googlesource.com/c/go/+/371054 that would mean making sysNames public).

All these alternatives are essentially similar to FileInfoHeaderNoNames proposal, as they let a user to have their own name lookup functionality, while keeping the backward compatibility.

It seems like the issue here is fundamentally about glibc being broken in certain ways.

The glibc issue is mostly worked around by using osusergo build tag. Problem is, that tag also affects other functionality, and in case of a binary that does chroot we may want to keep using glibc for name resolution, unless we're in a chrooted or otherwise untrusted environment.

@rsc
Copy link
Contributor

rsc commented Jan 12, 2022

What about cases where you want to override the mapping?
It seems weird to choose between glibc or nothing.

What if we add an optional interface to package tar,
and if the fi.FileInfo implements this interface then FileInfoHeader will use the methods
in preference to its own cache?

type FileInfoNames interface {
    Uname() string
    Gname() string
}

FileInfoNames is maybe not the right name, but you get the idea.

@kolyshkin
Copy link
Contributor Author

@rsc yes, I guess that would work. @tianon @tonistiigi @thaJeztah WDYT?

@thaJeztah
Copy link
Contributor

Happy to hear @tonistiigi's thoughts as well

Some quick thoughts;

  • Will the proposed interface Uname() / Gname() work well on non-unix-y platforms? (I know Windows can be a pain, and I'm not sure exactly what's expected for tar headers on Windows but wondering if it's flexible enough)

  • There's already a package-level sysStat() for platform-specific bits, which gets set on Linux in init(). Wondering if it would work to either have a way to override that, or to have FileInfoHeader() accept (a) functional parameter(s) that can be set to override the default, e.g. something like;

    func FileInfoHeader(fi fs.FileInfo, link string, infoFn ...func(fi fs.FileInfo, h *Header) error) (*Header, error) {
        if len(infoFn) > 0 {
            ...
        } else if sysStat != nil {
            ...
        }
    }

    But perhaps that's too much (?)

@ianlancetaylor
Copy link
Contributor

Will the proposed interface Uname() / Gname() work well on non-unix-y platforms?

Currently we only set the Uid and Gid fields on Unix systems, so, no. If somebody changes non-Unix systems to somehow set the Uid and Gid fields, then it would certainly make sense to use the methods if available to set the Uname and Gname fields.

There's already a package-level sysStat() for platform-specific bits, which gets set on Linux in init(). Wondering if it would work to either have a way to override that,

You can already override the function's results by simply changing fields in the Header returned by FileInfoHeader. The only reason to override calling the function would be if we don't want to run it, but there isn't any reason not to run it--except for the case discussed in this issue: it's problematic to convert Uid and Gid to Uname and Gname.

or to have FileInfoHeader() accept (a) functional parameter(s) that can be set to override the default,

We can't change the signature of FileInfoHeader, that would break the Go 1 compatibility guarantee.

@rsc
Copy link
Contributor

rsc commented Jan 19, 2022

It sounds like people are mostly happy with #50102 (comment), and I've come around to FileInfoNames as a name for that limited interface. We should embed FileInfo, though:

type FileInfoNames interface {
    fs.FileInfo
    Uname() (string, error)
    Gname() (string, error)
}

Does anyone object to this?

Edit, Jan 26: Added error result from Uname and Gname.

@tonistiigi
Copy link
Contributor

Maybe the functions should also return an error as in #50102 (comment) . The current implementations based on user.LookupId etc return error. The implementation will likely open /etc/passwd and user may want to control the behavior of what happens when file can't be read properly or no user can be found. Not a blocker though, I guess the user can pre-cache all the uname values while wrapping the FileInfo and handle the errors there as well.

@ianlancetaylor
Copy link
Contributor

The current code just ignores any error when converting Uid/Gid to Uname/Gname. But I suppose we could leave that determination up to the method. If the method returns a non-nil error, it will be returned by FileInfoHeader.

A different issue is that we should probably pass the uid/gid to the methods. Otherwise they have no simple way to retrieve them.

type FileInfoNames interface {
    fs.FileInfo
    Uname(uid int) (string, error)
    Gname(gid int) (string, error)
}

@rsc
Copy link
Contributor

rsc commented Jan 26, 2022

OK, does anyone have any objections to iant's #50102 (comment)?

@kolyshkin
Copy link
Contributor Author

OK, does anyone have any objections to iant's #50102 (comment)?

LGTM. @tonistiigi PTAL

@rsc rsc changed the title proposal: archive/tar: add FileInfoHeaderNoNames proposal: archive/tar: add FileInfoNames interface Feb 2, 2022
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/558355 mentions this issue: archive/tar: add FileInfoNames interface

@kolyshkin
Copy link
Contributor Author

@cherrymui @rsc @qiulaidongfeng can we please go back to the original issue? Which is the name lookups done by archive/tar can be very problematic in some cases (involving containers, chroots, etc).

While the solution in https://go.dev/cl/558355 does indeed prevent name lookups (by not calling statUnix function, which does those lookups since https://go.dev/cl/59531 in Go 1.10), it also has some bad side effects (because statUnix is not called) -- in particular, Devmajor, Devminor, AccessTime, and ChangeTime fields are no longer filled in when FileInfoNames is used.

What's really needed is a way to provide custom name lookup interface.

qiulaidongfeng added a commit to qiulaidongfeng/go that referenced this issue Jan 25, 2024
Fixes golang#50102

Change-Id: I8ec67a56f2ab61d78ae5890e5a80cd2e8acd9a38
@qiulaidongfeng
Copy link
Member

What's really needed is a way to provide custom name lookup interface.

Is this approach Unix-only? Or are all systems available?

@kolyshkin
Copy link
Contributor Author

What's really needed is a way to provide custom name lookup interface.

Is this approach Unix-only? Or are all systems available?

I think this is specific to unix (as defined by Go's unix build constraint), but I'm not an expert in other systems (Windows, Plan9 etc.) so I can't say it for sure.

@qiulaidongfeng
Copy link
Member

What's really needed is a way to provide custom name lookup interface.

Is this approach Unix-only? Or are all systems available?

I think this is specific to unix (as defined by Go's unix build constraint), but I'm not an expert in other systems (Windows, Plan9 etc.) so I can't say it for sure.

The behavior of a custom name lookup interface method is only used on unix, so what should such an interface do on a non-Unix system?

@rsc
Copy link
Contributor

rsc commented Feb 7, 2024

Devmajor, Devminor, AccessTime, and ChangeTime fields are no longer filled in when FileInfoNames is used.

That appears to have been a bug. The final 'return h, nil' in the conditional was wrong. Thanks for pointing that out.

In the original use case, the main thing we need is some way to disable the cgo-using lookup, and then code using FileInfoHeader can fill in whatever values it wants. For that it seems like maybe just updating FileInfoNames is fine as described above:

type FileInfoNames interface {
    FileInfo
    Uname() (string, error)
    Gname() (string, error)
}

Does that work for your use case?

@ianlancetaylor
Copy link
Contributor

I think that it would be helpful to pass in the UID and GID one way or another, as otherwise I think the method has to call stat again to fetch them again.

@rsc
Copy link
Contributor

rsc commented Feb 14, 2024

Agreed we don't want multiple stat calls. The creator of the FileInfoNames is going to back it with an os.FileInfo and they can use fi.Sys().(*syscall.Stat_t) to access the fields, no new system calls required.

@rsc
Copy link
Contributor

rsc commented Feb 14, 2024

Have all remaining concerns about this proposal been addressed?

The proposal is to add

type FileInfoNames interface {
    FileInfo
    Uname() (string, error)
    Gname() (string, error)
}

and use it for computing the user and group name tar fields when a FileInfo implements FileInfoName. The Uname and Gname names match tar.Header.

ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
This reverts CL 514235. Also reverts CL 518056 which is a followup
fix.

Reason for revert: Proposal golang#50102 defined an interface that is
too specific to UNIX-y systems and also didn't make much sense.
The proposal is un-accepted, and we'll revisit in Go 1.23.

Fixes (via backport) golang#65245.
Updates golang#50102.

Change-Id: I41ba0ee286c1d893e6564a337e5d76418d19435d
Reviewed-on: https://go-review.googlesource.com/c/go/+/558295
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
@rsc
Copy link
Contributor

rsc commented Mar 1, 2024

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

The proposal is to add

type FileInfoNames interface {
    FileInfo
    Uname() (string, error)
    Gname() (string, error)
}

and use it for computing the user and group name tar fields when a FileInfo implements FileInfoName. The Uname and Gname names match tar.Header.

@rsc rsc moved this from Active to Likely Accept in Proposals Mar 1, 2024
@rsc
Copy link
Contributor

rsc commented Mar 8, 2024

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

The proposal is to add

type FileInfoNames interface {
    FileInfo
    Uname() (string, error)
    Gname() (string, error)
}

and use it for computing the user and group name tar fields when a FileInfo implements FileInfoName. The Uname and Gname names match tar.Header.

@rsc rsc moved this from Likely Accept to Accepted in Proposals Mar 8, 2024
@rsc rsc changed the title proposal: archive/tar: add FileInfoNames interface archive/tar: add FileInfoNames interface Mar 8, 2024
@rsc rsc modified the milestones: Proposal, Backlog Mar 8, 2024
qiulaidongfeng added a commit to qiulaidongfeng/go that referenced this issue Mar 8, 2024
An optional interface FileInfoNames has been added.

If the parameter fi of FileInfoHeader implements the interface
the Gname/Uname and Gid/Uid of the return value Header are
provided by the method of the interface.

Also added testing.

Fixes golang#50102

Change-Id: Ie0465303a406292d6d0f6df886e5fc135b9d3cc6
qiulaidongfeng added a commit to qiulaidongfeng/go that referenced this issue Mar 8, 2024
Fixes golang#50102

Change-Id: I8ec67a56f2ab61d78ae5890e5a80cd2e8acd9a38
@dmitshur dmitshur modified the milestones: Backlog, Go1.23 Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
10 participants