-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
archive/tar: add FileInfoNames interface #50102
Comments
It is not quite feasible as modules don't actually allow vendoring standard library packages. That is only really possible with "more traditional" vendoring tools, some of which are not quite compatible with some features of modules either, so it is a major pain to use those. Another alternative would be to have a new tar package, with a new import path and potentially some new functionality. Forking and enforcing archive/tar import path is basically not really feasible as far as I can tell.
Is it a common pattern in the standard library to control functionality with build tags? I thought tags would more suitable for toggling optimisations or implementation variants (like the netgo tag).
It might makes sense to add a new function that does what is currently done, and simplify the existing function. But, yes, there is the compatibility question, however perhaps that can waived as the functionality is best effort and hereby proven problematic, so moving it into an optional secondary function might be quite reasonable. |
Just throwing out some alternatives: If we want to keep the impact to Go's public API minimal, current
If |
CC @dsnet |
I'm taking this suggestion back now, I've review this and see where compatibility issue is. |
How often do you want all the file system-loaded info except the names? That is:
It seems weird to carve out this one detail but leave all the other work that statUnix is doing. |
This proposal has been added to the active column of the proposals project |
Every time the tarball is unpacked in an environment where user/group names Usually, such resolution is system-dependent and can rely on local plain text When cgo is used, Go os/user relies on libc, which implements all these methods, Another issue with name resolution using glibc is when chroot (or pivot root) into All that can be solved by ether disabling cgo or using
TL;DR: user and group name resolution is rather complicated subject, this is why
This was the way it worked before https://go-review.googlesource.com/59531 / Go 1.10. The thing is, id -> name lookup is the only tricky part here, anything else in statUnix |
Thanks, but I'm not sure that answers the question. We understand the problem with resolving the names. The question is: do you need the other information extracted by the |
(sorry for being vague before) |
Thanks. |
Isn't this issue about packing the tar file, not unpacking it? It is always possible to change the names in the returned Header. We're having a hard time coming up with a suitable way It seems like the issue here is fundamentally about glibc being broken in certain ways. |
Change https://golang.org/cl/371054 mentions this issue: |
Yes, sorry, I mixed that up earlier. We're talking about creating a tar archive.
Sure, but this is not solving issues (2) and (3) from the description. It is also not performance-wise to do something that you don't need, and those lookups might not be cheap.
I totally agree, and in this case this has to be package-specific (we do not want to disable the lookups globally). Thinking out loud, the alternatives to what is proposed above (can be All these alternatives are essentially similar to
The glibc issue is mostly worked around by using |
What about cases where you want to override the mapping? What if we add an optional interface to package tar,
FileInfoNames is maybe not the right name, but you get the idea. |
@rsc yes, I guess that would work. @tianon @tonistiigi @thaJeztah WDYT? |
Happy to hear @tonistiigi's thoughts as well Some quick thoughts;
|
Currently we only set the
You can already override the function's results by simply changing fields in the
We can't change the signature of |
It sounds like people are mostly happy with #50102 (comment), and I've come around to FileInfoNames as a name for that limited interface. We should embed FileInfo, though:
Does anyone object to this? Edit, Jan 26: Added error result from Uname and Gname. |
Maybe the functions should also return an error as in #50102 (comment) . The current implementations based on |
The current code just ignores any error when converting A different issue is that we should probably pass the uid/gid to the methods. Otherwise they have no simple way to retrieve them. type FileInfoNames interface {
fs.FileInfo
Uname(uid int) (string, error)
Gname(gid int) (string, error)
} |
OK, does anyone have any objections to iant's #50102 (comment)? |
LGTM. @tonistiigi PTAL |
Change https://go.dev/cl/558355 mentions this issue: |
@cherrymui @rsc @qiulaidongfeng can we please go back to the original issue? Which is the name lookups done by archive/tar can be very problematic in some cases (involving containers, chroots, etc). While the solution in https://go.dev/cl/558355 does indeed prevent name lookups (by not calling What's really needed is a way to provide custom name lookup interface. |
Fixes golang#50102 Change-Id: I8ec67a56f2ab61d78ae5890e5a80cd2e8acd9a38
Is this approach Unix-only? Or are all systems available? |
I think this is specific to unix (as defined by Go's |
The behavior of a custom name lookup interface method is only used on unix, so what should such an interface do on a non-Unix system? |
That appears to have been a bug. The final 'return h, nil' in the conditional was wrong. Thanks for pointing that out. In the original use case, the main thing we need is some way to disable the cgo-using lookup, and then code using FileInfoHeader can fill in whatever values it wants. For that it seems like maybe just updating FileInfoNames is fine as described above:
Does that work for your use case? |
I think that it would be helpful to pass in the UID and GID one way or another, as otherwise I think the method has to call stat again to fetch them again. |
Agreed we don't want multiple stat calls. The creator of the FileInfoNames is going to back it with an os.FileInfo and they can use fi.Sys().(*syscall.Stat_t) to access the fields, no new system calls required. |
Have all remaining concerns about this proposal been addressed? The proposal is to add
and use it for computing the user and group name tar fields when a FileInfo implements FileInfoName. The Uname and Gname names match tar.Header. |
This reverts CL 514235. Also reverts CL 518056 which is a followup fix. Reason for revert: Proposal golang#50102 defined an interface that is too specific to UNIX-y systems and also didn't make much sense. The proposal is un-accepted, and we'll revisit in Go 1.23. Fixes (via backport) golang#65245. Updates golang#50102. Change-Id: I41ba0ee286c1d893e6564a337e5d76418d19435d Reviewed-on: https://go-review.googlesource.com/c/go/+/558295 Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Based on the discussion above, this proposal seems like a likely accept. The proposal is to add
and use it for computing the user and group name tar fields when a FileInfo implements FileInfoName. The Uname and Gname names match tar.Header. |
No change in consensus, so accepted. 🎉 The proposal is to add
and use it for computing the user and group name tar fields when a FileInfo implements FileInfoName. The Uname and Gname names match tar.Header. |
An optional interface FileInfoNames has been added. If the parameter fi of FileInfoHeader implements the interface the Gname/Uname and Gid/Uid of the return value Header are provided by the method of the interface. Also added testing. Fixes golang#50102 Change-Id: Ie0465303a406292d6d0f6df886e5fc135b9d3cc6
Fixes golang#50102 Change-Id: I8ec67a56f2ab61d78ae5890e5a80cd2e8acd9a38
Note, Feb 2 2022: The current proposal is in #50102 (comment).
Abstract
archive/tar function FileInfoHeader does uid -> uname and gid -> name lookups,
which are not always necessary and can sometimes be problematic. A new function,
FileInfoHeaderNoNames, is proposed to address these issues.
Background
Change https://go-review.googlesource.com/59531
(which made its way to Go 1.10) implemented
user/group name lookups in tar/archive's FileInfoHeader.
It fills in tar file info header fields Uname and Gname,
looking up user and group names (from Uid and Gid)
via os/user.LookupId and LookupGroupId functions.
Doing that is not always desirable, and is sometimes problematic:
In a chrooted environment, /etc/passwd and /etc/group may be
absent, or their contents may be entirely different from that of the host.
Failed name lookups are not currently cached, which may result in a
considerable performance regression, caused by re-parsing of
/etc/passwd and /etc/group for every file entry added to the tar.
In case of static linking against glibc, the latter wants to dlopen
some libraries that might either be unavailable (which results in
a panic/crash) or (in case of untrusted chroot) a malicious library
can be substituted by a bad actor.
There may be a need to create a tarball without any user/group names
(only with numeric uids/gids), akin to GNU tar's
--numeric-owner
option.There may be a need to use custom uid -> name and gid -> name
lookup functions.
Now, problem 2 can be mitigated by using (indirectly, via os/user Lookup{,Group}Id)
a good C library that does caching, or by caching failed lookups as well.
Problem 3 can be solved by using
osusergo
build tag, but it's compile unit wide,meaning it will also affect other os/user uses, not just archive/tar.
Yet it seems impossible to solve both 2 and 3 at the same time.
As far as I know, there are no easy solutions for problems 1 and 5.
In particular, this affects Docker, which performs image unpacking by re-executing
the main binary (dockerd) in the container context (essentially a chroot). As a workaround,
Docker maintains a fork of archive/tar with commit 0564e30 partially reverted
(see moby/moby#42402).
Proposal
Add a function similar to
FileInfoHeader
, which does not perform any id -> name lookups,leaving it to a user. The proposed name is
FileInfoHeaderNoNames
(can also be*NoLookup
,*Num
, etc).Rationale
Adding a new function seems to be the most simple and elegant approach, with very little code to add, and yet solving all the issues raised above.
Alternatives are:
archivetarnolookups
orarchivetarnumeric
)tar.NameLookup = false
ortar.NameLookup(false)
Compatibility
Since this is a new API, and the existing functionality of FileInfoHeader is left intact,
there are no compatibility issues.
Implementation
See https://go-review.googlesource.com/c/go/+/371054 for the example code.
The text was updated successfully, but these errors were encountered: