A tar file is a format for archiving a filesystem and is able to represent anything including regular files, directories, block and character devices, symlinks, and hardlinks.
Over the years, many incompatible extensions to tar have become common practice. Specifically, one them is the use of pseudo-files to encode additional information about a file or a set of files. A pseudo-file is encoded in the same way as a normal file, but the Header.Typeflag is set to some special value and the contents of the file are parsed according to some format determined by the typeflag. Both the GNU and PAX formats are heavy users of this approach.
The Go implementation of tar provides automatic parsing of the metadata contained in these pseudo-files for the real file it describes. For example, Go1.1 added support for parsing long filenames for the GNU format; Go1.3 added support for parsing local PAX headers; and Go1.3 added support for parsing GNU sparse headers.
Parsing the metadata for a single file is easy to represent in the API since we can just represent the additional information in the Header struct.
However, the PAX format adds a feature called "global PAX headers" which encodes metadata intended to modify all subsequent files (#22748 indicates that this feature is actually used; e.g., git archive --format=tgz produces them).
In order to provide the metadata contained within a global PAX header, Reader.Next returns a single Header representing the pseudo-file (in go1.10, we actually parse the PAX records into Header.PAXRecords, but that is orthogonal to this issue).
The problem with global headers (and really any typeflag that the Go implementation does not recognize) is that the following reports true:
This is the behavior on all versions of Go thus far.
The problem with IsRegular reporting true is:
It lies. This is not a regular file.
A user relying only on Header.FileInfo alone is unable to handle pseudo-files specially (e.g., ignoring them).
However, it is not trivial to have Header.FileInfo.Mode.IsRegular report false since the logic for os.FileMode.IsRegular only reports false for a very narrow set of types, none of which are an obvious choice for "special" files like metadata or unrecognized typeflags.
Thus, we need to make decision whether to keep or change the current behavior of Header.FileInfo.Mode.IsRegular for special files like TypeXGlobalHeader. If we change it, how do we make it report false? The only approach I see is to define a new constant (e.g., ModeSpecial) in the os package.
I'd be fine with an os.ModeSpecial defined vaguely.
We could also just document it in archive/tar, that those headers are there.
I'd actually be in favor of skipping such things by default in the tar.Reader and making even iterating over them opt-in. Might be too late to do that now. Adding an opt-out is kinda useless because if they're even aware of the tar format (I wasn't), people can do their own Header.Typeflag check and continue. But I assume most people don't know about these and also don't want to see them, so I'd be fine with trying an opt-in switch. We'd need to grep GitHub and google and see if the nobody-cares theory is correct.
I think the opt-in approach is probably a good idea as well, but I think it's orthogonal to ModeSpecial. I think there's still a value in letting the user do file processing using os.FileMode rather than forcing them to switch to Header.Typeflag unless strictly required.
There is currently no way for os.FileMode.IsRegular to report false
without being one of the following types:
ModeDir | ModeSymlink | ModeNamedPipe | ModeSocket | ModeDevice
This makes it difficult for custom implementations of os.FileInfo to return
a Mode that is explicitly not regular without resorting to setting one
of the types listed above. However, every one of the aforementioned types
are ill-suited as a general-purpose "not regular" file type.
Thus, add a ModeIrregular to serve exactly for that purpose.
The ModeIrregular type carries no information other than the fact that the
file is not regular.
Reviewed-by: Brad Fitzpatrick <firstname.lastname@example.org>