TarExtendedAttributes

Albert Lee edited this page Apr 12, 2018 · 1 revision

Storing Extended Attributes in Tar Files

Also see: TarNFS4ACLs and TarPosix1eACLs

Table of Contents

Introduction

Currently, I know of several different techniques for storing POSIX.1e-style extended attributes within tar files. Apple has extended GNU tar with support for a "resource file" stored as a separate entry. Joerg Schilling's _star_ and my own _libarchive_ library (used by _bsdtar_) use pax extensions for this purpose. Solaris and AIX added new tar entry types for this purpose.

libarchive

Libarchive stores each POSIX.1e extended attribute as a separate pax attribute. The pax attributes are named LIBARCHIVE.xattr.name, where name is the name of the POSIX.1e attribute. The name includes a namespace identifier (so an attribute "foo" in the user namespace from a FreeBSD system would get stored as LIBARCHIVE.xattr.user.foo). Non-ASCII bytes in the name get encoded as % followed by two uppercase hex digits. The value of the attribute is base-64 encoded. In particular, note that the pax attribute name and value are both entirely ASCII and hence are correct UTF-8. This was done to promote simple implementation for pax readers and writers that use strict UTF-8 parsing for all headers.

This approach works reasonably well for small attributes but could be cumbersome for very large attributes. It also does not support extended metadata (such as per-EA ownership and permissions) supported on some systems.

star

Joerg Schilling's star uses a very similar approach. It stores each attribute under SCHILY.xattr._name_ as above. I haven't found any documentation for how it handles non-ASCII bytes in the name or value.

The documentation included with star 1.5 (April, 2008) includes comments about the non-portability of this approach and suggests that this may be dropped in favor of a different approach that supports Solaris as well.

As of January 2009, GNU tar seems to be considering this approach, based on patches contributed by Red Hat.

Libarchive reads this format but does not currently write it.

Apple tar

Apple stores a separate resource file for each file written to a tar archive. This is based on the "`copyfile()`" library function. The resource file is stored as a full tar file entry just before the regular file in the archive and has the same name, except the last path element has "`._`" prepended. For example, the file "`some/dir/foo`" would have a resource file "`some/dir/._foo`".

Note that because the resource file has the same name, long filename extensions get unnecessarily duplicated with this arrangement. For example, using Apple's patched GNU tar distributed with Mac OS X through version 10.5, a long filename would generate the following sequence in the tar archive:

  • 'L' header for the long filename of the resource file
  • Body of the 'L' header with the filename for the resource file
  • '0' header for the resource file itself
  • Contents of the resource file
  • 'L' header for the long filename of the file itself
  • Body of the 'L' header with the filename for the file
  • '0' header for the file
  • Contents of the file.
Since Mac OS 10.6, Apple is using a patched version of bsdtar. Although this defaults to using pax extensions, it suffers from the same basic problem outlined above.

Starting with libarchive 3.0, the main libarchive distribution supports the Apple format on Mac OS. It does not parse the contents of the resource file or provide any support for Mac OS resources on other platforms.

AIX tar

AIX tar adds an 'E' entry following the regular file entry for each extended attribute attached to the file. The pathname of this entry is the name of the attribute. The body contains the value of the attribute. By using the standard ustar pathname and body, AIX avoids encoding concerns encoding for these fields.

The example I've seen seems to include some data beyond the end of the EA body but I'm not sure what that is for.

Unfortunately, the AIX approach requires a minimum of 1024 bytes in the archive for each EA. It also puts the EAs after the main entry. This is fine if you have few EAs and they tend to be large, but the applications of EAs that I've seen tend to use very small values. Adapting libarchive to support AIX EAs would also be difficult; libarchive assumes that all metadata (including EAs) precede the file data.

Solaris tar

I've seen references to an 'E' entry used by Solaris tar but haven't yet found any detailed documentation or examples of this entry.

From fsattr(5) in illumos:

       The extended attribute project extends the above header format by
       defining a new header type (for the Typeflag field). The type 'E' is
       defined to be used for all extended attribute files. Attribute files
       are stored in the tar archive as a sequence of two <header ,data>
       pairs. The first file contains the data necessary to locate and name
       the extended attribute in the file system. The second file contains the
       actual attribute file data.  Both files use an 'E' type header. The
       prefix and name fields in extended attribute headers are ignored,
       though they should be set to meaningful values for the benefit of
       archivers that do not process these headers. Solaris archivers set the
       prefix field to "'''/dev/null'''" to prevent archivers that do not understand
       the type 'E' header from trying to restore extended attribute files in
       inappropriate places.

usr/src/head/archives.h describes the structures in the first file:

/*
 * Define archive formats for extended attributes.
 *
 * Extended attributes are stored in two pieces.
 * 1. An attribute header which has information about
 *    what file the attribute is for and what the attribute
 *    is named.
 * 2. The attribute record itself.  Stored as a normal file type
 *    of entry.
 * Both the header and attribute record have special modes/typeflags
 * associated with them.
 *
 * The names of the header in the archive look like:
 * /dev/null/attr.hdr
 *
 * The name of the attribute looks like:
 * /dev/null/attr.
 *
 * This is done so that an archiver that doesn't understand these formats
 * can just dispose of the attribute records unless the user chooses to
 * rename them via cpio -r or pax -i
 *
 * The format is composed of a fixed size header followed
 * by a variable sized xattr_buf. If the attribute is a hard link
 * to another attribute, then another xattr_buf section is included
 * for the link.
 *
 * The xattr_buf is used to define the necessary "pathing" steps
 * to get to the extended attribute.  This is necessary to support
 * a fully recursive attribute model where an attribute may itself
 * have an attribute.
 *
 * The basic layout looks like this.
 *
 *     --------------------------------
 *     |                              |
 *     |         xattr_hdr            |
 *     |                              |
 *     --------------------------------
 *     --------------------------------
 *     |                              |
 *     |        xattr_buf             |
 *     |                              |
 *     --------------------------------
 *     --------------------------------
 *     |                              |
 *     |      (optional link info)    |
 *     |                              |
 *     --------------------------------
 *     --------------------------------
 *     |                              |
 *     |      attribute itself        |
 *     |      stored as normal tar    |
 *     |      or cpio data with       |
 *     |      special mode or         |
 *     |      typeflag                |
 *     |                              |
 *     --------------------------------
 *
 */
#define	XATTR_ARCH_VERS	"1.0"

/*
 * extended attribute fixed header
 *
 * h_version		format version.
 * h_size               size of header + variable sized data sections.
 * h_component_len      Length of entire pathing section.
 * h_link_component_len Length of link component section.  Again same definition
 *                      as h_component_len.
 */
struct xattr_hdr {
	char	h_version[7];
	char	h_size[10];
	char	h_component_len[10];	   /* total length of path component */
	char	h_link_component_len[10];
};

/*
 * The name is encoded like this:
 * filepathNULattrpathNUL[attrpathNULL]...
 */
struct xattr_buf {
	char	h_namesz[7];   /* length of h_names */
	char	h_typeflag;    /* actual typeflag of file being archived */
	char	h_names[1];	/* filepathNULattrpathNUL... */
};
Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.