Examples

Kyrre Ness Sjøbæk edited this page May 12, 2016 · 3 revisions
Clone this wiki locally

Short code examples illustrating usage of the library.

This page collects several short examples illustrating usage of libarchive. The intent here is to provide basic templates that can be copied and used to bootstrap your own development.

All of the code on this page is in the public domain. You may use it for any purpose whatsoever. (Please note that libarchive itself is not in the public domain; please read the license information in the source distribution for details.) If there is an example you'd like to see added to this page, please ask.

Table of Contents

List contents of Archive stored in File

Libarchive is based around two types of "objects": archive objects (pointers to `struct archive`) and entry objects (pointers to `struct archive_entry`). These are opaque references: you cannot directly access the structure fields, you can only invoke libarchive functions that create, manipulate, and destroy these objects for you.

The basic lifecycle of an archive object is very simple:

  • Create one using archive_XXX_new()
  • Configure it using "support" or "set" calls. ("Support" calls allow libarchive to decide when to use a feature, "set" calls enable the feature unconditionally.)
  • "Open" a particular data source.
  • Iterate over the contents: ask alternately for "header" (which returns an entry object describing the next entry in the archive) and "data"
  • When you're done, you can "close", query any final information or statistics, then call "free" to free the archive object. (The "free" calls were named "finish" in earlier versions of libarchive.)
Writing an archive is very similar, except that you provide header and data to libarchive instead of asking for them from libarchive.

Here's a very basic example that simply opens a file and lists the contents of the archive:

struct archive *a;
struct archive_entry *entry;
int r;

a = archive_read_new();
archive_read_support_filter_all(a);
archive_read_support_format_all(a);
r = archive_read_open_filename(a, "archive.tar", 10240); // Note 1
if (r != ARCHIVE_OK)
  exit(1);
while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
  printf("%s\\n",archive_entry_pathname(entry));
  archive_read_data_skip(a);  // Note 2
}
r = archive_read_free(a);  // Note 3
if (r != ARCHIVE_OK)
  exit(1);

Note 1: Beginning with libarchive 3.0, the `archive_read_open_filename()` function inspects the file before deciding how to handle the blocksize. If the filename provided refers to a tape device, for example, it will use exactly the blocksize you specify. For other devices, it may adjust the requested blocksize in order to obtain better performance.

Note 2: The call to `archive_read_data_skip()` here is not actually necessary, since libarchive will invoke it automatically if you request the next header without reading the data for the last entry.

Note 3: This function was called `archive_read_finish()` in earlier versions of libarchive. That name will remain available for at least a few years.

List contents of Archive stored in Memory

There are several variants of the "open" functions. The "filename" variant used above is intended to be simple to use in the common case, but you may find the "memory" variant more useful sometimes:

struct archive *a = archive_read_new();
archive_read_support_compression_gzip(a);
archive_read_support_format_tar(a);
r = archive_read_open_memory(a, buff, sizeof(buff));

Note that the "filename" variant accepts a block size indicating how large each read from disk will be. In the memory case, all that's needed is the total size of the archive stored in memory.

There are also variants to read from an already-opened file descriptor (which is useful if you need to skip the first part of a file before using libarchive to extract the rest) or `FILE *` pointer.

List contents of Archive with custom read functions

Sometimes, none of the packaged "open" functions will work for you. In that case, you can use the lower-level `archive_read_open` function. This accepts three callbacks and a pointer to your data:

  • An open callback. This is legacy and is never necessary and should not be used.
  • A read callback.
  • A close callback.
For example, you could implement a custom read callback function that called into an HTTP library in order to extract data as it was downloaded from a web site.

All of the callbacks use certain common libarchive conventions:

  • The open and close functions return ARCHIVE_OK (zero) on success or a negative value on failure. The most common values here are ARCHIVE_WARN for something that was less than perfect or ARCHIVE_FATAL for a failure that cannot be retried or recovered.
  • The read callback returns the number of bytes read, zero for end-of-file, or a negative failure code as above. It also returns a pointer to the block of data read.
  • Libarchive does not care how big the blocks are. It will fully consume any block before asking for the next, so your callbacks do not need to handle partially read blocks. The only requirement is that each block must be at least one byte because a zero-byte return indicates end-of-file.
Here's a simple outline of using your own custom callbacks. For brevity, I've omitted a lot of error-handling here. In particular, note how the custom "mydata" pointer is passed back into your callbacks so you can manage your private data as you wish.
void
list_archive(const char *name)
{
  struct mydata *mydata;
  struct archive *a;
  struct archive_entry *entry;
  mydata = malloc(sizeof(struct mydata));
  a = archive_read_new();
  mydata->name = name;
  mydata->fd = open(mydata->name, O_RDONLY);
  archive_read_support_compression_all(a);
  archive_read_support_format_all(a);
  archive_read_open(a, mydata, NULL, myread, myclose);
  while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
    printf("%s\\n",archive_entry_pathname(entry));
  }
  archive_read_finish(a);
  free(mydata);
}
ssize_t
myread(struct archive *a, void *client_data, const void **buff)
{
  struct mydata *mydata = client_data;
  *buff = mydata->buff;
  return (read(mydata->fd, mydata->buff, 10240));
}
int
myclose(struct archive *a, void *client_data)
{
  struct mydata *mydata = client_data;
  if (mydata->fd > 0)
    close(mydata->fd);
  free(mydata);
  return (ARCHIVE_OK);
}

LibarchiveIO has more details about designing and implementing effective I/O callback functions.

A note about the skip callback

The `archive_read_open2` function is like `archive_read_open` but it accepts an additional "skip" callback function. The skip callback is never required. However, if it is available, libarchive can optimize certain reads. In particular, none of the examples above read the body of an archive entry. For some archive formats, libarchive can invoke the "skip" callback to quickly seek over the entire body entry.

The skip callback must satisfy the following:

  • It must return the number of bytes actually skipped, or a negative failure code if skipping cannot be done.
  • It can skip fewer bytes than requested but must never skip more.
  • Only positive/forward skips will ever be requested.
  • If skipping is not provided or fails, libarchive will call the read() function and simply ignore any data that it does not need.
Note: The signature of the skip callback has changed slightly in libarchive 3.0. Previous versions used the platform `off_t` data type to indicate offsets for skipping. Unfortunately, that is not entirely portable, so libarchive 3.0 uses `int64_t` data types instead. If you need to write code that can compile with both libarchive 3.0 and earlier versions, you may need to define different types of your callbacks depending on the `ARCHIVE_VERSION_NUMBER` macro. Ask on libarchive-discuss@groups.google.com if you have questions about this.

A note about the seek callback

Libarchive 3.0 can now support a _seek_ callback. The seek callback is used to read archive formats that are not amenable to streaming, such as 7-Zip and some Zip variants.

Be careful: If you provide a seek callback function, you are guaranteeing that it will work. In particular, the system `lseek()` or `fseek()` generally does not work with tape files, some raw disk devices, pipes, or network sockets. If you provide a seek callback that just invokes the system function on such a file, libarchive will likely get very confused.

A Universal Decompressor

Starting with libarchive 2.8, there is a "raw" format handler that treats arbitrary binary input as a single-element archive. This is basically a cheat to allow you to get the output of a libarchive filter chain, including files with multiple encodings such as `gz.uu` files:

int r;
ssize_t size;

struct archive *a = archive_read_new();
archive_read_support_compression_all(a);
archive_read_support_format_raw(a);
r = archive_read_open_filename(a, filename, 16384);
if (r != ARCHIVE_OK) {
  /* ERROR */
}
r = archive_read_next_header(a, &ae);
if (r != ARCHIVE_OK) {
  /* ERROR */
}

for (;;) {
  size = archive_read_data(a, buff, buffsize);
  if (size < 0) {
    /* ERROR */
  }
  if (size == 0)
    break;
  write(1, buff, size);
}

archive_read_free(a));

Note that the "raw" format is not enabled by `archive_read_support_format_all()`.

A Basic Write Example

The following is a very simple example of using libarchive to write a group of files into a gzipped tar archive. This is a little more complex than the read examples above because the write example actually does something with the file bodies.

void
write_archive(const char *outname, const char **filename)
{
  struct archive *a;
  struct archive_entry *entry;
  struct stat st;
  char buff[8192];
  int len;
  int fd;

  a = archive_write_new();
  archive_write_add_filter_gzip(a);
  archive_write_set_format_pax_restricted(a); // Note 1
  archive_write_open_filename(a, outname);
  while (*filename) {
    stat(*filename, &st);
    entry = archive_entry_new(); // Note 2
    archive_entry_set_pathname(entry, *filename);
    archive_entry_set_size(entry, st.st_size); // Note 3
    archive_entry_set_filetype(entry, AE_IFREG);
    archive_entry_set_perm(entry, 0644);
    archive_write_header(a, entry);
    fd = open(*filename, O_RDONLY);
    len = read(fd, buff, sizeof(buff));
    while ( len > 0 ) {
        archive_write_data(a, buff, len);
        len = read(fd, buff, sizeof(buff));
    }
    close(fd);
    archive_entry_free(entry);
    filename++;
  }
  archive_write_close(a); // Note 4
  archive_write_free(a); // Note 5
}

int main(int argc, const char **argv)
{
  const char *outname;
  argv++;
  outname = argv++;
  write_archive(outname, argv);
  return 0;
}

Note 1: Libarchive's "pax restricted" format is a tar format that uses pax extensions only when absolutely necessary. Most of the time, it will write plain ustar entries. This is the recommended tar format for most uses. You should explicitly use ustar format only when you have to create archives that will be readable on older systems; you should explicitly request pax format only when you need to preserve as many attributes as possible.

Note 2: This example creates a fresh archive_entry object for each file. For better performance, you can reuse the same archive_entry object by using `archive_entry_clear()` to erase it after each use.

Note 3: Size, file type, and pathname are all required attributes here. You can also use `archive_entry_copy_stat()` to copy all information from the `struct stat` to the archive entry, including file type. To get even more complete information, look at the `archive_read_disk` API, which provides an easy way to get more extensive file metadata---including ACLs and extended attributes on some systems---than using the system `stat()` system call. It also works on platforms such as Windows where `stat()` either doesn't exist or is broken.

Note 4: The free/finish call will implicitly call `archive_write_close()` if necessary. However, the close call returns an error code and the free/finish call does not, so if you rely on the implicit close, you won't be able to detect any errors that happen with the final writes.

Note 5: Beginning with libarchive 3.0, this function is called `archive_write_free()`. The previous name was `archive_write_finish()`. If you want to write software compatible with libarchive 2.x and libarchive 3.x, you should use the old name, but be aware that it will be removed when libarchive 4.x is released.

Constructing Objects On Disk

Libarchive includes an `archive_write_disk` facility that works very much like `archive_write`, except that it constructs objects on disk instead of adding entries to an archive. The `archive_write_disk` facility knows how to construct directories, regular files, symlinks, hard links, and other types of disk objects. Here is a very simple example showing how you could use it to create a regular file on disk:

struct archive *a;
struct archive_entry *entry;

a = archive_write_disk_new();
archive_write_disk_set_options(a, ARCHIVE_EXTRACT_TIME);

entry = archive_entry_new();
archive_entry_set_pathname(entry, "my_file.txt");
archive_entry_set_filetype(entry, AE_IFREG); // Note 1
archive_entry_set_size(ae, 5);  // Note 2
archive_entry_set_mtime(ae, 123456789, 0);
archive_write_header(a, entry);
archive_write_data(a, "abcde", 5); // Note 3
archive_write_finish_entry(a);
archive_write_free(a);
archive_entry_free(entry);

Note 1: The `archive_entry.h` header defines a set of `AE_XXXXX` constants for various file types. These are the same as the `IF_XXXXX` constants that are defined on many systems, except that not all systems have a complete set of `IF_XXXXX` definitions in the standard headers.

Note 2: If you set a size in the entry, the `archive_write_disk` object will enforce that size.

Note 3: If you try to write more than the size set in the entry, your writes will be truncated; if you write fewer bytes than you promised, the file will be extended with zero bytes.

The pattern above can also be used to reconstruct directories, device nodes, and FIFOs. The same idea also works for restoring symlinks and hardlinks, but you do have to initialize the entry a little differently:

  • Symlinks have a file type `AE_IFLNK` and require a target to be set with `archive_entry_set_symlink()`
  • Hardlinks require a target to be set with `archive_entry_set_hardlink()`; if this is set, the regular filetype is ignored. If the entry describing a hardlink has a size, you must be prepared to write data to the linked files. If you don't want to overwrite the file, leave the size unset.

A Complete Extractor

Using the facilities described above, you can extract most archives to disk by simply copying entries from an `archive_read` object to an `archive_write_disk` object. The following code is slightly simplified from the `examples/untar.c` code included in the libarchive distribution:

static void
extract(const char *filename)
{
  struct archive *a;
  struct archive *ext;
  struct archive_entry *entry;
  int flags;
  int r;

  /* Select which attributes we want to restore. */
  flags = ARCHIVE_EXTRACT_TIME;
  flags |= ARCHIVE_EXTRACT_PERM;
  flags |= ARCHIVE_EXTRACT_ACL;
  flags |= ARCHIVE_EXTRACT_FFLAGS;

  a = archive_read_new();
  archive_read_support_format_all(a);
  archive_read_support_compression_all(a);
  ext = archive_write_disk_new();
  archive_write_disk_set_options(ext, flags);
  archive_write_disk_set_standard_lookup(ext);
  if ((r = archive_read_open_filename(a, filename, 10240)))
    exit(1);
  for (;;) {
    r = archive_read_next_header(a, &entry);
    if (r == ARCHIVE_EOF)
      break;
    if (r < ARCHIVE_OK)
      fprintf(stderr, "%s\n", archive_error_string(a));
    if (r < ARCHIVE_WARN)
      exit(1);
    r = archive_write_header(ext, entry);
    if (r < ARCHIVE_OK)
      fprintf(stderr, "%s\n", archive_error_string(ext));
    else if (archive_entry_size(entry) > 0) {
      r = copy_data(a, ext);
      if (r < ARCHIVE_OK)
        fprintf(stderr, "%s\n", archive_error_string(ext));
      if (r < ARCHIVE_WARN)
        exit(1);
    }
    r = archive_write_finish_entry(ext);
    if (r < ARCHIVE_OK)
      fprintf(stderr, "%s\n", archive_error_string(ext));
    if (r < ARCHIVE_WARN)
      exit(1);
  }
  archive_read_close(a);
  archive_read_free(a);
  archive_write_close(ext);
  archive_write_free(ext);
  exit(0);
}

The function above just reads headers from the input archive and writes them to disk. The key missing piece in the above is the `copy_data()` function to actually pull data from a read archive and write it to a write handle:

static int
copy_data(struct archive *ar, struct archive *aw)
{
  int r;
  const void *buff;
  size_t size;
  off_t offset;

  for (;;) {
    r = archive_read_data_block(ar, &buff, &size, &offset);
    if (r == ARCHIVE_EOF)
      return (ARCHIVE_OK);
    if (r < ARCHIVE_OK)
      return (r);
    r = archive_write_data_block(aw, buff, size, offset);
    if (r < ARCHIVE_OK) {
      fprintf(stderr, "%s\n", archive_error_string(aw));
      return (r);
    }
  }
}

Handling Errors

For simplicity, the examples above do not all demonstrate proper error handling. Most libarchive functions return a status code that should be acted on. The most important values are:

  • `ARCHIVE_EOF` is returned only from `archive_read_data()` functions when you reach the end of the data in an entry or from `archive_read_next_header()` when you reach the end of the archive.
  • `ARCHIVE_OK` if the operation completed successfully
  • `ARCHIVE_WARN` if the operation completed with some surprises. You may want to report the issue to your user. `archive_error_string` will return a suitable text message, `archive_errno` returns an associated system errno value. (Since not all errors are caused by failing system calls, `archive_errno` does not always return a meaningful value.)
  • `ARCHIVE_FAILED` if this operation failed. In particular, this means that further operations on this entry are impossible. This is returned, for example, if you try to write an entry type that's not supported by this archive format. Recovery usually consists of simply going on to the next entry.
  • `ARCHIVE_FATAL` if the archive object itself is no longer usable, typically because of an I/O failure or memory allocation failure. Generally, your only recovery in this case is to invoke `archive_write_finish` to release the archive object.
In certain extreme cases, libarchive will call abort() to terminate the program. This should generally only happen when libarchive's internal consistency checks detect serious bugs in libarchive itself. Prior to libarchive 3.0, these consistency checks also caused libarchive to call abort() when the API was misused -- such as by calling archive_write_header before opening an archive or continuing to write data to an archive handle after it has previously returned ARCHIVE_FATAL. Starting with libarchive 3.0, this type of misuse instead causes ARCHIVE_FATAL to be returned.