pack-rs

An experiment to make something like Pack using Rust. Pack is an archiver/compressor that takes a novel approach to the problem that has largely been dominated by two formats for the past couple of decades, tar/gz and zip. Original idea by O with an implementation in Pascal and some very clever SQL.

Status

This was an experiment, and a very interesting one at that. If and when the original Pack has a specification, and other people take an interest in it, then I would be more than happy to continue working on this project. All of the objectives that I had in mind are written in the TODO.org file (an emacs org-mode file), including building a Rust crate with an API similar to that of tar.

Build and Run

Prerequisites

Rust 2021 edition
C/C++ toolchain:
- macOS: Clang probably?
- RockyLinux: sudo yum install gcc-c++ make
- Ubuntu: sudo apt-get install build-essential
- Windows: MSVC build tools and Windows SDK
SQLite 3 library:
- macOS: brew install sqlite
- RockyLinux: sudo yum install sqlite-devel
- Ubuntu: sudo apt-get install libsqlite3-dev
- Windows: it will be bundled with rusqlite

Running the tests

For the time being there are very few unit tests.

cargo test

Creating, listing, extracting archives

Start by creating an archive using the create subcommand. The example below assumes that you have downloaded something interesting into your ~/Downloads directory.

$ cargo run -- create pack.db3 ~/Downloads/httpd-2.4.59
...
Added 3138 files to pack.db3

Now that the pack.db3 file exists, you can list the contents like so:

$ cargo run -- list pack.db3 | head -20
...
httpd-2.4.59/.deps
httpd-2.4.59/.gdbinit
httpd-2.4.59/.gitignore
httpd-2.4.59/ABOUT_APACHE
httpd-2.4.59/Apache-apr2.dsw
httpd-2.4.59/Apache.dsw
httpd-2.4.59/BuildAll.dsp
httpd-2.4.59/BuildBin.dsp
...

Finally, run extract to unpack the contents of the archive into the current directory:

$ cargo run -- extract pack.db3
...
Extracted 3138 files from pack.db3

Specification

A pack file is an SQLite database with file data stored in large blobs compressed using Zstandard. There are three primary tables.

Note: The schema described here differs slightly from Pack but is largely the same for all intents and purposes.

item

Rows in the item table represent directories, files, and symbolic links. The kind for files is 0, the kind for directories is 1, and the kind for symbolic links is 2. The name is the final part of the file path, such as README.md or src. The parent refers to the directory that contains this entry on the file system, with 0 indicating the entry is at the "root" of the archive.

Name	Type	Description
`id`	`INTEGER PRIMARY KEY`	rowid for the item
`parent`	`INTEGER`	rowid in the `item` table for the directory that contains this
`kind`	`INTEGER`	`0` (file), `1` (directory), `2` (symlink)
`name`	`TEXT NOT NULL`	name of the directory or file

content

Rows in the content table are nothing more than huge blobs of compressed data that contain the file data within the archive. The size of these blobs can vary, anywhere from 8 to 32 MiB (mebibytes) with the idea being that larger blocks of contiguous content will compress better.

Name	Type	Description
`id`	`INTEGER PRIMARY KEY`	rowid for the content
`value`	`BLOB`	(compressed) file content

The content blobs are built up from the contents of as many files as it takes to fill the target blob size, at which point the entire block is compressed using Zstandard (without a dictionary). How the file contents are mapped to the content blobs is defined in the itemcontent table described below.

For symbolic links, the raw bytes are stored as if they were file content.

itemcontent

The itemcontent table is the glue that binds the rows from the item table to the huge blobs in the content table. Typically a blob is large enough to hold many files, as such this table will show which blob contains the data for a particular file, and where within the (decompressed) blob to read the data.

For very large files that are larger than the blob size, they will reference multiple rows from the content table. The itempos and contentpos values make it possible to accommodate both small files that fit within a blob and large files that do not.

Empty files will have a row in the itemcontent table with a size of zero to make it easier to write the extraction implementation.

Name	Type	Description
`id`	`INTEGER PRIMARY KEY`	rowid for the itemcontent
`item`	`INTEGER`	rowid in the `item` table for the file
`itempos`	`INTEGER`	position within the file for this chunk of content
`content`	`INTEGER`	rowid in the `content` table where this chunk is stored
`contentpos`	`INTEGER`	position within the chunk from the `content` table for this chunk
`size`	`INTEGER`	the size of the chunk

Performance Considerations

When writing to a database file on secondary storage, the majority of the running time (~90%) is spent in the allocation of the blob in SQLite using this statement:

conn.execute(
    "INSERT INTO content (value) VALUES (ZEROBLOB(?1))",
    [compressed_len as i32],
)?;

This may be a limitation in the current API of the rusqlite crate, or due to my lack of expertise in the usage of SQLite. As a work-around, the program creates an in-memory database and writes to disk when finished.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
doc/internal		doc/internal
src		src
test/fixtures		test/fixtures
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
TODO.org		TODO.org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pack-rs

Status

Build and Run

Prerequisites

Running the tests

Creating, listing, extracting archives

Specification

item

content

itemcontent

Performance Considerations

About

Releases

Packages

Languages

License

nlfiedler/pack-rs

Folders and files

Latest commit

History

Repository files navigation

pack-rs

Status

Build and Run

Prerequisites

Running the tests

Creating, listing, extracting archives

Specification

item

content

itemcontent

Performance Considerations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages