chunkah

An OCI building tool that takes a flat rootfs and outputs a layered OCI image with content-based layers.

Note

chunkah is currently under heavy development and not yet ready for production. Experimental usage and feedback is much appreciated!

Motivation

Traditionally, images built using a Dockerfile result in a multi-layered image which model how the Dockerfile was written. For example, a separate layer is created for each RUN and COPY instructions. This can cause poor layer caching on clients pulling these images. A single package change may invalidate a layer much larger than the package itself, requiring re-pulling.

When splitting an image into content-based layers, it doesn't matter how the final contents of the image were derived. The image is "postprocessed" so that layers are created in a way that tries to maximize layer reuse. Commonly, this means grouping together related packages. This has benefits at various levels: at the network level (common layers do not need to be re-pulled), at the storage level (common layers are stored once), and at the runtime level (e.g. libraries are only mapped once).

chunkah allows you to keep building your image as you currently do, and then perform this content-based layer splitting.

Highlights

Content agnostic — Compatible with RPM-based images, but not only. Other package ecosystems can be supported, as well as fully custom content.
Container-native — Best used as a container image, either as part of a multi-staged build, or standalone.
Zero diff — Apart from modification time, content is never modified.
Reproducible — Supports SOURCE_DATE_EPOCH for reproducible layers.

It is a non-goal to support initial building of the root filesystem itself. Lots of tools for that exist already. It is also currently a non-goal to preprocess the rootfs to remove common sources of non-reproducibility (such as add-determinism). This can be done by the image build process itself.

Installation

While it's possible to install chunkah as a native CLI tool using cargo install, it's primarily intended to be used as a container image:

podman run -ti --rm quay.io/jlebon/chunkah --help

Usage

There are two main ways to use chunkah:

splitting an existing image
splitting an image at build time

Splitting an existing image

Using Podman/Buildah

When using Podman/Buildah, the most natural way to split an existing image is to use the Containerfile.splitter, passing the target image as the --from:

IMG=quay.io/fedora/fedora-minimal:latest
buildah build --skip-unused-stages=false --from $IMG \
  --build-arg CHUNKAH_CONFIG_STR="$(podman inspect $IMG)" \
  https://github.com/jlebon/chunkah/releases/download/latest/Containerfile.splitter

Additional arguments can be passed to chunkah using the CHUNKAH_ARGS build argument.

Note

You must add the --skip-unused-stages=false option (see also this buildah RFE).

For Buildah versions before v1.43, this also requires -v $(pwd):/run/src --security-opt=label=disable.

Another option is using the chunkah image directly and image mounts:

IMG=quay.io/fedora/fedora-minimal:latest
podman pull $IMG # image must be available locally
export CHUNKAH_CONFIG_STR="$(podman inspect $IMG)"
podman run --rm --mount=type=image,src=$IMG,dest=/chunkah \
  -e CHUNKAH_CONFIG_STR quay.io/jlebon/chunkah build | podman load

Using Docker/Moby

You can use the chunkah image directly using image mounts (requires v28+):

IMG=quay.io/fedora/fedora-minimal:latest
docker pull $IMG # image must be available locally
export CHUNKAH_CONFIG_STR="$(docker inspect $IMG)"
docker run --rm --mount=type=image,src=$IMG,destination=/chunkah \
  -e CHUNKAH_CONFIG_STR quay.io/jlebon/chunkah build > out.ociarchive
docker run --rm -ti -v $(pwd):/srv:z -w /srv quay.io/skopeo/stable \
  copy oci-archive:out.ociarchive docker-archive:out.dockerarchive
docker load -i out.dockerarchive

Note the conversion step using skopeo; chunkah currently only outputs an OCI archive, which docker load does not natively support.

Splitting an image at build time (buildah/podman only)

This uses a method called the "FROM oci-archive: trick", for lack of a better term. It has the massive advantage of being compatible with a regular buildah build flow and also makes it more natural to apply configs to the image, but is specific to the Podman ecosystem. This will not work with Docker.

FROM quay.io/fedora/fedora-minimal:latest AS builder
RUN dnf install -y git-core && dnf clean all

FROM quay.io/jlebon/chunkah AS chunkah
RUN --mount=from=builder,src=/,target=/chunkah,ro \
    --mount=type=bind,target=/run/src,rw \
        chunkah build > /run/src/out.ociarchive

FROM oci-archive:out.ociarchive
ENTRYPOINT ["git"]

Note

When building your image, you must also add the --skip-unused-stages=false option (see also this buildah RFE), and you cannot use the --jobs option.

For Buildah versions before v1.43, this also requires -v $(pwd):/run/src --security-opt=label=disable.

Advanced Usage

Understanding components

A component is a logical grouping of files that belong together. For example, all files from an RPM belong to the same component. Layers are created based on found components.

A component repo is a source of data from which components can be created. For example, the rpmdb is a component repo (one can imagine similar component repos for other distros). There is also an xattr-based component repo (see the section "Customizing the layers" below). Multiple component repos can be active at once.

Customizing the layers

It is possible to modify how components are assigned to layers by setting the user.component xattr on files/directories. This can be done using setfattr, e.g.:

RUN setfattr -n user.component -v "custom-apps" /usr/bin/my-app

This is compatible with rpm-ostree's support for the same feature.

Limiting the number of layers

By default, the maximum number of layers emitted is 64. This can be increased (up to 448) or decreased using the --max-layers option. If the number of components exceeds the maximum, chunkah will pack multiple components together. There is thus a tradeoff in deciding this. Fewer layers means losing the efficiency gains of content-based layers. Too many layers may mean excessive processing and overhead when pushing/pulling the image.

Building from a raw rootfs

For completeness, note it's of course also possible to split any arbitrary rootfs, regardless of where it comes from.

podman run --rm -v root:/chunkah:z -e CHUNKAH_CONFIG_STR="$(cat config.json)" \
  quay.io/jlebon/chunkah build > out.ociarchive

Note

The :z option will relabel all files for access by the container, which may be expensive for a large rootfs. You can use --security-opt=label=disable to avoid this, but it disables SELinux separation with the chunkah container.

Customizing the OCI image config and annotations

The OCI image config can be provided via the --config option (as a file) or --config-str/CHUNKAH_CONFIG_STR (inline). The primary format is the OCI image config spec as JSON:

{
    "Entrypoint": ["/bin/bash"],
    "Cmd": ["-c", "echo hi"],
    "WorkDir": "/root"
}

The output format of podman inspect and docker inspect are also supported, mostly for convenience when splitting an existing image, though it does also have the advantage of capturing annotations. Otherwise, it's also possible to set annotations directly using --annotation. Labels can also be added via --label.

Origins

chunkah is a generalized successor to rpm-ostree's build-chunked-oci command which does content-based layer splitting on RPM-based bootable container images. Unlike rpm-ostree, chunkah is not tied to bootable containers nor RPMs. The name is a nod to this ancestry and to buildah, with which it integrates well.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.claude/skills/extract-oci-archive		.claude/skills/extract-oci-archive
.github		.github
src		src
tests		tests
tools		tools
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Containerfile		Containerfile
Containerfile.splitter		Containerfile.splitter
Justfile		Justfile
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

chunkah

Motivation

Highlights

Installation

Usage

Splitting an existing image

Using Podman/Buildah

Using Docker/Moby

Splitting an image at build time (buildah/podman only)

Advanced Usage

Understanding components

Customizing the layers

Limiting the number of layers

Building from a raw rootfs

Customizing the OCI image config and annotations

Origins

About

Licenses found

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

Licenses found

jlebon/chunkah

Folders and files

Latest commit

History

Repository files navigation

chunkah

Motivation

Highlights

Installation

Usage

Splitting an existing image

Using Podman/Buildah

Using Docker/Moby

Splitting an image at build time (buildah/podman only)

Advanced Usage

Understanding components

Customizing the layers

Limiting the number of layers

Building from a raw rootfs

Customizing the OCI image config and annotations

Origins

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages