Skip to content

Import rootless OCI image#66

Merged
jserv merged 1 commit into
mainfrom
oci-image
Apr 29, 2026
Merged

Import rootless OCI image#66
jserv merged 1 commit into
mainfrom
oci-image

Conversation

@jserv
Copy link
Copy Markdown
Contributor

@jserv jserv commented Apr 29, 2026

mkrootfs.sh has only ever extracted the bundled Alpine minirootfs tarball into a staging directory and fed that to mke2fs -d. This change adds a --image=docker://... mode so the same staging path can be populated from an arbitrary OCI image hosted on a v2 registry, then a --rewrite-uid mode that restores tar-header uid/gid/mode into the resulting ext4 inodes.

The implementation has three pieces:

scripts/oci-pull.py is a stdlib-only Python helper that resolves manifest lists by host arch, fetches layer blobs with bearer-token auth, applies them to a directory honoring whiteouts (.wh.NAME and .wh..wh..opq), hardlinks (archive-root or parent-relative ustar form), and OCI symlinks. Layer blobs are content-addressed and cached under $XDG_CACHE_HOME/kbox/oci-layers, with atomic-rename writes and verify-on-read so a corrupted cache self-heals.

tools/oci-chown is an in-tree libext2fs helper that opens the ext4 image read-write, walks a NUL-separated manifest emitted by oci-pull.py (uid TAB gid TAB mode TAB path), and rewrites each inode's uid/gid hi+lo halves and i_mode permission bits via ext2fs_namei + ext2fs_read_inode/write_inode. The helper builds with its own Makefile (-lext2fs -lcom_err) and is built on demand by mkrootfs.sh; the kbox supervisor build is unchanged.

scripts/mkrootfs.sh grows --image=URL, --size=MB, and --rewrite-uid flags. Backward compat for the positional SIZE_MB argument is preserved. When --rewrite-uid is set, oci-pull.py emits a manifest into a temp file, mke2fs runs as before, and oci-chown rewrites the inodes from the manifest before mkrootfs declares the image ready.

The layer-apply path is the principal attack surface and is hardened against malicious images:

  • safe_join rejects '..' and absolute paths in tar member names.
  • Each member's parent directory is realpath-checked against the staging root before any write/unlink/chmod, so an in-staging symlink (e.g. layer 1 creates etc -> /etc, layer 2 writes etc/passwd) cannot redirect onto the host.
  • File writes use O_NOFOLLOW; pre-existing symlinks at the destination are unlinked first.
  • Hardlink targets are resolved through safe_join (and a parent- relative variant for ustar tarballs that also runs through safe_join after normalization), absolute linknames are rejected, the resolved source is realpath-confined, and os.link is called with follow_symlinks=False so a staging-resident symlink cannot redirect the link to a host file.
  • Bearer tokens are stripped on cross-host redirects compared by netloc, not just hostname, so a port-change redirect on the same host also drops the token.
  • DoS caps: MAX_MANIFEST_BYTES=4 MB, MAX_BLOB_BYTES=8 GB, MAX_TAR_MEMBERS=500_000.
  • Every blob is sha256-verified in flight; cache reads re-verify.

oci-chown's manifest parser range-checks uid/gid (0..UINT32_MAX), rejects leading sign or whitespace, rejects mode bits outside 07777, and dies loudly on a manifest tail missing the trailing NUL.

A new oci-image-import CI job pulls nginx:alpine end-to-end on every PR, asserts oci-chown reports a non-zero rewrite count, and verifies that /etc/nginx/nginx.conf ends up User=0/Group=0 and /usr/sbin/nginx mode is 0755.

Verified on x86_64 (node1) and aarch64 (arm) with alpine:3.21 (185 inodes rewritten) and node:alpine (~2880 inodes; /home/node round- trips at User=1000/Group=1000). Integration suite shows parity with the baseline tarball rootfs.

Close #17

Change-Id: Ic260ccce778a1de0875fc466cc8fde4c14e301a8


Summary by cubic

Adds rootless OCI image import to the rootfs build. scripts/mkrootfs.sh can now pull docker://... images and optionally restore tar-header ownership into ext4 inodes.

  • New Features

    • scripts/mkrootfs.sh: adds --image=docker://..., --rewrite-uid, and --size=MB (positional size still supported); Alpine tarball remains default; builds tools/oci-chown on demand; validates size; requires python3 for --image.
    • scripts/oci-pull.py: stdlib-only puller with bearer auth; resolves multi-arch; applies layers (whiteouts, hardlinks, symlinks); content-addressed cache under $XDG_CACHE_HOME/kbox/oci-layers with sha256 verify and prune.
    • tools/oci-chown: libext2fs helper that rewrites inode uid/gid/mode from a manifest; required for --root-id guests; supervisor build unchanged.
    • Hardened layer apply: safe path joins, realpath confinement, O_NOFOLLOW, strict hardlink handling (follow_symlinks=False), token stripping on cross-host redirects, DoS caps, and digest verification.
    • Docs: added docs/oci-image-import.md and README usage.
  • CI

    • New oci-image-import job pulls nginx:alpine, asserts non-zero inode rewrites, and verifies /etc/nginx/nginx.conf owner (0:0) and /usr/sbin/nginx mode (0755).

Written for commit 00008c5. Summary will update on new commits. Review in cubic

@jserv jserv force-pushed the oci-image branch 2 times, most recently from 0000f94 to 000034e Compare April 29, 2026 08:30
cubic-dev-ai[bot]

This comment was marked as resolved.

mkrootfs.sh has only ever extracted the bundled Alpine minirootfs
tarball into a staging directory and fed that to mke2fs -d. This change
adds a --image=docker://... mode so the same staging path can be
populated from an arbitrary OCI image hosted on a v2 registry, then a
--rewrite-uid mode that restores tar-header uid/gid/mode into the
resulting ext4 inodes.

The implementation has three pieces:

scripts/oci-pull.py is a stdlib-only Python helper that resolves
manifest lists by host arch, fetches layer blobs with bearer-token
auth, applies them to a directory honoring whiteouts (.wh.NAME and
.wh..wh..opq), hardlinks (archive-root or parent-relative ustar
form), and OCI symlinks. Layer blobs are content-addressed and cached
under $XDG_CACHE_HOME/kbox/oci-layers, with atomic-rename writes and
verify-on-read so a corrupted cache self-heals.

tools/oci-chown is an in-tree libext2fs helper that opens the ext4
image read-write, walks a NUL-separated manifest emitted by
oci-pull.py (uid TAB gid TAB mode TAB path), and rewrites each
inode's uid/gid hi+lo halves and i_mode permission bits via
ext2fs_namei + ext2fs_read_inode/write_inode. The helper builds with
its own Makefile (-lext2fs -lcom_err) and is built on demand by
mkrootfs.sh; the kbox supervisor build is unchanged.

scripts/mkrootfs.sh grows --image=URL, --size=MB, and --rewrite-uid
flags. Backward compat for the positional SIZE_MB argument is
preserved. When --rewrite-uid is set, oci-pull.py emits a manifest
into a temp file, mke2fs runs as before, and oci-chown rewrites the
inodes from the manifest before mkrootfs declares the image ready.

The layer-apply path is the principal attack surface and is hardened
against malicious images:

- safe_join rejects '..' and absolute paths in tar member names.
- Each member's parent directory is realpath-checked against the
  staging root before any write/unlink/chmod, so an in-staging
  symlink (e.g. layer 1 creates etc -> /etc, layer 2 writes
  etc/passwd) cannot redirect onto the host.
- File writes use O_NOFOLLOW; pre-existing symlinks at the destination
  are unlinked first.
- Hardlink targets are resolved through safe_join (and a parent-
  relative variant for ustar tarballs that also runs through
  safe_join after normalization), absolute linknames are rejected,
  the resolved source is realpath-confined, and os.link is called
  with follow_symlinks=False so a staging-resident symlink cannot
  redirect the link to a host file.
- Bearer tokens are stripped on cross-host redirects compared by
  netloc, not just hostname, so a port-change redirect on the same
  host also drops the token.
- DoS caps: MAX_MANIFEST_BYTES=4 MB, MAX_BLOB_BYTES=8 GB,
  MAX_TAR_MEMBERS=500_000.
- Every blob is sha256-verified in flight; cache reads re-verify.

oci-chown's manifest parser range-checks uid/gid (0..UINT32_MAX),
rejects leading sign or whitespace, rejects mode bits outside 07777,
and dies loudly on a manifest tail missing the trailing NUL.

A new oci-image-import CI job pulls nginx:alpine end-to-end on every
PR, asserts oci-chown reports a non-zero rewrite count, and verifies
that /etc/nginx/nginx.conf ends up User=0/Group=0 and /usr/sbin/nginx
mode is 0755.

Verified on x86_64 (node1) and aarch64 (arm) with alpine:3.21 (185
inodes rewritten) and node:alpine (~2880 inodes; /home/node round-
trips at User=1000/Group=1000). Integration suite shows parity with
the baseline tarball rootfs.

Close #17

Change-Id: Ic260ccce778a1de0875fc466cc8fde4c14e301a8
@jserv jserv merged commit 86df60a into main Apr 29, 2026
6 checks passed
@jserv jserv deleted the oci-image branch April 29, 2026 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rootless OCI image import into kbox rootfs

1 participant