Skip to content

Release v1: diffah container image delta export/import/inspect#1

Merged
leosocy merged 42 commits intomasterfrom
design/diffah-v1-spec
Apr 20, 2026
Merged

Release v1: diffah container image delta export/import/inspect#1
leosocy merged 42 commits intomasterfrom
design/diffah-v1-spec

Conversation

@leosocy
Copy link
Copy Markdown
Owner

@leosocy leosocy commented Apr 20, 2026

Overview

diffah is a CLI for shipping container images as portable layer
deltas
when registry-to-registry replication is unavailable
(air-gapped deployments, customer deliveries, offline mirrors). This PR
lands the complete v1 surface: export, import, inspect, and
version subcommands, all cross-tested against both OCI and Docker
schema 2 manifest formats.

A v2 image that shares base layers with a v1 baseline typically ships as
a delta archive that is 10% or less of the full image size — only
the layers that actually changed travel.

What v1 ships

CLI commands

  • diffah export — reads a target image and baseline manifest, computes
    which layers are new, and packages only the new blobs plus the target
    manifest and config into a portable .tar with a diffah.json
    sidecar describing which blobs the consumer must resolve from its
    local baseline.
  • diffah import — extracts the delta, opens the local baseline image,
    verifies every required baseline blob is reachable (fail-fast), and
    reconstructs the full target image in docker-archive, oci-archive,
    or dir format.
  • diffah inspect — previews the contents of a delta archive without
    writing anything: version, platform, manifest digests, shipped vs
    required blob counts, and the estimated size saving vs the full image.
  • diffah version — prints the build version.

Capabilities

  • --target and --baseline accept any containers-image transport:
    docker://, docker-archive:, oci-archive:, dir:.
  • --baseline-manifest accepts a standalone manifest.json when the
    original baseline image is no longer available but its manifest
    digest set is known.
  • --dry-run on export and import validates reachability and
    manifest structure without touching the filesystem.
  • Optional --compress=zstd on the outer archive for additional
    on-wire savings.
  • Cross-format round-trip: OCI ⇄ Docker schema 2 are handled
    transparently; manifest bytes are preserved verbatim in the internal
    dir: layout to keep target digests stable.

Architecture

Strict Interface → Service → Domain → Infrastructure layering:

  • cmd/ — Cobra CLI surface (interface layer).
  • pkg/exporter/, pkg/importer/ — service orchestration that wraps
    containers-image ImageSource / ImageDestination interfaces and
    delegates heavy lifting to go.podman.io/image/v5/copy.Image.
  • pkg/diff/ — pure domain types, sidecar schema, plan partition.
  • internal/imageio/, internal/archive/, internal/oci/
    infrastructure adapters for transports, tar/zstd packaging, and
    dir-layout helpers.

Testing

  • Unit tests for every package with coverage thresholds (≥ 80% for
    pkg/*, ≥ 60% for internal/*).
  • Cross-format integration matrix (OCI/schema 2 × docker-archive /
    oci-archive / dir) under pkg/importer/integration_test.go and
    cmd/*_integration_test.go.
  • Deterministic OCI and Docker schema 2 fixtures are committed under
    testdata/fixtures/ and verified via
    testdata/fixtures/CHECKSUMS. Regenerate with
    go run ./scripts/build_fixtures.
  • CI: lint and test run on every PR and push to master;
    integration runs nightly and on manual dispatch.

Build and release

  • Go 1.25 with build tag containers_image_openpgp.
  • GoReleaser produces linux_{amd64,arm64} and darwin_{amd64,arm64}
    binaries; the release workflow triggers on v* tags.
  • Install:
    go install -tags containers_image_openpgp github.com/leosocy/diffah@latest.

Usage snapshot

Producer:

diffah export \
  --target   docker://registry.example.com/app:v2 \
  --baseline docker://registry.example.com/app:v1 \
  --platform linux/amd64 \
  --output   ./app_v1_to_v2.tar

Consumer:

diffah import \
  --delta    ./app_v1_to_v2.tar \
  --baseline docker://registry.internal/app:v1 \
  --output   ./app_v2.tar

Preview:

diffah inspect ./app_v1_to_v2.tar

Design document

Full specification in
`docs/superpowers/specs/2026-04-20-diffah-design.md`:
archive format, export / import algorithms, error contracts, testing
strategy, and explicit non-goals.

Post-merge

  • Tag `v0.1.0` to trigger the `release` workflow and publish the
    first set of binaries on GitHub Releases.
  • Roadmap items for v2 (intra-layer diffing, direct push to registries,
    cosign signature verification) are tracked outside this PR.

Test plan

  • `go test -tags containers_image_openpgp -race -cover ./...`
    passes locally on darwin/arm64 (Go 1.25.4).
  • `golangci-lint run ./...` clean.
  • `diffah export` → `diffah inspect` → `diffah import`
    end-to-end smoke against local fixtures produces a byte-identical
    target image.
  • CI `lint` job green on this PR.
  • CI `test` job green on this PR (ubuntu-latest and macos-latest).

leosocy added 30 commits April 20, 2026 01:49
Captures the agreed scope, architecture, CLI surface, delta archive
format, and export/import algorithms before implementation begins,
so the upcoming implementation plan and code can be evaluated against
a stable contract.
7 stages, 26 TDD tasks with concrete file paths, failing tests, minimal
implementations, and per-task commits. Each containers-image API binding
task starts with a 'go doc' pin step so signatures are verified live
rather than recalled.
Pin versions:
- github.com/spf13/cobra@v1.8.1
- go.podman.io/image/v5@v5.39.2
- github.com/klauspost/compress@v1.18.5
- github.com/stretchr/testify@v1.11.1

Removed deprecated bakgo.mod with old github.com/containers/image reference.
go mod tidy strips pre-pinned deps that no code imports yet, so the
prior commit's go.mod plus empty go.sum was an inconsistent state.
Deps will land via go mod tidy as later tasks add real imports.
Bumps pre-commit-hooks to v4.6.0. Runs golangci-lint through a local
hook entry instead of pre-commit's hosted git-based install, so the
repo does not require contributors' machines to reach github.com on
each new clone (also works around SSL issues in restricted networks).
The originally written v1 config does not parse under the locally
installed golangci-lint v2.1.6. golangci-lint migrate produced this
v2 layout. gofmt and goimports are now expressed under the
formatters: section per the v2 schema. Linter selection is unchanged.

run.go is pinned to 1.24 to match the toolchain golangci-lint v2.1.6
itself was built with; bumping the binary to a Go 1.25 release will
let us drop this constraint.
The locally installed golangci-lint v2.1.6 was built with Go 1.24 and
panics when analysing source compiled against Go 1.25, so the local
hook cannot run reliably until contributors upgrade their binary.

CI pins a working golangci-lint version (added in Task 1.10), so move
the lint gate there. Local devs can run make lint when their toolchain
is current.
Provides the standard developer entry points called out in the spec
(§11.6). Build pins CGO_ENABLED=0 plus the containers_image_openpgp
tag so the binary stays static and free of GnuPG cgo. VERSION is
injected via -ldflags into cmd.version.
Wires the diffah binary so make build produces a usable CLI:
- main.go bootstraps cmd.Execute and propagates exit code 1 on error.
- cmd/root.go owns the cobra root, the version variable injected via
  -ldflags, the persistent --log-level flag, and a shared reportError
  helper for downstream subcommands.
- cmd/version.go prints the injected version.
- cmd/{export,import,inspect}.go are minimal stubs registered against
  the root so the help output matches the spec; later tasks replace
  the bodies.

Tests in cmd/root_test.go and cmd/version_test.go cover subcommand
registration, --help listing, and the version output. go.mod is
populated by go mod tidy now that real imports exist.
Both adapters wrap go.podman.io/image/v5 so service code can stay
independent of the upstream package. ParseReference normalises any
transport:reference string and wraps errors with the offending value;
DefaultPolicyContext returns an insecure-accept-any signature policy
appropriate for v1 (signing is out of scope per spec §2.2).
lint.yml runs golangci-lint via the official action against Go 1.25.4
on ubuntu. test.yml runs go test with -race -cover across ubuntu and
macos. Pinning version: latest in the lint action keeps us on a
golangci-lint build that targets the project's Go version.
pkg/diff hosts the domain layer with no framework dependencies.

errors.go defines the domain error types from spec section 9.1, all
implementing Error and (where useful) Unwrap so service callers can
chain context with fmt.Errorf and consumers can errors.As.

plan.go owns BlobRef, Plan, and ComputePlan. ComputePlan partitions
target layer references into RequiredFromBaseline and ShippedInDelta
according to which digests already live in baseline, preserving the
target's original ordering.

sidecar.go defines the diffah.json v1 schema, atomic Marshal that
validates before encoding, and ParseSidecar that rejects unknown
versions and missing required fields. The schema matches spec section
6.2 verbatim.

go test ./pkg/diff/... -cover reports 94.4% coverage.
…e Go

Adds a Progress (resumable handoff) section pinning HEAD, completed
stages, and the local-environment caveats discovered during execution
so the next session can pick up without rediscovering them.

Replaces Task 3.4's bash + buildah recipe with a pure-Go generator
(scripts/build_fixtures/main.go) that uses go.podman.io/image/v5 it-
self for the destination write. The original recipe required buildah,
which is not available on the development host; the Go generator runs
anywhere go run does and keeps determinism control fully in our code.
Pack atomically writes a tar archive containing srcDir contents plus
sidecar JSON to outPath. Uses tmp file + rename pattern to guarantee
no observers see partial archives. Supports optional zstd compression.
…ection

Extract writes every entry of a delta archive into a destination directory
and returns the sidecar bytes. ReadSidecar returns only sidecar bytes without
extracting the full archive, allowing fast metadata inspection. Both functions
auto-detect zstd compression by sniffing the stream's magic bytes (0x28B52FFD).

All 5 new tests pass; combined coverage 70.5%.
…ed fixtures

Replaces the placeholder bash script with a self-contained Go program that
produces bit-identical OCI and Docker Schema 2 archives on every run.
Determinism is achieved by pinning all tar/gzip headers to fixed values,
and post-processing each outer archive via normalizeTar to sort entries and
zero out any variable fields written by the upstream transport library.

Shared base layer digest is identical between v1 and v2 archives, which
makes the fixtures useful for testing ComputePlan and delta distribution.
Wire the full export pipeline: open baseline, collect layer digests,
copy target into a temp dir via KnownBlobsDest (wrapped through a
knownBlobsRef to inject at the ImageReference level), build and pack
the sidecar, then verify digest round-trip.

Adds derivePlatformFromConfig to read os/arch from the config blob
written by copy.Image into the directory transport layout, so the
sidecar always has a non-empty Platform without requiring the caller
to pass --platform.
- Add DryRunStats type and DryRun() function that computes the layer
  partition plan without invoking copy.Image or writing any output files
- Extract loadTargetManifest() helper to open the target image source,
  read the manifest, resolve manifest lists via platform selection, and
  return the parsed manifest — shared by both Export and DryRun
- Add TestExport_ManifestOnlyBaseline to prove the BaselineManifestPath
  code path works end-to-end through Export
- Add TestExport_DryRun_DoesNotWriteOutput and
  TestExport_DryRun_ManifestOnlyBaseline to verify DryRun behavior
Wire the full import pipeline (spec §8): extract delta archive, parse
sidecar, probe baseline, build CompositeSource, copy to output, rename
atomically, and verify. PreserveDigests is only set for dir output as
the spec requires — docker-archive/oci-archive need manifest rewriting
which PreserveDigests=true would refuse.
leosocy added 12 commits April 20, 2026 11:27
…fixture

- Extend scripts/build_fixtures to emit unrelated_oci.tar (1 layer,
  /unrelated.bin = 16 KiB of 0xFF), whose layer digest does not overlap
  with any v1/v2 layer, enabling fail-fast probe testing
- Add DryRunReport + DryRun to pkg/importer: runs steps 1-4 (extract,
  parse, open baseline, probe) without writing output, reports reachable
  vs. missing blobs
- Add TestImport_FailFast_MissingBaselineBlob, TestImport_DryRun_Reachable,
  and TestImport_DryRun_Missing; all three pass
Adds TestImport_Matrix, a table-driven integration test that exercises
the full export → import pipeline across source formats (OCI and Docker
Schema 2) and output formats (docker-archive, oci-archive, dir).
Five test cases cover all combinations with a helper buildDeltaS2() for
schema-2 deltas.
Replace stub with full implementation that reads sidecar metadata from delta
archives and displays platform info, manifest references, blob counts, and
compression savings percentage (required bytes / total bytes).
Keep dist/ (goreleaser output), bin/ (go build output), build_fixtures
(scripts helper binary), *.bck.yaml config backups, and .tool-versions
(asdf pin) out of the published repository. These are generated or
machine-local and should not be imposed on contributors.
…11.4

Action @v6 installs golangci-lint v1 by default, which rejects our v2
schema config (linters.settings, linters.exclusions.presets,
formatters). Action @v9 adopted the v2 binary series and v2.11.4 is
the first release built with Go 1.25 so it can analyse our module.
The local golangci-lint binary was built with Go 1.24 and could not
load this 1.25 module, so the real v2 lint output never surfaced during
development. Upgrading the local binary revealed 47 issues; this commit
clears all of them.

Config adjustments (.golangci.yaml):
- Disable gocritic hugeParam globally. Many flagged methods implement
  go.podman.io/image types interfaces (PutBlob, GetBlob, TryReusingBlob)
  which take BlobInfo by value; pointer receivers would break interface
  satisfaction.
- Bump gocyclo to 15 and exclude gosec from _test.go (0o644 on fixture
  paths is normal test practice; gosec G305 in reader.go is still
  checked).
- Sync run.go to 1.25 to match go.mod.

Code fixes:
- internal/archive/reader.go: wrap io.EOF checks with errors.Is, defend
  Extract against zip slip via safeJoin, split loop body into
  extractEntry for clarity.
- pkg/exporter/exporter.go, pkg/importer/importer.go: capture
  policyCtx.Destroy error via a defer closure.
- pkg/importer: promote "docker-archive", "oci-archive", and "dir" to
  exported Format* constants so external callers do not hardcode them.
- cmd/root.go: drop the unused reportError helper and its fmt/os imports.
- internal/archive/writer.go: mark the unused addFile parameter as _.
- Split overlong signatures in pkg/exporter and pkg/importer to keep
  every line under 120 columns.
go.podman.io/storage pulls in btrfs and devicemapper drivers that need
system headers (btrfs/version.h, libdevmapper) present on the Linux
runner. Ubuntu-latest does not ship them, and go test needs cgo for the
race detector, so the previous invocation failed at package load with
"btrfs/version.h: No such file or directory".

diffah never instantiates these drivers - the containers-image copy
path only uses directory/docker-archive/oci-archive transports - so
excluding them via build tags is strictly correct. Mirror the same set
in the Makefile test targets so make test matches CI.

goreleaser builds are unaffected (they already use CGO_ENABLED=0 and
the btrfs driver has a linux+cgo build constraint).
Two follow-ups to the v2 lint cleanup:

1. golangci-lint on CI still typechecked the podman/containers-image
   imports without our build tags, so it tried to compile the btrfs
   driver and gpgme-cgo path and failed with "btrfs/version.h: No such
   file or directory". Hoist the same tag set into run.build-tags so
   lint, test, and integration all agree. Unlocks 9 tag-guarded findings
   in scripts/build_fixtures/main.go that the reviewer flagged as latent;
   this commit clears them inline (errorlint io.EOF wrap, staticcheck
   QF1008 embedded-Header removal, two lll splits) and excludes
   gocyclo/funlen from scripts/ since the fixture builder is linear
   orchestration.

2. Add regression tests for safeJoin (zip-slip defense added in the
   prior commit with zero coverage). TestSafeJoin is table-driven across
   accept/reject cases; TestExtract_RejectsPathTraversal crafts a tar
   with a "../escape.txt" entry and confirms Extract rejects it before
   any file lands on disk. internal/archive coverage rises to 72.8%.
@leosocy leosocy merged commit de00704 into master Apr 20, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant