ci(docker): cut release wall time from ~11m to ~3-4m by pratikbin · Pull Request #2 · pratikbin/headroom

pratikbin · 2026-04-28T20:21:09Z

Summary

Five optimizations on the publish-docker workflow. Validated via two
workflow_dispatch runs on this branch.

Run	Cache state	Wall time	Δ vs baseline
Baseline (last 10 releases avg)	mixed gha-cache	11m	—
Test #3 (`25075229870`)	cold registry cache	4m19s	-61%
Test #4 (`25075468315`)	warm registry cache	3m5s	-72%

Changes

Native arm64 runner (ubuntu-24.04-arm) — no QEMU emulation
Single bake invocation per platform — builder stage shared
in-memory across all 8 targets in one buildx run
type=registry cache (<image>-buildcache:<arch>) — persists
across releases, no per-run 10GB cap
Builder pre-build subsumed by matrix collapse
Cosign on root + slim only — other 6 share equivalent layers

Per-platform builds push by digest. merge job creates multi-arch
manifest lists via docker buildx imagetools create — pure plumbing,
no rebuild.

Architecture

setup → build (amd64 + arm64 parallel)
       → merge (8 variants, parallel manifest lists)
       → sign (root + slim)  +  promote-latest

Security

All ${{ ... }} interpolations in run: blocks routed via env: to
prevent shell injection through release tags / workflow_dispatch
inputs / action outputs.

Test plan

actionlint .github/workflows/docker.yml clean
workflow_dispatch with version=0.0.0-test3 (cold) — green, 4m19s
workflow_dispatch with version=0.0.0-test4 (warm) — green, 3m5s
All 8 multi-arch manifests created (amd64 + arm64)
cosign sign succeeds for root + slim
:latest promotion succeeds with annotation timestamp
Real release on main after merge — verify against baseline

Cold-path detail (test #3)

buildArm 175s  buildAmd 140s   ← parallel; arm64 dominates wall
mergeMax  31s  signMax  29s    ← parallel; merge fan-out
promote   15s  setup     5s

Warm-path detail (test #4)

buildArm 114s  buildAmd 113s   ← cache hit on builder + deps
mergeMax  24s  signMax  22s
promote   13s  setup     6s

Five optimizations applied to the publish-docker workflow: 1. Native arm64 runner (ubuntu-24.04-arm) instead of QEMU emulation. 2. Collapse 8-variant matrix into a single bake invocation per platform; builder stage shared in-memory across targets within one buildx run. 3. Cross-run caching via type=registry (ref=<image>-buildcache:<arch>), replacing type=gha (per-workflow-run scoped, 10GB capped). 4. Builder pre-build subsumed by matrix collapse; one bake call per platform builds builder once and reuses it for all 8 variants. 5. Cosign signing limited to root + slim variants (public-facing tags). Other 6 share builder/runtime layers, are cryptographically equivalent. Architecture: setup -> build (amd64 + arm64 parallel) -> merge (8 variants parallel, manifest list creation only) -> sign (root + slim) + promote-latest Per-platform builds push by digest. Merge job creates multi-arch manifest lists via docker buildx imagetools create - no rebuild, just plumbing. All dynamic interpolations in run: blocks routed through env: to prevent shell injection via ${{ }} expansion of release tags / workflow_dispatch inputs / action outputs. Baseline avg: 11m wall (over 10 prior runs). Expected: ~4-5m wall after warm cache.

`headroom/transforms/tag_protector.py` was a regex-driven scan-and- replace loop that ran on every kompress call from ContentRouter (`content_router.py:1089`). The Python implementation had five real bugs we now fix in the port — the most consequential being a `str.replace(.., .., 1)` first-occurrence-replace bug that silently collapsed two identical custom-tag blocks in the same input to a single placeholder + a stray duplicate of the second block. # Bug fixes (each pinned by a `fixed_in_3e4` test) * **#1: O(n²) on nested custom tags.** Python's `while changed` loop restarted a full regex scan after every replacement. Rust walks once in linear time on input length. * **#2: First-occurrence replace bug.** `result.replace(orig, ph, 1)` replaces the FIRST textual match, not the matched offset. Two identical custom-tag blocks collapsed to one placeholder + a stray duplicate of the second block. The Rust walker stitches output by offset so distinct blocks always get distinct placeholders. * **#3: Silent 50-iteration cap.** Python had a hard `max_iterations = 50` safety limit that quietly truncated tag protection on deeply nested input. The Rust walker is bounded by input length only. * **#4: Self-closing pass duplicate-replace risk.** Python ran a second loop with the same `replace_first` bug for self-closers. Rust handles self-closers in the same single pass. * **#5: Placeholder collision.** If the input contained a literal `{{HEADROOM_TAG_…}}` substring, Python silently let the collision break restoration. Rust salts the prefix and reports it in stats. # Architecture Two-phase walker: * Phase 1 (`identify_spans`): linear scan over input bytes, hand- rolled tag-open / tag-close lexer (no regex). Maintains a stack of open custom tags; on a matching close, collapses the inner span into a single `Span { start, end, Block }`. Self-closing custom tags become `Span { ..., SelfClosing }` immediately. Marker-only mode (`compress_tagged_content=true`) emits Open/CloseMarker spans instead. Orphan opens stay un-protected (matches Python behavior). Orphan closes are emitted verbatim and counted in stats. * Phase 2 (`emit_output`): walks `text` once, splicing placeholders for span ranges and copying everything else verbatim. Offset-based, never `str.replace`. PyO3 surface: `protect_tags`, `restore_tags`, `is_html_tag`, `known_html_tag_names`. The Python shim retires the regex internals and re-exports `KNOWN_HTML_TAGS` (rebuilt from the Rust list) + `_is_html_tag` for backwards compat with `content_router.py` and the existing test surface. # Test plan * 25 Rust unit tests including 4 `fixed_in_3e4_*` bug-fix tests * 27 Python tests (23 existing + 4 new `fixed_in_3e4` parity tests) * 5 integration tests in `test_tag_protection_integration.py` pass * `make ci-precheck` clean

… OpenAI Closes the second half of P0-6: once memory injects memory_save / memory_search into body["tools"] for a session, every subsequent turn injects the byte-equal same definitions — even if memory is disabled mid-session. Toggling tool list mid-session busts Anthropic prefix cache per guide §6.3 #2. Adds in headroom/proxy/helpers.py: * SessionToolTracker — bounded LRU keyed by (provider, session_id) storing GOLDEN tool-definition bytes from the first injection. Tracker is provider-aware so the same session_id under Anthropic and OpenAI keeps independent state. Reentrant lock for concurrent access; LRU eviction at HEADROOM_TOOL_TRACKER_MAX_SESSIONS (default 1000). * apply_session_sticky_memory_tools — single coordination point with three paths: first-time inject (record golden bytes), sticky replay (always inject golden bytes regardless of inject_this_turn), and skip. Honors HEADROOM_TOOL_INJECTION_STICKY=disabled as a loud operator opt-in for rollback (NOT a fallback). * serialize_tool_definition_canonical — deterministic byte serialization via the same separators=(",",":")/ensure_ascii=False rules as serialize_body_canonical. * log_tool_injection_decision — structured per-decision log line; never logs the tool definition contents. Wires the helper into all four memory tool injection sites: * handlers/anthropic.py — /v1/messages * handlers/openai.py — /v1/chat/completions * handlers/openai.py — /v1/responses * handlers/openai.py — Codex WS path memory_handler.MemoryHandler gains compute_memory_tool_definitions(provider) — a pure builder that returns the tool definitions without mutating a tools list, so the proxy can route through the sticky tracker. The legacy inject_tools(...) is preserved for callers without a session_id. Tests: tests/test_memory_tool_session_sticky.py — 29 unit + integration cases covering: turn-1→turn-2 byte-equality (Anthropic + OpenAI), sticky replay after memory disabled, golden-fixture pin, LRU eviction, provider isolation under shared session_id, thread-safe concurrent access, env-var contract, disabled-mode passthrough, dedupe with client tools. Golden fixtures pin canonical bytes: * tests/fixtures/memory_tool_definitions/anthropic.json * tests/fixtures/memory_tool_definitions/openai.json No regex. No hardcodes (env-configurable: HEADROOM_TOOL_INJECTION_STICKY, HEADROOM_TOOL_TRACKER_MAX_SESSIONS). No silent fallbacks. Per-decision structured logging. Realignment build constraints satisfied.

Production incident (Finding #2 of HEADROOM_PROXY_LOG_FINDINGS_2026_05_03.md): on this customer's deployment the Rust extension `headroom._core` was never installed into the runtime Docker image. Diff compression failed 54 times in a single day; "Optimization failed: ModuleNotFoundError" hit 379 times. The failure rate climbed every day and reached ~223/day on 2026-05-03 — effectively 100% of requests on the Rust path. Every Rust PR we'd merged (MessageScorer, ICM, DiffCompressor, etc.) was providing zero customer value because the module wasn't loadable at all. Root cause: the Dockerfile builder stage installed Python deps and the in-tree `headroom-ai` package but never ran `maturin build` for the `headroom-py` crate, so the runtime image shipped without `_core.so`. The Python proxy continued to start because the extension's absence is caught and routed through Python-only fallbacks that either silently no-op or raise per-request. This change makes that mode impossible by default: * `headroom.proxy.server._check_rust_core()` runs as the first step of the FastAPI lifespan. If the import fails it prints a structured diagnostic, logs `event=rust_core_missing`, and calls `sys.exit(78)` (sysexits.h `EX_CONFIG`). Process supervisors (systemd / k8s / docker) treat this as a deliberate config error and stop restart loops. * `HEADROOM_REQUIRE_RUST_CORE=false` is the explicit opt-out for Python-only `pip install -e .` developer flows; lifespan logs `event=rust_core_disabled` and continues. Any other value (including unset) keeps the fail-loud default. * `/health` now surfaces `rust_core: "loaded" | "disabled" | "missing"` (plus `rust_core_error` when non-loaded) so operators can alert on the degraded state rather than discovering it via a customer ticket. * `scripts/build_rust_extension.sh` is the single dev-time path: build → install → import-verify with the same `hello()` marker the lifespan checks. Failures are loud at every step. * `Makefile` exposes the script as `make verify-rust-core`. * `Dockerfile` now installs `rustup` + `maturin`, builds the wheel from `crates/headroom-py`, force-installs it into site-packages, and runs the same `hello()` import-verify in the build image so a broken build fails the docker-build, not the next runtime restart. Tests: * `tests/test_rust_core_smoke.py` pins all four contracts: - `_core.hello()` returns `"headroom-core"` - missing extension + default env → `SystemExit(78)` - missing extension + opt-out env → lifespan starts, `/health` returns `rust_core: "disabled"` with the underlying error - present extension + default env → `("loaded", None)` Per-finding-#2: ~/Desktop/HEADROOM_PROXY_LOG_FINDINGS_2026_05_03.md.

PR chopratejas#350 CI: docker-native-e2e's wheel install succeeded but the build-stage verify (`from headroom._core import hello`) failed with `ModuleNotFoundError: No module named 'headroom._core'`. Same failure mode the customer hit in production (Finding #2) — but in CI we have the full layer trace. Root cause: the headroom-core-py wheel claims ownership of both `headroom/__init__.py` (stub from maturin's python-source layout) AND `headroom/_core.cpython-*.so`. The previous Dockerfile installed headroom-ai FIRST (which laid down the real `headroom/` tree), then the wheel SECOND with `--force-reinstall`. pip's --force-reinstall uninstalls the wheel's previously installed files before reinstalling — but the wheel's stub `__init__.py` had already overwritten headroom-ai's at first install. Net result: pip deleted `headroom/__init__.py` and `headroom/_core.so` ownership records got into a state where the .so wasn't present after the install. Fix: swap the order. Install the wheel first (lays down stub `__init__.py` + `_core.so`), then install headroom-ai (overwrites the stub with the real `__init__.py` and adds the rest of the `headroom/` tree). `_core.so` survives because headroom-ai doesn't claim ownership of it. Drop `--force-reinstall` from the wheel step since nothing is installing the wheel before it. This is the exact failure A0 was designed to catch — a deployment that ships without `_core` working. CI is now serving as a regression gate for the production install path. The remaining 3 PR check failures (validate × 3 / Dev Containers) are environmental: the runner's PyPI mirror (`pypi.netflix.net`) times out fetching `cuda-bindings==12.9.4` / `nvidia-cuda-cupti-cu12==12.8.90` / `safetensors==0.7.0`. These come from `headroom-ai[dev]` → `sentence-transformers` → `torch` → CUDA deps. Not caused by the realignment branch; the post-create script needs a `--extra dev-light` profile or the mirror needs the packages cached. Tracking separately.

pratikbin added 2 commits April 29, 2026 01:36

fix(ci): use bake --set platform (singular) not platforms

8baa932

pratikbin merged commit 04e2912 into main Apr 28, 2026
28 checks passed

pratikbin mentioned this pull request Apr 28, 2026

ci(docker): COPY --link, drop setup-python, cosign retry #3

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(docker): cut release wall time from ~11m to ~3-4m#2

ci(docker): cut release wall time from ~11m to ~3-4m#2
pratikbin merged 2 commits intomainfrom
ci/optimize-release-time

pratikbin commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pratikbin commented Apr 28, 2026

Summary

Changes

Architecture

Security

Test plan

Cold-path detail (test #3)

Warm-path detail (test #4)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant