ci(docker): cut release wall time from ~11m to ~3-4m#2
Merged
Conversation
Five optimizations applied to the publish-docker workflow:
1. Native arm64 runner (ubuntu-24.04-arm) instead of QEMU emulation.
2. Collapse 8-variant matrix into a single bake invocation per platform;
builder stage shared in-memory across targets within one buildx run.
3. Cross-run caching via type=registry (ref=<image>-buildcache:<arch>),
replacing type=gha (per-workflow-run scoped, 10GB capped).
4. Builder pre-build subsumed by matrix collapse; one bake call per
platform builds builder once and reuses it for all 8 variants.
5. Cosign signing limited to root + slim variants (public-facing tags).
Other 6 share builder/runtime layers, are cryptographically equivalent.
Architecture:
setup -> build (amd64 + arm64 parallel) -> merge (8 variants parallel,
manifest list creation only) -> sign (root + slim) +
promote-latest
Per-platform builds push by digest. Merge job creates multi-arch manifest
lists via docker buildx imagetools create - no rebuild, just plumbing.
All dynamic interpolations in run: blocks routed through env: to prevent
shell injection via ${{ }} expansion of release tags / workflow_dispatch
inputs / action outputs.
Baseline avg: 11m wall (over 10 prior runs).
Expected: ~4-5m wall after warm cache.
3 tasks
pratikbin
pushed a commit
that referenced
this pull request
May 3, 2026
`headroom/transforms/tag_protector.py` was a regex-driven scan-and- replace loop that ran on every kompress call from ContentRouter (`content_router.py:1089`). The Python implementation had five real bugs we now fix in the port — the most consequential being a `str.replace(.., .., 1)` first-occurrence-replace bug that silently collapsed two identical custom-tag blocks in the same input to a single placeholder + a stray duplicate of the second block. # Bug fixes (each pinned by a `fixed_in_3e4` test) * **#1: O(n²) on nested custom tags.** Python's `while changed` loop restarted a full regex scan after every replacement. Rust walks once in linear time on input length. * **#2: First-occurrence replace bug.** `result.replace(orig, ph, 1)` replaces the FIRST textual match, not the matched offset. Two identical custom-tag blocks collapsed to one placeholder + a stray duplicate of the second block. The Rust walker stitches output by offset so distinct blocks always get distinct placeholders. * **#3: Silent 50-iteration cap.** Python had a hard `max_iterations = 50` safety limit that quietly truncated tag protection on deeply nested input. The Rust walker is bounded by input length only. * **#4: Self-closing pass duplicate-replace risk.** Python ran a second loop with the same `replace_first` bug for self-closers. Rust handles self-closers in the same single pass. * **#5: Placeholder collision.** If the input contained a literal `{{HEADROOM_TAG_…}}` substring, Python silently let the collision break restoration. Rust salts the prefix and reports it in stats. # Architecture Two-phase walker: * Phase 1 (`identify_spans`): linear scan over input bytes, hand- rolled tag-open / tag-close lexer (no regex). Maintains a stack of open custom tags; on a matching close, collapses the inner span into a single `Span { start, end, Block }`. Self-closing custom tags become `Span { ..., SelfClosing }` immediately. Marker-only mode (`compress_tagged_content=true`) emits Open/CloseMarker spans instead. Orphan opens stay un-protected (matches Python behavior). Orphan closes are emitted verbatim and counted in stats. * Phase 2 (`emit_output`): walks `text` once, splicing placeholders for span ranges and copying everything else verbatim. Offset-based, never `str.replace`. PyO3 surface: `protect_tags`, `restore_tags`, `is_html_tag`, `known_html_tag_names`. The Python shim retires the regex internals and re-exports `KNOWN_HTML_TAGS` (rebuilt from the Rust list) + `_is_html_tag` for backwards compat with `content_router.py` and the existing test surface. # Test plan * 25 Rust unit tests including 4 `fixed_in_3e4_*` bug-fix tests * 27 Python tests (23 existing + 4 new `fixed_in_3e4` parity tests) * 5 integration tests in `test_tag_protection_integration.py` pass * `make ci-precheck` clean
pratikbin
pushed a commit
that referenced
this pull request
May 3, 2026
… OpenAI Closes the second half of P0-6: once memory injects memory_save / memory_search into body["tools"] for a session, every subsequent turn injects the byte-equal same definitions — even if memory is disabled mid-session. Toggling tool list mid-session busts Anthropic prefix cache per guide §6.3 #2. Adds in headroom/proxy/helpers.py: * SessionToolTracker — bounded LRU keyed by (provider, session_id) storing GOLDEN tool-definition bytes from the first injection. Tracker is provider-aware so the same session_id under Anthropic and OpenAI keeps independent state. Reentrant lock for concurrent access; LRU eviction at HEADROOM_TOOL_TRACKER_MAX_SESSIONS (default 1000). * apply_session_sticky_memory_tools — single coordination point with three paths: first-time inject (record golden bytes), sticky replay (always inject golden bytes regardless of inject_this_turn), and skip. Honors HEADROOM_TOOL_INJECTION_STICKY=disabled as a loud operator opt-in for rollback (NOT a fallback). * serialize_tool_definition_canonical — deterministic byte serialization via the same separators=(",",":")/ensure_ascii=False rules as serialize_body_canonical. * log_tool_injection_decision — structured per-decision log line; never logs the tool definition contents. Wires the helper into all four memory tool injection sites: * handlers/anthropic.py — /v1/messages * handlers/openai.py — /v1/chat/completions * handlers/openai.py — /v1/responses * handlers/openai.py — Codex WS path memory_handler.MemoryHandler gains compute_memory_tool_definitions(provider) — a pure builder that returns the tool definitions without mutating a tools list, so the proxy can route through the sticky tracker. The legacy inject_tools(...) is preserved for callers without a session_id. Tests: tests/test_memory_tool_session_sticky.py — 29 unit + integration cases covering: turn-1→turn-2 byte-equality (Anthropic + OpenAI), sticky replay after memory disabled, golden-fixture pin, LRU eviction, provider isolation under shared session_id, thread-safe concurrent access, env-var contract, disabled-mode passthrough, dedupe with client tools. Golden fixtures pin canonical bytes: * tests/fixtures/memory_tool_definitions/anthropic.json * tests/fixtures/memory_tool_definitions/openai.json No regex. No hardcodes (env-configurable: HEADROOM_TOOL_INJECTION_STICKY, HEADROOM_TOOL_TRACKER_MAX_SESSIONS). No silent fallbacks. Per-decision structured logging. Realignment build constraints satisfied.
pratikbin
pushed a commit
that referenced
this pull request
May 3, 2026
Production incident (Finding #2 of HEADROOM_PROXY_LOG_FINDINGS_2026_05_03.md): on this customer's deployment the Rust extension `headroom._core` was never installed into the runtime Docker image. Diff compression failed 54 times in a single day; "Optimization failed: ModuleNotFoundError" hit 379 times. The failure rate climbed every day and reached ~223/day on 2026-05-03 — effectively 100% of requests on the Rust path. Every Rust PR we'd merged (MessageScorer, ICM, DiffCompressor, etc.) was providing zero customer value because the module wasn't loadable at all. Root cause: the Dockerfile builder stage installed Python deps and the in-tree `headroom-ai` package but never ran `maturin build` for the `headroom-py` crate, so the runtime image shipped without `_core.so`. The Python proxy continued to start because the extension's absence is caught and routed through Python-only fallbacks that either silently no-op or raise per-request. This change makes that mode impossible by default: * `headroom.proxy.server._check_rust_core()` runs as the first step of the FastAPI lifespan. If the import fails it prints a structured diagnostic, logs `event=rust_core_missing`, and calls `sys.exit(78)` (sysexits.h `EX_CONFIG`). Process supervisors (systemd / k8s / docker) treat this as a deliberate config error and stop restart loops. * `HEADROOM_REQUIRE_RUST_CORE=false` is the explicit opt-out for Python-only `pip install -e .` developer flows; lifespan logs `event=rust_core_disabled` and continues. Any other value (including unset) keeps the fail-loud default. * `/health` now surfaces `rust_core: "loaded" | "disabled" | "missing"` (plus `rust_core_error` when non-loaded) so operators can alert on the degraded state rather than discovering it via a customer ticket. * `scripts/build_rust_extension.sh` is the single dev-time path: build → install → import-verify with the same `hello()` marker the lifespan checks. Failures are loud at every step. * `Makefile` exposes the script as `make verify-rust-core`. * `Dockerfile` now installs `rustup` + `maturin`, builds the wheel from `crates/headroom-py`, force-installs it into site-packages, and runs the same `hello()` import-verify in the build image so a broken build fails the docker-build, not the next runtime restart. Tests: * `tests/test_rust_core_smoke.py` pins all four contracts: - `_core.hello()` returns `"headroom-core"` - missing extension + default env → `SystemExit(78)` - missing extension + opt-out env → lifespan starts, `/health` returns `rust_core: "disabled"` with the underlying error - present extension + default env → `("loaded", None)` Per-finding-#2: ~/Desktop/HEADROOM_PROXY_LOG_FINDINGS_2026_05_03.md.
pratikbin
pushed a commit
that referenced
this pull request
May 3, 2026
PR chopratejas#350 CI: docker-native-e2e's wheel install succeeded but the build-stage verify (`from headroom._core import hello`) failed with `ModuleNotFoundError: No module named 'headroom._core'`. Same failure mode the customer hit in production (Finding #2) — but in CI we have the full layer trace. Root cause: the headroom-core-py wheel claims ownership of both `headroom/__init__.py` (stub from maturin's python-source layout) AND `headroom/_core.cpython-*.so`. The previous Dockerfile installed headroom-ai FIRST (which laid down the real `headroom/` tree), then the wheel SECOND with `--force-reinstall`. pip's --force-reinstall uninstalls the wheel's previously installed files before reinstalling — but the wheel's stub `__init__.py` had already overwritten headroom-ai's at first install. Net result: pip deleted `headroom/__init__.py` and `headroom/_core.so` ownership records got into a state where the .so wasn't present after the install. Fix: swap the order. Install the wheel first (lays down stub `__init__.py` + `_core.so`), then install headroom-ai (overwrites the stub with the real `__init__.py` and adds the rest of the `headroom/` tree). `_core.so` survives because headroom-ai doesn't claim ownership of it. Drop `--force-reinstall` from the wheel step since nothing is installing the wheel before it. This is the exact failure A0 was designed to catch — a deployment that ships without `_core` working. CI is now serving as a regression gate for the production install path. The remaining 3 PR check failures (validate × 3 / Dev Containers) are environmental: the runner's PyPI mirror (`pypi.netflix.net`) times out fetching `cuda-bindings==12.9.4` / `nvidia-cuda-cupti-cu12==12.8.90` / `safetensors==0.7.0`. These come from `headroom-ai[dev]` → `sentence-transformers` → `torch` → CUDA deps. Not caused by the realignment branch; the post-create script needs a `--extra dev-light` profile or the mirror needs the packages cached. Tracking separately.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Five optimizations on the
publish-dockerworkflow. Validated via twoworkflow_dispatchruns on this branch.25075229870)25075468315)Changes
ubuntu-24.04-arm) — no QEMU emulationin-memory across all 8 targets in one buildx run
type=registrycache (<image>-buildcache:<arch>) — persistsacross releases, no per-run 10GB cap
Per-platform builds push by digest.
mergejob creates multi-archmanifest lists via
docker buildx imagetools create— pure plumbing,no rebuild.
Architecture
Security
All
${{ ... }}interpolations inrun:blocks routed viaenv:toprevent shell injection through release tags / workflow_dispatch
inputs / action outputs.
Test plan
actionlint .github/workflows/docker.ymlcleanworkflow_dispatchwith version=0.0.0-test3 (cold) — green, 4m19sworkflow_dispatchwith version=0.0.0-test4 (warm) — green, 3m5scosign signsucceeds for root + slim:latestpromotion succeeds with annotation timestampmainafter merge — verify against baselineCold-path detail (test #3)
Warm-path detail (test #4)