Skip to content

ci(docker): cut release wall time from ~11m to ~3-4m#2

Merged
pratikbin merged 2 commits intomainfrom
ci/optimize-release-time
Apr 28, 2026
Merged

ci(docker): cut release wall time from ~11m to ~3-4m#2
pratikbin merged 2 commits intomainfrom
ci/optimize-release-time

Conversation

@pratikbin
Copy link
Copy Markdown
Owner

Summary

Five optimizations on the publish-docker workflow. Validated via two
workflow_dispatch runs on this branch.

Run Cache state Wall time Δ vs baseline
Baseline (last 10 releases avg) mixed gha-cache 11m
Test #3 (25075229870) cold registry cache 4m19s -61%
Test #4 (25075468315) warm registry cache 3m5s -72%

Changes

  1. Native arm64 runner (ubuntu-24.04-arm) — no QEMU emulation
  2. Single bake invocation per platform — builder stage shared
    in-memory across all 8 targets in one buildx run
  3. type=registry cache (<image>-buildcache:<arch>) — persists
    across releases, no per-run 10GB cap
  4. Builder pre-build subsumed by matrix collapse
  5. Cosign on root + slim only — other 6 share equivalent layers

Per-platform builds push by digest. merge job creates multi-arch
manifest lists via docker buildx imagetools create — pure plumbing,
no rebuild.

Architecture

setup → build (amd64 + arm64 parallel)
       → merge (8 variants, parallel manifest lists)
       → sign (root + slim)  +  promote-latest

Security

All ${{ ... }} interpolations in run: blocks routed via env: to
prevent shell injection through release tags / workflow_dispatch
inputs / action outputs.

Test plan

  • actionlint .github/workflows/docker.yml clean
  • workflow_dispatch with version=0.0.0-test3 (cold) — green, 4m19s
  • workflow_dispatch with version=0.0.0-test4 (warm) — green, 3m5s
  • All 8 multi-arch manifests created (amd64 + arm64)
  • cosign sign succeeds for root + slim
  • :latest promotion succeeds with annotation timestamp
  • Real release on main after merge — verify against baseline

Cold-path detail (test #3)

buildArm 175s  buildAmd 140s   ← parallel; arm64 dominates wall
mergeMax  31s  signMax  29s    ← parallel; merge fan-out
promote   15s  setup     5s

Warm-path detail (test #4)

buildArm 114s  buildAmd 113s   ← cache hit on builder + deps
mergeMax  24s  signMax  22s
promote   13s  setup     6s

Five optimizations applied to the publish-docker workflow:

1. Native arm64 runner (ubuntu-24.04-arm) instead of QEMU emulation.
2. Collapse 8-variant matrix into a single bake invocation per platform;
   builder stage shared in-memory across targets within one buildx run.
3. Cross-run caching via type=registry (ref=<image>-buildcache:<arch>),
   replacing type=gha (per-workflow-run scoped, 10GB capped).
4. Builder pre-build subsumed by matrix collapse; one bake call per
   platform builds builder once and reuses it for all 8 variants.
5. Cosign signing limited to root + slim variants (public-facing tags).
   Other 6 share builder/runtime layers, are cryptographically equivalent.

Architecture:
  setup -> build (amd64 + arm64 parallel) -> merge (8 variants parallel,
           manifest list creation only) -> sign (root + slim) +
           promote-latest

Per-platform builds push by digest. Merge job creates multi-arch manifest
lists via docker buildx imagetools create - no rebuild, just plumbing.

All dynamic interpolations in run: blocks routed through env: to prevent
shell injection via ${{ }} expansion of release tags / workflow_dispatch
inputs / action outputs.

Baseline avg: 11m wall (over 10 prior runs).
Expected: ~4-5m wall after warm cache.
@pratikbin pratikbin merged commit 04e2912 into main Apr 28, 2026
28 checks passed
pratikbin pushed a commit that referenced this pull request May 3, 2026
`headroom/transforms/tag_protector.py` was a regex-driven scan-and-
replace loop that ran on every kompress call from ContentRouter
(`content_router.py:1089`). The Python implementation had five real
bugs we now fix in the port — the most consequential being a
`str.replace(.., .., 1)` first-occurrence-replace bug that silently
collapsed two identical custom-tag blocks in the same input to a
single placeholder + a stray duplicate of the second block.

# Bug fixes (each pinned by a `fixed_in_3e4` test)

* **#1: O(n²) on nested custom tags.** Python's `while changed` loop
  restarted a full regex scan after every replacement. Rust walks
  once in linear time on input length.
* **#2: First-occurrence replace bug.** `result.replace(orig, ph, 1)`
  replaces the FIRST textual match, not the matched offset. Two
  identical custom-tag blocks collapsed to one placeholder + a stray
  duplicate of the second block. The Rust walker stitches output by
  offset so distinct blocks always get distinct placeholders.
* **#3: Silent 50-iteration cap.** Python had a hard `max_iterations
  = 50` safety limit that quietly truncated tag protection on deeply
  nested input. The Rust walker is bounded by input length only.
* **#4: Self-closing pass duplicate-replace risk.** Python ran a
  second loop with the same `replace_first` bug for self-closers.
  Rust handles self-closers in the same single pass.
* **#5: Placeholder collision.** If the input contained a literal
  `{{HEADROOM_TAG_…}}` substring, Python silently let the collision
  break restoration. Rust salts the prefix and reports it in stats.

# Architecture

Two-phase walker:
* Phase 1 (`identify_spans`): linear scan over input bytes, hand-
  rolled tag-open / tag-close lexer (no regex). Maintains a stack of
  open custom tags; on a matching close, collapses the inner span
  into a single `Span { start, end, Block }`. Self-closing custom
  tags become `Span { ..., SelfClosing }` immediately. Marker-only
  mode (`compress_tagged_content=true`) emits Open/CloseMarker spans
  instead. Orphan opens stay un-protected (matches Python behavior).
  Orphan closes are emitted verbatim and counted in stats.
* Phase 2 (`emit_output`): walks `text` once, splicing placeholders
  for span ranges and copying everything else verbatim. Offset-based,
  never `str.replace`.

PyO3 surface: `protect_tags`, `restore_tags`, `is_html_tag`,
`known_html_tag_names`. The Python shim retires the regex internals
and re-exports `KNOWN_HTML_TAGS` (rebuilt from the Rust list) +
`_is_html_tag` for backwards compat with `content_router.py` and the
existing test surface.

# Test plan

* 25 Rust unit tests including 4 `fixed_in_3e4_*` bug-fix tests
* 27 Python tests (23 existing + 4 new `fixed_in_3e4` parity tests)
* 5 integration tests in `test_tag_protection_integration.py` pass
* `make ci-precheck` clean
pratikbin pushed a commit that referenced this pull request May 3, 2026
… OpenAI

Closes the second half of P0-6: once memory injects memory_save / memory_search
into body["tools"] for a session, every subsequent turn injects the byte-equal
same definitions — even if memory is disabled mid-session. Toggling tool list
mid-session busts Anthropic prefix cache per guide §6.3 #2.

Adds in headroom/proxy/helpers.py:

  * SessionToolTracker — bounded LRU keyed by (provider, session_id) storing
    GOLDEN tool-definition bytes from the first injection. Tracker is
    provider-aware so the same session_id under Anthropic and OpenAI keeps
    independent state. Reentrant lock for concurrent access; LRU eviction at
    HEADROOM_TOOL_TRACKER_MAX_SESSIONS (default 1000).
  * apply_session_sticky_memory_tools — single coordination point with three
    paths: first-time inject (record golden bytes), sticky replay (always
    inject golden bytes regardless of inject_this_turn), and skip. Honors
    HEADROOM_TOOL_INJECTION_STICKY=disabled as a loud operator opt-in for
    rollback (NOT a fallback).
  * serialize_tool_definition_canonical — deterministic byte serialization
    via the same separators=(",",":")/ensure_ascii=False rules as
    serialize_body_canonical.
  * log_tool_injection_decision — structured per-decision log line; never
    logs the tool definition contents.

Wires the helper into all four memory tool injection sites:
  * handlers/anthropic.py — /v1/messages
  * handlers/openai.py — /v1/chat/completions
  * handlers/openai.py — /v1/responses
  * handlers/openai.py — Codex WS path

memory_handler.MemoryHandler gains compute_memory_tool_definitions(provider) —
a pure builder that returns the tool definitions without mutating a tools
list, so the proxy can route through the sticky tracker. The legacy
inject_tools(...) is preserved for callers without a session_id.

Tests: tests/test_memory_tool_session_sticky.py — 29 unit + integration
cases covering: turn-1→turn-2 byte-equality (Anthropic + OpenAI), sticky
replay after memory disabled, golden-fixture pin, LRU eviction, provider
isolation under shared session_id, thread-safe concurrent access, env-var
contract, disabled-mode passthrough, dedupe with client tools.

Golden fixtures pin canonical bytes:
  * tests/fixtures/memory_tool_definitions/anthropic.json
  * tests/fixtures/memory_tool_definitions/openai.json

No regex. No hardcodes (env-configurable: HEADROOM_TOOL_INJECTION_STICKY,
HEADROOM_TOOL_TRACKER_MAX_SESSIONS). No silent fallbacks. Per-decision
structured logging. Realignment build constraints satisfied.
pratikbin pushed a commit that referenced this pull request May 3, 2026
Production incident (Finding #2 of HEADROOM_PROXY_LOG_FINDINGS_2026_05_03.md):
on this customer's deployment the Rust extension `headroom._core` was
never installed into the runtime Docker image. Diff compression failed
54 times in a single day; "Optimization failed: ModuleNotFoundError" hit
379 times. The failure rate climbed every day and reached ~223/day on
2026-05-03 — effectively 100% of requests on the Rust path. Every Rust
PR we'd merged (MessageScorer, ICM, DiffCompressor, etc.) was providing
zero customer value because the module wasn't loadable at all.

Root cause: the Dockerfile builder stage installed Python deps and the
in-tree `headroom-ai` package but never ran `maturin build` for the
`headroom-py` crate, so the runtime image shipped without `_core.so`.
The Python proxy continued to start because the extension's absence is
caught and routed through Python-only fallbacks that either silently
no-op or raise per-request.

This change makes that mode impossible by default:

* `headroom.proxy.server._check_rust_core()` runs as the first step of
  the FastAPI lifespan. If the import fails it prints a structured
  diagnostic, logs `event=rust_core_missing`, and calls `sys.exit(78)`
  (sysexits.h `EX_CONFIG`). Process supervisors (systemd / k8s /
  docker) treat this as a deliberate config error and stop restart
  loops.
* `HEADROOM_REQUIRE_RUST_CORE=false` is the explicit opt-out for
  Python-only `pip install -e .` developer flows; lifespan logs
  `event=rust_core_disabled` and continues. Any other value (including
  unset) keeps the fail-loud default.
* `/health` now surfaces `rust_core: "loaded" | "disabled" | "missing"`
  (plus `rust_core_error` when non-loaded) so operators can alert on
  the degraded state rather than discovering it via a customer ticket.
* `scripts/build_rust_extension.sh` is the single dev-time path: build
  → install → import-verify with the same `hello()` marker the lifespan
  checks. Failures are loud at every step.
* `Makefile` exposes the script as `make verify-rust-core`.
* `Dockerfile` now installs `rustup` + `maturin`, builds the wheel from
  `crates/headroom-py`, force-installs it into site-packages, and runs
  the same `hello()` import-verify in the build image so a broken build
  fails the docker-build, not the next runtime restart.

Tests:
* `tests/test_rust_core_smoke.py` pins all four contracts:
  - `_core.hello()` returns `"headroom-core"`
  - missing extension + default env → `SystemExit(78)`
  - missing extension + opt-out env → lifespan starts, `/health`
    returns `rust_core: "disabled"` with the underlying error
  - present extension + default env → `("loaded", None)`

Per-finding-#2: ~/Desktop/HEADROOM_PROXY_LOG_FINDINGS_2026_05_03.md.
pratikbin pushed a commit that referenced this pull request May 3, 2026
PR chopratejas#350 CI: docker-native-e2e's wheel install succeeded but the
build-stage verify (`from headroom._core import hello`) failed with
`ModuleNotFoundError: No module named 'headroom._core'`. Same failure
mode the customer hit in production (Finding #2) — but in CI we have
the full layer trace.

Root cause: the headroom-core-py wheel claims ownership of both
`headroom/__init__.py` (stub from maturin's python-source layout)
AND `headroom/_core.cpython-*.so`. The previous Dockerfile installed
headroom-ai FIRST (which laid down the real `headroom/` tree), then
the wheel SECOND with `--force-reinstall`. pip's --force-reinstall
uninstalls the wheel's previously installed files before reinstalling
— but the wheel's stub `__init__.py` had already overwritten
headroom-ai's at first install. Net result: pip deleted
`headroom/__init__.py` and `headroom/_core.so` ownership records
got into a state where the .so wasn't present after the install.

Fix: swap the order. Install the wheel first (lays down stub
`__init__.py` + `_core.so`), then install headroom-ai (overwrites the
stub with the real `__init__.py` and adds the rest of the
`headroom/` tree). `_core.so` survives because headroom-ai doesn't
claim ownership of it. Drop `--force-reinstall` from the wheel step
since nothing is installing the wheel before it.

This is the exact failure A0 was designed to catch — a deployment
that ships without `_core` working. CI is now serving as a
regression gate for the production install path.

The remaining 3 PR check failures (validate × 3 / Dev Containers)
are environmental: the runner's PyPI mirror (`pypi.netflix.net`)
times out fetching `cuda-bindings==12.9.4` /
`nvidia-cuda-cupti-cu12==12.8.90` / `safetensors==0.7.0`. These come
from `headroom-ai[dev]` → `sentence-transformers` → `torch` → CUDA
deps. Not caused by the realignment branch; the post-create script
needs a `--extra dev-light` profile or the mirror needs the packages
cached. Tracking separately.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant