feat(listen): HTTP header injection for passwordless gateway auth#5
Conversation
Two bugs in the initial implementation broke byte-for-byte parity with
the Ruby Ztlp::HeaderVerifier and Elixir ZtlpGateway.HeaderSigner:
1. Non-ZTLP headers (Host, Content-Length, etc.) were being included
in the HMAC canonical string. The verifiers only consider X-ZTLP-*
headers, so every signature would fail verification on Rails.
2. The canonical string had a trailing '\n'. Ruby's .join("\n")
does not add a trailing newline — it only separates. This silent
one-byte drift would also fail HMAC verification.
Adds two BDD-style tests that lock the contract down:
- test_canonical_string_matches_ruby_contract: verifies the exact
three-header canonical format against a hand-computed HMAC vector,
with non-ZTLP headers present in the input that MUST be excluded.
- test_partial_request_is_rejected: documents the safe-failure
contract — partial HTTP requests (no \r\n\r\n terminator) are
rejected rather than HMAC-signed, preventing signed-prefix attacks.
End-to-end passwordless auth for the per-tenant Bootstrap admin UI when
accessed through 'ztlp connect bootstrap.<zone>.ztlp -L 18080:127.0.0.1:3000'.
The Rust gateway (ztlp listen) now intercepts the first decrypted HTTP
request on each forwarded TCP connection, strips any inbound X-ZTLP-*
header spoofing attempts, and injects authoritative trust headers
(X-ZTLP-Authenticated / Admin-Email / Timestamp / Signature) signed
with HMAC-SHA256. Rails Ztlp::HeaderVerifier validates the signature
with the same shared secret and treats the request as pre-authenticated
for the listed admin pubkey.
proto/src/tunnel.rs:
* New HttpInjectionConfig + HttpInjectionState. lookup_email() resolves
the peer's Noise static pubkey hex to a known admin email; only known
admins get header rewrites — unknown peers pass through unmodified so
Rails can fall back to its normal password login flow.
* iso8601_utc_now() formats SystemTime as a stable 'YYYY-MM-DDTHH:MM:SSZ'
timestamp (no chrono dep — manual SystemTime math).
* New entry point run_bridge_demuxed_with_http_injection(). The injector
runs on the FIRST ordered FRAME_DATA payload only; once 'done' the
bridge falls back to verbatim forwarding. If http_injector returns Err
(e.g. partial request, signed-prefix attack vector), the connection is
dropped rather than forwarded unmodified.
proto/src/bin/ztlp-cli.rs:
* Three new clap flags on 'ztlp listen':
--http-inject-headers
--header-hmac-secret <SECRET>
--admin-pubkey-email <HEX=EMAIL> (repeatable)
* Pairs validate (exactly one '='), hex is lowercased to match
HandshakeContext::remote_static_hex() output, secret must be non-empty
when the feature flag is on.
* Plumbed through cmd_listen -> cmd_listen_multi_session ->
handle_new_session -> run_session_bridge as Arc<HttpInjectionConfig>.
* Both run_bridge_demuxed() call sites inside run_session_bridge (initial
and reconnect-after-RESET) now call the new _with_http_injection variant
with the peer's pubkey hex extracted from the Noise handshake context.
ztlp.net/launch_app/app.py:
* Gateway compose service now reads secrets.env + instance.env via
env_file so it sees the same ZTLP_GATEWAY_HEADER_SECRET Rails reads.
* The ztlp listen command line passes --http-inject-headers and
--header-hmac-secret "$ZTLP_GATEWAY_HEADER_SECRET" unconditionally.
* --admin-pubkey-email is rendered conditionally via shell '$([ -n ... ]
&& echo ...)' — only attached when ZTLP_ADMIN_PUBKEY_HEX is non-empty.
Initial provision writes it as empty in instance.env so passwordless
auth stays OFF until an operator binds an enrolled device's pubkey.
This means today's behaviour (password form) is preserved by default
and the rollout is a one-line edit per tenant.
Security notes:
* HMAC key never enters the source tree — orchestration-generated only.
* Canonical string format is byte-compatible with the Ruby/Elixir
HeaderVerifier reference implementations (lowercased name, sorted,
joined by '\\n', no trailing newline — locked by
test_canonical_string_matches_ruby_contract).
* Partial HTTP requests are rejected with a connection drop, never
HMAC-signed — see test_partial_request_is_rejected.
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR adds passwordless admin authentication by injecting signed HTTP identity headers ( ChangesHTTP header injection for passwordless admin authentication
🎯 4 (Complex) | ⏱️ ~45 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
proto/src/bin/ztlp-cli.rs (1)
2952-2960:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
--http-inject-headersis ignored in single-session listen mode.The new config is only passed into
cmd_listen_multi_session. If an operator runsztlp listen --forward ... --http-inject-headers --max-sessions 1, this branch is skipped and the single-session path below still uses the plain bridge helpers, so the auth headers are never injected. Please either wire the config through the single-session bridge as well or reject that combination up front instead of silently disabling the feature.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@proto/src/bin/ztlp-cli.rs` around lines 2952 - 2960, The single-session listen path currently skips the http injection by only passing http_injection into cmd_listen_multi_session; update the single-session branch so the same http injection config is applied: either forward the http_injection value into the single-session bridge helper call (modify the called function signature for the plain bridge helper used in the single-session path and pass http_injection.clone()) or, if simpler, detect the unsupported combination early and return an error when --http-inject-headers is set together with --max-sessions 1; locate the single-session call site near cmd_listen_multi_session and adjust the bridge helper invocation or add a guard that checks http_injection and max_sessions to enforce the chosen behavior.
🧹 Nitpick comments (1)
proto/src/tunnel.rs (1)
1024-1061: 💤 Low valueHTTP injection only works on single-packet HTTP requests.
The injection hook processes only the first
FRAME_DATApayload. If the first HTTP request spans multiple packets (e.g., large headers or a POST body arriving before headers complete),inject_headerswill receive a partial request, fail with "Partial HTTP request received", and drop the connection.This is acceptable given:
- The PR documents this as expected behavior (first-packet injection only)
- Typical admin login GET requests fit in one ~1200-byte packet
- Failing closed (dropping connection) is the secure default
Consider documenting this constraint in the
run_bridge_demuxed_with_http_injectiondocstring for future maintainers.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@proto/src/tunnel.rs` around lines 1024 - 1061, The HTTP injection currently only examines the first FRAME_DATA payload and will fail/close the connection if the initial packet contains a partial HTTP request; add documentation to the run_bridge_demuxed_with_http_injection function’s docstring explaining this limitation so future maintainers know injection is first-packet-only, that inject_headers may receive partial requests (producing "Partial HTTP request received"), and that large headers or multi-packet requests will not be rewritten and will cause the connection to be dropped; mention the relevant symbols FRAME_DATA, http_injection, and inject_headers and note that this behavior is intentional and documented rather than changed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@proto/src/bin/ztlp-cli.rs`:
- Around line 2819-2838: The admin_pubkey_email parsing loop accepts any
non-empty LHS; instead validate that the left-hand side is valid hex and a
32-byte X25519 public key before inserting into the map. In the loop that
handles admin_pubkey_email (the code that currently extracts `hex` and `email`
and calls `map.insert(hex.to_lowercase(), email.to_string())`), attempt to
hex-decode `hex` (e.g. via hex::decode/from_hex) and check the decoded length
equals 32 bytes; if decoding fails or length != 32, return an Err with a clear
message like "invalid --admin-pubkey-email '<entry>' (expected 64-hex chars for
X25519 pubkey)". Only on successful validation insert the lowercased hex string
and email into the map so mappings will match `remote_static_hex()` later.
---
Outside diff comments:
In `@proto/src/bin/ztlp-cli.rs`:
- Around line 2952-2960: The single-session listen path currently skips the http
injection by only passing http_injection into cmd_listen_multi_session; update
the single-session branch so the same http injection config is applied: either
forward the http_injection value into the single-session bridge helper call
(modify the called function signature for the plain bridge helper used in the
single-session path and pass http_injection.clone()) or, if simpler, detect the
unsupported combination early and return an error when --http-inject-headers is
set together with --max-sessions 1; locate the single-session call site near
cmd_listen_multi_session and adjust the bridge helper invocation or add a guard
that checks http_injection and max_sessions to enforce the chosen behavior.
---
Nitpick comments:
In `@proto/src/tunnel.rs`:
- Around line 1024-1061: The HTTP injection currently only examines the first
FRAME_DATA payload and will fail/close the connection if the initial packet
contains a partial HTTP request; add documentation to the
run_bridge_demuxed_with_http_injection function’s docstring explaining this
limitation so future maintainers know injection is first-packet-only, that
inject_headers may receive partial requests (producing "Partial HTTP request
received"), and that large headers or multi-packet requests will not be
rewritten and will cause the connection to be dropped; mention the relevant
symbols FRAME_DATA, http_injection, and inject_headers and note that this
behavior is intentional and documented rather than changed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 76b03e29-f77e-4f45-85d6-37b74187d066
📒 Files selected for processing (6)
proto/Cargo.tomlproto/src/bin/ztlp-cli.rsproto/src/http_injector.rsproto/src/lib.rsproto/src/tunnel.rsztlp.net/launch_app/app.py
The /start handler was stripping <>{}[]; from user values at input
time. That silently mutated legitimate inputs (e.g. "Alice <Admin>"
became "Alice Admin") AND masked the test that verifies the output
boundary html.escape() is wired up, because by the time esc() ran
there were no angle brackets left to escape.
XSS defense lives at the output boundary — every render site already
funnels values through esc() (= html.escape(quote=True)). Keep only
the whitespace-collapsing clean() at input.
Fixes pre-existing test_start_escapes_user_values failure that was
red on main.
Previously the parsing loop accepted any non-empty LHS and inserted it into the admin map. A typo, wrong-length, or non-hex value would silently never match remote_static_hex() at runtime, looking like 'the gateway just won't inject headers' with no error to point at. Now we hex::decode the LHS and assert decoded.len() == 32 up front, returning a clear error pointing at the offending entry. Also trim trailing whitespace off the email side (the hex side already trimmed). Addresses CodeRabbit review on PR #5.
…SION
update_config_default_values had a hard-coded SemVer::new(0, 26, 0)
expectation against a value that's parsed from CARGO_PKG_VERSION at
build time. The 0.26.0 -> 0.26.2 version bump silently rotted this
test — it's only caught by the Performance Gate job (which runs full
cargo test --release including integration tests), not by the normal
Rust (proto) CI job (which only runs --lib).
Derive the expected value from env!("CARGO_PKG_VERSION") so the test
stays in sync with proto/Cargo.toml automatically, and add a comment
explaining why so the next version bump doesn't re-break it.
…auth (#6) Followup #2 from PR #5: lets a tenant admin bind their Noise static pubkey to enable passwordless ZTLP gateway auth without SSHing into the host to hand-edit instance.env. POST /api/admin-pubkey body (x-www-form-urlencoded): token=<claim_token>&pubkey_hex=<64 lowercase hex> returns 200 application/json: {status:ok, slug, applied:true} returns 400 / 401 / 404 / 500 with {error,detail?} Auth model: holder of the original claim_token. The token was shown once to the admin at /start time and is stored only as an HMAC digest in the DB, so reusing it as the auth credential here does not widen the trust boundary — anyone with the raw token already has full control of this tenant. Validation mirrors the Rust gateway's --admin-pubkey-email check: exactly 64 lowercase hex chars decoding to a 32-byte X25519 public key. Uppercase hex is normalized; off-by-one lengths, non-hex chars, and missing/invalid tokens all fail fast with clear errors. Side effects on success: 1. Rewrites the ZTLP_ADMIN_PUBKEY_HEX= line in instance.env in place (preserves all other operator-set keys). 2. Runs 'docker compose up -d --force-recreate gateway' in the instance dir. --force-recreate is required — a plain 'up -d' is a no-op when the image+config hash is unchanged and changing an env_file value alone does not bust that hash for already- running containers. Followup #1: also document inline why the chmod 600 on secrets.env is compatible with the gateway's env_file mount. docker-compose v2 reads env_file in the CLI process (same user that wrote it), not in the container, so the gateway never needs read perms on the file — only the Launch process user invoking 'docker compose up' does. Refactors: extract _slug_for_row and _instance_dir_for_slug helpers so the new handler and _provision_zone_dockers compute paths the exact same way. The test fixture's _fake_run now records calls into self._subprocess_calls so we can assert on the docker recreate. Tests: 7 new cases (45 total, was 38) covering missing/invalid tokens, bad-hex shapes, uppercase normalization, env+recreate side effects, rebind overwriting prior value, and 404 when the admin hits the endpoint before clicking the claim link.
Detailed description: - Updated the handoff document to capture the state of the passwordless authentication project. - Documented that PR #5 and PR #6 were completed and merged. - Logged the outstanding block: the 'Nebula dumb-pipe' architecture is routing around the L7 HTTP Injection layer (), breaking auto-login. - Established the immediate next steps: do not guess code fixes; instead, turn on aggressive trace logging / tcpdump on the AWS test server to measure exactly how TCP packets behave across the tunnel. - Documented the uncommitted hot-patch variable escape remaining on the AWS server.
… with regression pins (#14) What - Bumps `gateway/mix.exs` and `ns/mix.exs` from 0.24.0 → 0.29.4. - Bumps `proto/Cargo.toml` from 0.29.3 → 0.29.4. - Adds the same "version reporting (regression pin)" describe block to `gateway/test/ztlp_gateway/release_test.exs` and `ns/test/ztlp_ns/release_test.exs` that PR #13 added for relay (semver shape + runtime-vs-declared drift + ≥0.29.4 floor guard). - Adds `proto/tests/version_pin_test.rs` with two equivalent Rust-side pins (parseable semver + ≥0.29.4 floor guard) using the in-crate `ztlp_proto::updater::SemVer` to avoid taking a new external dependency. Why - PR #13 fixed the relay's version-string drift but explicitly left the rest of the tree as follow-up scope (handoff Known Problems #6: "gateway/mix.exs and ns/mix.exs at 0.24.0; proto/Cargo.toml at 0.29.3"). Gateway and NS had been pinned at 0.24.0 for five minor versions, so `Application.spec(:ztlp_gateway, :vsn)` and `Application.spec(:ztlp_ns, :vsn)` were lying about which tag was actually deployed — the exact bug class PR #13 fixed for relay. Same lie, three more components. - Without regression pins on these other components, the same drift can silently recur on the next tag cut. Details - Floor guard uses `Version.compare(declared, "0.29.4") in [:gt, :eq]` (Elixir) / `actual.cmp(&floor)` (Rust) rather than asserting a literal version string. This is deliberate: literal-string assertions become test maintenance burden on every routine bump, whereas a floor guard only fails on accidental down-bumps below the v0.29.4 strict-routing tag. - Runtime-vs-declared drift test catches both directions of the bug PR #13 found: (a) mix.exs bumped but .app cache stale, and (b) tag cut without bumping mix.exs. - Rust-side test re-uses the in-crate `updater::SemVer` parser rather than pulling in the external `semver` crate. This exercises the same code path the self-updater uses, so if that parser ever regresses both will catch it. - mix.exs comment blocks explain the bump inline so future readers see the on-call story at the point of change, not just in the test file or handoff. - No source-code logic changed; this is a version-string + test-only PR. Public API, wire format, and runtime behavior are untouched. Tests (TDD discipline followed) - RED step verified per-component before bump: * gateway: `mix test test/ztlp_gateway/release_test.exs --seed 0` → "mix.exs version 0.24.0 is older than the v0.29.4 strict-routing tag" (1/15 failures, all in the new floor-guard test, exactly as designed). * ns: `mix test test/ztlp_ns/release_test.exs --seed 0` → "mix.exs version 0.24.0 is older than the v0.29.4 strict-routing tag" (1/15 failures, identical message). * proto: `cargo test --test version_pin_test` → "proto/Cargo.toml version 0.29.3 is older than the v0.29.4 strict-routing tag" (1/2 failures, parseable-semver test still green so we know the fixture is sane). - GREEN step after bumps: * gateway: 15/15 in release_test.exs; 835/835 in full `mix test` (seed 1). Note: seed 55290 surfaces two pre-existing test-ordering flakes (TLS port reuse in tls_phase2_test.exs and a GenServer teardown race in crl_server_test.exs) — both also reproduce on plain `main` and are unrelated to this PR. Confirmed by stash + checkout `main`-only files + rerun on the same seed. * ns: 15/15 in release_test.exs; 729/729 in full `mix test`. * proto: 2/2 in version_pin_test; 858/858 in `cargo test --lib --release` (matches the pre-branch baseline). Validation (non-test) - `cargo check --release` clean (31 pre-existing dead-code warnings in `proto/src/bin/ztlp-cli.rs` unchanged — Known Problems #4 scope). - Relay suite re-run unchanged: 597/597 (confirms no cross-component collateral damage). - ztlp.net Python suite unchanged: 48/48 in `tests.test_launch_app`. - Infra untouched. No relay restart, no gateway restart, no NS restart. Live binaries still on v0.29.3 (relay) / v0.24.0 (gateway, ns) — actual deployment of the bumped versions is a separate, Steve-gated step. Follow-up - After merge, decide tag strategy with v0.29.4 already in the past: either (a) cut v0.29.5 that includes this + PR #13's mix.exs bump (cleanest), or (b) accept "tag is source of truth; mix.exs is best-effort" for the trailing v0.29.4 and reset on v0.30.0. - Task #3 (per-zone HMAC `Config.registration_secret/0`) is still the prod-readiness blocker and is the hard dependency for the Bootstrap workstream (handoff §"HARD DEPENDENCY uncovered while locking in #5"). - Known Problems #4 (`cargo fix` pass on 31 dead-code warnings in `proto/src/bin/ztlp-cli.rs`) still open; left out of scope here to keep this PR tight. Refs - PR #12 (v0.29.4 strict-routing): 829abdf - PR #13 (relay mix.exs vsn pin): d22afbf - Handoff: ~/hermes_session_handoff.md "Known Problems #6", "Open Question #1"
…i.rs (#17) What: - Closes Task #5 from hermes_session_handoff.md: "cargo fix pass for 31 dead-code warnings in proto/src/bin/ztlp-cli.rs". Before this change `cargo build --release --bin ztlp` emitted 31 warnings on the bin; after, it emits 0. - 20 warnings are mechanical fixes auto-applied by `cargo fix --bin "ztlp" -p ztlp-proto --release` (one unused `use TcpStream`, three unused `mut` qualifiers, sixteen `_param` prefix renames for intentionally-unused function parameters). One additional cargo-fix landed in `proto/src/tunnel.rs:1512` for a `let mut cc` that's only read. - The remaining 11 warnings are an island of placeholder code for a multi-session listener path: `cmd_listen_multi_session` (literal `Ok(())` body), plus `complete_handshake_for_reject`, `handle_new_session`, `wait_for_reset_on_socket`, `run_session_bridge`, `wait_for_reset_buffered`, `ns_pubkey_lookup`, `UdpNsResolver` + `new`, `spawn_relay_registration`, and `HANDSHAKE_TIMEOUT` (which is only referenced by the stubs). - These call each other (handle_new_session → ns_pubkey_lookup + UdpNsResolver + run_session_bridge → wait_for_reset_on_socket; the stub mirrors the architecture commented at line 2456) so they form a connected disabled feature, not orphaned dead code. Why: - `#[allow(dead_code)]` over deletion: the comment block above `cmd_listen_multi_session` says it's the intended next-gen listener ("1. Multiplexes sessions… 2. ConnectionTracker… 4. Enforces max_sessions with REJECT(CAPACITY_FULL)") and the body is a one-line stub returning `Ok(())`. Deleting all of it now would discard ~700 LOC of scaffolding for a planned feature; annotating preserves it cheaply and silences the noise. Each annotation carries a comment pointing at `cmd_listen_multi_session` so the next person knows the island is intentional and where to start when the feature is picked up. - `_param` over deletion for function-signature unused params: parameters are part of the public/internal API surface and may be wired up by future implementation work. Prefix-with-underscore is the idiomatic Rust signal for "intentionally unused". Details: - Files changed: - `proto/src/bin/ztlp-cli.rs` — 20 cargo-fix mechanical edits + 11 `#[allow(dead_code)]` annotations with explanatory doc comments. - `proto/src/tunnel.rs:1512` — single `let mut cc → let cc` (the binding is never reassigned, only `.lock().await` is called once immutably). Tests: - `cd ~/ztlp/proto && cargo build --release --bin ztlp` → 0 warnings on the `ztlp` bin (previously 31). - `cd ~/ztlp/proto && cargo test --lib --release` → 858 passed, 0 failed, 12 ignored (unchanged baseline). - `cd ~/ztlp/proto && cargo test --test version_pin_test --release` → 2 passed (unchanged baseline). Validation: - No public API changed. Function bodies are byte-for-byte identical (cargo-fix only renames bindings in places where the binding wasn't used / didn't need `mut`). - The lib crate still has 7 warnings unchanged (`HANDSHAKE_TIMEOUT` in `proto/src/agent/proxy.rs:66` and three `diag_*` counters in `proto/src/ffi.rs:1200-1862`). Those are outside the Task #5 scope (bin only); tracking as follow-up. Follow-up: - Task #5 from `hermes_session_handoff.md` is now closed. Future PR can decide whether to (a) wire up the multi-session listener stubs and drop the `#[allow(dead_code)]` annotations or (b) outright delete the island if it's no longer the planned direction. - Lib-crate 7 warnings (diag_* + proxy HANDSHAKE_TIMEOUT) deferred — separate small PR if Steve wants them cleaned up. Refs: - `hermes_session_handoff.md` Task #5: "`cargo fix` pass for 31 dead-code warnings in `proto/src/bin/ztlp-cli.rs`" - v0.29.5 baseline (a2a2a03 on main) — no library code semantics changed; runtime behavior unchanged.
What: - After PR #17 cleaned up the 31 warnings on the `ztlp` bin, the lib crate still emitted 7 warnings under `cargo build --release --lib`: * 1 × `const HANDSHAKE_TIMEOUT` orphaned in `proto/src/agent/proxy.rs` * 3 × `let mut diag_*` "variable assigned to, but never used" * 3 × `diag_* += 1` / `diag_* = 0` "value assigned is never read" All 6 diag warnings come from the same root cause: three iOS diagnostic counters are only consumed inside the `diag_log!` and `trace_info!` macros, which expand to no-ops when the `diag` feature is off (the default for non-iOS builds). The increments still happen at runtime, but rustc correctly sees that the values are never read. - This PR annotates the affected sites with narrowly-scoped `#[allow]` attributes (with explanatory comments) instead of restructuring the macro system or `#[cfg]`-gating each counter individually. Zero behavioral change. Why this approach (`#[allow]` vs alternatives): 1. **`#[cfg(feature = "diag")]` per counter** — would need to gate each declaration AND each increment site (6+ edits), and would diverge the iOS NE diag build from the non-diag build in a way that's easy to break later. 2. **Rewrite the no-op macros to `let _ = format_args!(...)`** — would silence the warnings AND force type-checking of macro args even when diag is off. Tried this first; it surfaced two pre-existing typos (`pps`, `local_addr` referenced inside the diag block but not in scope) that would prevent the iOS diag build from compiling. Those are real latent bugs but fixing them widens the PR scope — tracking separately as a follow-up. 3. **`#[allow]`** (chosen) — surgical, no runtime cost, no risk of masking new bugs because the lints are still active everywhere else in the file/function. Details: - `proto/src/agent/proxy.rs:66` — `HANDSHAKE_TIMEOUT` is orphaned scaffolding. Annotated `#[allow(dead_code)]` with a doc comment pointing at the matching `bin/ztlp-cli.rs::HANDSHAKE_TIMEOUT` so a future agent-side handshake wrapper has a consistent value to pull from. - `proto/src/ffi.rs:1196..1217` — added a multi-line NOTE explaining the diag-counter / no-op-macro interaction, then annotated each of the three problem counters with `#[allow(unused_assignments, unused_variables)]` on their `let`. - `proto/src/ffi.rs:1136` — added `#[allow(unused_assignments)]` on the `recv_loop` fn (the only place that hits the counters' `+= 1` / reset-to-zero patterns rustc flags as unused-assignment). A function- level `#[allow]` is necessary here because per-statement attributes aren't stable in Rust. Tests: - `cargo build --release --lib` → 0 warnings (was 7). - `cargo build --release --bin ztlp` → still 0 warnings (PR #17 baseline preserved). - `cargo test --lib --release` → 858 passed, 0 failed, 12 ignored (unchanged). - `cargo test --test version_pin_test --release` → 2 passed (unchanged). Validation: - No code semantics change. The three iOS diag counters still increment and still get logged when the `diag` feature is enabled. The `HANDSHAKE_TIMEOUT` constant is byte-for-byte the same value. Follow-up (tracked): - The macro-rewrite path surfaced two latent compile errors in the diag branch (`pps` and `local_addr` references in `ffi.rs` recv_loop that aren't in scope). These would prevent a `cargo build --features diag` from succeeding on the lib. They are NOT triggered by this PR's changes — only exposed if someone tries to enable the diag feature. Filing as a follow-up cleanup PR. Adding a CI matrix entry for `--features diag` would catch this class of bug going forward. Refs: - `hermes_session_handoff.md` Task #5 follow-up — "Lib-crate 7 warnings (diag_* + proxy HANDSHAKE_TIMEOUT) deferred — separate small PR". - v0.29.5 baseline. Runtime behavior unchanged.
…ents (BS-PR-6) (#28) What ----- Customer-facing end-to-end documentation and ready-to-copy reference clients for any external system (initial target: Z2LS) that wants to mint enrollment tokens via the ZTLP-secured Bootstrap API. Three artifacts: 1. bootstrap/docs/z2ls_enrollment_runbook.md (574 lines) The single-source guide for integrators: pre-reqs, provisioning an api_clients row, the per-zone HMAC signing contract, the full POST /api/v1/enrollment_tokens flow, redemption paths (CLI / macOS / iOS), an end-to-end smoke procedure, and a comprehensive troubleshooting section keyed on the actual 401 reason codes the authenticator logs. 2. bootstrap/script/z2ls_request_token.py (153 lines) Self-contained Python reference client. Stdlib-only (urllib + hmac + hashlib). Drop into any Z2LS codebase and adapt. 3. bootstrap/script/z2ls_request_token.rb (128 lines) Self-contained Ruby reference client. Stdlib-only (net/http + openssl). Mirrors the Python reference behavior byte-for-byte. Why --- Steve's 2026-05-23 brief item #5/#6: * "Build an API endpoint where Z2LS can request an enrollment token for a new system." (shipped in BS-PR-3 / #26) * "[Workflow] Z2LS sends the computer name to ZTLP Bootstrap [...] ZTLP Bootstrap validates that Z2LS is allowed to communicate with the API using ZTLP-secured communication." BS-PR-3 shipped the server side. This PR ships the documentation + reference signers so customers (and our own Z2LS instances) can hit that endpoint correctly on the first try. The most common integration failure for HMAC-signed APIs is the canonical-message format: this runbook spells out the 6-line layout and the two reference clients demonstrate it concretely. Details ------- * Canonical 6-line signed message matches `Ztlp::ApiAuthenticator#canonical_message`: METHOD\nFULLPATH\nZONE\nCLIENT\nTIMESTAMP\nSHA256_HEX(body) * Per-zone secret env-var slug rule (`zone.upcase.gsub(/[^A-Z0-9]+/, "_")`) matches `ApiAuthenticator.slugify_zone/1` exactly — verified by replicating the rule in both reference clients. * 64-char pure-hex secret values are decoded to 32 raw bytes BEFORE signing (the Bootstrap authenticator does the same; mismatched decoding is the second most common failure mode and has a dedicated troubleshooting bullet). * Runbook documents the 503 case (api_clients row exists, no Network row for the zone) and forward-refs BS-PR-4 for the auto-creation fix. * Helper scripts use only stdlib so they can be dropped into any Z2LS host without dependency installation. Tests ----- No new automated tests — this is documentation + reference scripts. However, both helpers were validated to produce signatures byte-identical to the server-side `Ztlp::ApiAuthenticator`: python sig: c019132d66f52d85173ca9d8811c093b754999afcc8646dd00016e1954ee5201 ruby sig: c019132d66f52d85173ca9d8811c093b754999afcc8646dd00016e1954ee5201 reference: c019132d66f52d85173ca9d8811c093b754999afcc8646dd00016e1954ee5201 (64-char hex key, POST /api/v1/enrollment_tokens, zone=acme.ztlp, client=z2ls.acme, ts=1700000000, body='{"computer_name":"alice-laptop"}') Pre-existing test suite untouched: * bootstrap full suite still 1020/1017 (same 3 pre-existing SshProvisionerTest port-mismatch failures predating this work). * `api/v1/enrollment_tokens_controller_test.rb` — 13/13 pass. Validation ---------- * `python3 -c 'compile(...)'` — script parses. * `ruby -c script/z2ls_request_token.rb` — Syntax OK. * Cross-language signature equivalence verified (see Tests). * Runbook proofread for accuracy against the actual auth + token code paths (slugify rule, hex-decode rule, canonical-message order, fullpath, RFC1035 validator, audit log action names). Follow-up --------- * BS-PR-4 (next): auto-create the per-tenant Network row + matching api_clients row during ztlp.net onboarding so customers don't hit the 503 path the runbook currently documents as a workaround. * If Steve wants a `bin/z2ls-request-token` packaged binary later we can roll the Python helper into a small CLI; for now keep it as a copy-paste reference.
Summary
Wires end-to-end passwordless authentication for the ZTLP SaaS UI (Bootstrap admin portal) when accessed through a per-tenant gateway via
ztlp connect. The gateway (ztlp listen) becomes selectively HTTP-aware: it validates the peer's Noise public key against an admin map, strips inboundX-ZTLP-*spoofing attempts, and injects authoritative HMAC-signed trust headers that Rails verifies viaZtlp::HeaderVerifier.Changes
proto/src/http_injector.rs(new) — HTTP parser + HMAC-SHA256 header injector. Canonical string locked to Ruby contract: lowercased, sorted,\n-joined, no trailing newline (BDD test enforces byte-for-byte parity).proto/src/bin/ztlp-cli.rs— new flags onztlp listen:--http-inject-headers,--header-hmac-secret,--admin-pubkey-email <HEX=EMAIL>(repeatable).proto/src/tunnel.rs— newHttpInjectionConfig(Arc'd) andrun_bridge_demuxed_with_http_injection(). Injection runs on the FIRST ordered FRAME_DATA payload only; subsequent bytes pass through verbatim. Connection drops on parse error (prevents signed-prefix attacks). Unknown peers pass through unmodified → Rails falls back to its normal password form.ztlp.net/launch_app/app.py— compose generation now templatesZTLP_GATEWAY_HEADER_SECRET(already insecrets.env) and conditionally renders--admin-pubkey-emailonly when the operator setsZTLP_ADMIN_PUBKEY_HEXininstance.env. Today's password-form behaviour preserved by default.Header Contract
X-ZTLP-Authenticated: 1X-ZTLP-Admin-Email: <email>X-ZTLP-Timestamp: <ISO8601 UTC>X-ZTLP-Signature: <hex hmac-sha256>HMAC secret variable
ZTLP_GATEWAY_HEADER_SECRETis shared between gateway and Rails, already generated by the orchestrator at claim time (32-byte hex).Test Plan
cargo test --release --lib→ 901 passed, 0 failedcargo build --release --bin ztlp→ success (8.0MB binary)ztlp listen --helpshows new flagsztlp.net/launch_app/app.pyrenders valid compose YAML in both modes (admin-pubkey set/unset)test_canonical_string_matches_ruby_contractandtest_partial_request_is_rejectedFollowups
secrets.env(chmod 600) on a live tenantPOST /tenants/<slug>/admin_pubkeyendpoint on Launch app to flip on passwordless auth via UI instead of editinginstance.envSummary by CodeRabbit
Release Notes
New Features
ztlp listencommand-line options to enable header injection, configure authentication secrets, and map peer identities to admin email addresses.Chores