Skip to content

feat(listen): HTTP header injection for passwordless gateway auth#5

Merged
priceflex merged 6 commits into
mainfrom
feature/ztlp-listen-http-header-injection
May 20, 2026
Merged

feat(listen): HTTP header injection for passwordless gateway auth#5
priceflex merged 6 commits into
mainfrom
feature/ztlp-listen-http-header-injection

Conversation

@priceflex
Copy link
Copy Markdown
Owner

@priceflex priceflex commented May 20, 2026

Summary

Wires end-to-end passwordless authentication for the ZTLP SaaS UI (Bootstrap admin portal) when accessed through a per-tenant gateway via ztlp connect. The gateway (ztlp listen) becomes selectively HTTP-aware: it validates the peer's Noise public key against an admin map, strips inbound X-ZTLP-* spoofing attempts, and injects authoritative HMAC-signed trust headers that Rails verifies via Ztlp::HeaderVerifier.

Changes

  • proto/src/http_injector.rs (new) — HTTP parser + HMAC-SHA256 header injector. Canonical string locked to Ruby contract: lowercased, sorted, \n-joined, no trailing newline (BDD test enforces byte-for-byte parity).
  • proto/src/bin/ztlp-cli.rs — new flags on ztlp listen: --http-inject-headers, --header-hmac-secret, --admin-pubkey-email <HEX=EMAIL> (repeatable).
  • proto/src/tunnel.rs — new HttpInjectionConfig (Arc'd) and run_bridge_demuxed_with_http_injection(). Injection runs on the FIRST ordered FRAME_DATA payload only; subsequent bytes pass through verbatim. Connection drops on parse error (prevents signed-prefix attacks). Unknown peers pass through unmodified → Rails falls back to its normal password form.
  • ztlp.net/launch_app/app.py — compose generation now templates ZTLP_GATEWAY_HEADER_SECRET (already in secrets.env) and conditionally renders --admin-pubkey-email only when the operator sets ZTLP_ADMIN_PUBKEY_HEX in instance.env. Today's password-form behaviour preserved by default.

Header Contract

  • X-ZTLP-Authenticated: 1
  • X-ZTLP-Admin-Email: <email>
  • X-ZTLP-Timestamp: <ISO8601 UTC>
  • X-ZTLP-Signature: <hex hmac-sha256>

HMAC secret variable ZTLP_GATEWAY_HEADER_SECRET is shared between gateway and Rails, already generated by the orchestrator at claim time (32-byte hex).

Test Plan

  • cargo test --release --lib901 passed, 0 failed
  • cargo build --release --bin ztlp → success (8.0MB binary)
  • ztlp listen --help shows new flags
  • ztlp.net/launch_app/app.py renders valid compose YAML in both modes (admin-pubkey set/unset)
  • 3/3 http_injector unit tests including test_canonical_string_matches_ruby_contract and test_partial_request_is_rejected

Followups

  • Verify the gateway container can read secrets.env (chmod 600) on a live tenant
  • Future: POST /tenants/<slug>/admin_pubkey endpoint on Launch app to flip on passwordless auth via UI instead of editing instance.env

Summary by CodeRabbit

Release Notes

  • New Features

    • Added HTTP header injection support for admin authentication on bridged TCP connections.
    • New ztlp listen command-line options to enable header injection, configure authentication secrets, and map peer identities to admin email addresses.
  • Chores

    • Added HTTP parsing and HMAC cryptography dependencies.

Review Change Stack

priceflex added 3 commits May 20, 2026 06:19
Two bugs in the initial implementation broke byte-for-byte parity with
the Ruby Ztlp::HeaderVerifier and Elixir ZtlpGateway.HeaderSigner:

1. Non-ZTLP headers (Host, Content-Length, etc.) were being included
   in the HMAC canonical string. The verifiers only consider X-ZTLP-*
   headers, so every signature would fail verification on Rails.

2. The canonical string had a trailing '\n'. Ruby's .join("\n")
   does not add a trailing newline — it only separates. This silent
   one-byte drift would also fail HMAC verification.

Adds two BDD-style tests that lock the contract down:
- test_canonical_string_matches_ruby_contract: verifies the exact
  three-header canonical format against a hand-computed HMAC vector,
  with non-ZTLP headers present in the input that MUST be excluded.
- test_partial_request_is_rejected: documents the safe-failure
  contract — partial HTTP requests (no \r\n\r\n terminator) are
  rejected rather than HMAC-signed, preventing signed-prefix attacks.
End-to-end passwordless auth for the per-tenant Bootstrap admin UI when
accessed through 'ztlp connect bootstrap.<zone>.ztlp -L 18080:127.0.0.1:3000'.

The Rust gateway (ztlp listen) now intercepts the first decrypted HTTP
request on each forwarded TCP connection, strips any inbound X-ZTLP-*
header spoofing attempts, and injects authoritative trust headers
(X-ZTLP-Authenticated / Admin-Email / Timestamp / Signature) signed
with HMAC-SHA256. Rails Ztlp::HeaderVerifier validates the signature
with the same shared secret and treats the request as pre-authenticated
for the listed admin pubkey.

proto/src/tunnel.rs:
  * New HttpInjectionConfig + HttpInjectionState. lookup_email() resolves
    the peer's Noise static pubkey hex to a known admin email; only known
    admins get header rewrites — unknown peers pass through unmodified so
    Rails can fall back to its normal password login flow.
  * iso8601_utc_now() formats SystemTime as a stable 'YYYY-MM-DDTHH:MM:SSZ'
    timestamp (no chrono dep — manual SystemTime math).
  * New entry point run_bridge_demuxed_with_http_injection(). The injector
    runs on the FIRST ordered FRAME_DATA payload only; once 'done' the
    bridge falls back to verbatim forwarding. If http_injector returns Err
    (e.g. partial request, signed-prefix attack vector), the connection is
    dropped rather than forwarded unmodified.

proto/src/bin/ztlp-cli.rs:
  * Three new clap flags on 'ztlp listen':
      --http-inject-headers
      --header-hmac-secret <SECRET>
      --admin-pubkey-email <HEX=EMAIL>  (repeatable)
  * Pairs validate (exactly one '='), hex is lowercased to match
    HandshakeContext::remote_static_hex() output, secret must be non-empty
    when the feature flag is on.
  * Plumbed through cmd_listen -> cmd_listen_multi_session ->
    handle_new_session -> run_session_bridge as Arc<HttpInjectionConfig>.
  * Both run_bridge_demuxed() call sites inside run_session_bridge (initial
    and reconnect-after-RESET) now call the new _with_http_injection variant
    with the peer's pubkey hex extracted from the Noise handshake context.

ztlp.net/launch_app/app.py:
  * Gateway compose service now reads secrets.env + instance.env via
    env_file so it sees the same ZTLP_GATEWAY_HEADER_SECRET Rails reads.
  * The ztlp listen command line passes --http-inject-headers and
    --header-hmac-secret "$ZTLP_GATEWAY_HEADER_SECRET" unconditionally.
  * --admin-pubkey-email is rendered conditionally via shell '$([ -n ... ]
    && echo ...)' — only attached when ZTLP_ADMIN_PUBKEY_HEX is non-empty.
    Initial provision writes it as empty in instance.env so passwordless
    auth stays OFF until an operator binds an enrolled device's pubkey.
    This means today's behaviour (password form) is preserved by default
    and the rollout is a one-line edit per tenant.

Security notes:
  * HMAC key never enters the source tree — orchestration-generated only.
  * Canonical string format is byte-compatible with the Ruby/Elixir
    HeaderVerifier reference implementations (lowercased name, sorted,
    joined by '\\n', no trailing newline — locked by
    test_canonical_string_matches_ruby_contract).
  * Partial HTTP requests are rejected with a connection drop, never
    HMAC-signed — see test_partial_request_is_rejected.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Warning

Rate limit exceeded

@priceflex has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 31 minutes and 12 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ca044a83-11af-4b4b-9fba-16895d1534dc

📥 Commits

Reviewing files that changed from the base of the PR and between ab18ba1 and 8352661.

📒 Files selected for processing (3)
  • proto/src/bin/ztlp-cli.rs
  • proto/tests/updater_test.rs
  • ztlp.net/launch_app/app.py
📝 Walkthrough

Walkthrough

This PR adds passwordless admin authentication by injecting signed HTTP identity headers (X-ZTLP-*) into forwarded TCP connections. New httparse and hmac dependencies support the injector logic. The tunnel bridge applies one-time injection to the first decrypted request using Noise peer pubkey lookups and HMAC-SHA256 signatures. New ztlp listen CLI arguments enable the feature and configure the shared secret and peer-to-email mappings. Deployment wires these arguments into gateway containers via Docker environment variables.

Changes

HTTP header injection for passwordless admin authentication

Layer / File(s) Summary
HTTP header injector: contract, implementation, and tests
proto/Cargo.toml, proto/src/http_injector.rs, proto/src/lib.rs
Adds httparse and hmac dependencies. Implements inject_headers function to parse HTTP requests, strip forged x-ztlp-* headers, append identity headers (X-ZTLP-Authenticated, X-ZTLP-Admin-Email, X-ZTLP-Timestamp), compute HMAC-SHA256 signatures over a Ruby-compatible canonical string, and return reconstructed request bytes. Tests validate header injection, forged-header stripping, signature correctness, Ruby contract compatibility, and partial request rejection.
Tunnel bridge injection plumbing and request rewriting
proto/src/tunnel.rs
Introduces HttpInjectionConfig struct for peer pubkey→email mapping and HMAC secret. Adds per-bridge HttpInjectionState and RFC-3339 timestamp formatter. Implements run_bridge_demuxed_with_http_injection to look up peer email and wire injection state into the bridge. Extends run_bridge_inner signature to carry optional injection state and adds injection_failed flag. Implements the injection hook in handle_incoming_packet to rewrite the first decrypted HTTP request and terminate the connection on injection failure. Updates all existing bridge call sites and tests to pass the new parameter.
CLI argument parsing and multi-session pipeline wiring
proto/src/bin/ztlp-cli.rs
Adds --http-inject-headers, --header-hmac-secret, and repeatable --admin-pubkey-email HEX=EMAIL arguments to ztlp listen. Extends cmd_listen to construct Arc<tunnel::HttpInjectionConfig> with validation. Threads the config through cmd_listen_multi_session and handle_new_session to per-session run_session_bridge. Updates bridge calls to use tunnel::run_bridge_demuxed_with_http_injection on initial and reconnect paths. Updates main wiring to parse and forward the arguments. Includes relay config line formatting fix.
Deployment environment and gateway service configuration
ztlp.net/launch_app/app.py
Updates _provision_zone_dockers to write ZTLP_ADMIN_PUBKEY_HEX= to generated instance.env. Updates gateway service in docker-compose.yml to load secrets.env and instance.env, enables --http-inject-headers, passes HMAC secret via --header-hmac-secret, and conditionally appends --admin-pubkey-email when pubkey is non-empty.

🎯 4 (Complex) | ⏱️ ~45 minutes

🐰 Headers signed with secrets so bold,
To route the admins, as stories are told,
No passwords needed when keys align,
One bridge, one injection, one auth design! 🔐

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(listen): HTTP header injection for passwordless gateway auth' directly and clearly summarizes the main change: adding HTTP header injection functionality to the listen command for passwordless authentication.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/ztlp-listen-http-header-injection

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
proto/src/bin/ztlp-cli.rs (1)

2952-2960: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

--http-inject-headers is ignored in single-session listen mode.

The new config is only passed into cmd_listen_multi_session. If an operator runs ztlp listen --forward ... --http-inject-headers --max-sessions 1, this branch is skipped and the single-session path below still uses the plain bridge helpers, so the auth headers are never injected. Please either wire the config through the single-session bridge as well or reject that combination up front instead of silently disabling the feature.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@proto/src/bin/ztlp-cli.rs` around lines 2952 - 2960, The single-session
listen path currently skips the http injection by only passing http_injection
into cmd_listen_multi_session; update the single-session branch so the same http
injection config is applied: either forward the http_injection value into the
single-session bridge helper call (modify the called function signature for the
plain bridge helper used in the single-session path and pass
http_injection.clone()) or, if simpler, detect the unsupported combination early
and return an error when --http-inject-headers is set together with
--max-sessions 1; locate the single-session call site near
cmd_listen_multi_session and adjust the bridge helper invocation or add a guard
that checks http_injection and max_sessions to enforce the chosen behavior.
🧹 Nitpick comments (1)
proto/src/tunnel.rs (1)

1024-1061: 💤 Low value

HTTP injection only works on single-packet HTTP requests.

The injection hook processes only the first FRAME_DATA payload. If the first HTTP request spans multiple packets (e.g., large headers or a POST body arriving before headers complete), inject_headers will receive a partial request, fail with "Partial HTTP request received", and drop the connection.

This is acceptable given:

  1. The PR documents this as expected behavior (first-packet injection only)
  2. Typical admin login GET requests fit in one ~1200-byte packet
  3. Failing closed (dropping connection) is the secure default

Consider documenting this constraint in the run_bridge_demuxed_with_http_injection docstring for future maintainers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@proto/src/tunnel.rs` around lines 1024 - 1061, The HTTP injection currently
only examines the first FRAME_DATA payload and will fail/close the connection if
the initial packet contains a partial HTTP request; add documentation to the
run_bridge_demuxed_with_http_injection function’s docstring explaining this
limitation so future maintainers know injection is first-packet-only, that
inject_headers may receive partial requests (producing "Partial HTTP request
received"), and that large headers or multi-packet requests will not be
rewritten and will cause the connection to be dropped; mention the relevant
symbols FRAME_DATA, http_injection, and inject_headers and note that this
behavior is intentional and documented rather than changed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@proto/src/bin/ztlp-cli.rs`:
- Around line 2819-2838: The admin_pubkey_email parsing loop accepts any
non-empty LHS; instead validate that the left-hand side is valid hex and a
32-byte X25519 public key before inserting into the map. In the loop that
handles admin_pubkey_email (the code that currently extracts `hex` and `email`
and calls `map.insert(hex.to_lowercase(), email.to_string())`), attempt to
hex-decode `hex` (e.g. via hex::decode/from_hex) and check the decoded length
equals 32 bytes; if decoding fails or length != 32, return an Err with a clear
message like "invalid --admin-pubkey-email '<entry>' (expected 64-hex chars for
X25519 pubkey)". Only on successful validation insert the lowercased hex string
and email into the map so mappings will match `remote_static_hex()` later.

---

Outside diff comments:
In `@proto/src/bin/ztlp-cli.rs`:
- Around line 2952-2960: The single-session listen path currently skips the http
injection by only passing http_injection into cmd_listen_multi_session; update
the single-session branch so the same http injection config is applied: either
forward the http_injection value into the single-session bridge helper call
(modify the called function signature for the plain bridge helper used in the
single-session path and pass http_injection.clone()) or, if simpler, detect the
unsupported combination early and return an error when --http-inject-headers is
set together with --max-sessions 1; locate the single-session call site near
cmd_listen_multi_session and adjust the bridge helper invocation or add a guard
that checks http_injection and max_sessions to enforce the chosen behavior.

---

Nitpick comments:
In `@proto/src/tunnel.rs`:
- Around line 1024-1061: The HTTP injection currently only examines the first
FRAME_DATA payload and will fail/close the connection if the initial packet
contains a partial HTTP request; add documentation to the
run_bridge_demuxed_with_http_injection function’s docstring explaining this
limitation so future maintainers know injection is first-packet-only, that
inject_headers may receive partial requests (producing "Partial HTTP request
received"), and that large headers or multi-packet requests will not be
rewritten and will cause the connection to be dropped; mention the relevant
symbols FRAME_DATA, http_injection, and inject_headers and note that this
behavior is intentional and documented rather than changed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 76b03e29-f77e-4f45-85d6-37b74187d066

📥 Commits

Reviewing files that changed from the base of the PR and between a789b66 and ab18ba1.

📒 Files selected for processing (6)
  • proto/Cargo.toml
  • proto/src/bin/ztlp-cli.rs
  • proto/src/http_injector.rs
  • proto/src/lib.rs
  • proto/src/tunnel.rs
  • ztlp.net/launch_app/app.py

Comment thread proto/src/bin/ztlp-cli.rs Outdated
priceflex added 3 commits May 20, 2026 07:02
The /start handler was stripping <>{}[]; from user values at input
time. That silently mutated legitimate inputs (e.g. "Alice <Admin>"
became "Alice Admin") AND masked the test that verifies the output
boundary html.escape() is wired up, because by the time esc() ran
there were no angle brackets left to escape.

XSS defense lives at the output boundary — every render site already
funnels values through esc() (= html.escape(quote=True)). Keep only
the whitespace-collapsing clean() at input.

Fixes pre-existing test_start_escapes_user_values failure that was
red on main.
Previously the parsing loop accepted any non-empty LHS and inserted it
into the admin map. A typo, wrong-length, or non-hex value would
silently never match remote_static_hex() at runtime, looking like
'the gateway just won't inject headers' with no error to point at.

Now we hex::decode the LHS and assert decoded.len() == 32 up front,
returning a clear error pointing at the offending entry. Also trim
trailing whitespace off the email side (the hex side already trimmed).

Addresses CodeRabbit review on PR #5.
…SION

update_config_default_values had a hard-coded SemVer::new(0, 26, 0)
expectation against a value that's parsed from CARGO_PKG_VERSION at
build time. The 0.26.0 -> 0.26.2 version bump silently rotted this
test — it's only caught by the Performance Gate job (which runs full
cargo test --release including integration tests), not by the normal
Rust (proto) CI job (which only runs --lib).

Derive the expected value from env!("CARGO_PKG_VERSION") so the test
stays in sync with proto/Cargo.toml automatically, and add a comment
explaining why so the next version bump doesn't re-break it.
@priceflex priceflex merged commit 35c5d48 into main May 20, 2026
14 of 16 checks passed
@priceflex priceflex deleted the feature/ztlp-listen-http-header-injection branch May 20, 2026 07:45
priceflex added a commit that referenced this pull request May 20, 2026
…auth (#6)

Followup #2 from PR #5: lets a tenant admin bind their Noise static
pubkey to enable passwordless ZTLP gateway auth without SSHing into
the host to hand-edit instance.env.

  POST /api/admin-pubkey
    body (x-www-form-urlencoded):
      token=<claim_token>&pubkey_hex=<64 lowercase hex>
    returns 200 application/json: {status:ok, slug, applied:true}
    returns 400 / 401 / 404 / 500 with {error,detail?}

Auth model: holder of the original claim_token. The token was shown
once to the admin at /start time and is stored only as an HMAC digest
in the DB, so reusing it as the auth credential here does not widen
the trust boundary — anyone with the raw token already has full
control of this tenant.

Validation mirrors the Rust gateway's --admin-pubkey-email check:
exactly 64 lowercase hex chars decoding to a 32-byte X25519 public
key. Uppercase hex is normalized; off-by-one lengths, non-hex chars,
and missing/invalid tokens all fail fast with clear errors.

Side effects on success:
  1. Rewrites the ZTLP_ADMIN_PUBKEY_HEX= line in instance.env in
     place (preserves all other operator-set keys).
  2. Runs 'docker compose up -d --force-recreate gateway' in the
     instance dir. --force-recreate is required — a plain 'up -d'
     is a no-op when the image+config hash is unchanged and changing
     an env_file value alone does not bust that hash for already-
     running containers.

Followup #1: also document inline why the chmod 600 on secrets.env
is compatible with the gateway's env_file mount. docker-compose v2
reads env_file in the CLI process (same user that wrote it), not in
the container, so the gateway never needs read perms on the file —
only the Launch process user invoking 'docker compose up' does.

Refactors: extract _slug_for_row and _instance_dir_for_slug helpers
so the new handler and _provision_zone_dockers compute paths the
exact same way. The test fixture's _fake_run now records calls into
self._subprocess_calls so we can assert on the docker recreate.

Tests: 7 new cases (45 total, was 38) covering missing/invalid
tokens, bad-hex shapes, uppercase normalization, env+recreate
side effects, rebind overwriting prior value, and 404 when the
admin hits the endpoint before clicking the claim link.
priceflex added a commit that referenced this pull request May 20, 2026
Ships PRs #5 + #6:
- HTTP header injection on ztlp listen for passwordless gateway auth
- POST /api/admin-pubkey endpoint on Launch app to bind admin pubkeys
priceflex added a commit that referenced this pull request May 20, 2026
Detailed description:
- Updated the handoff document to capture the state of the passwordless authentication project.
- Documented that PR #5 and PR #6 were completed and merged.
- Logged the outstanding block: the 'Nebula dumb-pipe' architecture is routing around the L7 HTTP Injection layer (), breaking auto-login.
- Established the immediate next steps: do not guess code fixes; instead, turn on aggressive trace logging / tcpdump on the AWS test server to measure exactly how TCP packets behave across the tunnel.
- Documented the uncommitted hot-patch  variable escape remaining on the AWS server.
priceflex added a commit that referenced this pull request May 23, 2026
… with regression pins (#14)

What
- Bumps `gateway/mix.exs` and `ns/mix.exs` from 0.24.0 → 0.29.4.
- Bumps `proto/Cargo.toml` from 0.29.3 → 0.29.4.
- Adds the same "version reporting (regression pin)" describe block to
  `gateway/test/ztlp_gateway/release_test.exs` and
  `ns/test/ztlp_ns/release_test.exs` that PR #13 added for relay
  (semver shape + runtime-vs-declared drift + ≥0.29.4 floor guard).
- Adds `proto/tests/version_pin_test.rs` with two equivalent Rust-side pins
  (parseable semver + ≥0.29.4 floor guard) using the in-crate
  `ztlp_proto::updater::SemVer` to avoid taking a new external dependency.

Why
- PR #13 fixed the relay's version-string drift but explicitly left the rest
  of the tree as follow-up scope (handoff Known Problems #6: "gateway/mix.exs
  and ns/mix.exs at 0.24.0; proto/Cargo.toml at 0.29.3"). Gateway and NS had
  been pinned at 0.24.0 for five minor versions, so
  `Application.spec(:ztlp_gateway, :vsn)` and `Application.spec(:ztlp_ns, :vsn)`
  were lying about which tag was actually deployed — the exact bug class
  PR #13 fixed for relay. Same lie, three more components.
- Without regression pins on these other components, the same drift can
  silently recur on the next tag cut.

Details
- Floor guard uses `Version.compare(declared, "0.29.4") in [:gt, :eq]` (Elixir)
  / `actual.cmp(&floor)` (Rust) rather than asserting a literal version string.
  This is deliberate: literal-string assertions become test maintenance burden
  on every routine bump, whereas a floor guard only fails on accidental
  down-bumps below the v0.29.4 strict-routing tag.
- Runtime-vs-declared drift test catches both directions of the bug PR #13
  found: (a) mix.exs bumped but .app cache stale, and (b) tag cut without
  bumping mix.exs.
- Rust-side test re-uses the in-crate `updater::SemVer` parser rather than
  pulling in the external `semver` crate. This exercises the same code path
  the self-updater uses, so if that parser ever regresses both will catch it.
- mix.exs comment blocks explain the bump inline so future readers see the
  on-call story at the point of change, not just in the test file or handoff.
- No source-code logic changed; this is a version-string + test-only PR.
  Public API, wire format, and runtime behavior are untouched.

Tests (TDD discipline followed)
- RED step verified per-component before bump:
  * gateway: `mix test test/ztlp_gateway/release_test.exs --seed 0` →
    "mix.exs version 0.24.0 is older than the v0.29.4 strict-routing tag"
    (1/15 failures, all in the new floor-guard test, exactly as designed).
  * ns: `mix test test/ztlp_ns/release_test.exs --seed 0` →
    "mix.exs version 0.24.0 is older than the v0.29.4 strict-routing tag"
    (1/15 failures, identical message).
  * proto: `cargo test --test version_pin_test` →
    "proto/Cargo.toml version 0.29.3 is older than the v0.29.4 strict-routing
    tag" (1/2 failures, parseable-semver test still green so we know the
    fixture is sane).
- GREEN step after bumps:
  * gateway: 15/15 in release_test.exs; 835/835 in full `mix test` (seed 1).
    Note: seed 55290 surfaces two pre-existing test-ordering flakes
    (TLS port reuse in tls_phase2_test.exs and a GenServer teardown race in
    crl_server_test.exs) — both also reproduce on plain `main` and are
    unrelated to this PR. Confirmed by stash + checkout `main`-only files +
    rerun on the same seed.
  * ns: 15/15 in release_test.exs; 729/729 in full `mix test`.
  * proto: 2/2 in version_pin_test; 858/858 in `cargo test --lib --release`
    (matches the pre-branch baseline).

Validation (non-test)
- `cargo check --release` clean (31 pre-existing dead-code warnings in
  `proto/src/bin/ztlp-cli.rs` unchanged — Known Problems #4 scope).
- Relay suite re-run unchanged: 597/597 (confirms no cross-component
  collateral damage).
- ztlp.net Python suite unchanged: 48/48 in `tests.test_launch_app`.
- Infra untouched. No relay restart, no gateway restart, no NS restart.
  Live binaries still on v0.29.3 (relay) / v0.24.0 (gateway, ns) — actual
  deployment of the bumped versions is a separate, Steve-gated step.

Follow-up
- After merge, decide tag strategy with v0.29.4 already in the past:
  either (a) cut v0.29.5 that includes this + PR #13's mix.exs bump
  (cleanest), or (b) accept "tag is source of truth; mix.exs is
  best-effort" for the trailing v0.29.4 and reset on v0.30.0.
- Task #3 (per-zone HMAC `Config.registration_secret/0`) is still the
  prod-readiness blocker and is the hard dependency for the Bootstrap
  workstream (handoff §"HARD DEPENDENCY uncovered while locking in #5").
- Known Problems #4 (`cargo fix` pass on 31 dead-code warnings in
  `proto/src/bin/ztlp-cli.rs`) still open; left out of scope here to keep
  this PR tight.

Refs
- PR #12 (v0.29.4 strict-routing): 829abdf
- PR #13 (relay mix.exs vsn pin): d22afbf
- Handoff: ~/hermes_session_handoff.md "Known Problems #6", "Open Question #1"
priceflex added a commit that referenced this pull request May 23, 2026
…i.rs (#17)

What:
- Closes Task #5 from hermes_session_handoff.md: "cargo fix pass for 31
  dead-code warnings in proto/src/bin/ztlp-cli.rs". Before this change
  `cargo build --release --bin ztlp` emitted 31 warnings on the bin;
  after, it emits 0.
- 20 warnings are mechanical fixes auto-applied by `cargo fix --bin "ztlp"
  -p ztlp-proto --release` (one unused `use TcpStream`, three unused `mut`
  qualifiers, sixteen `_param` prefix renames for intentionally-unused
  function parameters). One additional cargo-fix landed in
  `proto/src/tunnel.rs:1512` for a `let mut cc` that's only read.
- The remaining 11 warnings are an island of placeholder code for a
  multi-session listener path: `cmd_listen_multi_session` (literal `Ok(())`
  body), plus `complete_handshake_for_reject`, `handle_new_session`,
  `wait_for_reset_on_socket`, `run_session_bridge`, `wait_for_reset_buffered`,
  `ns_pubkey_lookup`, `UdpNsResolver` + `new`, `spawn_relay_registration`,
  and `HANDSHAKE_TIMEOUT` (which is only referenced by the stubs).
- These call each other (handle_new_session → ns_pubkey_lookup +
  UdpNsResolver + run_session_bridge → wait_for_reset_on_socket; the
  stub mirrors the architecture commented at line 2456) so they form a
  connected disabled feature, not orphaned dead code.

Why:
- `#[allow(dead_code)]` over deletion: the comment block above
  `cmd_listen_multi_session` says it's the intended next-gen listener
  ("1. Multiplexes sessions… 2. ConnectionTracker… 4. Enforces
  max_sessions with REJECT(CAPACITY_FULL)") and the body is a one-line
  stub returning `Ok(())`. Deleting all of it now would discard ~700 LOC
  of scaffolding for a planned feature; annotating preserves it cheaply
  and silences the noise. Each annotation carries a comment pointing at
  `cmd_listen_multi_session` so the next person knows the island is
  intentional and where to start when the feature is picked up.
- `_param` over deletion for function-signature unused params: parameters
  are part of the public/internal API surface and may be wired up by
  future implementation work. Prefix-with-underscore is the idiomatic
  Rust signal for "intentionally unused".

Details:
- Files changed:
  - `proto/src/bin/ztlp-cli.rs` — 20 cargo-fix mechanical edits + 11
    `#[allow(dead_code)]` annotations with explanatory doc comments.
  - `proto/src/tunnel.rs:1512` — single `let mut cc → let cc` (the
    binding is never reassigned, only `.lock().await` is called once
    immutably).

Tests:
- `cd ~/ztlp/proto && cargo build --release --bin ztlp` → 0 warnings on
  the `ztlp` bin (previously 31).
- `cd ~/ztlp/proto && cargo test --lib --release` → 858 passed, 0 failed,
  12 ignored (unchanged baseline).
- `cd ~/ztlp/proto && cargo test --test version_pin_test --release` →
  2 passed (unchanged baseline).

Validation:
- No public API changed. Function bodies are byte-for-byte identical
  (cargo-fix only renames bindings in places where the binding wasn't
  used / didn't need `mut`).
- The lib crate still has 7 warnings unchanged (`HANDSHAKE_TIMEOUT` in
  `proto/src/agent/proxy.rs:66` and three `diag_*` counters in
  `proto/src/ffi.rs:1200-1862`). Those are outside the Task #5 scope
  (bin only); tracking as follow-up.

Follow-up:
- Task #5 from `hermes_session_handoff.md` is now closed. Future PR can
  decide whether to (a) wire up the multi-session listener stubs and
  drop the `#[allow(dead_code)]` annotations or (b) outright delete the
  island if it's no longer the planned direction.
- Lib-crate 7 warnings (diag_* + proxy HANDSHAKE_TIMEOUT) deferred —
  separate small PR if Steve wants them cleaned up.

Refs:
- `hermes_session_handoff.md` Task #5: "`cargo fix` pass for 31
  dead-code warnings in `proto/src/bin/ztlp-cli.rs`"
- v0.29.5 baseline (a2a2a03 on main) — no library code semantics
  changed; runtime behavior unchanged.
priceflex added a commit that referenced this pull request May 23, 2026
What:
- After PR #17 cleaned up the 31 warnings on the `ztlp` bin, the lib
  crate still emitted 7 warnings under `cargo build --release --lib`:
    * 1 × `const HANDSHAKE_TIMEOUT` orphaned in `proto/src/agent/proxy.rs`
    * 3 × `let mut diag_*` "variable assigned to, but never used"
    * 3 × `diag_* += 1` / `diag_* = 0` "value assigned is never read"
  All 6 diag warnings come from the same root cause: three iOS diagnostic
  counters are only consumed inside the `diag_log!` and `trace_info!`
  macros, which expand to no-ops when the `diag` feature is off (the
  default for non-iOS builds). The increments still happen at runtime,
  but rustc correctly sees that the values are never read.
- This PR annotates the affected sites with narrowly-scoped `#[allow]`
  attributes (with explanatory comments) instead of restructuring the
  macro system or `#[cfg]`-gating each counter individually. Zero
  behavioral change.

Why this approach (`#[allow]` vs alternatives):
1. **`#[cfg(feature = "diag")]` per counter** — would need to gate each
   declaration AND each increment site (6+ edits), and would diverge the
   iOS NE diag build from the non-diag build in a way that's easy to
   break later.
2. **Rewrite the no-op macros to `let _ = format_args!(...)`** — would
   silence the warnings AND force type-checking of macro args even when
   diag is off. Tried this first; it surfaced two pre-existing typos
   (`pps`, `local_addr` referenced inside the diag block but not in
   scope) that would prevent the iOS diag build from compiling. Those
   are real latent bugs but fixing them widens the PR scope — tracking
   separately as a follow-up.
3. **`#[allow]`** (chosen) — surgical, no runtime cost, no risk of
   masking new bugs because the lints are still active everywhere else
   in the file/function.

Details:
- `proto/src/agent/proxy.rs:66` — `HANDSHAKE_TIMEOUT` is orphaned
  scaffolding. Annotated `#[allow(dead_code)]` with a doc comment
  pointing at the matching `bin/ztlp-cli.rs::HANDSHAKE_TIMEOUT` so a
  future agent-side handshake wrapper has a consistent value to pull
  from.
- `proto/src/ffi.rs:1196..1217` — added a multi-line NOTE explaining the
  diag-counter / no-op-macro interaction, then annotated each of the
  three problem counters with
  `#[allow(unused_assignments, unused_variables)]` on their `let`.
- `proto/src/ffi.rs:1136` — added `#[allow(unused_assignments)]` on the
  `recv_loop` fn (the only place that hits the counters' `+= 1` /
  reset-to-zero patterns rustc flags as unused-assignment). A function-
  level `#[allow]` is necessary here because per-statement attributes
  aren't stable in Rust.

Tests:
- `cargo build --release --lib` → 0 warnings (was 7).
- `cargo build --release --bin ztlp` → still 0 warnings (PR #17 baseline
  preserved).
- `cargo test --lib --release` → 858 passed, 0 failed, 12 ignored
  (unchanged).
- `cargo test --test version_pin_test --release` → 2 passed (unchanged).

Validation:
- No code semantics change. The three iOS diag counters still increment
  and still get logged when the `diag` feature is enabled. The
  `HANDSHAKE_TIMEOUT` constant is byte-for-byte the same value.

Follow-up (tracked):
- The macro-rewrite path surfaced two latent compile errors in the diag
  branch (`pps` and `local_addr` references in `ffi.rs` recv_loop that
  aren't in scope). These would prevent a `cargo build --features diag`
  from succeeding on the lib. They are NOT triggered by this PR's
  changes — only exposed if someone tries to enable the diag feature.
  Filing as a follow-up cleanup PR. Adding a CI matrix entry for
  `--features diag` would catch this class of bug going forward.

Refs:
- `hermes_session_handoff.md` Task #5 follow-up — "Lib-crate 7 warnings
  (diag_* + proxy HANDSHAKE_TIMEOUT) deferred — separate small PR".
- v0.29.5 baseline. Runtime behavior unchanged.
priceflex added a commit that referenced this pull request May 23, 2026
…ents (BS-PR-6) (#28)

What
-----
Customer-facing end-to-end documentation and ready-to-copy reference
clients for any external system (initial target: Z2LS) that wants to
mint enrollment tokens via the ZTLP-secured Bootstrap API.

Three artifacts:

  1. bootstrap/docs/z2ls_enrollment_runbook.md (574 lines)
     The single-source guide for integrators: pre-reqs, provisioning
     an api_clients row, the per-zone HMAC signing contract, the full
     POST /api/v1/enrollment_tokens flow, redemption paths
     (CLI / macOS / iOS), an end-to-end smoke procedure, and a
     comprehensive troubleshooting section keyed on the actual 401
     reason codes the authenticator logs.

  2. bootstrap/script/z2ls_request_token.py (153 lines)
     Self-contained Python reference client. Stdlib-only (urllib +
     hmac + hashlib). Drop into any Z2LS codebase and adapt.

  3. bootstrap/script/z2ls_request_token.rb (128 lines)
     Self-contained Ruby reference client. Stdlib-only (net/http +
     openssl). Mirrors the Python reference behavior byte-for-byte.

Why
---
Steve's 2026-05-23 brief item #5/#6:
  * "Build an API endpoint where Z2LS can request an enrollment token
     for a new system."  (shipped in BS-PR-3 / #26)
  * "[Workflow] Z2LS sends the computer name to ZTLP Bootstrap [...]
     ZTLP Bootstrap validates that Z2LS is allowed to communicate with
     the API using ZTLP-secured communication."

BS-PR-3 shipped the server side. This PR ships the documentation +
reference signers so customers (and our own Z2LS instances) can hit
that endpoint correctly on the first try.  The most common integration
failure for HMAC-signed APIs is the canonical-message format: this
runbook spells out the 6-line layout and the two reference clients
demonstrate it concretely.

Details
-------
* Canonical 6-line signed message matches
  `Ztlp::ApiAuthenticator#canonical_message`:
      METHOD\nFULLPATH\nZONE\nCLIENT\nTIMESTAMP\nSHA256_HEX(body)
* Per-zone secret env-var slug rule
  (`zone.upcase.gsub(/[^A-Z0-9]+/, "_")`) matches
  `ApiAuthenticator.slugify_zone/1` exactly — verified by replicating
  the rule in both reference clients.
* 64-char pure-hex secret values are decoded to 32 raw bytes BEFORE
  signing (the Bootstrap authenticator does the same; mismatched
  decoding is the second most common failure mode and has a dedicated
  troubleshooting bullet).
* Runbook documents the 503 case (api_clients row exists, no Network
  row for the zone) and forward-refs BS-PR-4 for the auto-creation
  fix.
* Helper scripts use only stdlib so they can be dropped into any
  Z2LS host without dependency installation.

Tests
-----
No new automated tests — this is documentation + reference scripts.
However, both helpers were validated to produce signatures
byte-identical to the server-side `Ztlp::ApiAuthenticator`:

    python sig: c019132d66f52d85173ca9d8811c093b754999afcc8646dd00016e1954ee5201
    ruby sig:   c019132d66f52d85173ca9d8811c093b754999afcc8646dd00016e1954ee5201
    reference:  c019132d66f52d85173ca9d8811c093b754999afcc8646dd00016e1954ee5201

(64-char hex key, POST /api/v1/enrollment_tokens, zone=acme.ztlp,
client=z2ls.acme, ts=1700000000, body='{"computer_name":"alice-laptop"}')

Pre-existing test suite untouched:
* bootstrap full suite still 1020/1017 (same 3 pre-existing
  SshProvisionerTest port-mismatch failures predating this work).
* `api/v1/enrollment_tokens_controller_test.rb` — 13/13 pass.

Validation
----------
* `python3 -c 'compile(...)'` — script parses.
* `ruby -c script/z2ls_request_token.rb` — Syntax OK.
* Cross-language signature equivalence verified (see Tests).
* Runbook proofread for accuracy against the actual auth + token code
  paths (slugify rule, hex-decode rule, canonical-message order,
  fullpath, RFC1035 validator, audit log action names).

Follow-up
---------
* BS-PR-4 (next): auto-create the per-tenant Network row + matching
  api_clients row during ztlp.net onboarding so customers don't hit
  the 503 path the runbook currently documents as a workaround.
* If Steve wants a `bin/z2ls-request-token` packaged binary later we
  can roll the Python helper into a small CLI; for now keep it as a
  copy-paste reference.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant