Skip to content

feat(bootstrap): gateway-auth enrollment-token API for Z2LS (Option C)#40

Merged
priceflex merged 1 commit into
mainfrom
feat/z2ls-gateway-auth-enrollment-api
May 24, 2026
Merged

feat(bootstrap): gateway-auth enrollment-token API for Z2LS (Option C)#40
priceflex merged 1 commit into
mainfrom
feat/z2ls-gateway-auth-enrollment-api

Conversation

@priceflex
Copy link
Copy Markdown
Owner

@priceflex priceflex commented May 24, 2026

Why this exists

The existing HMAC v1 API at POST /api/v1/enrollment_tokens is unusable in the current Launch-provisioned topology because the per-tenant ZTLP gateway is started with --http-inject-headers, which strips ALL inbound X-ZTLP-* headers as a defense against admin-auth spoofing. The HMAC API headers (X-ZTLP-Zone, X-ZTLP-Client, X-ZTLP-Timestamp, X-ZTLP-Signature) share that prefix, so they get nuked before reaching Rails.

Verified end-to-end against five freshly-provisioned tenants today (hermes-sandbox, hermes-lab, hermes-probe, hermes-trial, hermes-try5 — all *.ztlp) — every signed request 401s with audit reason missing_header, regardless of how the headers are sent (curl, Python urllib, all case variations).

Per Steve's direction this PR implements Option C: skip HMAC entirely and use the same gateway-auth path the Bootstrap UI already uses. Z2LS becomes "an admin-equivalent client over ZTLP" rather than a separately-credentialed system — the trust boundary is the ZTLP device identity, not a shared HMAC secret.

Why CSRF is safe to skip on this endpoint

In-file justification, summarized:

  • The endpoint is only reachable when require_gateway_auth! confirms trusted_gateway_admin succeeded.
  • trusted_gateway_admin verifies the request carries a valid gateway HMAC signature (Ztlp::HeaderVerifier.verify_request) over X-ZTLP-Authenticated, X-ZTLP-Admin-Email, X-ZTLP-Timestamp, X-ZTLP-Signature.
  • Those headers can only be produced by the ZTLP gateway, which only injects them when the connecting device is a ZTLP-enrolled admin device for this zone.
  • Therefore the ZTLP device-identity check is strictly stronger than CSRF would be — there is no browser-driven cross-origin surface to attack.
  • Cookie-session admins (UI logins) are explicitly not allowed on this endpoint — gateway-auth-only by design.

Files changed

File Change Lines
bootstrap/config/routes.rb +10 new namespace :admin under namespace :api with POST enrollment_tokens
bootstrap/app/controllers/api/admin/enrollment_tokens_controller.rb new 263 — JSON controller, gateway-auth-only, skip-CSRF, same response shape as HMAC v1
bootstrap/test/controllers/api/admin/enrollment_tokens_controller_test.rb new 194 — full coverage of auth, happy path, defaults, metadata, validation, audit log
bootstrap/script/z2ls_gateway_auth_token_request.rb new 116 — Z2LS-side reference Ruby client (no HMAC, no shared secret)
bootstrap/docs/z2ls_gateway_auth_runbook.md new 305 — customer-facing runbook for the gateway-auth path
bootstrap/docs/api_v1_ztlp_secured.md +14 "preferred path" banner pointing callers to the new runbook
bootstrap/docs/z2ls_enrollment_runbook.md +15 top-of-file note explaining the HMAC path is broken in Launch topology

Test results

  • Ruby syntax check (host): all 4 Ruby files parse OK
  • Controller load test (production image, Rails 7.1):
    Api::Admin::EnrollmentTokensController.action_methods => [:create]
    api_admin_enrollment_tokens_path => /api/admin/enrollment_tokens
    
  • Full Rails test suite NOT run on hostmocha/minitest gem is in group :test of the Gemfile but not bundled into the production image we have locally. CI is expected to run bin/rails test test/controllers/api/admin/enrollment_tokens_controller_test.rb with the full test-group bundle.

Manual validation followup

I'll retest end-to-end via the hermes-try5.ztlp sandbox tunnel after merge & redeploy:

ztlp connect bootstrap.hermes-try5.ztlp --ns-server 35.91.88.177:23096 \
  --service gw-hermes-try5 -L 18084:127.0.0.1:3000

curl -X POST http://127.0.0.1:18084/api/admin/enrollment_tokens \
  -H 'Content-Type: application/json' \
  -d '{"computer_name":"smoke-001"}'

Expected: 201 Created with a ztlp://enroll/?... URI in the response body. If the gateway-auth headers are reaching Rails (which they are — passwordless dashboard sign-in works), this should succeed without any further changes.

Things deliberately NOT touched

  • The existing HMAC v1 controller and routes are left intact as a historical/secondary path. New integrations should use /api/admin/; existing v1 callers continue to work (if any survive in a different topology).
  • The Rust gateway code in proto/ is unchanged. This is a Bootstrap-Rails-only PR. A future PR could narrow the gateway's X-ZTLP-* strip-list from prefix to exact-name allowlist so the HMAC path also works again, but Option C makes that optional rather than required.

Threat-model notes for reviewers

  • The require_gateway_auth! before_action returns 401 if trusted_gateway_admin returns nil. trusted_gateway_admin ALREADY rejects when ZTLP_TRUST_GATEWAY_AUTH is unset/false, when ZTLP_GATEWAY_HEADER_SECRET is empty, or when the HMAC verification fails. So the new endpoint inherits all those checks for free.
  • skip_forgery_protection is the only CSRF bypass — null_session would have been wrong here because we want the existing session, we just don't want CSRF.
  • The endpoint deliberately rejects cookie-session admins (the trusted_gateway_admin call short-circuits before the chain reaches session[:admin_user_id]-based lookup). This keeps the trust model crisp: only the gateway can authenticate Z2LS-style callers on this surface.

Related diagnostic notes (untracked, NOT in this PR)

  • /home/trs/projects/ztlp/docs/findings/2026-05-23-v1-api-header-collision.md — full diagnosis of the HMAC blocker that motivated Option C
  • /home/trs/projects/ztlp/hermes-sandbox-zone.md — sandbox zone facts for Hermes-side retesting

Both are session notes; intentionally not committed.

Summary by CodeRabbit

  • New Features

    • Added a new gateway-authenticated enrollment token API endpoint (POST /api/admin/enrollment_tokens) as the preferred integration path.
  • Documentation

    • Added comprehensive runbook for the gateway-auth enrollment flow, including request/response contracts, failure handling, and a reference client implementation.
    • Updated existing documentation to clarify the recommended integration path and gateway header authentication contract for new deployments.

Review Change Stack

The existing HMAC v1 API at POST /api/v1/enrollment_tokens is unusable
in the current Launch-provisioned topology because the per-tenant ZTLP
gateway is started with --http-inject-headers, which strips ALL inbound
X-ZTLP-* headers as a defense against admin-auth spoofing. The HMAC API
headers (X-ZTLP-Zone, X-ZTLP-Client, X-ZTLP-Timestamp, X-ZTLP-Signature)
share that prefix, so they get nuked before reaching Rails. Verified
end-to-end against four freshly-provisioned tenants
(hermes-sandbox.ztlp, hermes-lab.ztlp, hermes-probe.ztlp,
hermes-trial.ztlp, hermes-try5.ztlp) — every signed request 401s with
audit reason 'missing_header'.

Diagnosis: docs/findings/2026-05-23-v1-api-header-collision.md (left
untracked; not part of this PR).

This change side-steps the collision entirely by adding a new endpoint
that reuses the same gateway-auth path the Bootstrap UI already uses.
Z2LS becomes 'an admin-equivalent client over ZTLP' rather than a
separately-credentialed system — the trust boundary is the ZTLP device
identity, not a shared HMAC secret.

Changes:

  * routes.rb: add namespace :admin under namespace :api with
    POST /api/admin/enrollment_tokens
  * app/controllers/api/admin/enrollment_tokens_controller.rb (new):
    JSON-only controller protected by trusted_gateway_admin, skipping
    forgery_protection (justified in-file: the device-identity check
    is strictly stronger than CSRF). Returns the same response shape
    as the HMAC v1 controller. Cookie-session admins are explicitly
    NOT allowed — gateway-auth-only by design.
  * test/controllers/api/admin/enrollment_tokens_controller_test.rb
    (new): full coverage — auth (no headers, cookie-only, corrupted
    sig), happy path, max_uses/expires_in defaults + overrides,
    metadata storage, audit log row, validation (missing /
    malformed / oversized computer_name).
  * script/z2ls_gateway_auth_token_request.rb (new): Z2LS reference
    client. Plain HTTP POST through a local tunnel forward port; no
    HMAC, no CSRF, no shared secrets.
  * docs/z2ls_gateway_auth_runbook.md (new): customer-facing runbook
    for the gateway-auth path.
  * docs/api_v1_ztlp_secured.md: 'preferred path' banner pointing
    callers to the new runbook; HMAC contract kept as historical.
  * docs/z2ls_enrollment_runbook.md: top-of-file note explaining the
    HMAC path is broken in Launch topology and pointing at the new
    runbook.

Manual validation followup:

  Hermes will retest end-to-end via the hermes-try5.ztlp sandbox
  tunnel after merge & redeploy of the bootstrap image. The expected
  smoke test (verified-loadable in the production image at PR time):

    curl -X POST http://127.0.0.1:18084/api/admin/enrollment_tokens \
      -H 'Content-Type: application/json' \
      -d '{"computer_name":"smoke-001"}'

  with the tunnel up from a ZTLP-enrolled admin device should return
  201 + a valid ztlp://enroll/?... URI.

Test results:

  Ruby syntax check (host): all 4 files parse OK
  Controller load test (production image, Rails 7.1):
    Api::Admin::EnrollmentTokensController.action_methods => [:create]
    api_admin_enrollment_tokens_path => /api/admin/enrollment_tokens
  Full Rails test suite was NOT run on the host (mocha gem missing
  from production image bundle; test gems aren't installed). CI is
  expected to run the new test/controllers/api/admin/* file with the
  full test-group bundle.

Things deliberately NOT touched:

  * The existing HMAC v1 controller and routes are left intact as a
    historical/secondary path.
  * The Rust gateway code in proto/ is unchanged — this is a
    Bootstrap-Rails-only PR. A future PR could narrow the gateway's
    X-ZTLP-* strip-list from prefix to exact-name allowlist so the
    HMAC path also works again, but Option C makes that optional
    rather than required.

Co-authored-by: Steve Price <steve@techrockstars.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 24, 2026

📝 Walkthrough

Walkthrough

This PR adds a new gateway-authenticated API endpoint (POST /api/admin/enrollment_tokens) for issuing enrollment tokens to Z2LS hosts. It replaces the HMAC v1 signing path that fails under Launch-provisioned gateways due to header stripping, implements full validation and audit logging, provides comprehensive test coverage, and includes runbooks and a reference client.

Changes

Gateway-authenticated enrollment token endpoint

Layer / File(s) Summary
Route and authorization gate
bootstrap/config/routes.rb, bootstrap/app/controllers/api/admin/enrollment_tokens_controller.rb
Adds POST /api/admin/enrollment_tokens route and require_gateway_auth! guard that returns JSON 401 unless the request passes ZTLP signature validation via trusted_gateway_admin.
Token generation and request handling
bootstrap/app/controllers/api/admin/enrollment_tokens_controller.rb
Validates and constrains computer_name (DNS-label regex, length limits), resolves target Network from explicit zone or admin zone with fallback, parses max_uses (default 1) and expires_in (shorthand mappings), generates token via TokenGenerator, records audit entry, returns 201 JSON with token URI, expiration, and lifetime message. Maps TokenGenerator::TokenError to 503 JSON.
Integration test suite
bootstrap/test/controllers/api/admin/enrollment_tokens_controller_test.rb
Tests auth rejection (401) for requests missing gateway headers or with corrupted ZTLP signatures. Validates happy paths: token creation, 201 response shape, defaults, parameter overrides, metadata persistence, and audit logging. Asserts 422 validation failures for missing/blank/invalid/overlong computer_name. Includes helpers for signed ZTLP headers and gateway environment wrapping.
Documentation and reference client
bootstrap/docs/api_v1_ztlp_secured.md, bootstrap/docs/z2ls_enrollment_runbook.md, bootstrap/docs/z2ls_gateway_auth_runbook.md, bootstrap/script/z2ls_gateway_auth_token_request.rb
Updates existing docs with warnings that HMAC v1 headers are stripped by Launch gateways, directing new integrations to gateway-auth. Adds complete runbook for gateway-auth flow covering architecture shift, setup prerequisites (Z2LS as enrolled admin, pubkey binding, tunnel connection), contract specification, curl test, Ruby reference client, and troubleshooting. Reference client script implements HTTP POST to endpoint with JSON payload, error handling, and formatted output.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A token endpoint hops in from the gateway's side,
No HMAC headers lost—the auth runs inside!
Z2LS devices enroll with a trusty admin call,
Tests and docs light the way; the gateway-auth wins all! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding a new gateway-authenticated enrollment-token API endpoint for Z2LS integration, with the 'Option C' designation indicating the choice among implementation approaches.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/z2ls-gateway-auth-enrollment-api

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
bootstrap/docs/z2ls_gateway_auth_runbook.md (1)

49-75: ⚡ Quick win

Add language identifier to the fenced code block.

The architecture diagram starting at line 49 uses a fenced code block without a language specifier. Adding a language identifier (e.g., text or ascii-art) improves rendering consistency across Markdown processors.

📝 Proposed fix
-```
+```text
 Z2LS host (ZTLP-enrolled admin device for zone "acme.ztlp")
       │
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bootstrap/docs/z2ls_gateway_auth_runbook.md` around lines 49 - 75, The fenced
code block in bootstrap/docs/z2ls_gateway_auth_runbook.md that begins with the
diagram line "Z2LS host (ZTLP-enrolled admin device for zone \"acme.ztlp\")"
lacks a language identifier; update the opening fence from ``` to ```text (or
```ascii-art) so Markdown renderers treat it as plain text and preserve
formatting for the diagram and lines like "ZTLp connect..." and "POST
/api/admin/enrollment_tokens".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@bootstrap/docs/z2ls_gateway_auth_runbook.md`:
- Around line 49-75: The fenced code block in
bootstrap/docs/z2ls_gateway_auth_runbook.md that begins with the diagram line
"Z2LS host (ZTLP-enrolled admin device for zone \"acme.ztlp\")" lacks a language
identifier; update the opening fence from ``` to ```text (or ```ascii-art) so
Markdown renderers treat it as plain text and preserve formatting for the
diagram and lines like "ZTLp connect..." and "POST
/api/admin/enrollment_tokens".

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 01cf8148-2735-4503-a55c-254743a0f8db

📥 Commits

Reviewing files that changed from the base of the PR and between c57a866 and d81567f.

📒 Files selected for processing (7)
  • bootstrap/app/controllers/api/admin/enrollment_tokens_controller.rb
  • bootstrap/config/routes.rb
  • bootstrap/docs/api_v1_ztlp_secured.md
  • bootstrap/docs/z2ls_enrollment_runbook.md
  • bootstrap/docs/z2ls_gateway_auth_runbook.md
  • bootstrap/script/z2ls_gateway_auth_token_request.rb
  • bootstrap/test/controllers/api/admin/enrollment_tokens_controller_test.rb

@priceflex priceflex merged commit a5993ee into main May 24, 2026
7 checks passed
@priceflex priceflex deleted the feat/z2ls-gateway-auth-enrollment-api branch May 24, 2026 10:11
priceflex added a commit that referenced this pull request May 24, 2026
…t floors (#41)

What:
  Coordinated version bump across all four component manifests to match
  the v0.30.3 git tag cut from a5993ee (PR #40 — Z2LS gateway-auth
  enrollment API). Floors in release_test.exs / version_pin_test.rs
  ratcheted from 0.29.4 → 0.30.3 so a future v0.30.4 cut without bumping
  these files will fail CI loudly.

Why:
  v0.30.0 through v0.30.2 produced Docker image tags and a git tag but
  did not have a coordinated four-component manifest bump. That's the
  exact PR #13/#14 drift class — runtime services on v0.30.2 containers
  report Application.spec(:ztlp_ns, :vsn) == '0.30.0' (or some other
  stale value depending on when they were last compiled). The
  release-version-pinning skill prescribes ratcheting the floor +
  bumping the manifests in one PR after the tag so the next tag cut
  exercises the floor.

Files:
  proto/Cargo.toml                            0.30.0 → 0.30.3
  ns/mix.exs                                  0.30.0 → 0.30.3
  relay/mix.exs                               0.30.0 → 0.30.3
  gateway/mix.exs                             0.30.0 → 0.30.3
  proto/tests/version_pin_test.rs             floor 0.29.4 → 0.30.3
  ns/test/ztlp_ns/release_test.exs            floor 0.29.4 → 0.30.3
  relay/test/ztlp_relay/release_test.exs      floor 0.29.4 → 0.30.3
  gateway/test/ztlp_gateway/release_test.exs  floor 0.29.4 → 0.30.3
  .gitignore                                  + .ssh/ (defense against
                                                 accidental key commits)

Tests:
  TDD: RED → bumped manifests → GREEN.

  RED (manifests still at 0.30.0, floors ratcheted to 0.30.3):
    ns:      mix.exs version 0.30.0 is older than the v0.30.3 Z2LS gateway-auth tag
    relay:   mix.exs version 0.30.0 is older than the v0.30.3 Z2LS gateway-auth tag
    gateway: mix.exs version 0.30.0 is older than the v0.30.3 Z2LS gateway-auth tag
    (proto deferred to CI — local cargo 1.75 doesn't grok Cargo.lock v4)

  GREEN (after bump, full release_test.exs per component):
    ns:      15 tests, 0 failures
    relay:   15 tests, 0 failures
    gateway: 15 tests, 0 failures

  The runtime-vs-declared drift test (Application.spec/2 == mix.exs)
  also passes in GREEN, confirming the OTP .app cache was recompiled
  correctly after the bump.

Validation:
  - Full relay test suite running in background to confirm no collateral
    damage from the bump.
  - CI on the PR will exercise proto/tests/version_pin_test.rs (the
    floor guard there is the inverse direction — fails if Cargo.toml
    drops below the floor).

Follow-up:
  - Cut v0.30.4 from THIS commit so the tag and four-component manifests
    agree (the v0.30.3 tag is the 'tag points at pre-bump commit' case
    per the release-version-pinning skill).
  - Rebuild + redeploy ztlp-node and ztlp-bootstrap images tagged
    v0.30.4 so the on-disk SaaS state catches up to the source.
  - Bootstrap currently has no version manifest (Rails app, not a
    packaged artifact). Adding a VERSION file + initializer + test is
    listed in docs/plans/2026-05-24-z2ls-via-gateway-admin-auth.md as
    a follow-up — kept out of this PR to keep the scope tight.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant