fix(stage 7): federation isolation — silent-pass bug, doc + broker code together#69
Merged
hanwencheng merged 2 commits intomainfrom May 7, 2026
Merged
Conversation
… Null operator, §4.4.1 inline cleanup, §4.5 cross-machine guidance)
hanwencheng
added a commit
that referenced
this pull request
May 7, 2026
PHASE-0-CHECKPOINT.md covers Phase 0 in isolation against localhost. This guide is the production equivalent — full Stage 7 (Phases 0 + A.1 + A.2 + B + C-structural + D-rest + E) running on a real EC2 broker host with the AWS account from cloud-setup.md. Sections walk an operator through: - Two-machine layout (operator workstation vs broker host) with inline === ON … === banners on every command block. - Prerequisites checklist (cloud-setup.md §0–4 done, broker host bootstrapped, two cast-generated test wallets). - /healthz + /readyz + OIDC discovery + JWKS + IAM-side OIDC provider cross-checks (with the byte-for-byte issuer match invariant). - SIWE wallet auth round-trip for both wallets, signing with cast wallet sign (no --no-hash). - /v1/mint-oidc-jwt → AssumeRoleWithWebIdentity manual path, decoding the https://aws.amazon.com/tags claim. - Cloud-enforced isolation proof (the climax): wallet A reads its own prefix; wallet B's prefix returns AccessDenied from S3 itself, not app code. Includes the diagnostic-state runbook for both failure modes (own-prefix denied → JWT missing tag claim; other-prefix succeeds → cloud-setup.md §4.4.1 not applied; this is the silent-pass bug PR #69 fixed at the broker layer). - /v1/mint-aws-creds the daemon path with audit_record_id + anchored fields. - Capability grants (create / list / revoke), wallet linking + unauthenticated recover/lookup, email-link + OAuth2/Google flows. - Audit log inspection (sqlite plugin_mint_log columns explained). - Phase C EVM anchor (structural-only in v0; live alloy lands in V0.1-FOLLOWUPS hardening). - Prometheus metrics + Idempotency-Key (hit/miss/422 cases). - harness/stage-7-issue-64-done.sh as the programmatic gate. - Failure-mode walk-through: BOOT_FAIL anchor table, InvalidIdentityToken triage, AccessDenied-on-own-prefix, 24h-clean-exit + Restart=always. - 'What's intentionally not yet live' section pointing at V0.1-FOLLOWUPS.md so operators know which structural features ship as stubs (live EVM anchor, TEE signer, fail-closed grants default, latency histograms). 860 lines. All 6 cross-referenced files exist (verified).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Supersedes #67 (docs) and #68 (broker code) — consolidating into one PR because the bugs they each fix are interdependent and the federation only works when both land together.
TL;DR
docs/cloud-setup.md§4 (OIDC federation) had three independent bugs that combined to produce a silent false pass: end-to-end §4.5 returned success on both the own-prefix and foreign-prefix list calls, even though the runbook reads as if AWS is enforcing per-tenant isolation. This PR fixes all three layers.Reproduction (current main, before this PR)
4b should be
AccessDenied. It isn't. The cloud is enforcing nothing.Three bugs, in the order you'd discover them
Bug A: §3 inline policy
agentkeys-data-role-inlinemasks the bucket policyagentkeys-data-role-inline(from §3.2) grants the role broads3:GetObject+s3:ListBucketon the entire bucket — necessary in the static-IAM path but fatal here: IAM is union-of-allows, so this identity-based grant overrides §4.4's bucket-policy isolation. As long as it's attached, the bucket policy's PrincipalTag scoping never matters.The runbook moves §3 → §4 but never tells the operator to delete or rewrite this. Doc-only fix.
Bug B: §4.3 trust policy uses
StringNotEquals: ""instead ofNull: "false"The original snippet:
Looks like it requires the tag to be non-empty. Doesn't. AWS IAM evaluates negated string operators on missing context keys as TRUE ("the missing key is not equal to anything"). So a JWT carrying no AWS tags claim silently bypasses the check —
assume-role-with-web-identitysucceeds, the session has no PrincipalTag, and the bucket policy's tag-based scoping has nothing to evaluate against.Correct enforcement uses
Null:"the key MUST exist"— actually rejects sessions where the tag isn't set. Doc-only fix.Bug C (show-stopper): broker doesn't emit
https://aws.amazon.com/tagscrates/agentkeys-broker-server/src/handlers/oidc.rs:106-113builds JWT claims as:agentkeys_user_walletappears as a top-level claim only. AWS does not auto-promote arbitrary OIDC claims to session tags — it specifically requires the magic-namedhttps://aws.amazon.com/tagsclaim with aprincipal_tags(and optionallytransitive_tag_keys) shape. Without it, no PrincipalTag is set on assumed sessions, and${aws:PrincipalTag/agentkeys_user_wallet}in the bucket policy expands to empty.Spec: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_session-tags.html#oidc-session-tags
Code fix in
mint_oidc_jwt:transitive_tag_keysensures the tag persists across role chaining.Why all three bugs hid in plain sight
The bugs are sequenced such that each masks the next:
Net result: end-to-end §4.5 prints what looks like success. Without explicitly testing 4b against a foreign prefix, an operator would ship Stage 7 thinking federation isolation works.
What this PR does
Commit 1 —
docs(cloud-setup): fix §4 federation runbookdocs/cloud-setup.mdonly:Null: "false"instead ofStringNotEquals: "". Explanatory paragraph on why negated string operators don't work for tag-presence enforcement.s3:GetObject+s3:ListBucketstatement (which AWS rejects withMalformedPolicy) intoAllowDaemonListOwnPrefix(ListBucket + s3:prefix StringLike) andAllowDaemonGetOwnObjects(GetObject; resource ARN itself enforces the prefix).StringLike "${tag}/*"instead ofStringEquals "${tag}/"so sub-prefixes work.agentkeys-data-role-inline's broad-bucket grant. Preserves any non-S3 statements (ses:SendRawEmail); falls back todelete-role-policyif there are none.# === Run on your operator workstation ===/# === The rest runs inside the SSH session ===). Base64url-aware JWT decoder (replaces@base64dwhich fails withMalformed BOMwhen payload base64 happens to contain-or_). Diagnostic guide mapping observed outcomes to root causes.Diff: 1 file, +122/-15.
Commit 2 —
fix(broker): emit https://aws.amazon.com/tags claimcrates/agentkeys-broker-server/src/handlers/oidc.rs:mint_oidc_jwtadds thehttps://aws.amazon.com/tagsclaim withprincipal_tags.agentkeys_user_wallet = [session.wallet]andtransitive_tag_keys = ["agentkeys_user_wallet"].claims_supportedadvertiseshttps://aws.amazon.com/tags.crates/agentkeys-broker-server/tests/oidc_flow.rs:mint_oidc_jwt_signs_claims_for_session_walletasserts the JWT carries the AWS tags claim with the wallet underprincipal_tagsandtransitive_tag_keyspopulated. Bug-regression guard.discovery_returns_aws_compatible_shapeassertsclaims_supportedincludes the tags claim.cargo test -p agentkeys-broker-server→ all 9 + 6 + 0 tests pass.Test plan
cargo test -p agentkeys-broker-serverlocally — all green.jq -n(well-formed JSON,${aws:PrincipalTag/...}placeholders preserved).https://aws.amazon.com/tags.principal_tags.agentkeys_user_walletpopulated.Deploy notes
After merge, the broker host needs a rebuild + restart for the JWT shape to actually change:
Then re-run §4.5 from the runbook.
Related
🤖 Generated with Claude Code