docs(cloud-setup): fix §4 federation runbook — split bucket policy, enforce tag presence, drop §3 inline grant, cross-machine §4.5 guidance#67
Closed
hanwencheng wants to merge 1 commit intomainfrom
Conversation
0be90ec to
afd7213
Compare
… Null operator, §4.4.1 inline cleanup, §4.5 cross-machine guidance)
afd7213 to
0c51bc1
Compare
3 tasks
Member
Author
|
Consolidating with #68 into a single PR for the Stage 7 federation fix (docs + broker code together — same conceptual change). |
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four interrelated fixes to
docs/cloud-setup.md§4 (OIDC federation). End-to-end §4.5 against the deployed prod broker silently passed broad-access tests because three runbook bugs combined to mask each other. This PR addresses the doc-side bugs; broker code fix (the show-stopper) is out of scope and will be filed separately.What this PR fixes
1. §4.4 bucket policy — split into two statements (originally MalformedPolicy)
Old: one statement combining
s3:GetObject+s3:ListBucketunder ones3:prefixcondition. AWS rejected at policy-save time:s3:prefixis a request-time condition that only applies tos3:ListBucket(the prefix filter on listings).s3:GetObjectdoesn't carry a prefix parameter, so combining the two under ones3:prefixcondition is semantically incoherent.Fix: split into
AllowDaemonListOwnPrefix(ListBucket + s3:prefix StringLike) andAllowDaemonGetOwnObjects(GetObject; resource ARN itself enforces the prefix via${aws:PrincipalTag/...}expansion).Also changed
StringEquals "${tag}/"→StringLike "${tag}/*"so the daemon can list sub-prefixes (<wallet>/inbox/,<wallet>/sent/2026-05/), not just the root. Matches the shape already indocs/spec/ses-email-architecture.md§10.4 andwiki/tag-based-access.2. §4.3 trust policy —
Nulloperator instead ofStringNotEquals: ""Old:
Looks like it requires the tag to be non-empty. Doesn't. AWS IAM evaluates negated string operators on missing context keys as TRUE ("the missing key is not equal to anything"). So a JWT carrying no AWS tags claim silently bypasses the check —
assume-role-with-web-identitysucceeds, the session has no PrincipalTag, and the bucket policy's tag-based scoping has nothing to evaluate against.Fix:
Null: \"false\"means "the key MUST exist (be not-null)" — actually rejects sessions where the tag isn't set.3. New §4.4.1 — strip the §3 broad-bucket grant from
agentkeys-data-role-inline§3.2's inline policy grants
s3:GetObject+s3:ListBucketon the entire bucket — necessary in the static-IAM path (no PrincipalTag to scope on) but fatal in §4: IAM is union-of-allows, so this identity-based grant overrides §4.4's bucket-policy isolation. §4.5's 4b test silently succeeds instead of correctly returning AccessDenied — federation appears to work while the cloud is enforcing nothing.The runbook moves §3 → §4 but never tells the operator to delete or rewrite this inline policy. New section adds the cleanup commands (preserve
ses:SendRawEmail; drop the broad S3 grant) and a verification step.4. §4.5 — cross-machine guidance + base64url-aware decoder
Two doc bugs in the original §4.5 that bit me running it for real:
(a) Conflated environments. Step 1 hits
127.0.0.1:8090(operator's localhost) and step 2 hits\$OIDC_ISSUER(prod broker). These don't share session state — the prod broker validates against its own local backend on the EC2 host. The runbook reads as if everything is local; it isn't.Fix: split into Part A ("on the broker host — mint the JWT") and Part B ("on your operator workstation — assume role + verify"). Explicit SSH preamble; the operator copies the JWT across.
(b)
@base64ddoesn't handle base64url. JWT segments are base64url-encoded per RFC 7515 (-/_instead of+//, no=padding). jq's@base64dis strict base64. Most JWT payloads happen to not contain-or_in their base64 representation, so it works by accident — until it doesn't, with the cryptic errorjq: error (at <stdin>:1): Malformed BOM (while parsing '��').Fix: replace the decoder with one that converts base64url → base64 + adds padding before
@base64d:5. New diagnostic guide for §4.5 intermediate states
Added a "Diagnosing intermediate states" subsection that maps observed outcomes to root causes:
https://aws.amazon.com/tagsclaim (separate code bug, fix incoming)This is the bug pattern that hid Stage 7's broken federation in the original runbook.
Out of scope
The show-stopper bug —
crates/agentkeys-broker-server/src/handlers/oidc.rs:106-113doesn't emit thehttps://aws.amazon.com/tagsclaim — is a code change, not a doc fix. To be filed in a separate PR. Until that lands and is deployed, the corrected §4.5 will see both 4a and 4b deny (the documented intermediate state). The doc fixes here remove the masking layers so that bug actually surfaces.Verification
jq -nlocally (well-formed JSON output,\${aws:PrincipalTag/...}placeholders preserved).MalformedPolicyfrom the §4.4 combined statement,Malformed BOMfrom@base64don a real JWT, both list-objects-v2 calls succeeding because of the inline policy, JWT decode confirming no AWS tags claim.Test plan
put-bucket-policyagainst a fresh bucket — succeeds.update-assume-role-policy— succeeds.Statement[*].Actionreturns onlyses:SendRawEmail.🤖 Generated with Claude Code