Skip to content

Phase 4: Active delegation chains (delegation.grant production) #121

@hanwencheng

Description

@hanwencheng

Context

In M1-M3 the cap-token system is flat: an actor gets a cap, uses it directly. Delegation is schema-only (returns not_implemented_in_v1). For M4, we light it up — a parent agent issues a narrowed cap to a child sub-agent, with explicit scope reduction, TTL inheritance, revocation cascade, and full audit chain reconstruction.

Per milestones-roadmap.md §5, this is the depth that lets the first enterprise customer (regulated B2B brand-owner) actually deploy AgentKeys for multi-agent scenarios. Without delegation, every agent in a fleet needs its own first-class cap-mint flow — which doesn't scale and doesn't audit.

The corrected design from agent-iam-strategy.md §3.3`: delegation is implicit-in-cap-tokens by default; explicit delegation activates only after vendor proves M2-tier traction. This issue ships explicit delegation.

Scope (M4)

Cap-token format extensions

  • Add parent_cap_id: CapId (the cap this one was delegated from; null for root caps)
  • Add delegation_chain_depth: u8 (0 for root; +1 per delegation hop)
  • Add narrowed_scope: Scope (the explicit scope; MUST be ⊆ parent_cap's scope)
  • Existing fields (TTL, data_class, actor_omni, etc.) remain; delegation just adds parent linkage

Broker enforcement

  • Every delegated cap-mint validates: narrowed_scope ⊆ parent_scope AND narrowed_ttl ≤ parent_ttl_remaining AND delegation_chain_depth + 1 ≤ MAX_DEPTH
  • MAX_DEPTH env-configurable (default 3)
  • Reject with DelegationPrivilegeEscalation / DelegationDepthExceeded if violated

Revocation cascade

Audit chain

  • Every delegated cap-mint emits an audit row with the full delegation path: [root_cap_id, intermediate_cap_id_1, intermediate_cap_id_2, ...]
  • Audit replay (Phase 4: Policy versioning + audit replay #123) can reconstruct the delegation graph for any event and show "who delegated what to whom"

arch.md update (deliverable)

Delegation isn't in arch.md yet. Land a new arch.md §X "Delegation chains" section as part of this issue. Per CLAUDE.md architecture-as-source-of-truth policy: if it's an invariant, it lives in arch.md.

The new section covers:

  • Delegation grammar (scope ⊆ parent_scope etc.)
  • Maximum depth + rationale
  • Revocation cascade semantics
  • Audit chain reconstruction
  • Interaction with per-data-class isolation (delegation preserves data_class; can't cross classes)
  • Interaction with per-actor isolation (delegation preserves actor_omni; cross-actor delegation is a different ceremony, M5+)

Out of scope (defer)

  • Cross-actor delegation (M5+ — needs identity-ceremony work)
  • Delegation across data_classes (forbidden by design — preserves arch.md §17 invariants)
  • UI for delegation graph visualization (M5)
  • Delegation between vendors (M5+ — multi-tenant trust questions)

Acceptance criteria

  • A parent agent can delegate a narrowed cap to a child sub-agent: cap-mint succeeds with parent_cap_id set
  • Delegated cap with broader scope than parent is rejected with DelegationPrivilegeEscalation — verified by negative test
  • MAX_DEPTH enforced: delegation chain of depth N+1 (where N = MAX_DEPTH) rejected with DelegationDepthExceeded
  • Revoking a root cap revokes all descendants atomically (verified: worker rejects descendants with CapRevokedViaParent within 200ms of root revocation)
  • Audit replay (Phase 4: Policy versioning + audit replay #123) reconstructs the full delegation graph for any event in a known scenario
  • arch.md has a new "Delegation chains" section that the rest of the doc cross-references (per arch.md §5 canonical-names rule)
  • The 3 schema-only tools from Phase 1: AgentKeys MCP server — 7 active tools + 3 schema-only #107 (delegation.grant, delegation.revoke, approval.request) now return real results (not not_implemented_in_v1)

Risks

Risk Mitigation
Revocation cascade latency too high (worker-side check on every use is slow) Worker caches root cap_id status with short TTL (~5s); cascade is bounded by cache TTL; documented as "bounded offline revocation" same as #110-pattern caps
Audit chain reconstruction at depth N is slow Index audit rows by delegation_root_cap_id; reconstruction is a single index scan
Delegation surface tempts operators to use it where straight cap-mint would be simpler Documentation: "use delegation when you need scope narrowing or cascade revocation; otherwise mint a fresh cap" — example matrix in arch.md §X
Scope-subset check is wrong (allows privilege escalation in edge cases) Property test: random valid (parent, narrowed) pairs always pass subset; random invalid pairs always reject — minimum 1000 generated cases

References

Effort

~2-3 weeks (includes arch.md design work). Sequencing:

  1. (Days 1-3) arch.md §X "Delegation chains" design pass + spec review
  2. (Days 3-7) Broker cap-token format extension + scope-subset enforcement + property tests
  3. (Days 7-11) Worker chain-verify updates + revocation cascade
  4. (Days 11-14) Audit chain wiring + replay integration
  5. (Days 14-18) MCP server tool graduation (Phase 1: AgentKeys MCP server — 7 active tools + 3 schema-only #107's schema-only → real)
  6. (Days 18-21) Integration tests + demo scenarios + arch.md final review

Pickup notes for the next agent / developer

  • Read agent-iam-strategy.md §3.3 first — the corrected design (delegation is preview-only in v1) is the framing this issue ships
  • arch.md update is part of this issue, not a follow-up. If you ship the code without arch.md, future operators won't know the delegation grammar.
  • Per CLAUDE.md "Architecture-as-source-of-truth policy": re-read arch.md after every commit to verify it still matches the implementation
  • The scope-subset check is the most-likely-to-be-wrong piece. Property-based testing is non-negotiable here (use proptest for Rust; minimum 1000 cases).
  • Watch for: don't accidentally allow delegation across data_classes. The cap-token's data_class field is signed; delegation must preserve it. If you find yourself writing data_class: new_data_class, stop — that's a different ceremony (M5+).
  • Use the /agentkeys-issue-create skill for follow-up issues (e.g., delegation graph UI visualization, cross-actor delegation in M5)

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/brokerBroker server, cap-token issuance, OIDC issuancearea/identityHDKD actor tree, K-key inventory, identity ceremony

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions