Skip to content

docs: Agent IAM strategy reset — hooks-first wire architecture#140

Merged
hanwencheng merged 1 commit into
mainfrom
claude/docs-agent-iam-reset
May 28, 2026
Merged

docs: Agent IAM strategy reset — hooks-first wire architecture#140
hanwencheng merged 1 commit into
mainfrom
claude/docs-agent-iam-reset

Conversation

@hanwencheng
Copy link
Copy Markdown
Member

@hanwencheng hanwencheng commented May 28, 2026

Summary

Strategic reset for the AI-device wedge (issue #103 lineage). Docs-only PR; the companion implementation PR #141 ships agentkeys wire + agentkeys hook (and the operator runbook, since it documents those commands).

  • Move agent-iam-strategy.md from docs/research/docs/ (it is the strategic anchor, not third-party research); update all 13 references repo-wide.
  • arch.md §22d — IAM-guarantee delivery (hooks-first, proxy-fallback) + the agentkeys wire CLI surface.
  • New wiki glossary (docs/wiki/agent-iam-guarantee-glossary.md) — IAM tool vs IAM guarantee, hooks-vs-proxy trade-off table, verified hook-availability table across six runtimes (Claude Code, Codex, Hermes, OpenClaw, Kimiclaw, xiaozhi-server).
  • New plan (docs/spec/plans/phase-1-fresh-user-wire-onboarding.md) — the 7-step fresh-user journey + the manual-vs-automatic hybrid decision.
  • strategy doc §3.6 (IAM tool vs guarantee), §3.7 (wire decision), §4 Phase 1, §5 Phase 3/3b updates.
  • Archive the superseded Rust-runtime approach (demo runbook, verify script, setup script, sandbox Dockerfile) under docs/archived/*-rust-runtime-2026-05*.

Decision recorded

Option B (vendor → Task Host → both MCPs) with hooks-first IAM guarantees; OpenAI-compatible proxy as a lower-priority fallback for hosts without a hook surface. Anchored in strategy §2.1/§2.4 (Authority Host vs Task Host) and issue #133.

Companion PR

Implementation (agentkeys wire + agentkeys hook + Hermes adapter + the operator runbook) lands in #141 off main. The two are reviewable independently; both merge to leave the tree consistent. The plan doc's relative links to operator-runbook-wire.md resolve on main once both merge.

Test plan

  • Docs render; cross-links resolve (strategy doc move + 13 ref updates)
  • docs/archived/README.md lists the 4 archived artifacts with superseded-by pointers
  • arch.md §22d + wiki glossary + plan doc tell one consistent story
  • No stale docs/research/agent-iam-strategy.md references remain

🤖 Generated with Claude Code

@hanwencheng
Copy link
Copy Markdown
Member Author

Companion implementation PR: #141 (agentkeys wire + agentkeys hook).

Strategic reset for the AI-device wedge (issue #103 lineage):

- Move agent-iam-strategy.md from docs/research/ to docs/ (it is the
  strategic anchor, not third-party research); update all 13 references.
- arch.md §22d: IAM-guarantee delivery (hooks-first, proxy-fallback) +
  the agentkeys wire CLI surface.
- New wiki glossary: IAM tool vs IAM guarantee, hooks-vs-proxy trade-off,
  verified hook-availability table across six runtimes.
- New plan docs/spec/plans/phase-1-fresh-user-wire-onboarding.md (the
  7-step fresh-user journey + manual-vs-automatic hybrid decision).
- strategy doc §3.6/§3.7 + §4 Phase 1 + §5 Phase 3/3b updates.
- Archive the superseded Rust-runtime approach (demo runbook, verify
  script, setup script, sandbox Dockerfile) under docs/archived/.

The operator runbook (docs/operator-runbook-wire.md) ships in the
companion implementation PR #141, since it documents the agentkeys wire
+ hook commands that land there.
hanwencheng added a commit that referenced this pull request May 28, 2026
Phase 1.a of the fresh-user wire-onboarding plan. Turns the shipped MCP
tools (#107) into IAM guarantees the LLM cannot bypass, via Task-Host
lifecycle hooks (issue #133 track). Companion docs (strategy/arch/wiki/
plan) land in PR #140.

- `agentkeys hook check|audit|memory-inject` (src/hook.rs): thin MCP
  JSON-RPC clients invoked by the wire-generated hook scripts. Read the
  host stdin payload, call an AgentKeys MCP tool, emit host-shaped stdout
  JSON. `check` fails CLOSED; audit + memory-inject never block.
- `agentkeys wire <runtime>` (src/wire.rs): RuntimeAdapter trait +
  HermesAdapter. Detects Hermes, writes hook scripts to
  ~/.hermes/agent-hooks/, merges a sentinel-managed `hooks:` block into
  ~/.hermes/config.yaml (preserves other keys, refuses to clobber a
  foreign hooks:), sets hooks_auto_accept: true, verifies via
  `hermes hooks doctor`. Idempotent (ok/skip/fail per step); --check-only
  reports drift without writing.
- CLI wiring: Commands::Wire + Commands::Hook + HookAction in main.rs;
  pub mod hook/wire in lib.rs.
- Operator runbook (docs/operator-runbook-wire.md): the 7-step fresh-user
  flow + three-act demo verification — moved here from the docs PR since
  it documents these exact commands.

13 unit tests (6 hook + 7 wire). Smoke-tested end-to-end against the
in-memory MCP backend: Act 1 memory injection, Act 2 over-cap denial,
auto-audit, and the full wire apply->idempotent-rerun->check-only cycle.
@hanwencheng hanwencheng force-pushed the claude/docs-agent-iam-reset branch from 718362f to 0c6ce7f Compare May 28, 2026 09:00
@hanwencheng hanwencheng merged commit e0318e3 into main May 28, 2026
4 checks passed
hanwencheng added a commit that referenced this pull request May 28, 2026
Phase 1.a of the fresh-user wire-onboarding plan. Turns the shipped MCP
tools (#107) into IAM guarantees the LLM cannot bypass, via Task-Host
lifecycle hooks (issue #133 track). Companion docs (strategy/arch/wiki/
plan) land in PR #140.

- `agentkeys hook check|audit|memory-inject` (src/hook.rs): thin MCP
  JSON-RPC clients invoked by the wire-generated hook scripts. Read the
  host stdin payload, call an AgentKeys MCP tool, emit host-shaped stdout
  JSON. `check` fails CLOSED; audit + memory-inject never block.
- `agentkeys wire <runtime>` (src/wire.rs): RuntimeAdapter trait +
  HermesAdapter. Detects Hermes, writes hook scripts to
  ~/.hermes/agent-hooks/, merges a sentinel-managed `hooks:` block into
  ~/.hermes/config.yaml (preserves other keys, refuses to clobber a
  foreign hooks:), sets hooks_auto_accept: true, verifies via
  `hermes hooks doctor`. Idempotent (ok/skip/fail per step); --check-only
  reports drift without writing.
- CLI wiring: Commands::Wire + Commands::Hook + HookAction in main.rs;
  pub mod hook/wire in lib.rs.
- Operator runbook (docs/operator-runbook-wire.md): the 7-step fresh-user
  flow + three-act demo verification — moved here from the docs PR since
  it documents these exact commands.

13 unit tests (6 hook + 7 wire). Smoke-tested end-to-end against the
in-memory MCP backend: Act 1 memory injection, Act 2 over-cap denial,
auto-audit, and the full wire apply->idempotent-rerun->check-only cycle.
hanwencheng added a commit that referenced this pull request May 31, 2026
* feat(cli): agentkeys wire + hook — IAM-guarantee hooks for Hermes

Phase 1.a of the fresh-user wire-onboarding plan. Turns the shipped MCP
tools (#107) into IAM guarantees the LLM cannot bypass, via Task-Host
lifecycle hooks (issue #133 track). Companion docs (strategy/arch/wiki/
plan) land in PR #140.

- `agentkeys hook check|audit|memory-inject` (src/hook.rs): thin MCP
  JSON-RPC clients invoked by the wire-generated hook scripts. Read the
  host stdin payload, call an AgentKeys MCP tool, emit host-shaped stdout
  JSON. `check` fails CLOSED; audit + memory-inject never block.
- `agentkeys wire <runtime>` (src/wire.rs): RuntimeAdapter trait +
  HermesAdapter. Detects Hermes, writes hook scripts to
  ~/.hermes/agent-hooks/, merges a sentinel-managed `hooks:` block into
  ~/.hermes/config.yaml (preserves other keys, refuses to clobber a
  foreign hooks:), sets hooks_auto_accept: true, verifies via
  `hermes hooks doctor`. Idempotent (ok/skip/fail per step); --check-only
  reports drift without writing.
- CLI wiring: Commands::Wire + Commands::Hook + HookAction in main.rs;
  pub mod hook/wire in lib.rs.
- Operator runbook (docs/operator-runbook-wire.md): the 7-step fresh-user
  flow + three-act demo verification — moved here from the docs PR since
  it documents these exact commands.

13 unit tests (6 hook + 7 wire). Smoke-tested end-to-end against the
in-memory MCP backend: Act 1 memory injection, Act 2 over-cap denial,
auto-audit, and the full wire apply->idempotent-rerun->check-only cycle.

* test(harness): phase1 wire end-to-end harness + session-bearer plumbing

Adds harness/phase1-wire-demo.sh — the idempotent end-to-end test for the
agentkeys wire + hook flow, reusing the setup-heima.sh account (master =
MacBook, agent = aiosandbox). Two modes: --light (in-memory MCP, fully
self-contained) and default (real broker + workers + Heima mainnet).
Agent-side steps run via the sandbox REST API (/v1/shell/exec,
/v1/file/upload); the aarch64-linux agent binary is cross-built in an
arm64 rust container. Manual gates: LLM key, real Touch ID at scope grant,
the Hermes surprise + confirm. ok/skip/fail per step; bash 3.2 portable.

Also closes the session-bearer gap the real-broker path needs (arch.md
§22b.4 "cap-mint daemon->broker auth: session JWT only"): the hook now
forwards X-AgentKeys-Session-Bearer (env AGENTKEYS_SESSION_BEARER), and
`agentkeys wire` bakes it into the generated hook scripts. The MCP server
already relays it to the broker cap-mint; in-memory backend ignores it.

Test plan + automation decisions: docs/spec/plans/phase1-wire-harness-test-plan.md.
CLAUDE.md: one-line note that harness/demo testing runs on Heima mainnet.

Validated: hook 6/6 + wire 7/7 unit tests green; --light Phase 0 + --skip
mechanism confirmed on bash 3.2. Full live in-sandbox run pending a
reachable rust registry (cross-build) + hermes in the sandbox.

* test(harness): auto-resolve operator_omni + session bearer in Phase 0

Phase 0 was failing for two reusable-account fields that don't need
operator input:

- operator_omni: read from the agent file (heima-agent-create.sh writes
  both actor_omni AND operator_omni), same as actor_omni.
- session bearer: read the JWT from the master session file
  ~/.agentkeys/<session-id>/session.json (.token), with a soft expiry
  warning (created_at + ttl_seconds < now) so a stale session is flagged
  before cap-mint 401s.

Both still accept --flag / env overrides. Verified: Phase 0 now passes
end-to-end in real mode against the reused heima account.

* test(harness): use OPENROUTER_API_KEY env as the 0.6 LLM-key fallback

0.6 (LLM key) no longer always prompts: it falls back to OPENROUTER_API_KEY
/ LLM_API_KEY from the environment (export it in ~/.zshenv) and only prompts
when neither is set. The resolved key is used in Phase 4 to configure the
sandbox Hermes model (provider/base_url/default/api_key) so the surprise
chat works; if no key is available, Phase 4 is skipped cleanly.

Verified: Phase 0 now runs fully unattended in real mode (0.6 auto from
env, 0.7 from the master session). Never bakes the key into the repo.

* test(harness): add --fast host mode + rewrite runbook harness-first

--fast: host-only inner loop (seconds) — in-memory MCP on the Mac + the
three hook acts directly. No sandbox, no aarch64 cross-build, no hermes,
no account. Verified: Act 1 memory, Act 2 deny/allow, audit all green.

operator-runbook-wire.md rewritten as the single "run the demo" doc:
harness-first (TL;DR one-command per mode), the three modes table
(--fast / --light / real), prerequisites per mode, the manual gates
(LLM key auto from OPENROUTER_API_KEY, Touch ID, the surprise), updated
troubleshooting (session refresh, RUST_BUILD_IMAGE), and the old manual
7-step flow kept as Appendix B.

* test(harness): remove --fast mode

Drop the host-only --fast path (run_fast() + flag + main branch + header).
Back to two modes: --light (sandbox, in-memory backend) and default (real
broker + workers + Heima mainnet). Runbook updated to match — --light is
the lighter inner loop.

* test(harness): cache cross-build + fix sandbox wiring end-to-end

Verifying the demo end-to-end surfaced a chain of bugs that silent
failures (|| true, multi-line ErrorObservation, existence-only checks)
had been masking. All fixed; a --light run is now fully green and the
Hermes surprise is memory-aware (references the Chengdu travel memory).

Build caching (re-runs fast + idempotent like a local cargo build):
- derived builder image (pkg-config + libssl-dev baked once; reqwest's
  default native-tls links libssl) instead of apt-get per build
- named docker volumes for the cargo registry + git cache (no crates.io
  re-fetch every run); target/ already bind-mounted
- source-aware rebuild gate (rebuild only when a tracked .rs/Cargo.toml/
  Cargo.lock is newer than the binary; else skip)
- restart the sandbox MCP server when a fresh binary is uploaded

Sandbox wiring fixes:
- upload binaries to ~/.local/bin, not /usr/local/bin: the sandbox
  upload API runs non-root -> Errno 13 (resolve_sbx_paths resolves $HOME)
- runnability probe uses --help (the MCP server has no --version)
- MCP_PORT default 18088: 8088 collides with the sandbox gem-server
  (Address already in use)

Phase 4 (the surprise):
- write OPENROUTER_API_KEY to ~/.hermes/.env + set provider=openrouter
  (was provider=custom + never wrote the key); single-line commands
  (the sandbox rejects multi-line payloads); verified, not || true-masked
- wiring precheck fails loud when hooks/MCP are missing instead of
  printing open-chat instructions for a non-memory-aware chat
- non-fatal 4.1 model smoke surfaces 429/credential errors pre-surprise
- default LLM_MODEL=deepseek/deepseek-v4-flash (':free' is 429-throttled)

Runbook: troubleshooting rows for every failure above; env overrides for
the build-cache + LLM_MODEL/LLM_BASE_URL knobs.

* feat(wire): own the runtime hooks: key + audit memory ops + reproducible cross-build

agentkeys wire now OWNS the runtime's hooks: key. On (re)wire it REPLACES any
existing top-level hooks: block — whether that's our own block whose sentinel
comments a host re-serialization (`hermes config set`) dropped, or a hand-
authored one. The IAM guarantee requires the hooks be un-bypassable, and a YAML
config allows only one hooks: key, so coexistence isn't possible. Documented for
users in the new docs/user-manual.md (single home for user-facing behaviors),
linked from CLAUDE.md.

- wire.rs: strip_top_level_hooks() + merge_block now REPLACES (was: refused) a
  foreign/de-sentineled hooks: block; preserves other config keys; stays
  idempotent (sentineled re-runs skip). Unit test updated accordingly.
- mcp-server: audit-log every memory.get / memory.put (actor + namespace +
  bytes) at info — a server-side trail for memory reads/writes.
- harness robustness (three silent-failure traps found while iterating live):
  * pin the cross-build toolchain to the host's rustc — rust-toolchain.toml pins
    `channel = "stable"`, which FLOATS; a fresh container pulled 1.96 and broke
    clean builds of pre-release deps (crypto-common 0.2 / hybrid-array).
    Override via CROSS_RUST_TOOLCHAIN.
  * check the docker build EXIT CODE, not just file existence — a failed build
    no longer reports "ok" off a stale binary (warns + falls back, else fails).
  * check BOTH binaries for staleness — a stale mcp-server is no longer masked
    by an up-to-date cli.
- runbook: recovery row for "MCP died after a sandbox restart" (re-run Phases 0+1).

* fix(harness): make MCP step 1.4 mode/token-aware (idempotent restart)

Phase 1.4 reused ANY server answering :MCP_PORT/healthz, regardless of its
--backend or --vendor-tokens. So a leftover real-mode server (real broker,
harness-tok) from a default-mode run was reused for a --light run — the light
hook's demo-tok then hit it and every memory.get / permission.check 401'd
("bearer token not recognized"). This caused the recurring interactive-memory
flakiness.

Now 1.4 REUSES a running server only when its argv carries BOTH the intended
--backend AND --vendor-tokens; a mismatched / stale / leftover server is killed
and restarted with the correct config. Verified: a planted harness:STALE-TOK
server is detected and replaced with magiclick:demo-tok, after which all three
acts pass.

* fix(harness): stop mcp-server before re-uploading its binary (ETXTBSY)

Uploading a new agentkeys-mcp-server while the previous one is still running
failed with "Errno 26: Text file busy" — Linux refuses to overwrite a running
executable. 1.3 now pkills a live mcp-server before uploading its new binary
(1.4 restarts it after). Verified: a changed mcp-server now uploads cleanly,
the sandbox file matches the host, and the audit-logging build is live.

* feat(harness): respawn-loop MCP + log-append + explicit mode (no silent real)

Two robustness changes that close out the recurring demo flakiness.

1. Step 1.4 now starts the MCP server under a RESPAWN LOOP
   (`nohup bash -c 'while true; do <server>; echo [respawn]; sleep 1; done'`),
   so a crash self-heals without a harness re-run — the server kept dying under
   plain nohup, which is what left the wired hook hitting a dead :18088. The log
   uses >> (append) instead of > (truncate), so a restart no longer wipes the
   audit trail (that's why the memory log kept showing empty). Restart uses a
   double-pkill so the loop + any child it respawns mid-restart both go down.
   Verified: kill the server child → markers 1→2, healthz OK, token preserved.

2. Mode is now REQUIRED and explicit: `--light` OR `--real`, no silent default.
   Running the bare harness used to default to REAL mode, which flips the sandbox
   MCP to the live broker (harness-tok, no in-memory Chengdu fixture) and 401s the
   light-demo hook — the single biggest source of "no memory" confusion. The
   harness now errors with guidance if neither flag is given, and prints a loud
   MODE: banner so the active mode is never ambiguous. Added `--real`; --help +
   the runbook updated to {--light | --real}.

* docs(runbook): explicit --light vs --real comparison table

"light" was opaque (reads as "fewer features"). Replace the terse two-row table
with a full side-by-side: backend, memory data, the Chengdu fixture, broker/
chain, account, cap-mint, token, Touch ID, network, what it proves, cost/risk.
Lead with the one-liner: light = self-contained demo (pre-seeded, no external
deps); real = the live product wired to real infra.

* feat(cli+harness): agentkeys memory put + Mode-R memory seeding (step 1.5)

The --real demo had no Chengdu fixture (in-memory only), so the surprise found
nothing. Add a way to seed the real memory worker.

- New CLI: `agentkeys memory put --namespace <ns> --content <text>` (main.rs +
  hook::memory_put) — writes via agentkeys.memory.put, reusing the hook MCP
  client (env-configured actor/operator/token/session-bearer). Errors surface
  (Err) so a failed seed is loud. Verified end-to-end against a local in-memory
  MCP: put a new namespace → memory-inject reads it back; server logs the write.

- Harness step 1.5 (Mode R ONLY — in-memory auto-seeds): seeds the demo travel
  memory via `agentkeys memory put`. Idempotent (skips if the namespace already
  returns content), prompts before the write (auto with --yes), and fails loud
  with the heima-scope-set.sh --webauthn command if the cap-mint Store is
  rejected (memory scope not granted). The cap-mint uses the master session +
  the agent's device_key_hash (now resolved from the agent file and passed to
  the real-mode MCP via --default-device-key-hash). The authorizing Touch ID is
  the scope grant at 0.5. SEED_MEMORY_CONTENT overrides the default fixture.

- Runbook: the --real "Chengdu surprise" row now ✅ (1.5 seeds it) + a manual-
  gates entry describing the seed step and its scope/session prereqs.

* feat(harness): step 1.5 self-authorizes — WebAuthn grant then seed

Per request: 1.5 (Mode R) now grants the memory scope itself instead of just
failing-loud with instructions. Flow when the namespace is empty:
  1.5a  heima-scope-set.sh --webauthn --services memory  → real Touch ID
        (on-chain idempotent; only prompts because we gate on empty-memory, and
         SETS the full service list — override SEED_SCOPE_SERVICES to add more)
  1.5b  agentkeys memory put  → cap-mint Store → real memory worker
Still idempotent: if the namespace already returns content, the whole step
skips (no Touch ID). Detects a SKIPPED grant (K11 not webauthn-enrolled) and
fails loud with the `agentkeys k11 enroll --webauthn` command.

* docs(runbook): self-authorizing 1.5 seed + stale-binary trap

Match the runbook to the a6acef6 harness behavior:
- mode table: --real "Chengdu surprise" + "Touch ID" rows now say 1.5 SELF-
  grants the memory scope (heima-scope-set.sh --webauthn) then seeds — the
  Touch ID is at 1.5, not a separate 0.5 step.
- prerequisites: real-mode 1.5 seed needs the master's primary K11 enrolled in
  webauthn mode (else the grant skips); SEED_SCOPE_SERVICES sets the full list.
- manual gates: Touch ID is a hardware prompt --yes can't bypass; 1.5 is
  idempotent + self-authorizing (check → grant → seed, skip if populated).
- env overrides: SEED_MEMORY_CONTENT / SEED_SCOPE_SERVICES.
- troubleshooting: the stale-PATH-binary "unrecognized subcommand 'memory'"
  trap (rebuild + reinstall), 1.5a grant-skipped (k11 enroll), 1.5b seed fail.

* fix(harness): make --webauthn actually gate the 1.5 Touch ID grant

The --webauthn flag was vestigial — set but only shown in the log; 1.5
hardcoded heima-scope-set.sh --webauthn regardless, so `webauthn=false` in the
banner contradicted a Touch ID that would still fire. Now the flag means what
it says:

- 1.5 tries `agentkeys memory put` directly first — succeeds (no Touch ID) when
  the scope is already granted.
- If the put is scope-rejected: grant via real Touch ID ONLY when --webauthn
  was passed, then retry; without --webauthn, fail loud telling the operator to
  re-run with `--real --webauthn` (never triggers an unexpected Touch ID).
- This also avoids a wasted Touch ID when the scope already exists.

The REAL banner now warns when webauthn=false that 1.5 won't grant the scope.
Runbook: TL;DR shows `--real --webauthn`; mode table / Touch ID / seed gates
all state the grant is --webauthn-gated.

* fix(harness): drop bogus --session-id from 1.5 scope grant

heima-scope-set.sh has no --session-id flag (it signs via MASTER_KEY from the
env), so 1.5a failed with "unknown flag: --session-id". Removed it.

NOTE: real put/fetch is still blocked by infra, not the harness — the deployed
broker returns 404 for /v1/cap/memory-{get,put} and the audit worker 404s on
/v1/audit/append/v2 (the memory worker is fine, 422). Those routes need
redeploying (setup-broker-host.sh) before --real can be green.

* docs(CLAUDE): origin/evm deprecated → origin/main is the default + deploy branch

evm is frozen; all new work lands on the default branch main (feature branch →
PR → main). Rewrote the branch/deploy policy: the broker host deploys from
origin/main via `setup-broker-host.sh --ref main` (--upgrade is a back-compat
no-op; --ref drives the pull). Updated the land-the-fix policy's push target
evm → main. Left the EVM-the-VM references (pallet_evm / evm_version="london")
untouched — those are the Heima chain's EVM level, not the git branch.

* fix(harness): point real-mode cap-mint at the BROKER, not the signer

The cap-mint 404 was a harness config bug, not infra. The MCP server was
started with --broker-url ${BACKEND_URL}, and BACKEND_URL = $AGENTKEYS_SIGNER_URL
= signer.litentry.org — the dedicated SIGNER listener (:8092, key-derivation
only), which 404s /v1/cap/*. The cap-mint routes live on the BROKER, fronted at
OIDC_ISSUER = https://$BROKER_HOST = broker.litentry.org (verified live: cap
routes return 422 = exist; jwks 200).

- Resolve BROKER_URL = AGENTKEYS_BROKER_URL → OIDC_ISSUER (the broker), and use
  it for the MCP --broker-url AND the 0.2 broker-healthz check (was checking the
  signer's static /healthz stub).
- 1.4 reuse check now also matches --broker-url (real mode), so a server pointed
  at the wrong broker is replaced instead of silently reused. Empty in light
  mode (no-op grep).

Verified live after the operator's `setup-broker-host.sh --ref main`:
  broker.litentry.org/v1/cap/{cred-fetch,memory-put} → 422 (routes exist)
  audit.litentry.org/v1/audit/append/v2             → 422 (worker fixed)
  memory.litentry.org/v1/memory/{get,put}           → 422 (worker OK)

* feat(mcp): per-actor STS credential relay for worker S3 (issue #90)

The MCP http backend never forwarded per-actor STS creds, so the memory
worker fell back to the EC2 instance profile (SES-only, no S3) and every
memory put/get 502'd. Wire the relay end to end:

- mcp-server: http backend mints agent-tagged STS creds via the broker
  (mint-oidc-jwt -> AssumeRoleWithWebIdentity) and forwards them to the
  worker as X-Aws-* headers, AWS-scoping S3 to bots/<actor>/memory/.
  New config: --agent-session-bearer / --memory-role-arn /
  --vault-role-arn / --aws-region. Adds agentkeys-provisioner dep.
- harness: step 0.8 mints the agent session from agent_private_key
  (omni == actor_omni); 1.4 passes the relay args (real mode always
  restarts for a fresh bearer). Also 0.7 operator-session auto-mint
  (wallet_sig; fixes the expired/wrong-omni alice session), RUSTUP_VOL
  toolchain cache, sbx_exec --max-time, memory-inject </dev/null.
- cli: memory_inject no longer blocks on stdin.
- docs: operator-runbook-wire 502/0.7/0.8 rows; CLAUDE.md ssh-agentkeys.

Verified live: memory.put -> ok (s3_key bots/82a0.../memory/memory.enc),
memory.get -> content, hook memory-inject -> {context}; cross-actor S3
write -> AccessDenied (per-actor IAM isolation holds).

* feat(cli): agent device-session — in-sandbox keygen + wallet_sig (interim §10.2)

Fixes the "master bootstrap" violation: heima-agent-create.sh generated the
agent key on the operator laptop (cast wallet new) with a stub link code. The
agent's device key must be born on the agent machine. New keystone:

- `agentkeys agent device-session` (crates/agentkeys-cli/src/device_session.rs):
  generates/loads a secp256k1 device key IN THE SANDBOX (0600, never leaves),
  derives EVM address + actor_omni (sha256) + device_key_hash (keccak256), mints
  a broker session via wallet_sig SIWE, and emits {agent_address, actor_omni,
  device_key_hash, pop_sig, session_jwt} for the master to bind on-chain. EIP-191
  signing (k256 low-s, v∈{27,28}) matches the broker's ecrecover exactly. Shell-
  driven, no python / no new sandbox runtime. Adds k256+sha3 (already workspace
  deps). Verified live: fresh in-sandbox key → minted session omni == actor_omni;
  key present only in the sandbox.
- harness/phase1-wire-demo.sh: clean_slate step (1.2b) — kills orphaned hermes
  chats, removes stale fake-MCP launchers, clears Hermes session/state/native
  memory so each demo proves recall from the live worker (SKIP_CLEAN=1 opts out).
- CLAUDE.md: "Agent-side wire demo — REAL memory only" rule + ssh-agentkeys.

Full HDKD-literal §10.2 (broker link-code endpoints + daemon keygen + HDKD omni)
tracked in #144. Harness Phase P (orchestrate the ceremony + on-chain register +
scope grant) lands next.

* feat(harness): Phase P — fresh in-sandbox agent pairing (install + approve)

Wires the device-session keystone into the demo so every --real run does a real
arch.md §10.2-style pairing with the agent key born in the sandbox (interim;
full HDKD/broker ceremony = #144).

- heima-agent-create.sh: --from-pubkey mode (--agent-address/--actor-omni/
  --device-key-hash/--pop-sig) registers the SANDBOX-generated device without
  generating a key or funding on the master; writes a KEY-LESS metadata file so
  heima-scope-set can resolve the actor by label. Legacy self-gen path intact.
- phase1-wire-demo.sh Phase P (after 1.3, before 1.4): mint one-time link code →
  `agentkeys agent device-session --regen` in the sandbox → register device
  on-chain → heima-scope-set --webauthn (approve permissions, Touch ID) for the
  FRESH actor. Sets ACTOR_OMNI/AGENT_SESSION_BEARER/DEVICE_KEY_HASH from the
  pairing. Fresh omni each run → empty memory → seeded at 1.5 → recalled in Act 1.
- phase0: derive OPERATOR_OMNI from the master key (no agent-file dependency);
  0.4 defers actor_omni to Phase P; 0.8 skipped under fresh pairing (Phase P
  mints the agent session in-sandbox). --reuse-agent / AGENTKEYS_REUSE_AGENT=1
  restores the legacy master-side agent for fallback.

Verified (no Touch ID / no tx): device-session → from-pubkey --dry-run assembles
registerAgentDevice with the sandbox values + operator_omni from the master key;
metadata file is key-less. On-chain register + Touch ID scope grant are
operator-gated (run with --real --webauthn).

* docs(runbook): Phase P fresh-pairing walkthrough + how-to-run

Fold the §10.2 fresh-pairing flow into operator-runbook-wire.md:
- new "How to run — the --real --webauthn walkthrough" (step-by-step:
  start sandbox → run → Phase P install/pair → approve (Touch ID) → surprise).
- two-modes table, prerequisites, manual gates, flags, env, troubleshooting
  all updated: agent key is born in the sandbox (Phase P), Touch ID + on-chain
  register happen each run (fresh pairing), --reuse-agent for the legacy path.

* feat(harness): deterministic memory-injection check (hermes hooks test)

The chat "surprise" is a flaky success signal — an LLM may phrase a memory-aware
reply any way, treat a past-dated memory as "not this weekend", or DISOWN the
injected context as a hallucination (observed live). Add a deterministic,
no-inference check:

- Phase 4.2 (authoritative): `hermes hooks test pre_llm_call` fires the hook
  through Hermes' OWN config-wired dispatcher and asserts stdout carries a
  "context" block → proves the permissioned memory is injected into the LLM
  request. 4.2b: `hermes hooks doctor` (all 3 hooks exec + valid JSON).
- 4.3 (the chat surprise) demoted to an OPTIONAL live demo; note to run it WHILE
  the gate is open (Phase 5 teardown stops the MCP).
- runbook: "Verifying it worked — deterministically (no LLM inference)" section.

Validated in --light: 4.2 → {"context":"## Memory: travel\nChengdu …"} → ok.

* docs(runbook): clarify the deterministic check runs IN THE SANDBOX (standalone)

Make explicit: the harness (laptop) does setup; `hermes hooks test pre_llm_call`
is a standalone in-sandbox verify (docker exec … or the code-server terminal) —
no need to re-run the harness to re-check memory injection.

* docs(runbook): add the deterministic verify (hermes hooks test) to the TL;DR

Put `hermes hooks test pre_llm_call` front-and-center as a third TL;DR step —
the no-inference pass/fail for memory injection, alongside the run commands.

* fix(harness): 1.5 auto-seeds in fresh pairing (no redundant [y/N] after Touch ID)

Phase P (P.3) already approves the [memory] scope via Touch ID for the fresh
actor, and a brand-new actor's namespace is always empty — so step 1.5's legacy
"memory empty — seed? [y/N]" gate (and its own scope-grant/second-Touch-ID path)
was redundant friction right after the operator already approved. In fresh
pairing, 1.5 now seeds automatically with a single memory.put (succeeds because
P.3 granted the scope). The gated/scope-aware path is kept only for
--reuse-agent (legacy, where the agent wasn't freshly paired this run).

* fix: address Codex adversarial review (PR #141)

- [high] harness 1.3: a failed cross-build now FAILS (red) instead of skip, so
  the demo can't silently run STALE binaries and "pass" the deterministic 4.2
  verify. Explicit unsafe override: ALLOW_STALE_BINARY=1.
- [high] wire.rs: hook scripts 0o755 -> 0o700 + hooks dir 0o700. The scripts
  export the operator session bearer + vendor token; 0o700 closes the cross-user
  token-theft vector. (Same-user exposure is architectural — out-of-process
  custody tracked in #144.)
- [med] device_session.rs: enforce owner-only on an EXISTING key (reject
  symlink / non-regular / group-or-other perm bits), not just on create — so a
  copied/restored loose-perm key can't silently mint a session.
- [med] http_backend sts_headers: warn! on the legitimate no-relay downgrade
  (both absent) and ERROR on inconsistent config (exactly one of bearer/role-arn
  set) before calling the worker. Kept both-absent as a valid fallback
  (--reuse-agent / in-memory) rather than hard-failing.

Build-verified (cli + mcp-server). Findings #1/#2/#4 fixed; #3 partial by design.

* style: cargo fmt + clippy fix for CI gate

rustfmt on the agentkeys-cli files touched this session (device_session.rs,
wire.rs) plus pre-existing drift in hook.rs/main.rs; and drop a redundant
explicit deref in wire.rs strip_top_level_hooks (clippy explicit_auto_deref).
Fixes the cargo-fmt CI job; clippy + build clean on cli and mcp-server.
hanwencheng added a commit that referenced this pull request May 31, 2026
Merged origin/main (#140 IAM strategy reset + #141 agentkeys wire/hook +
#137 audit-vector exporter + #138 CI hardening). The merge brought in the
Authority-Host / Task-Host model, which obsoletes the prior agent-onboarding
design (paste-a-pair-code into a remote sandbox). Redesigned the web-flow plan
to match.

What changed in the plan:

- stage3-agent-usage.md — full rewrite. Agent onboarding is now pair (agentkeys
  agent device-session — key born in the runtime, never on the master) → wire
  (agentkeys wire installs 3 IAM-guarantee hooks the LLM can't bypass:
  pre_tool_call→check, post_tool_call→audit, pre_llm_call→memory-inject) → the
  three acts (permissioned memory / deterministic denial / audit) + the
  memory-aware surprise (deterministically backed by `hermes hooks test
  pre_llm_call`, not a chat reply). Adds the hook-aware live dashboard +
  guarantee-health panel. Preserves the 16-step isolation health check.

- overview.md — new "two-host model" section up top (Authority Host vs Task
  Host; IAM tool vs IAM guarantee). Act-3 TODO reframed as "Phase 2 — the wire
  flow." Master onboarding (Phase 1) unchanged.

- data-model.md — replaced the bootstrap/* endpoints with the pair/wire/observe
  surface: /v1/agents/pair/{init,bind,approve-scope}, /v1/agents/:id/{wire,
  unwire,verify/memory-inject,guarantee-health}, hook-tagged /v1/audit/stream.
  Notes the per-actor STS relay config + MCP port 18088.

- input-discipline.md — §2.5: runtime choice + wire namespaces/payment-scope are
  Real inputs; the agent device key is born in the runtime (master never holds
  it); the runtime list reflects real adapter support, never faked.

- README.md — redesign banner + updated source-of-truth + file-map row.

- dev.sh — MCP_PORT default 8088 → 18088 (8088 collides with the sandbox
  gem-server, per #141); header comments aligned.

Plan only — no implementation. Master-onboarding docs (stage1/stage2) untouched.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant