Phase 3: LLM-host hook integration (Claude Code, Codex/ChatGPT, etc.)

## Goal

Let AgentKeys participate in the LLM host's lifecycle via the host's native hook system — without modifying the host. Targets the Phase 3 / M3 multi-runtime story (Hermes, OpenClaw, Doubao, Claude Code, Codex/ChatGPT all invoke the same AgentKeys MCP tools under their own runtime control).

This is the complement to the MCP tools we shipped in Phase 1 (`memory.get`, `permission.check`, `audit.append`, …). Tools are LLM-invoked on demand; hooks are *runtime*-invoked at lifecycle events. Both directions matter:

| Direction | Trigger | AgentKeys role |
|---|---|---|
| **Host → AgentKeys** (most useful) | Host's lifecycle hook fires (pre-tool, post-tool, stop, session-end) | AgentKeys tool gets called as the hook body |
| **AgentKeys → Host** (later) | `permission.check` denies, `cap.revoke` fires | Hook fires in the host to update its UI / refuse the user / clear context |

## Why this matters

Today the LLM has to decide to call our tools. That works for memory queries but is wrong for guardrails — the LLM shouldn't be free to skip a `permission.check` before a payment, and shouldn't be free to skip `audit.append` after sensitive tool use. Hooks move those guarantees out of LLM discretion and into the runtime.

Concrete patterns this unlocks:

1. **Pre-payment gate.** Claude Code `PreToolUse` hook for any tool whose name matches `*pay*|*order*|*purchase*` → call `agentkeys.permission.check` → block the tool call if verdict=deny. The LLM physically cannot bypass.
2. **Auto-audit.** `PostToolUse` hook → `agentkeys.audit.append` with the tool name, params hash, and result. Every tool use lands in the off-chain audit feed without LLM cooperation.
3. **Session summary.** `Stop` hook → `agentkeys.memory.put(namespace=profile, content=…)` to roll up what the user agreed to / learned / changed during the session.
4. **Cross-runtime parity.** Same hook contract exposed for Codex/ChatGPT, Cursor, future agents. The runtime's lifecycle vocabulary differs, but the AgentKeys tool surface is the same.

## Phase 3 scope (proposed)

- [ ] **Reference hook configs** for Claude Code + Codex/ChatGPT. Ship as starter snippets in `docs/wiki/` showing PreToolUse / PostToolUse / Stop wired to AgentKeys MCP tools. Operator copies into their `~/.claude/settings.json` (Claude Code) or equivalent.
- [ ] **`agentkeys hook check`** CLI helper — wraps the host's hook stdin/stdout JSON convention. Operator just writes `command: 'agentkeys hook check --scope payment.spend'` in their settings; we handle the JSON parsing + MCP call + return the right block/allow shape.
- [ ] **Cap-mint pre-warming for hook latency** — hooks add p99 latency on every tool call; pre-mint a short-TTL cap on session start so the per-call check is sub-50ms.
- [ ] **One e2e demo per runtime** — Claude Code, Codex/ChatGPT, xiaozhi (already partially covered by Phase 1) all running the same three-act storyboard via hooks instead of LLM-invoked tools. Demonstrates 'pick your runtime, AgentKeys behavior is identical.'
- [ ] **Reverse direction stub** — define the JSON shape the host hook fires when our server initiates a denial / revocation. Implementation deferred to M4.

## Out of scope (defer to M4)

- Full delegation-chain hooks (parent-agent → child-agent lifecycle binding)
- Real-time UI push when a hook denies (parent app notification)
- Cross-host hook portability spec (a vendor-neutral hook standard)

## References

- Claude Code hooks docs: `claude --help` → settings → hooks (PreToolUse, PostToolUse, Stop, etc.)
- Codex hook system: equivalent lifecycle events
- [`docs/spec/plans/milestones-roadmap.md`](docs/spec/plans/milestones-roadmap.md) §M3 — multi-runtime parity is the umbrella goal
- AgentKeys MCP tools shipped in Phase 1 (#107) — the surface the hooks call

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 3: LLM-host hook integration (Claude Code, Codex/ChatGPT, etc.) #133

Goal

Why this matters

Phase 3 scope (proposed)

Out of scope (defer to M4)

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Direction	Trigger	AgentKeys role
Host → AgentKeys (most useful)	Host's lifecycle hook fires (pre-tool, post-tool, stop, session-end)	AgentKeys tool gets called as the hook body
AgentKeys → Host (later)	`permission.check` denies, `cap.revoke` fires	Hook fires in the host to update its UI / refuse the user / clear context

Phase 3: LLM-host hook integration (Claude Code, Codex/ChatGPT, etc.) #133

Description

Goal

Why this matters

Phase 3 scope (proposed)

Out of scope (defer to M4)

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions