M1 foundation: strategy + roadmap + research docs + 20 refined issues by hanwencheng · Pull Request #130 · litentry/agentKeys

hanwencheng · 2026-05-24T13:44:41Z

Summary

This PR lands the strategic + roadmap + planning foundation for M1 work to begin. It absorbs ~3 sessions of business research, architecture corrections, PM automation tuning, and per-issue refinement into a coherent set of documents + issue bodies that the next developer/agent picks up cleanly.

The branch is 17 commits ahead; 5 were already merged via PR #127 + #128 (squashed onto main; their original commits still appear in git log here but produce no diff). The remaining ~12 commits are the new work below.

What lands

1. Strategic anchor — Agent IAM positioning

docs/research/agent-iam-strategy.md — full strategic positioning: Agent IAM as the category (Identity / Memory / Permissions / Capability tokens / Audit / Delegation / Revocation), Task Host vs Authority Host distinction, dual narrative (B2B vs consumer), 4 architecture corrections (bounded revocation, two-tier audit, delegation as preview-only, dual narrative), memory namespace model, 12-month roadmap, 7 strategic risks
4 architecture corrections from external critique already absorbed in the doc (immediate-online + bounded-offline revocation; real-time off-chain + 2-min on-chain batched audit; delegation as schema/preview only in v1; dual-narrative B2B + consumer separation)

2. Milestone roadmap — operational source of truth

docs/spec/plans/milestones-roadmap.md — M1-M7 detailed scope + post-M7 horizons + strategic risks table + how-to-use instructions. Replaces the archived v1/v2 stage plan. Companion to arch.md (architecture) and agent-iam-strategy.md (positioning).

3. Hardware research — MagicLick / xiaozhi-server / Volcano Ark / Tuya

docs/research/xiaozhi-esp32-magiclink.md — hardware identification (MagicLick 2.5 = ESP32-S3 + ES8311 + 128×128 GC9107 + WiFi/4G) and Option-1 (use xiaozhi-server) vs Option-2 (rewrite) decision
docs/research/xiaozhi-hermes-architecture.md — three ASCII diagrams: baseline xiaozhi / pivoted flow / per-turn sequence
docs/research/xiaozhi-hermes-risks.md — R1-R4 risk verification grounded in actual repo code with file:line citations
docs/research/volcano-ark-mcp-integration.md — Phase 3a integration architecture (Pattern B: hosted MCP server) + AgentKeys MCP tool inventory
docs/research/tuya-vs-xiaozhi.md — Tuya = different role (PaaS for brand-owners) vs xiaozhi (open firmware for makers); 17× star delta justifies xiaozhi-first; Tuya is M2 complement

4. Business research — wedge thesis

docs/research/ai-hardware-companion-wedge.md — full wedge analysis (FoloToy / Ropet / BubblePal as priority vendors; pricing structure; competitive landscape; Alipay vs Stripe rationale)
docs/research/ai-hardware-companion-office-hours.md — YC office-hours diagnostic with six forcing questions; kill criterion (0 paid pilots from 3 vendors in 6 months → pivot to MCP credential broker)
docs/research/ai-memory-systems-survey.md — competitive memory-layer survey

5. Demo plan + ESP32 firmware foundation

docs/spec/plans/issue-103-aiosandbox-hermes-esp32-demo.md — v0 demo plan (note: superseded by the xiaozhi-server pivot; Phase 1: AgentKeys MCP server — 7 active tools + 3 schema-only #107-Phase 1: Volcano Ark MCP marketplace registration (PoC) #112 are the M1 cohort that actually ships)
firmware/esp32s3-agentkeys/ — 17 files scaffolded (reference only; MagicLick uses xiaozhi-esp32 firmware instead)

6. Memory design

docs/plan/agentkeys-memory-design.md — 4-type memory taxonomy (profile/procedural/semantic/episodic) that composes with the namespace model in Phase 1: Memory namespace model — wire to cap-token + worker filter #108

7. Docs reorganization

Archived to docs/archived/:

docs/stage7-demo-and-verification.md (123KB, supplanted by scripts/setup-broker-host.sh + v2 demo orchestrators)
docs/operator-runbook-stage7.md (39KB, same)
docs/stage8-wip.md (15KB, off-chain vault design now in arch.md + threat-model)
docs/spec/plans/development-stages.md (the 8-stage v1/v2 plan, replaced by milestones-roadmap.md)

Cross-refs updated in 9 active docs: arch.md, dev-setup.md, CLAUDE.md, v2-stage1-migration-and-demo.md, spec/threat-model-key-custody.md, spec/ses-email-architecture.md, spec/credential-backend-interface.md, spec/heima-gaps-vs-desired-architecture.md, wiki/upstream-backend-classes-exercise-vs-distribution.md.

8. PM automation simplification

Removed pm-workflow-audit.yml, pm-sync-fields-from-labels.yml, expected-workflows.json, check-workflows.sh, sync-fields-from-labels.sh (labels migrated to fields; sync no longer needed)
Kept pm-auto-archive-closed-pr.yml (auto-archive PRs on close)
Added pm/scripts/sync-size-from-effort.sh (one-shot Size population from issue body Effort estimates)
Deleted the unused "Blocked by" TEXT project field (use GitHub native issue relationships instead)
Updated ~/.claude/skills/agentkeys-issue-create/SKILL.md to use native GitHub relationships + set Kind/Priority/Size project fields directly via API

9. Issue refinement — 20 issues (#107-#126)

All 20 new issues created in earlier PM work refined with the standard template:

Context — why this matters now in the milestone plan
Scope — specific deliverables for the milestone
Out of scope — explicit deferrals to later milestones
Acceptance criteria — testable checkboxes
Risks + mitigations — what could go wrong + how we handle it
References — links to milestones-roadmap.md, arch.md, relevant research docs
Effort — day-level sequencing
Pickup notes — what to read first, where the code lives, what to watch for, when to use the /agentkeys-issue-create skill

Body sizes 5KB-8KB per issue (~130KB total of context information). Coverage:

Milestone	Issues
M1	#107 MCP server · #108 namespace · #109 audit · #110 parent UI · #111 demo runbook + pitch · #112 Volcano Ark
M2	#113 vendor portal · #114 Tuya connector · #115 audit dashboard · #116 FoloToy outreach · #126 consumer brand
M3	#117 Hermes-MCP · #118 OpenClaw-MCP · #119 Python SDK · #120 TypeScript SDK
M4	#121 delegation chains · #122 approval workflow · #123 policy versioning
M7	#124 MCP extensions · #125 OAuth-for-Agents

10. Project board state

All 38 open issues now have Kind + Priority + Size project field values populated (live, not just in this PR). The Phase + Estimate fields are deprecated; operator can delete via UI when ready.

Test plan

This PR doesn't ship any executable code — it lands strategy + research + plans + issue refinement. Verification is doc + state-level:

Verify docs/spec/plans/milestones-roadmap.md renders cleanly on GitHub (table of contents, all links resolve, no broken anchors)
Open 3 random issues from Phase 1: AgentKeys MCP server — 7 active tools + 3 schema-only #107-Phase 2: Consumer brand + landing page (name TBD: scoped.ai / leash.ai / bonded.ai) #126 and confirm the body has all 7 template sections (Context / Scope / Out of scope / Acceptance / Risks / References / Pickup notes)
Confirm archived stage docs are at docs/archived/*_2026-04.md and not on active paths
Confirm no stale operator-runbook-stage7.md / stage7-demo-and-verification.md / stage8-wip.md refs in docs/arch.md, docs/dev-setup.md, CLAUDE.md
Confirm ~/.claude/skills/agentkeys-issue-create/SKILL.md is registered and invokable via /agentkeys-issue-create
(Optional) Visit litentry/projects/19 and confirm all open issues have populated Kind / Priority / Size fields

What's NOT in this PR

Executable M1 code (the actual MCP server, namespace plumbing, parent UI) — those are the M1 issues themselves
Native mobile apps (M5)
Standards-body engagement (M7 — #124 and #125 document the path but the work doesn't start until precondition is met)

Migration notes for the next agent / developer

After this PR merges, the next agent picking up M1 work should:

Read docs/spec/plans/milestones-roadmap.md §2 (M1 scope) — operational source of truth
Read docs/research/agent-iam-strategy.md §4 (Phase 1 storyboard) — strategic anchor
Read the refined body of #107 (AgentKeys MCP server) — that's THE critical-path M1 issue everything else plumbs through
Use the /agentkeys-issue-create skill for any follow-up issues

Add two business research artifacts under docs/research/: - ai-hardware-companion-wedge.md (round 1+2): market sizing, competitive landscape, direct competitors, business model critique, 12 critical comments, naming, Stripe ACP / Alipay+ AMP integration path, WeChat feasibility, security-first demo storyboard. - ai-hardware-companion-office-hours.md: YC-style office-hours diagnostic on the same wedge. Six forcing questions surfaced zero vendor conversations + no named buyer. P2 narrowed mid-session to memory portability + isolation + privacy. Approach D chosen: AgentKeys-native hosted sandbox (aiosandbox) with OpenClaw/Hermes agent runtime + per-actor isolation (issue #90) + cross-vendor memory consent model. Pricing pivoted to AWS-style elastic per-user (Free / Basic vendor-paid $2-3/active-device / Pro $10 user-paid with 30% lifetime acquirer revshare / future Compute usage-based). 8/10 quality after 2 spec-review iterations. Both index entries added to docs/research/README.md.

End-to-end demo plan for the AgentKeys hardware-vendor wedge: ESP32 device + simple URL config → agent-infra/sandbox running Hermes (AgentKeys-native runtime) + agentkeys-daemon with mock memory injected from S3 MD blob at agent boot. 12-step implementation order. Reuses arch.md canonical primitives (sandbox runtime, supervisord lifecycle, memory bucket layout bots/<actor_omni>/memory/, agentkeys-daemon). v0 scope: single ESP32, single sandbox, single mock memory blob, text-mode chat. Voice mode, multi-tenancy, cap-token enforcement, cross-vendor portability, and payment rails are deferred to follow-up issues. 3-week effort estimate. Acceptance: reviewer can flash board + run setup script + see personalized response within 15 minutes.

Pivot canonical demo target from generic ESP32 to ESP32-S3-DevKitC-1: - Native USB-OTG (single USB-C, no separate UART chip) - PSRAM (8MB octal) for voice follow-up audio buffers - Xtensa LX7 with AI vector instructions for on-device wake-word - Still MCU-class authenticity (~$10-15 dev board, <$5 chip in BOM volume) Stack: PlatformIO + ESP-IDF (not Arduino) — production AI-toy vendors use ESP-IDF and S3-specific features (native USB CDC, PSRAM, ESP-DSP, secure boot, OTA) need IDF. Scaffolded firmware foundation under firmware/esp32s3-agentkeys/: - platformio.ini, CMakeLists.txt, sdkconfig.defaults, partitions.csv - main.c spawns 4 FreeRTOS tasks (wifi/button/chat/led) coordinated via event group + queue - wifi_sta.c: working STA mode + auto-reconnect - button.c: working GPIO interrupt + 200ms debounce on BOOT (GPIO 0) - led_status.c: stub blinker (real WS2812 RGB state machine is TODO) - https_chat.c: stub echoing user input (real esp_http_client POST is TODO) - config.h: NVS → secrets.h → hardcoded defaults priority order - README.md: flash quickstart + troubleshooting Foundation builds + flashes + boots into FreeRTOS loop today; chat returns mock '[mock] you said: ...' echo. Real HTTPS POST is the clear next step (esp_http_client + cJSON parse, ~100 lines). Renamed plan file issue-102 → issue-103 to match actual issue number.

…on 1 Hardware on hand confirmed via the device display showing 'magiclink 2p5/1.9.4': MagicLick 2.5 running xiaozhi-esp32 v1.9.4 firmware. xiaozhi-esp32 (github.com/78/xiaozhi-esp32, MIT, 26K stars) is the dominant Chinese open-source AI voice firmware for ESP32. Supports 70+ boards including ours. Full streaming voice pipeline already shipping: offline wake-word (ESP-SR) → ASR → LLM → TTS → OPUS over WebSocket or MQTT+UDP. MCP-based device + cloud control. MagicLick 2.5 hardware specs reconstructed from boards/magiclick-2p5/config.h + board.cc: - ESP32-S3 chip - ES8311 audio codec (full-duplex I2S, 24kHz) - 128x128 GC9107 SPI LCD with emoji rendering - 3 buttons (main GPIO 21, left GPIO 0, right GPIO 47) - 2 WS2812 LEDs on GPIO 38 - DualNetworkBoard: WiFi primary + ML307 Cat.1 4G fallback - Battery + power manager with tickless idle 'Hermes agent' clarified to mean NousResearch/hermes-agent (MIT, Python, self-improving learning loop, multi-interface gateway, LLM-agnostic). NOT an internal AgentKeys runtime as the original plan §C4 mistakenly stated. Strong recommendation: Option 1 — keep xiaozhi firmware unchanged, build cloud-side xiaozhi-hermes-bridge that speaks the xiaozhi WebSocket protocol while routing the agent loop to Hermes-agent (which pulls memory from agentkeys-daemon per §C3). Reduces v0 effort from ~3 months (custom firmware) to ~2-3 weeks (server-side adapter only). Forks from one of four existing reference server implementations (Python xinnan-tech, Go hackers365 with openclaw, Java joey-zhou, Go AnimeAIChat). Hardware verification: 5 paths documented (visual / ROM bootloader via boot button hold / WiFi captive portal / vendor app / disassembly). USB doesn't enumerate by default because device is in normal firmware mode; hold LEFT button while connecting USB to drop into ESP32-S3 ROM bootloader for esptool access. Added PIVOT banner at top of issue-103 plan flagging that C4/C5/C6 are superseded. Full new direction in docs/research/xiaozhi-esp32-magiclink.md. firmware/esp32s3-agentkeys/ stays in tree as reference scaffolding for future custom hardware (new product lines that need first-party firmware), not the path for the MagicLick demo.

Two new research docs supporting the issue #103 Option 1 direction: docs/research/xiaozhi-hermes-architecture.md Permanent architecture reference with three ASCII diagrams: - Diagram A: baseline xiaozhi flow (device → cloud → LLM) - Diagram B: our pivoted flow with changed layers highlighted (UNCHANGED firmware, NEW URL only on device side, fork + one-module-rewrite on cloud side, new memory layer) - Diagram C: per-turn sequence with latency budget breakdown (~2.0-2.5s first-audio; ~+250-500ms delta vs baseline) Precise diff table: 13 layers compared, only 4 actually change, 3 of those are NEW additions (not modifications). The actual code change is concentrated in ONE module of the bridge fork. docs/research/xiaozhi-hermes-risks.md Risk verification grounded in actual Hermes-agent + xinnan-tech/xiaozhi-esp32-server source code, NOT assumptions. Specific file paths + line numbers cited throughout. R1 (Hermes HTTP gateway stateless-vs-session): REAL but mitigation is built-in. Gateway exposes /v1/chat/completions with three session modes (stateless per-call default, explicit continuation via X-Hermes-Session-Id, long-term memory scoping via X-Hermes-Session-Key). Bridge sets per-device session keys. Effort: 2-4 hours. R2 (Latency stack): mostly NOT real. agent/conversation_loop.py line 4152 confirms learning loop runs as background task AFTER response delivery, OFF the turn path. With enabled_toolsets=[] + max_iterations=1 + streaming SSE, overhead is ~50-200ms. xiaozhi-performance-research baselines: - ASR: 0.795s Xunfei / 0.85s Doubao - LLM first-token: 0.434s Qwen-Flash / 0.774s Kimi-K2 - TTS: 0.488s CosyVoice / 0.667s Edge-TTS / 0.103s PaddleSpeech Pipelined: 1.4-2.4s first-audio, within 2.0-2.5s target. Effort: 1 day (tune + measure). R3 (Concurrent device handling): less bad than feared. Hermes gateway IS multi-tenant by design (serves Telegram + Discord + Slack + WhatsApp + Signal + CLI from one process). Per-request memory ~20-80MB; 100 devices ~2-8GB on one VPS. xiaozhi-esp32- server's documented '100+ devices per process' claim is unverified in repo — only 6-concurrent demo documented. For v0: 0 hours. For production scale: 1-2 weeks sticky-LB. R4 (newly discovered during research): cold agent construction per request adds 50-300ms on every turn. _create_agent() called inside _handle_chat_completions for EVERY request, no pooling. Most impactful for voice UX (compounds turn-by-turn). Mitigation: fork-local agent pool (1 day) or upstream patch (2-4 days). Net effect: v0 timeline revised from ~3 weeks to ~1-2 weeks. Updated docs/research/README.md to index both new docs.

Three updates following the risk-verification research: 1. docs/research/tuya-vs-xiaozhi.md (new) Answers 'is Tuya the same role as xiaozhi?': DIFFERENT role, partial firmware overlap. Tuya = closed PaaS for brand-owners (NYSE: TUYA, $80.9M Q1 2026 revenue, 306 premium customers, 1.97M developers, 100+ countries). xiaozhi = open firmware for makers (MIT, 26.7K stars). TuyaOpen is a 1.6K-star defensive ESP32 SDK from Jan 2026 — 17x adoption gap. AgentKeys posture: complement both, never compete. - Phase 1 (now): xiaozhi cloud-side bridge (issue #103) - Phase 2 (3-6 mo): Tuya Cloud Development connector - Sit above both rails (same pattern as Alipay+ AMP / Stripe ACP) 2. v0 demo timeline revised from ~3 weeks to ~1-2 weeks in issue-103-aiosandbox-hermes-esp32-demo.md: - PIVOT banner at top of plan - Effort estimate section (line 441) The basis is xiaozhi-hermes-risks.md showing all four risks are smaller than originally feared (R1 built-in mitigation, R2 background loop, R3 multi-tenant by design, R4 cheap fork-local hack). 3. Fixed false cross-reference in xiaozhi-hermes-risks.md The 'unverified 100+ devices' claim was incorrectly attributed to the office-hours doc. It actually circulated in earlier informal discussion — not in any committed doc. Reworded to remove the false attribution. 4. Added implementation update banner to office-hours doc pointing readers at the four xiaozhi research docs + the revised v0 timeline. The §Recommended Approach / Pricing / Cross-Vendor Memory Model below stay unchanged — only the firmware-and-runtime layer shifted.

…form Earlier version of tuya-vs-xiaozhi.md claimed Phase 3 would add adapters for Xiaomi MIoT, Alibaba Smart Home, and Volcano AI Hub without verifying each platform's third-party developer surface. Research findings per platform: Volcano Ark (ByteDance) — VERIFIED FEASIBLE - Open international developer signup, no PRC entity / ICP needed - MCP-server marketplace launched 2026 (mcp.so/server/mcp-server/volcengine) - AgentKeys publishes an MCP tool any Doubao-powered AI hardware can call - Genuinely Tuya-equivalent for the AI-side rather than IoT-side - ~1 week effort AliGenie / Tmall Genie (Alibaba) — FEASIBLE WITH PARTNERSHIP - International Alibaba Cloud account works for sandbox + custom-skill webhook - Production distribution onto Tmall Genie hardware requires Alibaba's skill review + de-facto PRC-domiciled brand - ~1 week dev + partnership lead time Xiaomi MIoT / XiaoAI — WEAKEST - Brand-tier integration requires Mi Ecosystem partnership admission - Publishable XiaoAI skills require PRC real-name verification - Consumer-OAuth path (Home-Assistant-style) works today for foreign servers but is a narrower wedge than brand-tier - Defer until partnership or scope to consumer-OAuth only Rewrote Phase 3 section to split into 3a (Volcano open), 3b (AliGenie with partner), 3c (Xiaomi deferred). Added explicit 'Honest note on Phase 3 verification' acknowledging the original claim was hand-wavy. Added 15 source URLs to the Sources block.

New research doc with three ASCII diagrams showing how AgentKeys integrates with Volcano Ark (ByteDance's enterprise AI cloud hosting Doubao LLM) as a Phase 3a hosted MCP server registered in their 2026 MCP marketplace. Pattern B (hosted by us, marketplace is discovery only): - AgentKeys MCP server at mcp.agentkeys.io exposes 5-7 tools (memory get/put, cred fetch, cap mint, audit append, whoami, permission check) mapped to existing Stage 7+ backend RPCs - Vendor Doubao agents call our MCP tools via HTTPS/SSE with per-vendor Bearer token + per-actor X-AgentKeys-Actor header - No vendor firmware changes; no Doubao runtime changes — just marketplace registration + one-checkbox vendor opt-in Diagram A: high-level architecture (device → RTC → Doubao → MCP → AgentKeys MCP server → backend) Diagram B: per-call MCP tool sequence with ~200-400ms per-call latency budget (concern noted: multiple tool calls per turn can stack — mitigation via batched 'context.bootstrap' tool) Diagram C: cross-vendor composition showing same user (O_kevin) with FoloToy (Doubao + MCP adapter) AND MagicLick (xiaozhi + Hermes bridge) both terminating at one AgentKeys backend with one memory namespace + one identity tree + one audit ledger. This is the cross-vendor portability moat materializing automatically per office-hours doc §Cross-Vendor Memory Model. Effort: ~1-1.5 weeks (sibling to xiaozhi-hermes-bridge). 6 open risks called out + mitigations sketched: - MCP latency stacking per turn - Marketplace approval SLA - Per-tenant auth model TBD - Actor omni resolution pattern (vendor-side vs whoami call) - MCP protocol version compat with Doubao runtime - Cross-vendor cap-token consent (resolved: same office-hours consent ceremony applies) Updated docs/research/README.md to index the new doc.

New strategic anchor doc at docs/research/agent-iam-strategy.md captures the revised direction from multi-round discussion (original Agent IAM proposal → independent analysis → ChatGPT critique → synthesis). Three-layer positioning, three audiences: - AI Device Account (consumer/vendor BD pitch) - Agent IAM (B2B/investor/CTO category) - Trust Substrate (compliance/regulator/Web3 partner) Five accepted strategic moves: - Task Host vs Authority Host distinction (we are Authority) - Agent IAM as the technical category (not key management / not memory MCP) - MCP as integration surface, not product identity - Zero orchestration in v1 — hard line - Deploy → grow → standardize sequencing Four architecture corrections that tighten commitments: 1. Revocation: 'immediate online, bounded TTL/cache offline' (NOT 'no propagation delay'). High-risk actions always online; low-risk reads use short-lived cached caps; offline mode denies sensitive actions by default. 2. Audit (two-tier): real-time off-chain feed in parent-control UI + 10-min batched Merkle root anchored to Heima. NOT real-time on-chain. Heima explorer is tamper-evidence proof, not the UX surface. 3. Delegation: agentkeys.delegation.grant is schema-documented but not active in v1. Returns not_implemented_in_v1. Active delegation lands in Phase 4. 4. Dual narrative — don't lead with 'Agent IAM' in consumer contexts; don't lead with 'memory portability' anywhere. Authority is the category; privacy/memory are benefits. Phase 1 revised to three-act IAM demo (per office-hours doc §9.6 storyboard, now elevated to authoritative spec): - Act 1 Permissioned Memory (scoped read, not 'smart') - Act 2 Deterministic Denial (policy decides, no LLM) - Act 3 Online Revocation (parent UI → device denies) Implementation note: cap-token machinery is already shipped via Stage 7+ (broker, signer, K3/K10 HDKD, memory/cred/audit workers, per-actor isolation per issue #90). New Phase 1 work is the MCP server wrapper (~1 week), parent-control web UI (~3-4 days), two-tier audit wiring (~1 day), runbook (~half day). Total ~2 weeks. 12-month roadmap revised: - Phase 0: shipped (Stage 7+) - Phase 1 (0-2 wk): Agent IAM v0 demo - Phase 2 (1-2 mo): vendor pilot + multi-rail (Volcano Ark, Tuya) - Phase 3 (3-4 mo): runtime neutrality (Hermes/OpenClaw as MCP tools) - Phase 4 (6 mo): delegation + approval + ACL depth - Phase 5 (post-12mo): standards engagement (contingent on traction) Updates to existing docs: - docs/research/README.md: indexed new strategy doc as 'Strategic anchor' - ai-hardware-companion-office-hours.md: positioning note pivoted from 'implementation update' to 'strategic update' pointing at strategy doc - issue-103 plan: PIVOT banner expanded with three-act demo + four corrections; old §C4/C5/C6 marked superseded; cap-token shipped context made explicit; no implementation re-spec per user direction

…espace model Three nits from review: 1. Generic chain instead of Heima-specific positioning The strategy doc shouldn't be Heima-locked — chain is a deployment config (arch.md describes 'Litentry parachain (or EVM L2 fallback)' so the design is already chain-agnostic at the contract layer). Updated all positioning text to 'audit chain' / 'on-chain' / 'chain explorer' instead of Heima-specific. Kept arch.md and runbook refs to Heima where they describe actual deployed infra (the 'currently Heima per arch.md, swappable' note in §Phase 0 captures the reality without committing the strategy to Heima). 2. 2-min batch instead of 10-min Modern fast-finality chains with cheap gas make sub-block-time batching viable. 10 min was too conservative — set 2 min as the default cadence. Faster batch = better UX for parents watching audit feed; the cost per anchor is sub-cent at typical batch sizes. 3. Memory namespace model (new §3.5) Read the memory research/design doc from main (commit 53ccc9f 'docs: AI memory worker design plan + agent-memory research survey'). It defines four STRUCTURAL types (profile / procedural / semantic / episodic) with specific S3 key derivation per type. For Agent IAM, namespaces are an ORTHOGONAL semantic dimension that composes with the 4 structural types. Memory item has BOTH a structural type AND a semantic namespace. Cap-tokens scope namespace access (namespaces_allowed claim, deterministic string-set membership check). v0 defaults: personal / family / work / travel (4 namespaces). kids/device/temp deferred to Phase 3-4. Composition is non-conflicting: namespaces live in wire-format metadata, NOT in the S3 key derivation. Memory worker filters at retrieval. The 4-type S3 layout from memory-design §3.2a is preserved exactly. Future evolution path documented (path-prefixed layout if scale demands). arch.md compatibility check: zero contradictions found. - Memory data_class binding (§17.5) unchanged - Per-actor PrincipalTag isolation (§17) unchanged - Cap-token format extensible (namespaces_allowed is additive) - Memory worker never calls LLM invariant preserved - K3 epoch rotation unchanged - Architecture-as-source-of-truth: future arch.md §17 + memory- design §3 get additive paragraphs when v0 ships, no canonical- name conflicts introduced. Files updated: - docs/research/agent-iam-strategy.md: §3.2 audit (2-min + chain- agnostic), §3.5 NEW memory namespace model with arch.md compat check, Phase 0 line (Heima → 'currently Heima per arch.md, swappable') - docs/research/README.md: strategy doc summary updated with 2-min + namespace model - docs/research/ai-hardware-companion-office-hours.md: implementation update banner reflects 2-min on-chain anchor - docs/research/volcano-ark-mcp-integration.md: diagram boxes generic ('AWS S3, audit chain', 'off-chain + chain') - docs/spec/plans/issue-103-aiosandbox-hermes-esp32-demo.md: PIVOT banner reflects 2-min chain-agnostic anchor; NOT-in-scope list generic 'on-chain audit anchoring'

New pm/ subfolder for GitHub project management automation. Treats milestones / labels / issue categorization as code under version control with idempotent shell scripts that reconcile GitHub state to declarative JSON. Files: - pm/README.md — folder purpose + how to use - pm/milestones.json — 7 roadmap milestones (M1-M7) source of truth - pm/labels.json — 40-label taxonomy: area/ kind/ phase/ status/ priority/ + extras (needs-arch-review, vendor-blocker) - pm/issue-assignments.json — categorization of all 23 pre-existing open issues with milestone + labels + notes - pm/new-issues.json — 20 new Phase 1-7 issues to create - pm/arch-md-verification-report.md — #5/#6/#9/#37 verification - pm/PROJECT-DASHBOARD-GUIDE.md — how to use projects/19 board + CI integration patterns - pm/scripts/sync-milestones.sh — idempotent: creates/updates from milestones.json - pm/scripts/sync-labels.sh — idempotent: creates/updates from labels.json - pm/scripts/sync-issues.sh — idempotent: assigns milestone+labels to each issue in issue-assignments.json - pm/scripts/create-issues.sh — idempotent: creates new issues from new-issues.json, skips if title already exists - pm/scripts/audit.sh — read-only: groups open issues by milestone, flags uncategorized + missing area/* labels - pm/scripts/add-to-project.sh — adds issues to litentry/projects/19 (requires gh auth refresh -s project,read:project) Executed in this session: - Created 7 milestones (M1: First MCP demo + Volcano Ark PoC, M2: First vendor wedge, M3: Runtime neutrality, M4: Capability + revocation depth, M5: Native mobile + biometric, M6: TEE integration + security, M7: Standards + ecosystem) - Created 40 labels across 5 namespaces (area, kind, phase, status, priority) + extras (needs-arch-review, vendor-blocker) - Categorized 23 pre-existing open issues with milestones + labels - Created 20 new issues (#107-#126) for Phase 1-7 work per the agent-iam-strategy.md roadmap - Verified #5, #6, #9, #37 against arch.md — verdicts: #5 partially aligned (closed; lives as tier A in §15.3), #6 needs design refresh against current K11+SidecarRegistry, #9 already implemented as K3 HDKD per §6.2 (recommend close), #37 superseded by K11 WebAuthn per §K11 (recommend close) Final state: 43 open issues, 100% categorized to milestones, 100% labeled with area/*. No uncategorized issues. Per user direction: did NOT merge / close #5/#6/#9/#37 even though recommendations are clear. User to make final close decisions.

…s-fields strategy Three fixes responding to user feedback: 1. add-to-project.sh: replace mapfile (bash 4+) with while-read loop for macOS bash 3.2 portability per CLAUDE.md project standard. Verified working: 'bash pm/scripts/add-to-project.sh 103' now successfully adds the issue to litentry/projects/19. 2. NEW pm/scripts/setup-project-fields.sh: creates the canonical project-level fields (Priority, Phase, Estimate, Iteration, Risk, Notes) via gh project field-create. Solves the 'cluttered Labels column' UX pain by letting the user split single-value PM concerns (priority, phase, status) out of the multi-value labels pile into typed field columns. 3. PROJECT-DASHBOARD-GUIDE.md: added 'Labels vs Fields — when to use which' section explaining the split: - Labels (repo-level, multi-value): area/*, kind/*, semantic flags like needs-arch-review, vendor-blocker - Fields (project-level, single-value): Priority, Phase, Status, Estimate, Risk Plus step-by-step instructions to migrate the cluttered Labels column to clean field-based grouping. These don't change the strategic plan; they just fix the operational PM-board ergonomics the user surfaced from running the script live.

User pointed out the project board has 10 built-in workflows that replace much of what the scripts do. Updated guidance to prefer workflows; scripts are fallback/batch tools. PROJECT-DASHBOARD-GUIDE.md updates: - Replaced the brief 'Recommended workflows' section with a full table of the 10 built-in workflows + their default state + what to configure - New 'Script ↔ workflow split' table making clear which jobs use workflows vs scripts (workflows for runtime project events; scripts for repo-level state, batch creation, field definitions) - One-time workflow configuration checklist (3 steps to get the Auto-add filter set, verify other green workflows, optionally enable Auto-archive) add-to-project.sh updates: - Header now flags this as PRIMARILY A BACKFILL / FALLBACK TOOL - Lists three legit use cases: backfilling pre-existing issues, fallback when Auto-add workflow is misconfigured, adding from a different repo via PM_REPO override - Pointer to PROJECT-DASHBOARD-GUIDE.md for workflow setup No script behavior changes; only documentation tightens to match the workflow-first reality.

… stay manual) User asked if workflows can be programmatically checked. Partial yes: GitHub's public GraphQL ProjectV2Workflow type exposes only: id, name, number, enabled, createdAt, updatedAt, project, fullDatabaseId NOT the filter expression or action configuration (UI-only, not in the public API). So we get: ✅ 'is the workflow enabled' check ❌ 'does the workflow do the right thing' check (filter/action body) New files: - pm/expected-workflows.json: declarative source of truth for what workflows should be enabled + what each one's filter/action should do (free-text 'verify_in_ui' field that engineers cross-check against the UI) - pm/scripts/check-workflows.sh: audits live workflows on litentry/projects/19 vs expected-workflows.json - Confirms enabled state matches - Flags unexpected workflows that exist but aren't in our list - Prints all per-workflow expected filter/action notes for manual UI verification - Exits 0 when all expectations match, 1 on mismatch (CI-friendly) Live audit result (verified on litentry/projects/19): 7 expected workflows enabled (Auto-add to project, Auto-add sub-issues to project, Item added/closed, Auto-close issue, PR linked/merged), 4 optional workflows correctly disabled (Auto-archive, Code review approved, Code changes requested, Item reopened). 11/11 match. This script can be wired into a future CI workflow to alert on drift if anyone disables Auto-add to project or similar.

Adds two GitHub Actions and one supporting script to push project automation to its API ceiling. After this change, label-to-field sync and workflow drift detection both run on every event / daily schedule instead of as manual scripts. What landed: - .github/workflows/pm-sync-fields-from-labels.yml: triggers on issues labeled/unlabeled/opened/transferred. Calls sync-fields-from-labels.sh to mirror priority/p* + phase/v* labels into the project's Priority + Phase single-select fields. workflow_dispatch variant for backfill. - .github/workflows/pm-workflow-audit.yml: daily cron + push trigger. Runs check-workflows.sh against expected-workflows.json and opens (or comments on) a tracking issue when drift is detected. - pm/scripts/sync-fields-from-labels.sh: backing script for the sync workflow. Forgiving mode (warns + skips when a field is missing rather than aborting), bash 3.2 portable, uses -f for option-ID strings to avoid gh api numeric coercion. - pm/scripts/setup-project-fields.sh: now detects + rebuilds empty-placeholder single-select fields (GitHub's built-in Priority/Size ship with zero options) and cleans up "Project <Name>" zombie fields left behind when deleteProjectV2Field renames instead of deleting system-reserved names. Fully idempotent. - pm/PROJECT-DASHBOARD-GUIDE.md: new "What's automated vs UI-only" verdict table (built-in workflow filter/action contents + custom views are 100% UI-only — no API mutation exists for either). New "Known gotcha" section on Priority-field zombies. Script-vs-workflow split rewritten as three-tier matrix (built-in / our GH Action / bash script). Verification: tested live against litentry/projects/19. Backfilled 40+ issues onto board, synced Priority + Phase from labels on every one, zero zombie fields remain. setup-project-fields.sh second-run shows all skips. API ceiling discovered via GraphQL introspection: ProjectV2Workflow has no create/update mutation (only delete). ProjectV2View has no create/update mutation at all. Both are read-only via API, UI-only to configure. Required repo secret for CI: PM_PROJECT_TOKEN (fine-grained PAT with Projects=read+write, Issues=read+write). Documented in dashboard guide.

…ub native User feedback after live use of the migration: - The label→field sync workflow is no longer needed (labels were deleted in PR #129; fields are now the source of truth, set via the issue-create skill or manually in UI). - The workflow-drift audit workflow added noise without value (built-in workflows rarely drift, and the operator manages them in UI anyway). - The Blocked-by TEXT project field duplicates GitHub's native issue relationships ("Mark as blocked by" / "Mark as blocking" in the UI side panel, keyboard `B B` / `B X`). Use the native feature. ## Removed - .github/workflows/pm-workflow-audit.yml (drift detection — operator handles in UI) - .github/workflows/pm-sync-fields-from-labels.yml (labels-to-fields sync — labels are gone) - pm/expected-workflows.json (declarative expectation for the audit) - pm/scripts/check-workflows.sh (called by the audit) - pm/scripts/sync-fields-from-labels.sh (called by the sync workflow) - "Blocked by" project field (deleted via API; setup-project-fields.sh no longer creates it) ## Kept / added - .github/workflows/pm-auto-archive-closed-pr.yml — auto-archives PRs from the board on close (built-in Auto-archive only fires after 30 days) - pm/scripts/sync-size-from-effort.sh (NEW) — one-shot bulk-populate of the Size project field by parsing each issue's "## Effort" body section. Idempotent (skips already-sized items). Defaults to M when no parseable effort line found. - ~/.claude/skills/agentkeys-issue-create — updated to: - Set Kind/Priority/Size project fields directly via API (replaces deleted label-sync workflow) - Use GitHub native relationships for blocked-by (replaces removed field) ## Live state after this change 39 open issues all have complete Kind + Priority + Size field values (36 mapped from explicit "## Effort" bodies; 3 defaulted to M for issues without parseable effort). ## What stays UI-only - The deprecated "Phase" project field still exists with v0..v4 data on issues — operator can delete in UI when ready. - The deprecated "Estimate" project field (duplicate of GitHub's built-in Size) still exists — same UI-cleanup-later.

The v1/v2 staged plan framing retires after v2-stage3 ships green. Going forward, milestone-level work (M1-M7) is tracked against the new docs/spec/plans/milestones-roadmap.md — the operational companion to agent-iam-strategy.md. ## Archived (moved to docs/archived/ with _2026-04 suffix) - docs/stage7-demo-and-verification.md (123KB, the big stage-7 end-to-end demo doc) - docs/operator-runbook-stage7.md (39KB, supplanted by scripts/setup-broker-host.sh) - docs/stage8-wip.md (15KB, off-chain vault design now in arch.md + threat-model) - docs/spec/plans/development-stages.md (the 8-stage v2 plan, replaced by milestones-roadmap.md) Per CLAUDE.md docs policy: archive, never delete; archived files are never read in normal dev. ## Added - docs/spec/plans/milestones-roadmap.md — M1-M7 detail + post-M7 horizons + strategic risks table + how-to-use-this-doc. Cross-references arch.md for invariants and agent-iam-strategy.md for positioning. This becomes the authoritative milestone plan from M1 onward. ## Cross-refs updated (active docs only) - docs/arch.md: §24 + §25 cross-refs now point at scripts/setup-broker-host.sh (canonical idempotent runbook) + archived stage-7 commentary for history - docs/dev-setup.md: 5 stage7/dev-stages refs → setup-broker-host.sh + milestones-roadmap.md - docs/v2-stage1-migration-and-demo.md: 4 stage7 refs → archive locations + status banner noting v1/v2 retirement after v2-stage3 - CLAUDE.md: 3 refs (build plan, runbook policy, harness workflow) → milestones-roadmap.md - docs/spec/{threat-model-key-custody,ses-email-architecture,credential-backend-interface}.md: stage8-wip refs → archive - docs/spec/heima-gaps-vs-desired-architecture.md: stage7 demo §4 → archive - docs/wiki/upstream-backend-classes-exercise-vs-distribution.md: stage7 demo refs → archive (wiki auto-publishes to GitHub Wiki via publish-wiki.yml) ## What's NOT updated (intentional) Issue-specific plan files under docs/spec/plans/issue-64/ + issue-74-* + issue-credential-storage-* still reference the archived docs by name. These are themselves historical issue-deliverable records; the references are timestamped artifacts of when those issues were planned, not active operational links. They stay as-is.

Merge origin/main into claude/hopeful-mccarthy-15e5ba to resolve conflicts opened by PR #127's squash-merge landing on main. Plus cleanup of pm/ files that no longer fit the post-migration workflow. ## Conflict resolution 4 files had conflicting versions (main's pre-migration state vs our post-migration state): - pm/PROJECT-DASHBOARD-GUIDE.md → OURS (post-migration narrative) - pm/README.md → OURS, then rewritten to drop refs to deleted files - pm/labels.json → OURS (recolored area/*, red reserved for human attention) - pm/scripts/setup-project-fields.sh → OURS (Urgent/High/Medium/Low + Kind) ## Re-deletions (main re-added; we re-removed) Files we deleted in 47d503f got re-added by the merge because they were on main from #127. Re-removed: - .github/workflows/pm-sync-fields-from-labels.yml - .github/workflows/pm-workflow-audit.yml - pm/expected-workflows.json - pm/scripts/check-workflows.sh - pm/scripts/sync-fields-from-labels.sh ## Unused pm/ files removed (per user request) - pm/scripts/sync-issues.sh: actively broken — references deleted labels (priority/p*, kind/*, phase/v*) that were removed in the migration. Running it today fails. - pm/scripts/create-issues.sh: one-shot tool that created issues from new-issues.json. All 20 issues already created (#107-#126); running again would attempt to duplicate. - pm/issue-assignments.json: historical record of pre-migration label assignments. Data references deleted labels. - pm/new-issues.json: historical record of which 20 issues to create. All created. The refined-issue bodies are the source of truth now. - pm/arch-md-verification-report.md: one-off arch.md compatibility verification for #5/#6/#9/#37. Job done. ## What remains in pm/ | Path | Status | |---|---| | pm/PROJECT-DASHBOARD-GUIDE.md | Active — dashboard usage | | pm/README.md | Active — folder intro (rewritten) | | pm/labels.json | Active — sync-labels.sh source | | pm/milestones.json | Active — sync-milestones.sh source | | pm/scripts/add-to-project.sh | Active — backfill tool | | pm/scripts/audit.sh | Active — read-only state audit | | pm/scripts/setup-project-fields.sh | Active — project field bootstrap | | pm/scripts/sync-labels.sh | Active — applies labels.json | | pm/scripts/sync-milestones.sh | Active — applies milestones.json | | pm/scripts/sync-size-from-effort.sh | Active — one-shot Size population |

hanwencheng added 18 commits May 23, 2026 23:15

hanwencheng merged commit f132a7c into main May 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

M1 foundation: strategy + roadmap + research docs + 20 refined issues#130

M1 foundation: strategy + roadmap + research docs + 20 refined issues#130
hanwencheng merged 18 commits into
mainfrom
claude/hopeful-mccarthy-15e5ba

hanwencheng commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hanwencheng commented May 24, 2026

Summary

What lands

1. Strategic anchor — Agent IAM positioning

2. Milestone roadmap — operational source of truth

3. Hardware research — MagicLick / xiaozhi-server / Volcano Ark / Tuya

4. Business research — wedge thesis

5. Demo plan + ESP32 firmware foundation

6. Memory design

7. Docs reorganization

8. PM automation simplification

9. Issue refinement — 20 issues (#107-#126)

10. Project board state

Test plan

What's NOT in this PR

Migration notes for the next agent / developer

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant