Skip to content

M1 foundation: strategy + roadmap + research docs + 20 refined issues#130

Merged
hanwencheng merged 18 commits into
mainfrom
claude/hopeful-mccarthy-15e5ba
May 24, 2026
Merged

M1 foundation: strategy + roadmap + research docs + 20 refined issues#130
hanwencheng merged 18 commits into
mainfrom
claude/hopeful-mccarthy-15e5ba

Conversation

@hanwencheng
Copy link
Copy Markdown
Member

Summary

This PR lands the strategic + roadmap + planning foundation for M1 work to begin. It absorbs ~3 sessions of business research, architecture corrections, PM automation tuning, and per-issue refinement into a coherent set of documents + issue bodies that the next developer/agent picks up cleanly.

The branch is 17 commits ahead; 5 were already merged via PR #127 + #128 (squashed onto main; their original commits still appear in git log here but produce no diff). The remaining ~12 commits are the new work below.

What lands

1. Strategic anchor — Agent IAM positioning

  • docs/research/agent-iam-strategy.md — full strategic positioning: Agent IAM as the category (Identity / Memory / Permissions / Capability tokens / Audit / Delegation / Revocation), Task Host vs Authority Host distinction, dual narrative (B2B vs consumer), 4 architecture corrections (bounded revocation, two-tier audit, delegation as preview-only, dual narrative), memory namespace model, 12-month roadmap, 7 strategic risks
  • 4 architecture corrections from external critique already absorbed in the doc (immediate-online + bounded-offline revocation; real-time off-chain + 2-min on-chain batched audit; delegation as schema/preview only in v1; dual-narrative B2B + consumer separation)

2. Milestone roadmap — operational source of truth

  • docs/spec/plans/milestones-roadmap.md — M1-M7 detailed scope + post-M7 horizons + strategic risks table + how-to-use instructions. Replaces the archived v1/v2 stage plan. Companion to arch.md (architecture) and agent-iam-strategy.md (positioning).

3. Hardware research — MagicLick / xiaozhi-server / Volcano Ark / Tuya

4. Business research — wedge thesis

5. Demo plan + ESP32 firmware foundation

6. Memory design

7. Docs reorganization

Archived to docs/archived/:

  • docs/stage7-demo-and-verification.md (123KB, supplanted by scripts/setup-broker-host.sh + v2 demo orchestrators)
  • docs/operator-runbook-stage7.md (39KB, same)
  • docs/stage8-wip.md (15KB, off-chain vault design now in arch.md + threat-model)
  • docs/spec/plans/development-stages.md (the 8-stage v1/v2 plan, replaced by milestones-roadmap.md)

Cross-refs updated in 9 active docs: arch.md, dev-setup.md, CLAUDE.md, v2-stage1-migration-and-demo.md, spec/threat-model-key-custody.md, spec/ses-email-architecture.md, spec/credential-backend-interface.md, spec/heima-gaps-vs-desired-architecture.md, wiki/upstream-backend-classes-exercise-vs-distribution.md.

8. PM automation simplification

  • Removed pm-workflow-audit.yml, pm-sync-fields-from-labels.yml, expected-workflows.json, check-workflows.sh, sync-fields-from-labels.sh (labels migrated to fields; sync no longer needed)
  • Kept pm-auto-archive-closed-pr.yml (auto-archive PRs on close)
  • Added pm/scripts/sync-size-from-effort.sh (one-shot Size population from issue body Effort estimates)
  • Deleted the unused "Blocked by" TEXT project field (use GitHub native issue relationships instead)
  • Updated ~/.claude/skills/agentkeys-issue-create/SKILL.md to use native GitHub relationships + set Kind/Priority/Size project fields directly via API

9. Issue refinement — 20 issues (#107-#126)

All 20 new issues created in earlier PM work refined with the standard template:

  • Context — why this matters now in the milestone plan
  • Scope — specific deliverables for the milestone
  • Out of scope — explicit deferrals to later milestones
  • Acceptance criteria — testable checkboxes
  • Risks + mitigations — what could go wrong + how we handle it
  • References — links to milestones-roadmap.md, arch.md, relevant research docs
  • Effort — day-level sequencing
  • Pickup notes — what to read first, where the code lives, what to watch for, when to use the /agentkeys-issue-create skill

Body sizes 5KB-8KB per issue (~130KB total of context information). Coverage:

Milestone Issues
M1 #107 MCP server · #108 namespace · #109 audit · #110 parent UI · #111 demo runbook + pitch · #112 Volcano Ark
M2 #113 vendor portal · #114 Tuya connector · #115 audit dashboard · #116 FoloToy outreach · #126 consumer brand
M3 #117 Hermes-MCP · #118 OpenClaw-MCP · #119 Python SDK · #120 TypeScript SDK
M4 #121 delegation chains · #122 approval workflow · #123 policy versioning
M7 #124 MCP extensions · #125 OAuth-for-Agents

10. Project board state

All 38 open issues now have Kind + Priority + Size project field values populated (live, not just in this PR). The Phase + Estimate fields are deprecated; operator can delete via UI when ready.

Test plan

This PR doesn't ship any executable code — it lands strategy + research + plans + issue refinement. Verification is doc + state-level:

What's NOT in this PR

  • Executable M1 code (the actual MCP server, namespace plumbing, parent UI) — those are the M1 issues themselves
  • Native mobile apps (M5)
  • Standards-body engagement (M7 — #124 and #125 document the path but the work doesn't start until precondition is met)

Migration notes for the next agent / developer

After this PR merges, the next agent picking up M1 work should:

  1. Read docs/spec/plans/milestones-roadmap.md §2 (M1 scope) — operational source of truth
  2. Read docs/research/agent-iam-strategy.md §4 (Phase 1 storyboard) — strategic anchor
  3. Read the refined body of #107 (AgentKeys MCP server) — that's THE critical-path M1 issue everything else plumbs through
  4. Use the /agentkeys-issue-create skill for any follow-up issues

Add two business research artifacts under docs/research/:

- ai-hardware-companion-wedge.md (round 1+2): market sizing, competitive
  landscape, direct competitors, business model critique, 12 critical
  comments, naming, Stripe ACP / Alipay+ AMP integration path, WeChat
  feasibility, security-first demo storyboard.
- ai-hardware-companion-office-hours.md: YC-style office-hours
  diagnostic on the same wedge. Six forcing questions surfaced zero
  vendor conversations + no named buyer. P2 narrowed mid-session to
  memory portability + isolation + privacy. Approach D chosen:
  AgentKeys-native hosted sandbox (aiosandbox) with OpenClaw/Hermes
  agent runtime + per-actor isolation (issue #90) + cross-vendor
  memory consent model. Pricing pivoted to AWS-style elastic
  per-user (Free / Basic vendor-paid $2-3/active-device / Pro $10
  user-paid with 30% lifetime acquirer revshare / future Compute
  usage-based). 8/10 quality after 2 spec-review iterations.

Both index entries added to docs/research/README.md.
End-to-end demo plan for the AgentKeys hardware-vendor wedge:
ESP32 device + simple URL config → agent-infra/sandbox running
Hermes (AgentKeys-native runtime) + agentkeys-daemon with mock
memory injected from S3 MD blob at agent boot.

12-step implementation order. Reuses arch.md canonical primitives
(sandbox runtime, supervisord lifecycle, memory bucket layout
bots/<actor_omni>/memory/, agentkeys-daemon). v0 scope: single
ESP32, single sandbox, single mock memory blob, text-mode chat.

Voice mode, multi-tenancy, cap-token enforcement, cross-vendor
portability, and payment rails are deferred to follow-up issues.

3-week effort estimate. Acceptance: reviewer can flash board + run
setup script + see personalized response within 15 minutes.
Pivot canonical demo target from generic ESP32 to ESP32-S3-DevKitC-1:
- Native USB-OTG (single USB-C, no separate UART chip)
- PSRAM (8MB octal) for voice follow-up audio buffers
- Xtensa LX7 with AI vector instructions for on-device wake-word
- Still MCU-class authenticity (~$10-15 dev board, <$5 chip in BOM volume)

Stack: PlatformIO + ESP-IDF (not Arduino) — production AI-toy vendors
use ESP-IDF and S3-specific features (native USB CDC, PSRAM, ESP-DSP,
secure boot, OTA) need IDF.

Scaffolded firmware foundation under firmware/esp32s3-agentkeys/:
- platformio.ini, CMakeLists.txt, sdkconfig.defaults, partitions.csv
- main.c spawns 4 FreeRTOS tasks (wifi/button/chat/led) coordinated
  via event group + queue
- wifi_sta.c: working STA mode + auto-reconnect
- button.c: working GPIO interrupt + 200ms debounce on BOOT (GPIO 0)
- led_status.c: stub blinker (real WS2812 RGB state machine is TODO)
- https_chat.c: stub echoing user input (real esp_http_client POST is TODO)
- config.h: NVS → secrets.h → hardcoded defaults priority order
- README.md: flash quickstart + troubleshooting

Foundation builds + flashes + boots into FreeRTOS loop today; chat
returns mock '[mock] you said: ...' echo. Real HTTPS POST is the
clear next step (esp_http_client + cJSON parse, ~100 lines).

Renamed plan file issue-102 → issue-103 to match actual issue number.
…on 1

Hardware on hand confirmed via the device display showing 'magiclink
2p5/1.9.4': MagicLick 2.5 running xiaozhi-esp32 v1.9.4 firmware.

xiaozhi-esp32 (github.com/78/xiaozhi-esp32, MIT, 26K stars) is the
dominant Chinese open-source AI voice firmware for ESP32. Supports
70+ boards including ours. Full streaming voice pipeline already
shipping: offline wake-word (ESP-SR) → ASR → LLM → TTS → OPUS over
WebSocket or MQTT+UDP. MCP-based device + cloud control.

MagicLick 2.5 hardware specs reconstructed from
boards/magiclick-2p5/config.h + board.cc:
- ESP32-S3 chip
- ES8311 audio codec (full-duplex I2S, 24kHz)
- 128x128 GC9107 SPI LCD with emoji rendering
- 3 buttons (main GPIO 21, left GPIO 0, right GPIO 47)
- 2 WS2812 LEDs on GPIO 38
- DualNetworkBoard: WiFi primary + ML307 Cat.1 4G fallback
- Battery + power manager with tickless idle

'Hermes agent' clarified to mean NousResearch/hermes-agent (MIT,
Python, self-improving learning loop, multi-interface gateway,
LLM-agnostic). NOT an internal AgentKeys runtime as the original
plan §C4 mistakenly stated.

Strong recommendation: Option 1 — keep xiaozhi firmware unchanged,
build cloud-side xiaozhi-hermes-bridge that speaks the xiaozhi
WebSocket protocol while routing the agent loop to Hermes-agent
(which pulls memory from agentkeys-daemon per §C3). Reduces v0
effort from ~3 months (custom firmware) to ~2-3 weeks (server-side
adapter only). Forks from one of four existing reference server
implementations (Python xinnan-tech, Go hackers365 with openclaw,
Java joey-zhou, Go AnimeAIChat).

Hardware verification: 5 paths documented (visual / ROM bootloader
via boot button hold / WiFi captive portal / vendor app / disassembly).
USB doesn't enumerate by default because device is in normal firmware
mode; hold LEFT button while connecting USB to drop into ESP32-S3
ROM bootloader for esptool access.

Added PIVOT banner at top of issue-103 plan flagging that C4/C5/C6
are superseded. Full new direction in
docs/research/xiaozhi-esp32-magiclink.md.

firmware/esp32s3-agentkeys/ stays in tree as reference scaffolding
for future custom hardware (new product lines that need first-party
firmware), not the path for the MagicLick demo.
Two new research docs supporting the issue #103 Option 1 direction:

docs/research/xiaozhi-hermes-architecture.md
  Permanent architecture reference with three ASCII diagrams:
  - Diagram A: baseline xiaozhi flow (device → cloud → LLM)
  - Diagram B: our pivoted flow with changed layers highlighted
    (UNCHANGED firmware, NEW URL only on device side, fork +
    one-module-rewrite on cloud side, new memory layer)
  - Diagram C: per-turn sequence with latency budget breakdown
    (~2.0-2.5s first-audio; ~+250-500ms delta vs baseline)
  Precise diff table: 13 layers compared, only 4 actually change,
  3 of those are NEW additions (not modifications). The actual code
  change is concentrated in ONE module of the bridge fork.

docs/research/xiaozhi-hermes-risks.md
  Risk verification grounded in actual Hermes-agent +
  xinnan-tech/xiaozhi-esp32-server source code, NOT assumptions.
  Specific file paths + line numbers cited throughout.

  R1 (Hermes HTTP gateway stateless-vs-session): REAL but
  mitigation is built-in. Gateway exposes /v1/chat/completions
  with three session modes (stateless per-call default, explicit
  continuation via X-Hermes-Session-Id, long-term memory scoping
  via X-Hermes-Session-Key). Bridge sets per-device session keys.
  Effort: 2-4 hours.

  R2 (Latency stack): mostly NOT real. agent/conversation_loop.py
  line 4152 confirms learning loop runs as background task AFTER
  response delivery, OFF the turn path. With enabled_toolsets=[]
  + max_iterations=1 + streaming SSE, overhead is ~50-200ms.
  xiaozhi-performance-research baselines:
  - ASR: 0.795s Xunfei / 0.85s Doubao
  - LLM first-token: 0.434s Qwen-Flash / 0.774s Kimi-K2
  - TTS: 0.488s CosyVoice / 0.667s Edge-TTS / 0.103s PaddleSpeech
  Pipelined: 1.4-2.4s first-audio, within 2.0-2.5s target.
  Effort: 1 day (tune + measure).

  R3 (Concurrent device handling): less bad than feared. Hermes
  gateway IS multi-tenant by design (serves Telegram + Discord +
  Slack + WhatsApp + Signal + CLI from one process). Per-request
  memory ~20-80MB; 100 devices ~2-8GB on one VPS. xiaozhi-esp32-
  server's documented '100+ devices per process' claim is
  unverified in repo — only 6-concurrent demo documented. For v0:
  0 hours. For production scale: 1-2 weeks sticky-LB.

  R4 (newly discovered during research): cold agent construction
  per request adds 50-300ms on every turn. _create_agent() called
  inside _handle_chat_completions for EVERY request, no pooling.
  Most impactful for voice UX (compounds turn-by-turn).
  Mitigation: fork-local agent pool (1 day) or upstream patch
  (2-4 days).

  Net effect: v0 timeline revised from ~3 weeks to ~1-2 weeks.

Updated docs/research/README.md to index both new docs.
Three updates following the risk-verification research:

1. docs/research/tuya-vs-xiaozhi.md (new)
   Answers 'is Tuya the same role as xiaozhi?': DIFFERENT role,
   partial firmware overlap. Tuya = closed PaaS for brand-owners
   (NYSE: TUYA, $80.9M Q1 2026 revenue, 306 premium customers,
   1.97M developers, 100+ countries). xiaozhi = open firmware
   for makers (MIT, 26.7K stars). TuyaOpen is a 1.6K-star
   defensive ESP32 SDK from Jan 2026 — 17x adoption gap.

   AgentKeys posture: complement both, never compete.
   - Phase 1 (now): xiaozhi cloud-side bridge (issue #103)
   - Phase 2 (3-6 mo): Tuya Cloud Development connector
   - Sit above both rails (same pattern as Alipay+ AMP / Stripe ACP)

2. v0 demo timeline revised from ~3 weeks to ~1-2 weeks
   in issue-103-aiosandbox-hermes-esp32-demo.md:
   - PIVOT banner at top of plan
   - Effort estimate section (line 441)
   The basis is xiaozhi-hermes-risks.md showing all four risks
   are smaller than originally feared (R1 built-in mitigation,
   R2 background loop, R3 multi-tenant by design, R4 cheap
   fork-local hack).

3. Fixed false cross-reference in xiaozhi-hermes-risks.md
   The 'unverified 100+ devices' claim was incorrectly
   attributed to the office-hours doc. It actually circulated
   in earlier informal discussion — not in any committed doc.
   Reworded to remove the false attribution.

4. Added implementation update banner to office-hours doc
   pointing readers at the four xiaozhi research docs + the
   revised v0 timeline. The §Recommended Approach / Pricing /
   Cross-Vendor Memory Model below stay unchanged — only the
   firmware-and-runtime layer shifted.
…form

Earlier version of tuya-vs-xiaozhi.md claimed Phase 3 would add
adapters for Xiaomi MIoT, Alibaba Smart Home, and Volcano AI Hub
without verifying each platform's third-party developer surface.
Research findings per platform:

Volcano Ark (ByteDance) — VERIFIED FEASIBLE
- Open international developer signup, no PRC entity / ICP needed
- MCP-server marketplace launched 2026 (mcp.so/server/mcp-server/volcengine)
- AgentKeys publishes an MCP tool any Doubao-powered AI hardware can call
- Genuinely Tuya-equivalent for the AI-side rather than IoT-side
- ~1 week effort

AliGenie / Tmall Genie (Alibaba) — FEASIBLE WITH PARTNERSHIP
- International Alibaba Cloud account works for sandbox + custom-skill webhook
- Production distribution onto Tmall Genie hardware requires Alibaba's
  skill review + de-facto PRC-domiciled brand
- ~1 week dev + partnership lead time

Xiaomi MIoT / XiaoAI — WEAKEST
- Brand-tier integration requires Mi Ecosystem partnership admission
- Publishable XiaoAI skills require PRC real-name verification
- Consumer-OAuth path (Home-Assistant-style) works today for foreign
  servers but is a narrower wedge than brand-tier
- Defer until partnership or scope to consumer-OAuth only

Rewrote Phase 3 section to split into 3a (Volcano open), 3b
(AliGenie with partner), 3c (Xiaomi deferred). Added explicit
'Honest note on Phase 3 verification' acknowledging the original
claim was hand-wavy. Added 15 source URLs to the Sources block.
New research doc with three ASCII diagrams showing how AgentKeys
integrates with Volcano Ark (ByteDance's enterprise AI cloud
hosting Doubao LLM) as a Phase 3a hosted MCP server registered in
their 2026 MCP marketplace.

Pattern B (hosted by us, marketplace is discovery only):
- AgentKeys MCP server at mcp.agentkeys.io exposes 5-7 tools
  (memory get/put, cred fetch, cap mint, audit append, whoami,
  permission check) mapped to existing Stage 7+ backend RPCs
- Vendor Doubao agents call our MCP tools via HTTPS/SSE with
  per-vendor Bearer token + per-actor X-AgentKeys-Actor header
- No vendor firmware changes; no Doubao runtime changes — just
  marketplace registration + one-checkbox vendor opt-in

Diagram A: high-level architecture (device → RTC → Doubao →
  MCP → AgentKeys MCP server → backend)
Diagram B: per-call MCP tool sequence with ~200-400ms per-call
  latency budget (concern noted: multiple tool calls per turn
  can stack — mitigation via batched 'context.bootstrap' tool)
Diagram C: cross-vendor composition showing same user (O_kevin)
  with FoloToy (Doubao + MCP adapter) AND MagicLick (xiaozhi +
  Hermes bridge) both terminating at one AgentKeys backend with
  one memory namespace + one identity tree + one audit ledger.
  This is the cross-vendor portability moat materializing
  automatically per office-hours doc §Cross-Vendor Memory Model.

Effort: ~1-1.5 weeks (sibling to xiaozhi-hermes-bridge).

6 open risks called out + mitigations sketched:
- MCP latency stacking per turn
- Marketplace approval SLA
- Per-tenant auth model TBD
- Actor omni resolution pattern (vendor-side vs whoami call)
- MCP protocol version compat with Doubao runtime
- Cross-vendor cap-token consent (resolved: same office-hours
  consent ceremony applies)

Updated docs/research/README.md to index the new doc.
New strategic anchor doc at docs/research/agent-iam-strategy.md
captures the revised direction from multi-round discussion
(original Agent IAM proposal → independent analysis → ChatGPT
critique → synthesis).

Three-layer positioning, three audiences:
- AI Device Account (consumer/vendor BD pitch)
- Agent IAM (B2B/investor/CTO category)
- Trust Substrate (compliance/regulator/Web3 partner)

Five accepted strategic moves:
- Task Host vs Authority Host distinction (we are Authority)
- Agent IAM as the technical category (not key management / not
  memory MCP)
- MCP as integration surface, not product identity
- Zero orchestration in v1 — hard line
- Deploy → grow → standardize sequencing

Four architecture corrections that tighten commitments:

1. Revocation: 'immediate online, bounded TTL/cache offline'
   (NOT 'no propagation delay'). High-risk actions always
   online; low-risk reads use short-lived cached caps; offline
   mode denies sensitive actions by default.

2. Audit (two-tier): real-time off-chain feed in parent-control
   UI + 10-min batched Merkle root anchored to Heima. NOT
   real-time on-chain. Heima explorer is tamper-evidence proof,
   not the UX surface.

3. Delegation: agentkeys.delegation.grant is schema-documented
   but not active in v1. Returns not_implemented_in_v1. Active
   delegation lands in Phase 4.

4. Dual narrative — don't lead with 'Agent IAM' in consumer
   contexts; don't lead with 'memory portability' anywhere.
   Authority is the category; privacy/memory are benefits.

Phase 1 revised to three-act IAM demo (per office-hours doc
§9.6 storyboard, now elevated to authoritative spec):
- Act 1 Permissioned Memory (scoped read, not 'smart')
- Act 2 Deterministic Denial (policy decides, no LLM)
- Act 3 Online Revocation (parent UI → device denies)

Implementation note: cap-token machinery is already shipped via
Stage 7+ (broker, signer, K3/K10 HDKD, memory/cred/audit workers,
per-actor isolation per issue #90). New Phase 1 work is the
MCP server wrapper (~1 week), parent-control web UI (~3-4 days),
two-tier audit wiring (~1 day), runbook (~half day). Total ~2 weeks.

12-month roadmap revised:
- Phase 0: shipped (Stage 7+)
- Phase 1 (0-2 wk): Agent IAM v0 demo
- Phase 2 (1-2 mo): vendor pilot + multi-rail (Volcano Ark, Tuya)
- Phase 3 (3-4 mo): runtime neutrality (Hermes/OpenClaw as MCP tools)
- Phase 4 (6 mo): delegation + approval + ACL depth
- Phase 5 (post-12mo): standards engagement (contingent on traction)

Updates to existing docs:
- docs/research/README.md: indexed new strategy doc as 'Strategic anchor'
- ai-hardware-companion-office-hours.md: positioning note pivoted from
  'implementation update' to 'strategic update' pointing at strategy doc
- issue-103 plan: PIVOT banner expanded with three-act demo + four
  corrections; old §C4/C5/C6 marked superseded; cap-token shipped
  context made explicit; no implementation re-spec per user direction
…espace model

Three nits from review:

1. Generic chain instead of Heima-specific positioning
   The strategy doc shouldn't be Heima-locked — chain is a deployment
   config (arch.md describes 'Litentry parachain (or EVM L2 fallback)'
   so the design is already chain-agnostic at the contract layer).
   Updated all positioning text to 'audit chain' / 'on-chain' /
   'chain explorer' instead of Heima-specific. Kept arch.md and
   runbook refs to Heima where they describe actual deployed infra
   (the 'currently Heima per arch.md, swappable' note in §Phase 0
   captures the reality without committing the strategy to Heima).

2. 2-min batch instead of 10-min
   Modern fast-finality chains with cheap gas make sub-block-time
   batching viable. 10 min was too conservative — set 2 min as the
   default cadence. Faster batch = better UX for parents watching
   audit feed; the cost per anchor is sub-cent at typical batch sizes.

3. Memory namespace model (new §3.5)
   Read the memory research/design doc from main (commit 53ccc9f
   'docs: AI memory worker design plan + agent-memory research survey').
   It defines four STRUCTURAL types (profile / procedural / semantic /
   episodic) with specific S3 key derivation per type.

   For Agent IAM, namespaces are an ORTHOGONAL semantic dimension
   that composes with the 4 structural types. Memory item has BOTH
   a structural type AND a semantic namespace. Cap-tokens scope
   namespace access (namespaces_allowed claim, deterministic
   string-set membership check).

   v0 defaults: personal / family / work / travel (4 namespaces).
   kids/device/temp deferred to Phase 3-4.

   Composition is non-conflicting: namespaces live in wire-format
   metadata, NOT in the S3 key derivation. Memory worker filters
   at retrieval. The 4-type S3 layout from memory-design §3.2a is
   preserved exactly. Future evolution path documented (path-prefixed
   layout if scale demands).

   arch.md compatibility check: zero contradictions found.
   - Memory data_class binding (§17.5) unchanged
   - Per-actor PrincipalTag isolation (§17) unchanged
   - Cap-token format extensible (namespaces_allowed is additive)
   - Memory worker never calls LLM invariant preserved
   - K3 epoch rotation unchanged
   - Architecture-as-source-of-truth: future arch.md §17 + memory-
     design §3 get additive paragraphs when v0 ships, no canonical-
     name conflicts introduced.

Files updated:
- docs/research/agent-iam-strategy.md: §3.2 audit (2-min + chain-
  agnostic), §3.5 NEW memory namespace model with arch.md compat
  check, Phase 0 line (Heima → 'currently Heima per arch.md,
  swappable')
- docs/research/README.md: strategy doc summary updated with 2-min
  + namespace model
- docs/research/ai-hardware-companion-office-hours.md: implementation
  update banner reflects 2-min on-chain anchor
- docs/research/volcano-ark-mcp-integration.md: diagram boxes
  generic ('AWS S3, audit chain', 'off-chain + chain')
- docs/spec/plans/issue-103-aiosandbox-hermes-esp32-demo.md:
  PIVOT banner reflects 2-min chain-agnostic anchor; NOT-in-scope
  list generic 'on-chain audit anchoring'
New pm/ subfolder for GitHub project management automation. Treats
milestones / labels / issue categorization as code under version
control with idempotent shell scripts that reconcile GitHub state
to declarative JSON.

Files:
- pm/README.md — folder purpose + how to use
- pm/milestones.json — 7 roadmap milestones (M1-M7) source of truth
- pm/labels.json — 40-label taxonomy: area/ kind/ phase/ status/
  priority/ + extras (needs-arch-review, vendor-blocker)
- pm/issue-assignments.json — categorization of all 23 pre-existing
  open issues with milestone + labels + notes
- pm/new-issues.json — 20 new Phase 1-7 issues to create
- pm/arch-md-verification-report.md — #5/#6/#9/#37 verification
- pm/PROJECT-DASHBOARD-GUIDE.md — how to use projects/19 board +
  CI integration patterns
- pm/scripts/sync-milestones.sh — idempotent: creates/updates from
  milestones.json
- pm/scripts/sync-labels.sh — idempotent: creates/updates from
  labels.json
- pm/scripts/sync-issues.sh — idempotent: assigns milestone+labels
  to each issue in issue-assignments.json
- pm/scripts/create-issues.sh — idempotent: creates new issues from
  new-issues.json, skips if title already exists
- pm/scripts/audit.sh — read-only: groups open issues by milestone,
  flags uncategorized + missing area/* labels
- pm/scripts/add-to-project.sh — adds issues to litentry/projects/19
  (requires gh auth refresh -s project,read:project)

Executed in this session:
- Created 7 milestones (M1: First MCP demo + Volcano Ark PoC, M2:
  First vendor wedge, M3: Runtime neutrality, M4: Capability +
  revocation depth, M5: Native mobile + biometric, M6: TEE
  integration + security, M7: Standards + ecosystem)
- Created 40 labels across 5 namespaces (area, kind, phase,
  status, priority) + extras (needs-arch-review, vendor-blocker)
- Categorized 23 pre-existing open issues with milestones + labels
- Created 20 new issues (#107-#126) for Phase 1-7 work per the
  agent-iam-strategy.md roadmap
- Verified #5, #6, #9, #37 against arch.md — verdicts: #5 partially
  aligned (closed; lives as tier A in §15.3), #6 needs design
  refresh against current K11+SidecarRegistry, #9 already
  implemented as K3 HDKD per §6.2 (recommend close), #37 superseded
  by K11 WebAuthn per §K11 (recommend close)

Final state: 43 open issues, 100% categorized to milestones, 100%
labeled with area/*. No uncategorized issues.

Per user direction: did NOT merge / close #5/#6/#9/#37 even though
recommendations are clear. User to make final close decisions.
…s-fields strategy

Three fixes responding to user feedback:

1. add-to-project.sh: replace mapfile (bash 4+) with while-read loop
   for macOS bash 3.2 portability per CLAUDE.md project standard.
   Verified working: 'bash pm/scripts/add-to-project.sh 103' now
   successfully adds the issue to litentry/projects/19.

2. NEW pm/scripts/setup-project-fields.sh: creates the canonical
   project-level fields (Priority, Phase, Estimate, Iteration, Risk,
   Notes) via gh project field-create. Solves the 'cluttered Labels
   column' UX pain by letting the user split single-value PM
   concerns (priority, phase, status) out of the multi-value labels
   pile into typed field columns.

3. PROJECT-DASHBOARD-GUIDE.md: added 'Labels vs Fields — when to
   use which' section explaining the split:
   - Labels (repo-level, multi-value): area/*, kind/*, semantic
     flags like needs-arch-review, vendor-blocker
   - Fields (project-level, single-value): Priority, Phase, Status,
     Estimate, Risk
   Plus step-by-step instructions to migrate the cluttered Labels
   column to clean field-based grouping.

These don't change the strategic plan; they just fix the operational
PM-board ergonomics the user surfaced from running the script live.
User pointed out the project board has 10 built-in workflows that
replace much of what the scripts do. Updated guidance to prefer
workflows; scripts are fallback/batch tools.

PROJECT-DASHBOARD-GUIDE.md updates:
- Replaced the brief 'Recommended workflows' section with a full
  table of the 10 built-in workflows + their default state + what
  to configure
- New 'Script ↔ workflow split' table making clear which jobs use
  workflows vs scripts (workflows for runtime project events; scripts
  for repo-level state, batch creation, field definitions)
- One-time workflow configuration checklist (3 steps to get the
  Auto-add filter set, verify other green workflows, optionally
  enable Auto-archive)

add-to-project.sh updates:
- Header now flags this as PRIMARILY A BACKFILL / FALLBACK TOOL
- Lists three legit use cases: backfilling pre-existing issues,
  fallback when Auto-add workflow is misconfigured, adding from
  a different repo via PM_REPO override
- Pointer to PROJECT-DASHBOARD-GUIDE.md for workflow setup

No script behavior changes; only documentation tightens to match
the workflow-first reality.
… stay manual)

User asked if workflows can be programmatically checked. Partial yes:
GitHub's public GraphQL ProjectV2Workflow type exposes only:
  id, name, number, enabled, createdAt, updatedAt, project, fullDatabaseId
NOT the filter expression or action configuration (UI-only, not in
the public API).

So we get:
  ✅ 'is the workflow enabled' check
  ❌ 'does the workflow do the right thing' check (filter/action body)

New files:
- pm/expected-workflows.json: declarative source of truth for what
  workflows should be enabled + what each one's filter/action should
  do (free-text 'verify_in_ui' field that engineers cross-check
  against the UI)
- pm/scripts/check-workflows.sh: audits live workflows on
  litentry/projects/19 vs expected-workflows.json
  - Confirms enabled state matches
  - Flags unexpected workflows that exist but aren't in our list
  - Prints all per-workflow expected filter/action notes for
    manual UI verification
  - Exits 0 when all expectations match, 1 on mismatch (CI-friendly)

Live audit result (verified on litentry/projects/19): 7 expected
workflows enabled (Auto-add to project, Auto-add sub-issues to
project, Item added/closed, Auto-close issue, PR linked/merged),
4 optional workflows correctly disabled (Auto-archive, Code review
approved, Code changes requested, Item reopened). 11/11 match.

This script can be wired into a future CI workflow to alert on
drift if anyone disables Auto-add to project or similar.
Adds two GitHub Actions and one supporting script to push project automation
to its API ceiling. After this change, label-to-field sync and workflow drift
detection both run on every event / daily schedule instead of as manual scripts.

What landed:

- .github/workflows/pm-sync-fields-from-labels.yml: triggers on issues
  labeled/unlabeled/opened/transferred. Calls sync-fields-from-labels.sh
  to mirror priority/p* + phase/v* labels into the project's Priority + Phase
  single-select fields. workflow_dispatch variant for backfill.

- .github/workflows/pm-workflow-audit.yml: daily cron + push trigger.
  Runs check-workflows.sh against expected-workflows.json and opens (or
  comments on) a tracking issue when drift is detected.

- pm/scripts/sync-fields-from-labels.sh: backing script for the sync workflow.
  Forgiving mode (warns + skips when a field is missing rather than aborting),
  bash 3.2 portable, uses -f for option-ID strings to avoid gh api numeric
  coercion.

- pm/scripts/setup-project-fields.sh: now detects + rebuilds empty-placeholder
  single-select fields (GitHub's built-in Priority/Size ship with zero options)
  and cleans up "Project <Name>" zombie fields left behind when
  deleteProjectV2Field renames instead of deleting system-reserved names.
  Fully idempotent.

- pm/PROJECT-DASHBOARD-GUIDE.md: new "What's automated vs UI-only" verdict
  table (built-in workflow filter/action contents + custom views are 100%
  UI-only — no API mutation exists for either). New "Known gotcha" section
  on Priority-field zombies. Script-vs-workflow split rewritten as three-tier
  matrix (built-in / our GH Action / bash script).

Verification: tested live against litentry/projects/19. Backfilled 40+
issues onto board, synced Priority + Phase from labels on every one, zero
zombie fields remain. setup-project-fields.sh second-run shows all skips.

API ceiling discovered via GraphQL introspection: ProjectV2Workflow has
no create/update mutation (only delete). ProjectV2View has no create/update
mutation at all. Both are read-only via API, UI-only to configure.

Required repo secret for CI: PM_PROJECT_TOKEN (fine-grained PAT with
Projects=read+write, Issues=read+write). Documented in dashboard guide.
…ub native

User feedback after live use of the migration:
- The label→field sync workflow is no longer needed (labels were deleted in
  PR #129; fields are now the source of truth, set via the issue-create skill
  or manually in UI).
- The workflow-drift audit workflow added noise without value (built-in
  workflows rarely drift, and the operator manages them in UI anyway).
- The Blocked-by TEXT project field duplicates GitHub's native issue
  relationships ("Mark as blocked by" / "Mark as blocking" in the UI side
  panel, keyboard `B B` / `B X`). Use the native feature.

## Removed

- .github/workflows/pm-workflow-audit.yml (drift detection — operator handles in UI)
- .github/workflows/pm-sync-fields-from-labels.yml (labels-to-fields sync — labels are gone)
- pm/expected-workflows.json (declarative expectation for the audit)
- pm/scripts/check-workflows.sh (called by the audit)
- pm/scripts/sync-fields-from-labels.sh (called by the sync workflow)
- "Blocked by" project field (deleted via API; setup-project-fields.sh no longer creates it)

## Kept / added

- .github/workflows/pm-auto-archive-closed-pr.yml — auto-archives PRs from the
  board on close (built-in Auto-archive only fires after 30 days)
- pm/scripts/sync-size-from-effort.sh (NEW) — one-shot bulk-populate of the
  Size project field by parsing each issue's "## Effort" body section.
  Idempotent (skips already-sized items). Defaults to M when no parseable
  effort line found.
- ~/.claude/skills/agentkeys-issue-create — updated to:
  - Set Kind/Priority/Size project fields directly via API (replaces deleted
    label-sync workflow)
  - Use GitHub native relationships for blocked-by (replaces removed field)

## Live state after this change

39 open issues all have complete Kind + Priority + Size field values
(36 mapped from explicit "## Effort" bodies; 3 defaulted to M for issues
without parseable effort).

## What stays UI-only

- The deprecated "Phase" project field still exists with v0..v4 data on
  issues — operator can delete in UI when ready.
- The deprecated "Estimate" project field (duplicate of GitHub's built-in
  Size) still exists — same UI-cleanup-later.
The v1/v2 staged plan framing retires after v2-stage3 ships green. Going
forward, milestone-level work (M1-M7) is tracked against the new
docs/spec/plans/milestones-roadmap.md — the operational companion to
agent-iam-strategy.md.

## Archived (moved to docs/archived/ with _2026-04 suffix)

- docs/stage7-demo-and-verification.md (123KB, the big stage-7 end-to-end demo doc)
- docs/operator-runbook-stage7.md (39KB, supplanted by scripts/setup-broker-host.sh)
- docs/stage8-wip.md (15KB, off-chain vault design now in arch.md + threat-model)
- docs/spec/plans/development-stages.md (the 8-stage v2 plan, replaced by milestones-roadmap.md)

Per CLAUDE.md docs policy: archive, never delete; archived files are never
read in normal dev.

## Added

- docs/spec/plans/milestones-roadmap.md — M1-M7 detail + post-M7 horizons
  + strategic risks table + how-to-use-this-doc. Cross-references arch.md
  for invariants and agent-iam-strategy.md for positioning. This becomes
  the authoritative milestone plan from M1 onward.

## Cross-refs updated (active docs only)

- docs/arch.md: §24 + §25 cross-refs now point at scripts/setup-broker-host.sh
  (canonical idempotent runbook) + archived stage-7 commentary for history
- docs/dev-setup.md: 5 stage7/dev-stages refs → setup-broker-host.sh +
  milestones-roadmap.md
- docs/v2-stage1-migration-and-demo.md: 4 stage7 refs → archive locations +
  status banner noting v1/v2 retirement after v2-stage3
- CLAUDE.md: 3 refs (build plan, runbook policy, harness workflow) →
  milestones-roadmap.md
- docs/spec/{threat-model-key-custody,ses-email-architecture,credential-backend-interface}.md:
  stage8-wip refs → archive
- docs/spec/heima-gaps-vs-desired-architecture.md: stage7 demo §4 → archive
- docs/wiki/upstream-backend-classes-exercise-vs-distribution.md: stage7
  demo refs → archive (wiki auto-publishes to GitHub Wiki via publish-wiki.yml)

## What's NOT updated (intentional)

Issue-specific plan files under docs/spec/plans/issue-64/ + issue-74-* +
issue-credential-storage-* still reference the archived docs by name.
These are themselves historical issue-deliverable records; the references
are timestamped artifacts of when those issues were planned, not active
operational links. They stay as-is.
Merge origin/main into claude/hopeful-mccarthy-15e5ba to resolve conflicts
opened by PR #127's squash-merge landing on main. Plus cleanup of pm/
files that no longer fit the post-migration workflow.

## Conflict resolution

4 files had conflicting versions (main's pre-migration state vs our
post-migration state):
- pm/PROJECT-DASHBOARD-GUIDE.md → OURS (post-migration narrative)
- pm/README.md → OURS, then rewritten to drop refs to deleted files
- pm/labels.json → OURS (recolored area/*, red reserved for human attention)
- pm/scripts/setup-project-fields.sh → OURS (Urgent/High/Medium/Low + Kind)

## Re-deletions (main re-added; we re-removed)

Files we deleted in 47d503f got re-added by the merge because they
were on main from #127. Re-removed:
- .github/workflows/pm-sync-fields-from-labels.yml
- .github/workflows/pm-workflow-audit.yml
- pm/expected-workflows.json
- pm/scripts/check-workflows.sh
- pm/scripts/sync-fields-from-labels.sh

## Unused pm/ files removed (per user request)

- pm/scripts/sync-issues.sh: actively broken — references deleted
  labels (priority/p*, kind/*, phase/v*) that were removed in the
  migration. Running it today fails.
- pm/scripts/create-issues.sh: one-shot tool that created issues from
  new-issues.json. All 20 issues already created (#107-#126); running
  again would attempt to duplicate.
- pm/issue-assignments.json: historical record of pre-migration label
  assignments. Data references deleted labels.
- pm/new-issues.json: historical record of which 20 issues to create.
  All created. The refined-issue bodies are the source of truth now.
- pm/arch-md-verification-report.md: one-off arch.md compatibility
  verification for #5/#6/#9/#37. Job done.

## What remains in pm/

| Path | Status |
|---|---|
| pm/PROJECT-DASHBOARD-GUIDE.md | Active — dashboard usage |
| pm/README.md | Active — folder intro (rewritten) |
| pm/labels.json | Active — sync-labels.sh source |
| pm/milestones.json | Active — sync-milestones.sh source |
| pm/scripts/add-to-project.sh | Active — backfill tool |
| pm/scripts/audit.sh | Active — read-only state audit |
| pm/scripts/setup-project-fields.sh | Active — project field bootstrap |
| pm/scripts/sync-labels.sh | Active — applies labels.json |
| pm/scripts/sync-milestones.sh | Active — applies milestones.json |
| pm/scripts/sync-size-from-effort.sh | Active — one-shot Size population |
@hanwencheng hanwencheng merged commit f132a7c into main May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant