M1 foundation: strategy + roadmap + research docs + 20 refined issues#130
Merged
Conversation
Add two business research artifacts under docs/research/: - ai-hardware-companion-wedge.md (round 1+2): market sizing, competitive landscape, direct competitors, business model critique, 12 critical comments, naming, Stripe ACP / Alipay+ AMP integration path, WeChat feasibility, security-first demo storyboard. - ai-hardware-companion-office-hours.md: YC-style office-hours diagnostic on the same wedge. Six forcing questions surfaced zero vendor conversations + no named buyer. P2 narrowed mid-session to memory portability + isolation + privacy. Approach D chosen: AgentKeys-native hosted sandbox (aiosandbox) with OpenClaw/Hermes agent runtime + per-actor isolation (issue #90) + cross-vendor memory consent model. Pricing pivoted to AWS-style elastic per-user (Free / Basic vendor-paid $2-3/active-device / Pro $10 user-paid with 30% lifetime acquirer revshare / future Compute usage-based). 8/10 quality after 2 spec-review iterations. Both index entries added to docs/research/README.md.
End-to-end demo plan for the AgentKeys hardware-vendor wedge: ESP32 device + simple URL config → agent-infra/sandbox running Hermes (AgentKeys-native runtime) + agentkeys-daemon with mock memory injected from S3 MD blob at agent boot. 12-step implementation order. Reuses arch.md canonical primitives (sandbox runtime, supervisord lifecycle, memory bucket layout bots/<actor_omni>/memory/, agentkeys-daemon). v0 scope: single ESP32, single sandbox, single mock memory blob, text-mode chat. Voice mode, multi-tenancy, cap-token enforcement, cross-vendor portability, and payment rails are deferred to follow-up issues. 3-week effort estimate. Acceptance: reviewer can flash board + run setup script + see personalized response within 15 minutes.
Pivot canonical demo target from generic ESP32 to ESP32-S3-DevKitC-1: - Native USB-OTG (single USB-C, no separate UART chip) - PSRAM (8MB octal) for voice follow-up audio buffers - Xtensa LX7 with AI vector instructions for on-device wake-word - Still MCU-class authenticity (~$10-15 dev board, <$5 chip in BOM volume) Stack: PlatformIO + ESP-IDF (not Arduino) — production AI-toy vendors use ESP-IDF and S3-specific features (native USB CDC, PSRAM, ESP-DSP, secure boot, OTA) need IDF. Scaffolded firmware foundation under firmware/esp32s3-agentkeys/: - platformio.ini, CMakeLists.txt, sdkconfig.defaults, partitions.csv - main.c spawns 4 FreeRTOS tasks (wifi/button/chat/led) coordinated via event group + queue - wifi_sta.c: working STA mode + auto-reconnect - button.c: working GPIO interrupt + 200ms debounce on BOOT (GPIO 0) - led_status.c: stub blinker (real WS2812 RGB state machine is TODO) - https_chat.c: stub echoing user input (real esp_http_client POST is TODO) - config.h: NVS → secrets.h → hardcoded defaults priority order - README.md: flash quickstart + troubleshooting Foundation builds + flashes + boots into FreeRTOS loop today; chat returns mock '[mock] you said: ...' echo. Real HTTPS POST is the clear next step (esp_http_client + cJSON parse, ~100 lines). Renamed plan file issue-102 → issue-103 to match actual issue number.
…on 1 Hardware on hand confirmed via the device display showing 'magiclink 2p5/1.9.4': MagicLick 2.5 running xiaozhi-esp32 v1.9.4 firmware. xiaozhi-esp32 (github.com/78/xiaozhi-esp32, MIT, 26K stars) is the dominant Chinese open-source AI voice firmware for ESP32. Supports 70+ boards including ours. Full streaming voice pipeline already shipping: offline wake-word (ESP-SR) → ASR → LLM → TTS → OPUS over WebSocket or MQTT+UDP. MCP-based device + cloud control. MagicLick 2.5 hardware specs reconstructed from boards/magiclick-2p5/config.h + board.cc: - ESP32-S3 chip - ES8311 audio codec (full-duplex I2S, 24kHz) - 128x128 GC9107 SPI LCD with emoji rendering - 3 buttons (main GPIO 21, left GPIO 0, right GPIO 47) - 2 WS2812 LEDs on GPIO 38 - DualNetworkBoard: WiFi primary + ML307 Cat.1 4G fallback - Battery + power manager with tickless idle 'Hermes agent' clarified to mean NousResearch/hermes-agent (MIT, Python, self-improving learning loop, multi-interface gateway, LLM-agnostic). NOT an internal AgentKeys runtime as the original plan §C4 mistakenly stated. Strong recommendation: Option 1 — keep xiaozhi firmware unchanged, build cloud-side xiaozhi-hermes-bridge that speaks the xiaozhi WebSocket protocol while routing the agent loop to Hermes-agent (which pulls memory from agentkeys-daemon per §C3). Reduces v0 effort from ~3 months (custom firmware) to ~2-3 weeks (server-side adapter only). Forks from one of four existing reference server implementations (Python xinnan-tech, Go hackers365 with openclaw, Java joey-zhou, Go AnimeAIChat). Hardware verification: 5 paths documented (visual / ROM bootloader via boot button hold / WiFi captive portal / vendor app / disassembly). USB doesn't enumerate by default because device is in normal firmware mode; hold LEFT button while connecting USB to drop into ESP32-S3 ROM bootloader for esptool access. Added PIVOT banner at top of issue-103 plan flagging that C4/C5/C6 are superseded. Full new direction in docs/research/xiaozhi-esp32-magiclink.md. firmware/esp32s3-agentkeys/ stays in tree as reference scaffolding for future custom hardware (new product lines that need first-party firmware), not the path for the MagicLick demo.
Two new research docs supporting the issue #103 Option 1 direction: docs/research/xiaozhi-hermes-architecture.md Permanent architecture reference with three ASCII diagrams: - Diagram A: baseline xiaozhi flow (device → cloud → LLM) - Diagram B: our pivoted flow with changed layers highlighted (UNCHANGED firmware, NEW URL only on device side, fork + one-module-rewrite on cloud side, new memory layer) - Diagram C: per-turn sequence with latency budget breakdown (~2.0-2.5s first-audio; ~+250-500ms delta vs baseline) Precise diff table: 13 layers compared, only 4 actually change, 3 of those are NEW additions (not modifications). The actual code change is concentrated in ONE module of the bridge fork. docs/research/xiaozhi-hermes-risks.md Risk verification grounded in actual Hermes-agent + xinnan-tech/xiaozhi-esp32-server source code, NOT assumptions. Specific file paths + line numbers cited throughout. R1 (Hermes HTTP gateway stateless-vs-session): REAL but mitigation is built-in. Gateway exposes /v1/chat/completions with three session modes (stateless per-call default, explicit continuation via X-Hermes-Session-Id, long-term memory scoping via X-Hermes-Session-Key). Bridge sets per-device session keys. Effort: 2-4 hours. R2 (Latency stack): mostly NOT real. agent/conversation_loop.py line 4152 confirms learning loop runs as background task AFTER response delivery, OFF the turn path. With enabled_toolsets=[] + max_iterations=1 + streaming SSE, overhead is ~50-200ms. xiaozhi-performance-research baselines: - ASR: 0.795s Xunfei / 0.85s Doubao - LLM first-token: 0.434s Qwen-Flash / 0.774s Kimi-K2 - TTS: 0.488s CosyVoice / 0.667s Edge-TTS / 0.103s PaddleSpeech Pipelined: 1.4-2.4s first-audio, within 2.0-2.5s target. Effort: 1 day (tune + measure). R3 (Concurrent device handling): less bad than feared. Hermes gateway IS multi-tenant by design (serves Telegram + Discord + Slack + WhatsApp + Signal + CLI from one process). Per-request memory ~20-80MB; 100 devices ~2-8GB on one VPS. xiaozhi-esp32- server's documented '100+ devices per process' claim is unverified in repo — only 6-concurrent demo documented. For v0: 0 hours. For production scale: 1-2 weeks sticky-LB. R4 (newly discovered during research): cold agent construction per request adds 50-300ms on every turn. _create_agent() called inside _handle_chat_completions for EVERY request, no pooling. Most impactful for voice UX (compounds turn-by-turn). Mitigation: fork-local agent pool (1 day) or upstream patch (2-4 days). Net effect: v0 timeline revised from ~3 weeks to ~1-2 weeks. Updated docs/research/README.md to index both new docs.
Three updates following the risk-verification research: 1. docs/research/tuya-vs-xiaozhi.md (new) Answers 'is Tuya the same role as xiaozhi?': DIFFERENT role, partial firmware overlap. Tuya = closed PaaS for brand-owners (NYSE: TUYA, $80.9M Q1 2026 revenue, 306 premium customers, 1.97M developers, 100+ countries). xiaozhi = open firmware for makers (MIT, 26.7K stars). TuyaOpen is a 1.6K-star defensive ESP32 SDK from Jan 2026 — 17x adoption gap. AgentKeys posture: complement both, never compete. - Phase 1 (now): xiaozhi cloud-side bridge (issue #103) - Phase 2 (3-6 mo): Tuya Cloud Development connector - Sit above both rails (same pattern as Alipay+ AMP / Stripe ACP) 2. v0 demo timeline revised from ~3 weeks to ~1-2 weeks in issue-103-aiosandbox-hermes-esp32-demo.md: - PIVOT banner at top of plan - Effort estimate section (line 441) The basis is xiaozhi-hermes-risks.md showing all four risks are smaller than originally feared (R1 built-in mitigation, R2 background loop, R3 multi-tenant by design, R4 cheap fork-local hack). 3. Fixed false cross-reference in xiaozhi-hermes-risks.md The 'unverified 100+ devices' claim was incorrectly attributed to the office-hours doc. It actually circulated in earlier informal discussion — not in any committed doc. Reworded to remove the false attribution. 4. Added implementation update banner to office-hours doc pointing readers at the four xiaozhi research docs + the revised v0 timeline. The §Recommended Approach / Pricing / Cross-Vendor Memory Model below stay unchanged — only the firmware-and-runtime layer shifted.
…form Earlier version of tuya-vs-xiaozhi.md claimed Phase 3 would add adapters for Xiaomi MIoT, Alibaba Smart Home, and Volcano AI Hub without verifying each platform's third-party developer surface. Research findings per platform: Volcano Ark (ByteDance) — VERIFIED FEASIBLE - Open international developer signup, no PRC entity / ICP needed - MCP-server marketplace launched 2026 (mcp.so/server/mcp-server/volcengine) - AgentKeys publishes an MCP tool any Doubao-powered AI hardware can call - Genuinely Tuya-equivalent for the AI-side rather than IoT-side - ~1 week effort AliGenie / Tmall Genie (Alibaba) — FEASIBLE WITH PARTNERSHIP - International Alibaba Cloud account works for sandbox + custom-skill webhook - Production distribution onto Tmall Genie hardware requires Alibaba's skill review + de-facto PRC-domiciled brand - ~1 week dev + partnership lead time Xiaomi MIoT / XiaoAI — WEAKEST - Brand-tier integration requires Mi Ecosystem partnership admission - Publishable XiaoAI skills require PRC real-name verification - Consumer-OAuth path (Home-Assistant-style) works today for foreign servers but is a narrower wedge than brand-tier - Defer until partnership or scope to consumer-OAuth only Rewrote Phase 3 section to split into 3a (Volcano open), 3b (AliGenie with partner), 3c (Xiaomi deferred). Added explicit 'Honest note on Phase 3 verification' acknowledging the original claim was hand-wavy. Added 15 source URLs to the Sources block.
New research doc with three ASCII diagrams showing how AgentKeys integrates with Volcano Ark (ByteDance's enterprise AI cloud hosting Doubao LLM) as a Phase 3a hosted MCP server registered in their 2026 MCP marketplace. Pattern B (hosted by us, marketplace is discovery only): - AgentKeys MCP server at mcp.agentkeys.io exposes 5-7 tools (memory get/put, cred fetch, cap mint, audit append, whoami, permission check) mapped to existing Stage 7+ backend RPCs - Vendor Doubao agents call our MCP tools via HTTPS/SSE with per-vendor Bearer token + per-actor X-AgentKeys-Actor header - No vendor firmware changes; no Doubao runtime changes — just marketplace registration + one-checkbox vendor opt-in Diagram A: high-level architecture (device → RTC → Doubao → MCP → AgentKeys MCP server → backend) Diagram B: per-call MCP tool sequence with ~200-400ms per-call latency budget (concern noted: multiple tool calls per turn can stack — mitigation via batched 'context.bootstrap' tool) Diagram C: cross-vendor composition showing same user (O_kevin) with FoloToy (Doubao + MCP adapter) AND MagicLick (xiaozhi + Hermes bridge) both terminating at one AgentKeys backend with one memory namespace + one identity tree + one audit ledger. This is the cross-vendor portability moat materializing automatically per office-hours doc §Cross-Vendor Memory Model. Effort: ~1-1.5 weeks (sibling to xiaozhi-hermes-bridge). 6 open risks called out + mitigations sketched: - MCP latency stacking per turn - Marketplace approval SLA - Per-tenant auth model TBD - Actor omni resolution pattern (vendor-side vs whoami call) - MCP protocol version compat with Doubao runtime - Cross-vendor cap-token consent (resolved: same office-hours consent ceremony applies) Updated docs/research/README.md to index the new doc.
New strategic anchor doc at docs/research/agent-iam-strategy.md captures the revised direction from multi-round discussion (original Agent IAM proposal → independent analysis → ChatGPT critique → synthesis). Three-layer positioning, three audiences: - AI Device Account (consumer/vendor BD pitch) - Agent IAM (B2B/investor/CTO category) - Trust Substrate (compliance/regulator/Web3 partner) Five accepted strategic moves: - Task Host vs Authority Host distinction (we are Authority) - Agent IAM as the technical category (not key management / not memory MCP) - MCP as integration surface, not product identity - Zero orchestration in v1 — hard line - Deploy → grow → standardize sequencing Four architecture corrections that tighten commitments: 1. Revocation: 'immediate online, bounded TTL/cache offline' (NOT 'no propagation delay'). High-risk actions always online; low-risk reads use short-lived cached caps; offline mode denies sensitive actions by default. 2. Audit (two-tier): real-time off-chain feed in parent-control UI + 10-min batched Merkle root anchored to Heima. NOT real-time on-chain. Heima explorer is tamper-evidence proof, not the UX surface. 3. Delegation: agentkeys.delegation.grant is schema-documented but not active in v1. Returns not_implemented_in_v1. Active delegation lands in Phase 4. 4. Dual narrative — don't lead with 'Agent IAM' in consumer contexts; don't lead with 'memory portability' anywhere. Authority is the category; privacy/memory are benefits. Phase 1 revised to three-act IAM demo (per office-hours doc §9.6 storyboard, now elevated to authoritative spec): - Act 1 Permissioned Memory (scoped read, not 'smart') - Act 2 Deterministic Denial (policy decides, no LLM) - Act 3 Online Revocation (parent UI → device denies) Implementation note: cap-token machinery is already shipped via Stage 7+ (broker, signer, K3/K10 HDKD, memory/cred/audit workers, per-actor isolation per issue #90). New Phase 1 work is the MCP server wrapper (~1 week), parent-control web UI (~3-4 days), two-tier audit wiring (~1 day), runbook (~half day). Total ~2 weeks. 12-month roadmap revised: - Phase 0: shipped (Stage 7+) - Phase 1 (0-2 wk): Agent IAM v0 demo - Phase 2 (1-2 mo): vendor pilot + multi-rail (Volcano Ark, Tuya) - Phase 3 (3-4 mo): runtime neutrality (Hermes/OpenClaw as MCP tools) - Phase 4 (6 mo): delegation + approval + ACL depth - Phase 5 (post-12mo): standards engagement (contingent on traction) Updates to existing docs: - docs/research/README.md: indexed new strategy doc as 'Strategic anchor' - ai-hardware-companion-office-hours.md: positioning note pivoted from 'implementation update' to 'strategic update' pointing at strategy doc - issue-103 plan: PIVOT banner expanded with three-act demo + four corrections; old §C4/C5/C6 marked superseded; cap-token shipped context made explicit; no implementation re-spec per user direction
…espace model Three nits from review: 1. Generic chain instead of Heima-specific positioning The strategy doc shouldn't be Heima-locked — chain is a deployment config (arch.md describes 'Litentry parachain (or EVM L2 fallback)' so the design is already chain-agnostic at the contract layer). Updated all positioning text to 'audit chain' / 'on-chain' / 'chain explorer' instead of Heima-specific. Kept arch.md and runbook refs to Heima where they describe actual deployed infra (the 'currently Heima per arch.md, swappable' note in §Phase 0 captures the reality without committing the strategy to Heima). 2. 2-min batch instead of 10-min Modern fast-finality chains with cheap gas make sub-block-time batching viable. 10 min was too conservative — set 2 min as the default cadence. Faster batch = better UX for parents watching audit feed; the cost per anchor is sub-cent at typical batch sizes. 3. Memory namespace model (new §3.5) Read the memory research/design doc from main (commit 53ccc9f 'docs: AI memory worker design plan + agent-memory research survey'). It defines four STRUCTURAL types (profile / procedural / semantic / episodic) with specific S3 key derivation per type. For Agent IAM, namespaces are an ORTHOGONAL semantic dimension that composes with the 4 structural types. Memory item has BOTH a structural type AND a semantic namespace. Cap-tokens scope namespace access (namespaces_allowed claim, deterministic string-set membership check). v0 defaults: personal / family / work / travel (4 namespaces). kids/device/temp deferred to Phase 3-4. Composition is non-conflicting: namespaces live in wire-format metadata, NOT in the S3 key derivation. Memory worker filters at retrieval. The 4-type S3 layout from memory-design §3.2a is preserved exactly. Future evolution path documented (path-prefixed layout if scale demands). arch.md compatibility check: zero contradictions found. - Memory data_class binding (§17.5) unchanged - Per-actor PrincipalTag isolation (§17) unchanged - Cap-token format extensible (namespaces_allowed is additive) - Memory worker never calls LLM invariant preserved - K3 epoch rotation unchanged - Architecture-as-source-of-truth: future arch.md §17 + memory- design §3 get additive paragraphs when v0 ships, no canonical- name conflicts introduced. Files updated: - docs/research/agent-iam-strategy.md: §3.2 audit (2-min + chain- agnostic), §3.5 NEW memory namespace model with arch.md compat check, Phase 0 line (Heima → 'currently Heima per arch.md, swappable') - docs/research/README.md: strategy doc summary updated with 2-min + namespace model - docs/research/ai-hardware-companion-office-hours.md: implementation update banner reflects 2-min on-chain anchor - docs/research/volcano-ark-mcp-integration.md: diagram boxes generic ('AWS S3, audit chain', 'off-chain + chain') - docs/spec/plans/issue-103-aiosandbox-hermes-esp32-demo.md: PIVOT banner reflects 2-min chain-agnostic anchor; NOT-in-scope list generic 'on-chain audit anchoring'
New pm/ subfolder for GitHub project management automation. Treats milestones / labels / issue categorization as code under version control with idempotent shell scripts that reconcile GitHub state to declarative JSON. Files: - pm/README.md — folder purpose + how to use - pm/milestones.json — 7 roadmap milestones (M1-M7) source of truth - pm/labels.json — 40-label taxonomy: area/ kind/ phase/ status/ priority/ + extras (needs-arch-review, vendor-blocker) - pm/issue-assignments.json — categorization of all 23 pre-existing open issues with milestone + labels + notes - pm/new-issues.json — 20 new Phase 1-7 issues to create - pm/arch-md-verification-report.md — #5/#6/#9/#37 verification - pm/PROJECT-DASHBOARD-GUIDE.md — how to use projects/19 board + CI integration patterns - pm/scripts/sync-milestones.sh — idempotent: creates/updates from milestones.json - pm/scripts/sync-labels.sh — idempotent: creates/updates from labels.json - pm/scripts/sync-issues.sh — idempotent: assigns milestone+labels to each issue in issue-assignments.json - pm/scripts/create-issues.sh — idempotent: creates new issues from new-issues.json, skips if title already exists - pm/scripts/audit.sh — read-only: groups open issues by milestone, flags uncategorized + missing area/* labels - pm/scripts/add-to-project.sh — adds issues to litentry/projects/19 (requires gh auth refresh -s project,read:project) Executed in this session: - Created 7 milestones (M1: First MCP demo + Volcano Ark PoC, M2: First vendor wedge, M3: Runtime neutrality, M4: Capability + revocation depth, M5: Native mobile + biometric, M6: TEE integration + security, M7: Standards + ecosystem) - Created 40 labels across 5 namespaces (area, kind, phase, status, priority) + extras (needs-arch-review, vendor-blocker) - Categorized 23 pre-existing open issues with milestones + labels - Created 20 new issues (#107-#126) for Phase 1-7 work per the agent-iam-strategy.md roadmap - Verified #5, #6, #9, #37 against arch.md — verdicts: #5 partially aligned (closed; lives as tier A in §15.3), #6 needs design refresh against current K11+SidecarRegistry, #9 already implemented as K3 HDKD per §6.2 (recommend close), #37 superseded by K11 WebAuthn per §K11 (recommend close) Final state: 43 open issues, 100% categorized to milestones, 100% labeled with area/*. No uncategorized issues. Per user direction: did NOT merge / close #5/#6/#9/#37 even though recommendations are clear. User to make final close decisions.
…s-fields strategy
Three fixes responding to user feedback:
1. add-to-project.sh: replace mapfile (bash 4+) with while-read loop
for macOS bash 3.2 portability per CLAUDE.md project standard.
Verified working: 'bash pm/scripts/add-to-project.sh 103' now
successfully adds the issue to litentry/projects/19.
2. NEW pm/scripts/setup-project-fields.sh: creates the canonical
project-level fields (Priority, Phase, Estimate, Iteration, Risk,
Notes) via gh project field-create. Solves the 'cluttered Labels
column' UX pain by letting the user split single-value PM
concerns (priority, phase, status) out of the multi-value labels
pile into typed field columns.
3. PROJECT-DASHBOARD-GUIDE.md: added 'Labels vs Fields — when to
use which' section explaining the split:
- Labels (repo-level, multi-value): area/*, kind/*, semantic
flags like needs-arch-review, vendor-blocker
- Fields (project-level, single-value): Priority, Phase, Status,
Estimate, Risk
Plus step-by-step instructions to migrate the cluttered Labels
column to clean field-based grouping.
These don't change the strategic plan; they just fix the operational
PM-board ergonomics the user surfaced from running the script live.
User pointed out the project board has 10 built-in workflows that replace much of what the scripts do. Updated guidance to prefer workflows; scripts are fallback/batch tools. PROJECT-DASHBOARD-GUIDE.md updates: - Replaced the brief 'Recommended workflows' section with a full table of the 10 built-in workflows + their default state + what to configure - New 'Script ↔ workflow split' table making clear which jobs use workflows vs scripts (workflows for runtime project events; scripts for repo-level state, batch creation, field definitions) - One-time workflow configuration checklist (3 steps to get the Auto-add filter set, verify other green workflows, optionally enable Auto-archive) add-to-project.sh updates: - Header now flags this as PRIMARILY A BACKFILL / FALLBACK TOOL - Lists three legit use cases: backfilling pre-existing issues, fallback when Auto-add workflow is misconfigured, adding from a different repo via PM_REPO override - Pointer to PROJECT-DASHBOARD-GUIDE.md for workflow setup No script behavior changes; only documentation tightens to match the workflow-first reality.
… stay manual)
User asked if workflows can be programmatically checked. Partial yes:
GitHub's public GraphQL ProjectV2Workflow type exposes only:
id, name, number, enabled, createdAt, updatedAt, project, fullDatabaseId
NOT the filter expression or action configuration (UI-only, not in
the public API).
So we get:
✅ 'is the workflow enabled' check
❌ 'does the workflow do the right thing' check (filter/action body)
New files:
- pm/expected-workflows.json: declarative source of truth for what
workflows should be enabled + what each one's filter/action should
do (free-text 'verify_in_ui' field that engineers cross-check
against the UI)
- pm/scripts/check-workflows.sh: audits live workflows on
litentry/projects/19 vs expected-workflows.json
- Confirms enabled state matches
- Flags unexpected workflows that exist but aren't in our list
- Prints all per-workflow expected filter/action notes for
manual UI verification
- Exits 0 when all expectations match, 1 on mismatch (CI-friendly)
Live audit result (verified on litentry/projects/19): 7 expected
workflows enabled (Auto-add to project, Auto-add sub-issues to
project, Item added/closed, Auto-close issue, PR linked/merged),
4 optional workflows correctly disabled (Auto-archive, Code review
approved, Code changes requested, Item reopened). 11/11 match.
This script can be wired into a future CI workflow to alert on
drift if anyone disables Auto-add to project or similar.
Adds two GitHub Actions and one supporting script to push project automation to its API ceiling. After this change, label-to-field sync and workflow drift detection both run on every event / daily schedule instead of as manual scripts. What landed: - .github/workflows/pm-sync-fields-from-labels.yml: triggers on issues labeled/unlabeled/opened/transferred. Calls sync-fields-from-labels.sh to mirror priority/p* + phase/v* labels into the project's Priority + Phase single-select fields. workflow_dispatch variant for backfill. - .github/workflows/pm-workflow-audit.yml: daily cron + push trigger. Runs check-workflows.sh against expected-workflows.json and opens (or comments on) a tracking issue when drift is detected. - pm/scripts/sync-fields-from-labels.sh: backing script for the sync workflow. Forgiving mode (warns + skips when a field is missing rather than aborting), bash 3.2 portable, uses -f for option-ID strings to avoid gh api numeric coercion. - pm/scripts/setup-project-fields.sh: now detects + rebuilds empty-placeholder single-select fields (GitHub's built-in Priority/Size ship with zero options) and cleans up "Project <Name>" zombie fields left behind when deleteProjectV2Field renames instead of deleting system-reserved names. Fully idempotent. - pm/PROJECT-DASHBOARD-GUIDE.md: new "What's automated vs UI-only" verdict table (built-in workflow filter/action contents + custom views are 100% UI-only — no API mutation exists for either). New "Known gotcha" section on Priority-field zombies. Script-vs-workflow split rewritten as three-tier matrix (built-in / our GH Action / bash script). Verification: tested live against litentry/projects/19. Backfilled 40+ issues onto board, synced Priority + Phase from labels on every one, zero zombie fields remain. setup-project-fields.sh second-run shows all skips. API ceiling discovered via GraphQL introspection: ProjectV2Workflow has no create/update mutation (only delete). ProjectV2View has no create/update mutation at all. Both are read-only via API, UI-only to configure. Required repo secret for CI: PM_PROJECT_TOKEN (fine-grained PAT with Projects=read+write, Issues=read+write). Documented in dashboard guide.
…ub native User feedback after live use of the migration: - The label→field sync workflow is no longer needed (labels were deleted in PR #129; fields are now the source of truth, set via the issue-create skill or manually in UI). - The workflow-drift audit workflow added noise without value (built-in workflows rarely drift, and the operator manages them in UI anyway). - The Blocked-by TEXT project field duplicates GitHub's native issue relationships ("Mark as blocked by" / "Mark as blocking" in the UI side panel, keyboard `B B` / `B X`). Use the native feature. ## Removed - .github/workflows/pm-workflow-audit.yml (drift detection — operator handles in UI) - .github/workflows/pm-sync-fields-from-labels.yml (labels-to-fields sync — labels are gone) - pm/expected-workflows.json (declarative expectation for the audit) - pm/scripts/check-workflows.sh (called by the audit) - pm/scripts/sync-fields-from-labels.sh (called by the sync workflow) - "Blocked by" project field (deleted via API; setup-project-fields.sh no longer creates it) ## Kept / added - .github/workflows/pm-auto-archive-closed-pr.yml — auto-archives PRs from the board on close (built-in Auto-archive only fires after 30 days) - pm/scripts/sync-size-from-effort.sh (NEW) — one-shot bulk-populate of the Size project field by parsing each issue's "## Effort" body section. Idempotent (skips already-sized items). Defaults to M when no parseable effort line found. - ~/.claude/skills/agentkeys-issue-create — updated to: - Set Kind/Priority/Size project fields directly via API (replaces deleted label-sync workflow) - Use GitHub native relationships for blocked-by (replaces removed field) ## Live state after this change 39 open issues all have complete Kind + Priority + Size field values (36 mapped from explicit "## Effort" bodies; 3 defaulted to M for issues without parseable effort). ## What stays UI-only - The deprecated "Phase" project field still exists with v0..v4 data on issues — operator can delete in UI when ready. - The deprecated "Estimate" project field (duplicate of GitHub's built-in Size) still exists — same UI-cleanup-later.
The v1/v2 staged plan framing retires after v2-stage3 ships green. Going
forward, milestone-level work (M1-M7) is tracked against the new
docs/spec/plans/milestones-roadmap.md — the operational companion to
agent-iam-strategy.md.
## Archived (moved to docs/archived/ with _2026-04 suffix)
- docs/stage7-demo-and-verification.md (123KB, the big stage-7 end-to-end demo doc)
- docs/operator-runbook-stage7.md (39KB, supplanted by scripts/setup-broker-host.sh)
- docs/stage8-wip.md (15KB, off-chain vault design now in arch.md + threat-model)
- docs/spec/plans/development-stages.md (the 8-stage v2 plan, replaced by milestones-roadmap.md)
Per CLAUDE.md docs policy: archive, never delete; archived files are never
read in normal dev.
## Added
- docs/spec/plans/milestones-roadmap.md — M1-M7 detail + post-M7 horizons
+ strategic risks table + how-to-use-this-doc. Cross-references arch.md
for invariants and agent-iam-strategy.md for positioning. This becomes
the authoritative milestone plan from M1 onward.
## Cross-refs updated (active docs only)
- docs/arch.md: §24 + §25 cross-refs now point at scripts/setup-broker-host.sh
(canonical idempotent runbook) + archived stage-7 commentary for history
- docs/dev-setup.md: 5 stage7/dev-stages refs → setup-broker-host.sh +
milestones-roadmap.md
- docs/v2-stage1-migration-and-demo.md: 4 stage7 refs → archive locations +
status banner noting v1/v2 retirement after v2-stage3
- CLAUDE.md: 3 refs (build plan, runbook policy, harness workflow) →
milestones-roadmap.md
- docs/spec/{threat-model-key-custody,ses-email-architecture,credential-backend-interface}.md:
stage8-wip refs → archive
- docs/spec/heima-gaps-vs-desired-architecture.md: stage7 demo §4 → archive
- docs/wiki/upstream-backend-classes-exercise-vs-distribution.md: stage7
demo refs → archive (wiki auto-publishes to GitHub Wiki via publish-wiki.yml)
## What's NOT updated (intentional)
Issue-specific plan files under docs/spec/plans/issue-64/ + issue-74-* +
issue-credential-storage-* still reference the archived docs by name.
These are themselves historical issue-deliverable records; the references
are timestamped artifacts of when those issues were planned, not active
operational links. They stay as-is.
Merge origin/main into claude/hopeful-mccarthy-15e5ba to resolve conflicts opened by PR #127's squash-merge landing on main. Plus cleanup of pm/ files that no longer fit the post-migration workflow. ## Conflict resolution 4 files had conflicting versions (main's pre-migration state vs our post-migration state): - pm/PROJECT-DASHBOARD-GUIDE.md → OURS (post-migration narrative) - pm/README.md → OURS, then rewritten to drop refs to deleted files - pm/labels.json → OURS (recolored area/*, red reserved for human attention) - pm/scripts/setup-project-fields.sh → OURS (Urgent/High/Medium/Low + Kind) ## Re-deletions (main re-added; we re-removed) Files we deleted in 47d503f got re-added by the merge because they were on main from #127. Re-removed: - .github/workflows/pm-sync-fields-from-labels.yml - .github/workflows/pm-workflow-audit.yml - pm/expected-workflows.json - pm/scripts/check-workflows.sh - pm/scripts/sync-fields-from-labels.sh ## Unused pm/ files removed (per user request) - pm/scripts/sync-issues.sh: actively broken — references deleted labels (priority/p*, kind/*, phase/v*) that were removed in the migration. Running it today fails. - pm/scripts/create-issues.sh: one-shot tool that created issues from new-issues.json. All 20 issues already created (#107-#126); running again would attempt to duplicate. - pm/issue-assignments.json: historical record of pre-migration label assignments. Data references deleted labels. - pm/new-issues.json: historical record of which 20 issues to create. All created. The refined-issue bodies are the source of truth now. - pm/arch-md-verification-report.md: one-off arch.md compatibility verification for #5/#6/#9/#37. Job done. ## What remains in pm/ | Path | Status | |---|---| | pm/PROJECT-DASHBOARD-GUIDE.md | Active — dashboard usage | | pm/README.md | Active — folder intro (rewritten) | | pm/labels.json | Active — sync-labels.sh source | | pm/milestones.json | Active — sync-milestones.sh source | | pm/scripts/add-to-project.sh | Active — backfill tool | | pm/scripts/audit.sh | Active — read-only state audit | | pm/scripts/setup-project-fields.sh | Active — project field bootstrap | | pm/scripts/sync-labels.sh | Active — applies labels.json | | pm/scripts/sync-milestones.sh | Active — applies milestones.json | | pm/scripts/sync-size-from-effort.sh | Active — one-shot Size population |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR lands the strategic + roadmap + planning foundation for M1 work to begin. It absorbs ~3 sessions of business research, architecture corrections, PM automation tuning, and per-issue refinement into a coherent set of documents + issue bodies that the next developer/agent picks up cleanly.
The branch is 17 commits ahead; 5 were already merged via PR #127 + #128 (squashed onto main; their original commits still appear in
git loghere but produce no diff). The remaining ~12 commits are the new work below.What lands
1. Strategic anchor — Agent IAM positioning
docs/research/agent-iam-strategy.md— full strategic positioning: Agent IAM as the category (Identity / Memory / Permissions / Capability tokens / Audit / Delegation / Revocation), Task Host vs Authority Host distinction, dual narrative (B2B vs consumer), 4 architecture corrections (bounded revocation, two-tier audit, delegation as preview-only, dual narrative), memory namespace model, 12-month roadmap, 7 strategic risks2. Milestone roadmap — operational source of truth
docs/spec/plans/milestones-roadmap.md— M1-M7 detailed scope + post-M7 horizons + strategic risks table + how-to-use instructions. Replaces the archived v1/v2 stage plan. Companion toarch.md(architecture) andagent-iam-strategy.md(positioning).3. Hardware research — MagicLick / xiaozhi-server / Volcano Ark / Tuya
docs/research/xiaozhi-esp32-magiclink.md— hardware identification (MagicLick 2.5 = ESP32-S3 + ES8311 + 128×128 GC9107 + WiFi/4G) and Option-1 (use xiaozhi-server) vs Option-2 (rewrite) decisiondocs/research/xiaozhi-hermes-architecture.md— three ASCII diagrams: baseline xiaozhi / pivoted flow / per-turn sequencedocs/research/xiaozhi-hermes-risks.md— R1-R4 risk verification grounded in actual repo code with file:line citationsdocs/research/volcano-ark-mcp-integration.md— Phase 3a integration architecture (Pattern B: hosted MCP server) + AgentKeys MCP tool inventorydocs/research/tuya-vs-xiaozhi.md— Tuya = different role (PaaS for brand-owners) vs xiaozhi (open firmware for makers); 17× star delta justifies xiaozhi-first; Tuya is M2 complement4. Business research — wedge thesis
docs/research/ai-hardware-companion-wedge.md— full wedge analysis (FoloToy / Ropet / BubblePal as priority vendors; pricing structure; competitive landscape; Alipay vs Stripe rationale)docs/research/ai-hardware-companion-office-hours.md— YC office-hours diagnostic with six forcing questions; kill criterion (0 paid pilots from 3 vendors in 6 months → pivot to MCP credential broker)docs/research/ai-memory-systems-survey.md— competitive memory-layer survey5. Demo plan + ESP32 firmware foundation
docs/spec/plans/issue-103-aiosandbox-hermes-esp32-demo.md— v0 demo plan (note: superseded by the xiaozhi-server pivot; Phase 1: AgentKeys MCP server — 7 active tools + 3 schema-only #107-Phase 1: Volcano Ark MCP marketplace registration (PoC) #112 are the M1 cohort that actually ships)firmware/esp32s3-agentkeys/— 17 files scaffolded (reference only; MagicLick uses xiaozhi-esp32 firmware instead)6. Memory design
docs/plan/agentkeys-memory-design.md— 4-type memory taxonomy (profile/procedural/semantic/episodic) that composes with the namespace model in Phase 1: Memory namespace model — wire to cap-token + worker filter #1087. Docs reorganization
Archived to
docs/archived/:docs/stage7-demo-and-verification.md(123KB, supplanted byscripts/setup-broker-host.sh+ v2 demo orchestrators)docs/operator-runbook-stage7.md(39KB, same)docs/stage8-wip.md(15KB, off-chain vault design now in arch.md + threat-model)docs/spec/plans/development-stages.md(the 8-stage v1/v2 plan, replaced by milestones-roadmap.md)Cross-refs updated in 9 active docs:
arch.md,dev-setup.md,CLAUDE.md,v2-stage1-migration-and-demo.md,spec/threat-model-key-custody.md,spec/ses-email-architecture.md,spec/credential-backend-interface.md,spec/heima-gaps-vs-desired-architecture.md,wiki/upstream-backend-classes-exercise-vs-distribution.md.8. PM automation simplification
pm-workflow-audit.yml,pm-sync-fields-from-labels.yml,expected-workflows.json,check-workflows.sh,sync-fields-from-labels.sh(labels migrated to fields; sync no longer needed)pm-auto-archive-closed-pr.yml(auto-archive PRs on close)pm/scripts/sync-size-from-effort.sh(one-shot Size population from issue body Effort estimates)~/.claude/skills/agentkeys-issue-create/SKILL.mdto use native GitHub relationships + set Kind/Priority/Size project fields directly via API9. Issue refinement — 20 issues (#107-#126)
All 20 new issues created in earlier PM work refined with the standard template:
milestones-roadmap.md,arch.md, relevant research docs/agentkeys-issue-createskillBody sizes 5KB-8KB per issue (~130KB total of context information). Coverage:
10. Project board state
All 38 open issues now have Kind + Priority + Size project field values populated (live, not just in this PR). The Phase + Estimate fields are deprecated; operator can delete via UI when ready.
Test plan
This PR doesn't ship any executable code — it lands strategy + research + plans + issue refinement. Verification is doc + state-level:
docs/spec/plans/milestones-roadmap.mdrenders cleanly on GitHub (table of contents, all links resolve, no broken anchors)docs/archived/*_2026-04.mdand not on active pathsoperator-runbook-stage7.md/stage7-demo-and-verification.md/stage8-wip.mdrefs indocs/arch.md,docs/dev-setup.md,CLAUDE.md~/.claude/skills/agentkeys-issue-create/SKILL.mdis registered and invokable via/agentkeys-issue-createlitentry/projects/19and confirm all open issues have populated Kind / Priority / Size fieldsWhat's NOT in this PR
#124and#125document the path but the work doesn't start until precondition is met)Migration notes for the next agent / developer
After this PR merges, the next agent picking up M1 work should:
docs/spec/plans/milestones-roadmap.md§2 (M1 scope) — operational source of truthdocs/research/agent-iam-strategy.md§4 (Phase 1 storyboard) — strategic anchor#107(AgentKeys MCP server) — that's THE critical-path M1 issue everything else plumbs through/agentkeys-issue-createskill for any follow-up issues