docs: reflect phases A/B/D across the md files by michaelzwang13 · Pull Request #26 · michaelzwang13/AgentOS

michaelzwang13 · 2026-05-24T05:32:55Z

Docs reconciliation — what's built vs what was aspirational

The trust moat (Phase B) and memory layer (Phase D) had been described in PROJECT_CONTEXT.md and ROADMAP.md as defensible-layer ambitions when they are now real for the Code Review Engineer. This PR brings the docs in line with what's actually shipped through PRs #19 / #20 / #25.

Per-file

CLAUDE.md — Status line names the three shipped phases; architecture bullets for template-driven runtime / agent-token auth / memory + audit log; backend layout includes policy.py, agent_auth.py, and the new agent-scoped models; test count 97 → 125; new "Trust moat" and "Memory" conventions sections so future edits stay inside the established patterns.
README.md — Tagline mentions the depth work; routers/services/models trees refreshed; agent-runtime section rewritten around the template-shaped container; test count 78 → 125; "Scope and status" rewritten as Foundation + Hardening + 3-of-4 phases shipped with issue/PR references.
ROADMAP.md — Each of the five Enforced Specialization layers annotated ✅/partial/not-built; "What's Been Built" reorganized as Foundation / Hardening / Code Review Engineer epic with PR numbers; "What Needs Doing Next" replaced with the current backlog (AWS deployment via CDK #11, Frontend do-over #12, Hire & offboard flow missing from the UI #13, Add CI to run backend + frontend tests on PRs #14, Add backend/.env.example #15, Customer Support & Secretary are undifferentiated shells #17, Stop passing secrets as plaintext Docker env vars #18, Agent memory compaction via LLM reflection #23).
PROJECT_CONTEXT.md — OAuth gateway bullets annotated with which steps Phases B + D made real; each Defensible Layer marked partial with a pointer to what is and isn't built.
LOCAL_SETUP.md — Drift notice flagging the stale Next.js + Compose sections (deferred to AWS deployment via CDK #11); migration step switched to schema.sql (the consolidated fresh-install snapshot); rebuild-image guidance after skill changes strengthened; test count 68 → 125.

Removed

AGENT_SYSTEM_PROMPT.md — superseded by per-template SOUL.md written at container boot (Phase A).
HANDOFF.md — a one-time frontend handoff brief; context now lives in git history.

References to both scrubbed from README.md and CLAUDE.md.

Deliberately not in scope

Full LOCAL_SETUP.md rewrite (Next.js → Vite, Compose → native start scripts) — that's the substance of AWS deployment via CDK #11.
Other-role docs (Customer Support / Secretary) — they still run the generic container; their docs reflect that. Bringing them to A/B/D parity is Customer Support & Secretary are undifferentiated shells #17.

🤖 Generated with Claude Code

Closes the memory loop and persists the audit trail. Phase B left a deny-only log line in policy.require_action; Phase D promotes that stub to a persisted agent_action_log row covering both allow and deny. Adds a per-agent key/value memory store the agent writes via a new update-memory skill and reads back as injected role_context on every dispatch — knowledge now survives container restarts. Pulls Phase C's reviewed_prs dedup index forward so the autonomous PR-review loop can build on a stable base. Schema (migration 004 + schema.sql): - agent_memory (agent_id, key, value, updated_at) — unique(agent_id, key) - agent_action_log (agent_id, action, outcome, metadata jsonb, created_at) with a CHECK constraint pinning outcome to allowed | denied - reviewed_prs (agent_id, owner, repo, pr_number, reviewed_at) with unique(agent_id, owner, repo, pr_number) - RLS closed-by-default policies, mirrored across both files Models: - AgentMemoryModel.get / upsert / list_by_agent / delete - ActionLogModel.record / list_by_agent - ReviewedPRModel.exists / record / list_by_agent Policy: - require_action persists allow + deny rows. Audit writes are best-effort — a DB hiccup logs locally and lets the request proceed, trading audit completeness for availability; the policy decision itself never depends on the DB. Dispatcher: - dispatch_task now loads the calling agent's memory and injects it into role_context (best-effort; a memory-load failure does not block dispatch). Gateway: - GET / POST /gateway/memory — agent-authed, policy-gated via agent.memory.{read,write}. The agent's own row only — memory is scoped per agent_id. - POST /github/review now inserts a reviewed_prs row on success, guarded by exists() so re-reviews don't double-write. Skill + template: - New update-memory skill — agent POSTs key/value pairs via the agent bearer token. Server-side persistence is what lets preferences survive container restart (SOUL.md cannot). - code-review-engineer.yaml: skills += update-memory; allowed_actions += agent.memory.read, agent.memory.write. Tests: 124/124 pass. 11 new — policy audit allow + deny + audit-failure swallowed; memory write/read/auth/policy/empty-key; reviewed_prs records on success + skips on dup; dispatcher injects memory + best- effort failure. Live smoke test against the running backend (12/12 green): write/read round-trip, upsert idempotent on key, 401 on bad/missing token, 403 when a secretary role hits /memory, agent_action_log carries the allow + deny rows with role metadata. Closes #8. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

CLAUDE.md, README.md, ROADMAP.md, PROJECT_CONTEXT.md, LOCAL_SETUP.md all now describe what's actually built rather than the original hackathon-only scope. The drift was substantial because the trust moat (Phase B) and the memory layer (Phase D) were being described in PROJECT_CONTEXT.md and ROADMAP.md as defensible-layer ambitions when they are now real for the Code Review Engineer. CLAUDE.md gains a Status line that names the three shipped phases, architecture bullets for template-driven runtime / agent-token auth / memory + audit log, an updated backend layout including policy.py and agent_auth.py and the new agent-scoped models, the actual backend test count (97 -> 125), a note on migration 004, and two new conventions sections (trust moat, memory) so future edits stay inside the established patterns. README.md updates the tagline to mention the depth work, refreshes the routers/services/models trees, rewrites the agent-runtime section around the template-shaped container, bumps the test count (78 -> 125), and rewrites Scope and Status as Foundation + Hardening + 3-of-4 phases shipped with issue and PR references. ROADMAP.md annotates each of the Enforced Specialization five layers with built / partial / not-built status, rewrites What's Been Built as Foundation / Hardening / Code Review Engineer epic with PR numbers, and replaces What Needs Doing Next with the current backlog (referencing #11, #12, #13, #14, #15, #17, #18, #23). PROJECT_CONTEXT.md annotates the OAuth gateway bullets with which steps Phases B + D made real, and marks each Defensible Layer as partial with a pointer to what is and isn't built. LOCAL_SETUP.md gets a drift notice flagging the stale Next.js + Compose sections (tracked in #11), switches the migration step to schema.sql (the consolidated fresh-install snapshot), strengthens the rebuild-image guidance after skill changes, and bumps the test count. Also removes AGENT_SYSTEM_PROMPT.md and HANDOFF.md (now superseded by SOUL.md per-template and git history respectively) and scrubs their references from README.md and CLAUDE.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

michaelzwang13 and others added 2 commits May 23, 2026 16:36

michaelzwang13 merged commit 5df8281 into main May 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: reflect phases A/B/D across the md files#26

docs: reflect phases A/B/D across the md files#26
michaelzwang13 merged 2 commits into
mainfrom
docs/reflect-phases-abd

michaelzwang13 commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michaelzwang13 commented May 24, 2026

Docs reconciliation — what's built vs what was aspirational

Per-file

Removed

Deliberately not in scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant