Skip to content

feat: skills editor, memory files editor, dashboard shell, five built-in skills#56

Merged
mcheemaa merged 6 commits intomainfrom
feat/pr1-skills-memory-dashboard
Apr 14, 2026
Merged

feat: skills editor, memory files editor, dashboard shell, five built-in skills#56
mcheemaa merged 6 commits intomainfrom
feat/pr1-skills-memory-dashboard

Conversation

@mcheemaa
Copy link
Copy Markdown
Member

Summary

Ships PR1 of Project 3: the operator dashboard for Phantom.

Two tabs are live and production-grade in this PR:

  • Skills: create, read, update, delete, and lint markdown skills under the user-scope .claude/skills/ tree. Structured YAML frontmatter form plus a Monaco-quality body textarea with keyboard save, dirty-state tracking, atomic writes, and every edit audited in SQLite.
  • Memory files: the same CRUD story for arbitrary .md files under the user-scope .claude/ tree (excluding skills, plugins, agents, and settings JSON). CLAUDE.md, rules, and free-form memory all live here.

Six additional tabs (sessions, cost, scheduler, evolution, memory explorer, settings) ship as Coming Soon placeholders in the same dashboard shell.

Architecture

  • Storage (src/skills/, src/memory-files/): path validation, Zod-validated YAML frontmatter, linter, atomic tmp-then-rename writes, audit log tables.
  • API (src/ui/api/skills.ts, src/ui/api/memory-files.ts): JSON CRUD routes wired behind the existing cookie auth check in src/ui/serve.ts. Every mutating call records a row in skill_audit_log or memory_file_audit_log.
  • Reflective tools (src/agent/in-process-reflective-tools.ts): a new in-process MCP server (phantom-reflective) that exposes phantom_memory_search (semantic + temporal) and phantom_list_sessions directly to the agent, so the built-in reflective skills can actually fire.
  • Dashboard awareness (src/agent/prompt-blocks/dashboard-awareness.ts): a short block added to the environment section of the system prompt so the agent knows the dashboard exists and can direct the operator to it.
  • Dashboard UI (public/dashboard/): a single static HTML shell with a sidebar, hash router, and two JS modules. Vanilla JS, no React, no build step. Tailwind v4 tokens inherited from the existing phantom design vocabulary.
  • Built-in skills (skills-builtin/): mirror, thread, echo, overheard, ritual, show-my-tools. Seeded into the user-scope skills volume on container first boot; existing edits are preserved.

Test plan

  • bun test passes (1040 pass, 0 fail, +62 new tests vs main)
  • bun run lint clean
  • bun run typecheck clean
  • Manual walk-through of the dashboard in a browser: create, edit, and delete a skill; create, edit, and delete a memory file; verify the beforeunload guard fires on dirty state; verify theme toggle; verify Coming Soon placeholders render
  • Deploy a container build and verify the six built-in skills land in ~/.claude/skills/ on first boot
  • Send a Slack message that triggers one of the reflective skills and confirm it loads memory via the in-process tool

Rollback

Single commit-range revert on the branch. No schema rollback is needed because the two new migrations are additive tables with indices; leaving them in an inactive deployment is safe. Existing functionality in the dashboard has no coupling to the pre-existing /ui/ surface, so removing the new /ui/dashboard/ tree and the new API routes is a clean undo.

Add the on-disk storage layers for the PR1 dashboard's skills and memory
files tabs. Both subsystems validate paths, parse and serialize content
atomically via tmp-then-rename, cap content size, and record every edit
in a new SQLite audit log.

Skills at /home/phantom/.claude/skills/<name>/SKILL.md are parsed with a
Zod-validated strict YAML frontmatter schema (name, description,
when_to_use required; allowed-tools, argument-hint, arguments, context,
disable-model-invocation optional) and linted for missing fields, body
size, and shell red-list patterns. Memory files are any markdown under
/home/phantom/.claude/ excluding skills/, plugins/, agents/, and the
settings.json pair.

Adds skill_audit_log and memory_file_audit_log tables with indices.
Updates the migration test for the four new migrations.
Wire /ui/api/skills and /ui/api/memory-files into the existing serve.ts
request pipeline. Both route sets live behind the phantom_session cookie
check and return JSON bodies. Every mutating call records a row in the
appropriate audit log table.

Routes:
- GET    /ui/api/skills                       list
- POST   /ui/api/skills                       create
- GET    /ui/api/skills/:name                 read
- PUT    /ui/api/skills/:name                 update
- DELETE /ui/api/skills/:name                 delete
- GET    /ui/api/memory-files                 list
- POST   /ui/api/memory-files                 create (body includes path)
- GET    /ui/api/memory-files/<encoded-path>  read
- PUT    /ui/api/memory-files/<encoded-path>  update
- DELETE /ui/api/memory-files/<encoded-path>  delete

Adds setDashboardDb() to hand the database into the api dispatch so the
audit log writes go through the already-open connection used elsewhere.
Expose phantom_memory_search and phantom_list_sessions to the agent
itself as a new in-process MCP server (phantom-reflective). This is what
makes the five reflective built-in skills actually fireable: they can
query memory with temporal filters and enumerate recent sessions without
having to round-trip through the external MCP endpoint at /mcp.

phantom_memory_search wraps MemorySystem.recallEpisodes and recallFacts
with an optional days_back filter that maps to RecallOptions.timeRange
and the 'temporal' strategy. phantom_list_sessions reads the sessions
SQLite table directly with channel and days_back filters.

Also adds a small 'dashboard awareness' prompt block wired into the
environment section of the system prompt. The agent now knows the
dashboard exists at /ui/dashboard/, that its skills and memory files
are editable there, and that it should point the operator at those URLs
when asked 'what can I edit' or similar.
The hand-crafted operator dashboard at /ui/dashboard/. A single static
HTML shell with a sticky nav, a sidebar of eight tabs, and a main
content area that hash-routes between live tabs and Coming Soon
placeholders.

Two tabs are live in PR1:

- Skills: left column of skill cards grouped by source (built-in vs
  yours) with substring search. Right column is a full-fidelity
  editor with a structured YAML frontmatter form (name, description,
  when_to_use, allowed-tools chip input with autocomplete, argument
  hint, context select, user-invoke-only toggle) and a large
  JetBrains Mono body textarea. Tab inserts two spaces, Cmd/Ctrl+S
  saves, dirty-state dot pulses next to the title, lint hints render
  under the body, delete has a confirm modal. New skill modal offers
  blank or duplicate-from-mirror templates.

- Memory files: same split layout over any .md file under
  /home/phantom/.claude/. New file modal accepts any path ending in
  .md, creates nested directories automatically, and opens the
  editor on the new file. CLAUDE.md gets a small info banner noting
  it is the top-level memory loaded every session.

Six Coming Soon placeholders (sessions, cost, scheduler, evolution,
memory explorer, settings) render a serif headline, the expected PR,
and a link back to skills.

No React, no build step. Vanilla JS with one namespaced helper per
tab. Tailwind and DaisyUI tokens are not loaded on this shell; the
dashboard inherits the phantom- vocabulary spiritually by declaring
its own phantom-nav, phantom-chip, phantom-mono, phantom-meta, and
phantom-muted classes from the same token values as the base
template. Light and dark themes share the same primary indigo with
cream or warm-deep-dark surfaces.

Also adds a Dashboard quick-link to the existing /ui/ landing page.
beforeunload guards unsaved edits and the router respects the dirty
state on navigation.
Ship a small catalog of genuinely novel reflective skills that make a
fresh phantom feel alive from message one. Each is a full SKILL.md with
a strict YAML frontmatter, a Goal, numbered Steps with per-step success
criteria, and Rules.

- mirror: weekly self-audit playback. Pulls the last 7 days of memory,
  anchors observations to specific episodes, renders three sections
  (what I noticed, what I am unsure about, one question for you).

- thread: evolution of thinking on a topic. Takes a topic, clusters
  mentions chronologically, identifies turning points, renders a short
  narrative with callouts.

- echo: prior-answer surfacer. Before deriving a new answer to a
  substantive question, checks memory for semantically similar prior
  questions and surfaces the conclusion if the match is strong.

- overheard: promises audit. Scans the last 14 days for commitment
  phrases, checks for follow-through, surfaces the top 3-5 open
  promises with draft followup offers.

- ritual: latent patterns to scheduled jobs. Finds recurring behaviors
  in 60 days of sessions, verifies them against memory, proposes
  formalization as phantom_schedule jobs.

- show-my-tools: utility skill that lists current skills, memory
  files, and the dashboard URLs. The discovery path for everything
  the operator can edit.

All five reflective skills list the new in-process MCP tools
(mcp__phantom-reflective__phantom_memory_search,
mcp__phantom-reflective__phantom_list_sessions,
mcp__phantom-scheduler__phantom_schedule) in their allowed-tools field
so they can actually fire.

The skills ship in /app/skills-builtin/ inside the image. The docker
entrypoint copies each directory to /home/phantom/.claude/skills/ on
first boot only. Existing directories are preserved, so operator edits
survive container rebuilds. Dockerfile copies the skills-builtin tree
in both the builder and runtime stages.
@mcheemaa mcheemaa marked this pull request as ready for review April 14, 2026 01:30
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 96a018f9f7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

results.episodes = await memory.recallEpisodes(input.query, opts).catch(() => []);
}
if (input.memory_type === "semantic" || input.memory_type === "all") {
results.facts = await memory.recallFacts(input.query, { limit }).catch(() => []);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pass temporal filters to semantic fact recall

When days_back is provided, phantom_memory_search builds a temporal RecallOptions for episodes, but semantic facts are still fetched with { limit } only, so facts are not time-bounded. In reflective skills that request recent windows (for example weekly reviews), this leaks older facts into the result set and contradicts the tool contract that days_back limits returned memory items.

Useful? React with 👍 / 👎.

context: SkillContextSchema.optional(),
"disable-model-invocation": z.boolean().optional(),
})
.strict();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Permit built-in source marker in frontmatter schema

The schema is strict, so any extra key is rejected at parse time, but source detection later relies on an optional x-phantom-source marker (detectSource in src/skills/storage.ts) to classify built-in/agent skills. As written, that marker can never be present without causing a parse failure, so skills are always treated as user-sourced and built-in-specific UI behavior (grouping/guardrails) cannot work.

Useful? React with 👍 / 👎.

1. Pass temporal filters to semantic fact recall in the in-process
   reflective MCP server. phantom_memory_search was building a temporal
   RecallOptions for episodes when days_back was set, but recallFacts
   was being called with only { limit }. Reflective skills like mirror
   that ask for a 7-day window were leaking older facts into the result
   set. Fix is one word: pass the same opts object to recallFacts. The
   downstream semantic.recall already honors timeRange via a Qdrant
   range filter on valid_from.

2. Permit the x-phantom-source provenance marker in the SKILL.md
   frontmatter schema. The Zod schema was strict and detectSource() in
   src/skills/storage.ts read frontmatter['x-phantom-source'] without
   the field being declared, so any built-in skill that set the marker
   would have been rejected at parse time. Added the field to the
   schema as an optional enum of "built-in" | "agent" | "user", added
   the marker to all six built-in SKILL.md files (echo, mirror,
   overheard, ritual, show-my-tools, thread), and added tests for both
   the schema acceptance and the source classification.

Quality gates: bun test 1044 pass / 0 fail (+4 new tests), bun run
lint clean, bun run typecheck clean.
@mcheemaa mcheemaa merged commit 98f09b2 into main Apr 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant