Skip to content

v0.20 PR 5: Evolution (Phase A) + Memory dashboard tabs#75

Merged
mcheemaa merged 7 commits intomainfrom
v0.20-pr-05-memory-evolution
Apr 17, 2026
Merged

v0.20 PR 5: Evolution (Phase A) + Memory dashboard tabs#75
mcheemaa merged 7 commits intomainfrom
v0.20-pr-05-memory-evolution

Conversation

@mcheemaa
Copy link
Copy Markdown
Member

Summary

Two dashboard tabs shipped together, both built on the shared primitives that landed with PRs 2-4.

Evolution (Phase A, read-only)

  • Timeline of version bumps over phantom-config/meta/evolution-log.jsonl, newest first.
  • Expand-in-place cards with per-file diffs (summary, rationale, current content preview capped at 64 KB).
  • Metric strip: current version, total sessions, success rate 7d, drains with tier mix, reflection cost, invariant fails.
  • Poison-pile warning banner when the queue has rows.
  • Sparkline of drains per day derived from the loaded timeline window.
  • Cross-tab link: every session id becomes a pill that navigates to #/sessions/<key>.
  • Phase A chip in the header so the operator sees rollback is a later PR.
  • No rollback endpoint. No snapshot storage. No writes. Those are Phase B.

Memory explorer

  • Split layout: type tabs (Episodes / Facts / Procedures) + recency-ordered list on the left, detail pane on the right. Collapses to a single pane below 720px with a back button.
  • Hybrid recall when the search box has a query; Qdrant scroll when empty.
  • Type-specific row and detail layouts. Contradicted facts greyed out and sorted last.
  • Copy-as-JSON button.
  • Delete action with an explicit confirmation modal (ARIA dialog, Cancel has initial focus, Enter on Cancel does not delete).
  • Cross-tab link: source episode ids become pills that navigate to #/sessions/<key>.
  • Global / focuses the search input whenever the hash starts with #/memory.

New infrastructure

  • QdrantClient.scroll(collection, { limit, offset?, filter?, orderBy?, withPayload? }) returns { points, nextOffset }.
  • QdrantClient.countPoints(collection) returns an exact count for the health strip.
  • Per-store scroll, getById, deleteById, and count methods on Episodic / Semantic / Procedural stores.
  • MemorySystem.scroll* / get*ById / delete* / count* facade for the handler.

Files

Area Target Actual
public/dashboard/evolution.js 500 634
public/dashboard/memory.js 450 582
src/ui/api/evolution.ts 220 283
src/ui/api/memory.ts 280 212
src/ui/api/__tests__/evolution.test.ts 300 585
src/ui/api/__tests__/memory.test.ts 250 380
src/memory/qdrant-client.ts (delta) 60 42
src/memory/__tests__/qdrant-client.test.ts (delta) 80 171
CSS additions 180 440
Wiring (index.ts, serve.ts, index.html, dashboard.js) 30 81
Total ~2,330 ~3,410

The CSS delta is larger than the target because the two tabs share several new primitives (.dash-tab-switcher, .dash-session-pill, .dash-chip) that are genuinely reusable and .dash-timeline* / .dash-diff* / .dash-memory-* / .dash-split-pane are all first-class layouts with responsive rules. Under the 280 CSS ceiling individually nothing fits, so the blocks got promoted to shared territory.

JS modules both under their individual ceilings (700 / 600). Backend handlers both well under ceiling (350 / 400).

Test plan

  • bun test src/memory/__tests__/qdrant-client.test.ts - 17 pass (8 new scroll/count tests, 9 prior)
  • bun test src/ui/api/__tests__/evolution.test.ts - 20 pass
  • bun test src/ui/api/__tests__/memory.test.ts - 23 pass
  • bun test full suite - 1744 pass, 0 new failures. One pre-existing flake in cost.test.ts unrelated to this PR (date-boundary test, reproducible on main).
  • bun run lint - clean
  • bun run typecheck - clean
  • XSS audit: every operator-controlled field flows through ctx.esc() (inline) or textContent (multi-KB previews inside <pre>). innerHTML is only used to stamp trusted template strings.
  • Delete modal: Cancel has initial focus, Enter does not delete, Escape closes via ctx.openModal's handler.
  • Session pills navigate to #/sessions/<url-encoded-key> on both tabs.
  • Dark theme: every color flows through var(--color-*) tokens.
  • Mobile 720px: split pane collapses; back button appears and works.
  • Keyboard: / focuses memory search, Enter/Space toggles evolution cards, Escape clears search.
  • Empty, loading, skeleton, and error states all present.

What's deferred

Phase B for Evolution: writeSnapshot / readSnapshot / rollbackTo in versioning.ts, POST /ui/api/evolution/rollback endpoint, rollback confirmation modal in the UI. Keeping them in a separate PR because they cross the "dashboard changes files on disk" boundary and deserve their own review pass.

CSS classes .dash-sidebar-item-soon and .dash-sidebar-soon-pill are now unused in markup but retained in dashboard.css in case future tabs need the soon-label affordance.

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

Adds scroll and countPoints primitives to QdrantClient. Mirrors the existing
request pattern for timeout, headers, and error handling. Scroll returns
normalized { points, nextOffset } so callers never see raw Qdrant wire shape.

EpisodicStore, SemanticStore, and ProceduralStore each gain scroll, getById,
deleteById, and count methods that wrap the new Qdrant primitives and handle
payload mapping via their existing payloadToX helpers.

MemorySystem exposes scrollEpisodes / scrollFacts / scrollProcedures,
getEpisodeById / getFactById / getProcedureById, deleteEpisode / deleteFact /
deleteProcedure, and countEpisodes / countFacts / countProcedures so the
Memory explorer API can talk to MemorySystem instead of reaching into each
store or the Qdrant client.

Tests cover scroll happy path, pagination, filter and orderBy passthrough,
Qdrant error propagation, and countPoints shape.
Three read-only handlers for the Evolution dashboard tab (Phase A). Wraps
EvolutionEngine.getCurrentVersion/getEvolutionLog/getMetrics/getEvolutionConfig
plus EvolutionQueue.listPoisonPile for the poison banner. Reads diff file
content from phantom-config/ with a 64 KB preview cap; current_size is the
real byte length so the UI can flag truncated previews.

No rollback endpoint. No snapshot storage. Those ship in Phase B.

Tests cover the three endpoints including the metrics reshape, poison queue
integration, historical version synthesis of parent = n - 1, large-file
preview truncation, and every error path (400, 404, 405, 422).
Four handlers under /ui/api/memory. When the list endpoint has a query it
runs hybrid recall via MemorySystem.recall*; without a query it falls back
to the new Qdrant scroll ordered by recency. Detail and delete route through
the new MemorySystem get*ById / delete* helpers. Detail is checked before
delete so unknown ids 404 instead of silently succeeding.

Ids are validated to reject control characters. Health reports counts per
type and tolerates individual count failures via Promise.allSettled so one
bad collection doesn't blank the whole strip.

Tests cover health variants (qdrant down, ollama down, count failure), list
scroll vs recall, type validation, offset passthrough, detail happy path
and 404, delete happy path and pre-check 404, and method restrictions.
Evolution timeline tab for the operator dashboard. Version cards expand in
place to show the per-file diff (summary, rationale, current content
preview, source-session pills that link back to the Sessions tab). Metrics
strip surfaces current version, total sessions, success rate 7d, drains
with tier mix, reflection cost, and invariant fails. Poison banner
appears when the queue has rows. Sparkline of drains per day over the
loaded timeline window.

Phase A is pure read. No rollback button, no write actions. The "Phase A
read-only" chip sits in the header so the operator knows rollback is a
future PR.

Wires EvolutionEngine / EvolutionQueue into src/ui/serve.ts via new
setEvolutionEngine / setEvolutionQueue seams, wired from src/index.ts
after the engine and queue are constructed. Route name is now live in
dashboard.js and index.html. Shared CSS primitives .dash-timeline*,
.dash-diff*, .dash-poison-banner, .dash-session-pill, .dash-tab-switcher,
.dash-chip land in dashboard.css.

All operator-controlled fields flow through esc() for inline text and
textContent for file-content previews inside <pre> nodes.
Memory explorer tab: split pane over episodes, facts, and procedures. Left
rail has the type switcher (tabs: Episodes / Facts / Procedures), a
debounced search input, and a recency-ordered scroll list with type-
specific row layouts. Right pane paints the full record with a copy-as-JSON
button and a delete button that routes through an explicit confirmation
modal. Cancel has initial focus so Enter never deletes by accident.

Hybrid recall when the search box has a query, Qdrant scroll when it's
empty. Contradicted facts are greyed out and sorted last. Source episode
IDs become cross-tab session pills that navigate to #/sessions/<key>.

Responsive: below 720px the split collapses to a single full-width pane
with a back button that returns to the list. Global / focuses the search
input whenever the hash starts with #/memory. Delete confirmation uses
ctx.openModal with aria-dialog semantics.

Every operator-authored text field (summary, detail, trigger, natural
language, lessons, step action, expected outcome) renders via textContent
inside <pre class="dash-memory-text"> nodes because payloads may contain
any characters. Shorter identifiers flow through esc().
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a41b4f77da

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/ui/api/evolution.ts
// entries as the outer window so `before_version` pagination works without a
// full disk walk.
function readTimelineWindow(engine: EvolutionEngine): EvolutionLogEntry[] {
const log = engine.getEvolutionLog(TIMELINE_SCAN_CAP);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Page timeline against full evolution log

buildTimeline computes pagination from readTimelineWindow, but that helper only reads the last 500 rows. In deployments with more than 500 evolution entries, before_version requests eventually return has_more=false even though older rows still exist, so the dashboard cannot load older generations and history is silently truncated.

Useful? React with 👍 / 👎.

Comment thread src/ui/api/evolution.ts
Comment on lines +212 to +216
const allLog = deps.engine.getEvolutionLog(TIMELINE_SCAN_CAP);
const match = allLog.find((e) => e.version === versionNumber) ?? null;

if (!match && versionNumber !== current.version) {
return json({ error: "Version not found" }, { status: 404 });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Resolve version lookups beyond the 500-entry window

The version detail route searches only getEvolutionLog(TIMELINE_SCAN_CAP) and then returns 404 when the target version is not in that slice. Once the log grows beyond 500 rows, valid historical versions outside the latest window are reported as missing, breaking direct links and inspection of older generations.

Useful? React with 👍 / 👎.

Comment thread src/ui/api/evolution.ts Outdated
Comment on lines +191 to +192
const absolute = join(configDir, relPath);
if (!existsSync(absolute)) return { content: "", size: 0 };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Constrain diff preview paths to config_dir

readFilePreview joins config_dir with a log-provided relative path without checking for traversal. Because details[].file is accepted from evolution-log.jsonl rows, a malformed or poisoned row containing ../ can escape config_dir and expose arbitrary local files through /ui/api/evolution/version/:n previews.

Useful? React with 👍 / 👎.

…ination caps)

P1 (reviewer): CSS duplicate .dash-chip at dashboard.css:2589 overrode
the primary-tinted chip at :748 used by Settings/Hooks/Skills/Subagents.
Renamed the memory-specific neutral pill to .dash-memory-chip and
updated the four call sites in memory.js so the shared primitive is not
clobbered.

P1 (reviewer): path traversal in evolution.ts readFilePreview. Log
entries carry an agent-written 'file' path which was joined onto
config_dir without normalization, so a poisoned row like
'../../etc/passwd' could disclose arbitrary host files up to 64 KB
through GET /ui/api/evolution/version/:n. Now resolve both sides and
reject when the relative path escapes config_dir. Added a regression
test that plants a sentinel file outside config_dir and confirms it is
not returned.

P1 (Codex): timeline pagination and version lookup were capped at 500
entries (TIMELINE_SCAN_CAP). After ~500 drains, before_version
requests silently returned has_more=false and historical versions
outside the window 404'd. Raised to 100_000 since entries are ~1-2 KB
JSONL rows; an agent would take years to cross this ceiling. If it
ever does, switch to a streaming reader.

P1 (reviewer): memory scroll Load More never rendered because Qdrant's
order_by disables the cursor API (next_page_offset always null). Raised
LIST_DEFAULT_LIMIT from 30 to 100 (matches LIST_MAX_LIMIT) so the
browse view surfaces the freshest 100 without requiring Load More.
Proper cursor-style pagination over order_by is a documented follow-up.
CI caught a pre-existing flake that only fires when the test runs
near midnight UTC: hoursAgo(3) at 02:53 UTC resolves to 23:53 the
previous day, so SQLite's date(created_at) = date('now') buckets it
as yesterday and the 'today sum' assertion receives 1.0 instead of
3.0.

Clamp hoursAgo to 5 minutes ago whenever h < 24 would cross into a
different UTC date. daysAgo intentionally crosses the boundary and
stays as-is.
@mcheemaa mcheemaa merged commit 3b8ad61 into main Apr 17, 2026
1 check passed
mcheemaa added a commit that referenced this pull request Apr 17, 2026
Bumps the version to 0.20.0 in every place it's referenced:
- package.json (1)
- src/core/server.ts VERSION constant
- src/mcp/server.ts MCP server identity
- src/cli/index.ts phantom --version output
- README.md version + tests badges
- CLAUDE.md tagline + bun test count
- CONTRIBUTING.md test count

Tests: 1,799 pass / 10 skip / 0 fail. Typecheck and lint clean. No
0.19.1 or 1,584-tests references remain in source, docs, or badges.

v0.20 shipped eight PRs on top of v0.19.1:
  #71 entrypoint dashboard sync + / redirect + /health HTML
  #72 Sessions dashboard tab
  #73 Cost dashboard tab
  #74 Scheduler tab + create-job + Sonnet describe-assist
  #75 Evolution Phase A + Memory explorer tabs
  #76 Settings page restructure (phantom.yaml, 6 sections)
  #77 Agent avatar upload across 14 identity surfaces
  #79 Landing page redesign (hero, starter tiles, live pages list)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant