Skip to content

first cut of promotion web ui#1

Merged
emp3thy merged 49 commits into
mainfrom
webui
Apr 19, 2026
Merged

first cut of promotion web ui#1
emp3thy merged 49 commits into
mainfrom
webui

Conversation

@emp3thy
Copy link
Copy Markdown
Owner

@emp3thy emp3thy commented Apr 19, 2026

Adds basic web UI and the some of the initial panels


Note

Medium Risk
Introduces a new Flask-based local web server that reads/writes the SQLite memory.db (approve/reject/edit/retire/demote), plus process-exit shutdown/inactivity logic; mistakes could cause unintended data/status changes or unexpected termination, though exposure is limited to localhost with Origin/Referer checks.

Overview
Adds a new Flask+HTMX management UI (python -m better_memory.ui) that binds to localhost on a random port, writes the discovered URL to $BETTER_MEMORY_HOME/ui.url, and supports /shutdown, /healthz, Origin/Referer enforcement for non-GET, and an inactivity watchdog.

Implements a Pipeline kanban view backed by new read-only SQL query helpers and InsightService updates, enabling UI actions to approve/reject candidates, edit titles/content, retire/demote insights, view linked source observations, and show promotion/merge/consolidation as Phase-stubbed flows. Also vendors static assets/templates/CSS, promotes resolve_home() for cross-module use, adds a GitHub Actions workflow to run tests/ui, and documents manual UI launch in the README.

Reviewed by Cursor Bugbot for commit be08e25. Bugbot is set up for automated code reviews on this repo. Configure here.

emp3thy and others added 30 commits April 18, 2026 23:10
Covers the web skeleton, Pipeline Kanban, Sweep Review, Knowledge
Editor, Promotion Workflow, Audit Timeline, Graph View, and
memory.start_ui() tool. Flask + HTMX, services-backed, 127.0.0.1
with Origin checks, HTMX polling, Cytoscape for the graph view.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- POST /shutdown defers os._exit via threading.Timer so Flask flushes
- Cross-fragment refresh uses HX-Trigger header + hx-trigger from:body
- Graph layout switches to fcose (handles 500 nodes better than cose)
- Graph side-panel spells out per-type fragments (observation, insight,
  knowledge doc)
- Only-one-expanded card behavior implemented via data-expanded + hx-on
- Audit date filter uses from/to date inputs plus preset shortcuts
- Consolidation single-job enforcement uses module-level threading.Lock
- Styling is hand-written CSS; no framework
- Directory layout lists the new fragments and fcose vendored file

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers spec §2 (tech/process) and §3 (layout shell, routes). Eleven
tasks from flask dep through __main__ subprocess integration, with
TDD at every step. Does not wire services or the memory.start_ui()
MCP tool — those come in Phase 2 and Phase 10 respectively.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- HTMX pin bumped to 2.0.8 (latest stable; 2.0.4 works but is older)
- Watchdog refactored: idle-check helper exposed on app.config so
  tests can invoke it synchronously. No real-time sleeps in tests.
  Adds start_watchdog=False kwarg for tests that don't need the thread.
- Subprocess integration tests use stdout=DEVNULL to remove any
  Windows pipe-buffer deadlock risk.
- Badge template renders empty string when count=0 so the CSS
  :empty rule can hide it (previously emitted "0", defeating the hide).
- Badge test switches from view-monkeypatching to direct
  render_template in app_context — cleaner.

All validated empirically against werkzeug 3.1.8 / flask 3.1.3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task 1 now renames _resolve_home → resolve_home in config.py and
verifies no stale references remain. __main__.py and app.py in
Task 9 import the public name.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ffold

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ub, docstring)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Also exempt HEAD from the Origin check so curl -sI smoke tests pass;
HEAD is a safe read-only method equivalent to GET.
15 tasks implementing spec §4: summary bar + drill-in panel, compact
and expanded cards for each stage, click-to-expand with only-one-open
script, per-stage action routes (approve/reject/edit for candidates;
retire/edit/demote/view-sources for insights), stub-outs for
ConsolidationService (Phase 3) and Promotion workflow (Phase 7),
and a jobs.py with threading.Lock ready for Phase 3 to fill in.

Smoke tests for Approve/Reject/Retire/Demote against consolidation-
generated data are deferred to Phase 3, since Phase 3 ships the
data generator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Task 3: promote _row_to_insight to public row_to_insight (same
  pattern as Phase 1's resolve_home). 88% -> 95%.
- Task 5: tighten brittle ">1<" assertion to a regex on
  <span class="count">N</span>. 85% -> 92%.
- Task 8/11: replace hx-swap="beforeend" for View sources with a
  dedicated #sources-<id> container + innerHTML swap. Fixes double-
  append on repeat click. 82%/85% -> 92%.
- Task 9: add a regression test that asserts base.html includes the
  htmx:beforeRequest listener and the DOM markers it walks. 80% -> 85%.
- Task 12: replace inline onclick with hx-on:click to stay within
  HTMX's blessed pattern. 88% -> 93%.
- Task 13: test now extracts job_id from response fragment regex
  instead of calling current_job_id() (which the stub clears). 70%
  -> 90%. (already committed in dd59fa2 amendment)
- Task 15: explicit cd to project root in smoke script so
  Path.cwd().name matches the user's real observations. 85% -> 92%.

Remaining <90%: Task 7 (80%), Task 9 (85%). Both are structural risks
the plan itself cannot fully eliminate — caught at execution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task 16 ships a Playwright browser-test harness. A tmp_path fixture
spawns the UI subprocess with an isolated BETTER_MEMORY_HOME, applies
migrations, reads ui.url, and hands it plus the home dir to the test.
One representative test: expanding a second card must auto-collapse
the first (spec §4). Future UI phases extend with their own browser
tests. Task 9's confidence rises to 95% with real behavioral
verification.

Task 17 adds .github/workflows/ui-tests.yml running pytest tests/ui/
on every PR that touches UI paths. Uses astral-sh/setup-uv with cache
and playwright install --with-deps chromium. Full-repo tests excluded
for now because tests/mcp/test_server_integration.py needs a running
Ollama instance that CI doesn't provide; separate workflow later.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- create_app() accepts db_path kwarg; resolves to resolve_home()/memory.db if omitted
- Opens sqlite connection via connect() and registers it in app.extensions["db_connection"]
- Instantiates InsightService and registers it in app.extensions["insight_service"]
- Adds no-op teardown_appcontext hook (connection lives for app lifetime, not per-request)
- conftest: replaces client fixture with tmp_db + migrated-DB-backed client
- conftest: patches threading.Timer in client fixture to prevent real os._exit during tests
- test_app: adds TestServiceWiring (2 tests) verifying both extensions are present and usable
emp3thy and others added 18 commits April 19, 2026 15:58
Replace Phase 1 placeholder with summary bar (4 pills with live counts),
panel container with load+10s+job-complete HTMX triggers defaulting to
candidates, and stub routes for pipeline_panel and pipeline_consolidate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements Task 11: replaces 5 stub routes (retire, demote, edit GET/POST,
compact, sources) with real InsightService-backed implementations; adds
list_insight_sources query helper; creates insight_sources.html fragment;
adds TestInsightActions (5 tests). 59/59 pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread better_memory/ui/jobs.py
Comment thread better_memory/ui/jobs.py Outdated
Two issues flagged by Cursor Bugbot on PR #1:

1. Race (Medium): _current_job_id = None was assigned AFTER
   _lock.release() in the finally block. A second caller could
   acquire the lock, set _current_job_id to its own id, and have
   the first caller's stale finally overwrite it back to None.
   Swap the order so the clear happens while the lock is still held.

2. Dead code (Low): current_job_id() was defined but never imported
   or called. YAGNI — remove. Phase 3 can add it back when a real
   consumer needs it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit be08e25. Configure here.

<button hx-get="{{ url_for('insight_edit', id=i.id) }}"
hx-target="closest .card" hx-swap="outerHTML">Edit</button>
<button hx-get="{{ url_for('insight_sources', id=i.id) }}"
hx-target="#sources-{{ i.id }}" hx-swap="innerHTML">View sources</button>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded promoted card shows broken Promote instead of Demote

Medium Severity

The insight_card route serves both confirmed and promoted insights, but the insight_card_expanded.html template unconditionally renders a "Promote" button and never renders a "Demote" button. When a user clicks a promoted card (from promoted_card_compact.html), it expands into this template showing a "Promote" button that returns a 404 (since insight_promote requires status == "confirmed"), while the "Demote" functionality available on the compact card is lost entirely. The insight_compact_card route correctly distinguishes templates by status, but insight_card does not.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit be08e25. Configure here.

@emp3thy emp3thy merged commit 372e2d5 into main Apr 19, 2026
2 checks passed
@emp3thy emp3thy deleted the webui branch April 19, 2026 18:18
emp3thy added a commit that referenced this pull request May 12, 2026
…edupe exposure list

Three BugBot review findings on PR #52:

1. HIGH: session_close.py wrote the session_end spool marker
   unconditionally after emitting decision:block, causing premature
   synthesis runs and duplicate marker writes on the second Stop fire.
   Fix: _emit_rating_directive_if_unrated now returns bool; main()
   exits early when a block was emitted. Marker lands only on the
   final (non-blocking) Stop fire.

2. MEDIUM (dup): same root cause as #1.

3. MEDIUM: memory.list_session_exposures and the session_close
   directive query both returned duplicate (kind, id) entries when a
   memory had two exposure rows (bootstrap + retrieve). The LLM would
   submit two ratings for the same (kind, id) pair and
   apply_session_ratings would reject the whole batch.
   Fix: GROUP BY (memory_kind, memory_id) with MIN(exposed_at) in
   both queries. Matches _apply_one's contract (one rating stamps all
   unrated rows for a (kind, id)).

Tests:
- Updated test_non_empty_unrated_emits_decision_block to assert no
  marker is written when block is emitted.
- New test_multi_row_exposure_dedupes_in_directive verifies dedupe
  in session_close directive.
- New test_dedupes_multi_row_exposure verifies dedupe in MCP
  list_session_exposures tool.

862 tests pass (was 860, +2 new).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant