Skip to content

Conversation Search

Sia edited this page May 31, 2026 · 3 revisions

Conversation Search & Archive

Three related features on top of the conversation_turns table (see Conversation History for the storage): search, export/import, and auto-archive.

Per-project search

/projects/{id}/history and /chat/history filter inside a single project's turns:

  • Session dropdown.
  • Role filter (user, assistant, tool_use, tool_result, etc.).
  • Tool name filter.
  • ISO-timestamp from / to.
  • content_tsv @@ plainto_tsquery('simple', q) text search.
  • Agent filter — main only / all / @<name>.

Pagination is 100 turns/page, ts ASC (oldest first).

Cross-project search

/history (no project id) walks all projects in one query.

$ Visit /history
$ q: "settings screen"
$ role: (all)
$ → 200-hit hard cap, ts DESC (newest first)

For each match:

  • Project id + role pill in the left column.
  • Match preview = ±100 chars around the matched word.
  • The matched substring is wrapped in <mark style="background:#facc15">…</mark> for visual scan.

User input is bound via QueryBuilder.registerArgument so SQL injection is impossible by construction. plainto_tsquery further sanitizes by tokenizing the input into AND-combined lexemes — meta characters / quotes are safe. The empty query returns an empty result page (prevents accidental "dump all turns" requests).

Limits

  • 200-row hard cap — typing a very common word doesn't drag the page out to multi-MB.
  • Search algorithm — PostgreSQL tsvector + GIN. Sub-millisecond on hundreds of thousands of rows per EXPLAIN ANALYZE. Token-level match (no substring); CJK best-effort via the simple tokenizer.

Export / import

Each project's /history page has two new buttons:

  • 📥 JSON 다운로드GET /projects/{id}/history/export streams every turn in the project as application/json with envelope:

    {
      "schemaVersion": 1,
      "projectId": "my-app",
      "exportedAt": "2026-05-24T15:00:00Z",
      "turnCount": 1234,
      "turns": [ { "sessionId": "...", "turnIdx": 0, "ts": "...", "role": "user", "content": "...", ... }, ... ]
    }
  • 📤 JSON 가져오기 — multipart upload to POST .../history/import. Dry-run on by default (counts what would be inserted, no DB write). Uncheck to actually import.

Idempotency

The import is session-id-level idempotent:

  • If a session id already exists in the target project, the whole session is skipped (no partial merge).
  • This avoids the complexity of row-by-row dedup with re-issued turnIdx sequences.

A warning string in the response lists how many sessions were skipped and why.

Use cases

  • Migrate to a new vibe-coder host. Export per project on the old host, import on the new one. PostgreSQL data dir doesn't need to be copied.
  • Cross-instance backup. Export to S3 / git lfs / wherever.
  • Hand off to a downstream analysis tool (LangChain trace inspectors understand the same shape).

Auto-archive

ConversationArchiver runs every 24 h in the background. Each tick:

  1. Finds every (projectId, sessionId) whose max ts is older than archiveAfterDays (default 30 days).
  2. Builds the same JSON envelope as the manual export (single-session variant).
  3. Writes it to <workspace>/.vibecoder/<projectId>/archive/session-<sid>.json.
  4. Only if the dump succeeded, deletes those rows from conversation_turns.

The dump and delete are not transactional — if the server crashes between, the next tick re-detects the session, sees the file already exists (skips dump), and re-attempts the delete. So a single crash never loses data.

Restoring archived sessions

There's no "restore from archive" button. The workflow:

  1. Find the JSON: vibe-coder-data/workspace/.vibecoder/<projectId>/archive/session-<sid>.json.
  2. Upload it through /projects/{projectId}/history 📤 import button (it's the same envelope schema).

Tuning

ConversationArchiver(workspace, archiveAfterDays = 90)

is the wiring point in ServerMain.kt. Change 30 → 90 (or 7 for aggressive cleanup) and rebuild the image. It's a constructor parameter.

Audit log

Action Audited as
Manual export (download) Not logged (read-only, frequent)
Manual import Not logged
Auto-archive tick Server stdout log only (not in audit_log)

Related

Clone this wiki locally