feat(data): FTS-powered context enrichment for LLM chat by cpcloud · Pull Request #933 · micasa-dev/micasa

cpcloud · 2026-04-14T22:28:15Z

Summary

FTS5 entities_fts table indexing 7 entity types (projects, vendors, appliances, incidents, quotes, maintenance items, service logs), wired into the chat pipeline so the LLM gets entity context in both SQL and summary prompts.
Entity search hardened: per-type quotas guarantee cross-type representation (three-tier window-function query), NL-tolerant query (lowercase, stopword filtering, OR-join of content words as quoted prefix phrases) so questions like "what's the status of the kitchen project?" match, rank threshold plumbing, stable entity_id tiebreaker.
entities_fts stays current via SQLite triggers: own-row INSERT/UPDATE/DELETE per entity plus cascading refresh of children — quote rows refresh when their parent project or vendor changes; service_log rows refresh when their parent maintenance_item changes.
BuildFTSContext escapes delimiters to protect against prompt injection. EntitySummary returns tri-state (found/stale/missing) so the caller can revalidate before using cached results.
micasa eval fts subcommand for chat-quality evaluation: seeded fixture, 8 default questions, FTS-on/off A/B arms, deterministic regex rubric plus optional LLM judge (tolerant parser for real-world model output — markdown decoration, :/= separators, <think> blocks), table/markdown/JSON reports (table default on TTYs), --strict exit code on per-question rubric regression, Nix app wrapper.
AGENTS.md rules against reinventing stdlib helpers and passing conceptual-zero values (nil/empty/0) to third-party functions without checking godoc.

Closes #707

Adds plans/707-fts-context-enrichment.md (spec) and plans/707-fts-context-enrichment-plan.md (implementation plan) for the first FTS feature: injecting entity-search context into the chat pipeline's SQL-generation, summary, and fallback prompt builders. Refs micasa-dev#707.

Wires FTS entity search into all three chat prompt builders (BuildSQLPrompt, BuildSummaryPrompt, BuildSystemPrompt fallback). Each chat turn runs SearchEntities against an entities_fts virtual table indexing every text-bearing entity type, fetches a live one-line summary for each hit via EntitySummary (which revalidates against the source table), fences the results with a clearly-labeled BEGIN/END block, and escapes the fence delimiter + code-fence tokens in user-controlled entity text to prevent prompt-injection breakout. Stage 1 gets the context for better SQL (entity IDs for disambiguation, better WHERE filters, fewer fragile LIKEs). Stage 2 gets it as disambiguation-only background ("use solely for disambiguation, not as a source of additional facts") so summaries are still grounded in the SQL results. Fallback gets it to improve the no-SQL answer path. Error handling: any failure in SearchEntities or EntitySummary short-circuits to empty context, matching pre-FTS behavior exactly -- enrichment degrades gracefully. Regression test covers the query-error wrapping path (corrupted FTS schema is forwarded as a wrapped "search entities:" error). Refs micasa-dev#707.

Adds plans/707-fts-eval-and-hardening.md: follow-up plan for the FTS context-enrichment work. Three pieces: A. A `micasa eval fts` subcommand that runs a benchmark question set through the live chat pipeline against a fixture DB (or the user's own DB), grades each answer with a deterministic rubric plus an optional LLM judge, and reports FTS-on vs FTS-off deltas. B. SearchEntities becomes a single window-function query with per-entity-type quotas and a BM25 rank threshold. C. setupEntitiesFTS installs AFTER INSERT/UPDATE/DELETE triggers on every source table. UPDATE triggers on parents whose text is embedded in a child's FTS row cascade a refresh to those children. Covers acceptance criteria, privacy warnings for non-fixture runs, partial-failure taxonomy, and JudgeScore sentinel semantics (-1 = "not run"; 0 = genuine all-criteria-failed grade). Refs micasa-dev#707.

setupEntitiesFTS now installs AFTER INSERT / UPDATE / DELETE triggers on every source table that contributes rows to entities_fts (projects, vendors, appliances, maintenance_items, incidents, service_log_entries, quotes). Parent tables whose text is embedded in a child's entity_name (project.title and vendor.name in quote, maintenance_item.name in SLE) get companion _au_cascade triggers that rebuild the child's FTS row when the parent is updated. Cascade JOINs filter on parent.deleted_at IS NULL so a parent soft-delete degrades the child's entity_name (project title disappears from the quote; vendor name disappears; SLE name blanks out) instead of leaving stale text in the index. The populate path carries the same filter so initial rebuilds on app open match the trigger invariant. Trigger installation is idempotent (DROP IF EXISTS + CREATE), so schema drift across app versions heals on the next Store.Open. FK constraints (RESTRICT on quote parents, CASCADE on SLE parents) continue to govern hard-delete feasibility; parent _ad triggers are plain single-table cleanups, no cascade blocks needed. Tests cover: insert, rename, soft-delete, parent-rename cascade for all three relationships, parent-soft-delete cascade via raw DML (the app gates soft-delete with live children, so the cascade path is exercised by sync in production; raw DML matches that scenario in tests), FK cascade on maintenance_item hard-delete, and initial rebuild preserving the soft-delete filter for both SLE and quote joins. Refs micasa-dev#707.

Replaces the flat LIMIT 20 in SearchEntities with a three-tier window-function query and adds natural-language query tolerance. Ranking: - Tier 1 takes exactly one row per matching entity type (guarantees cross-type representation). - Tier 2 raises each type up to ftsEntityKPerType rows so single noisy types can't dominate. - Tier 3 fills any remaining room up to ftsEntityTotalCap from whatever's left, globally ranked. Single-type searches use the full cap this way. Package-level tuning constants (not user-configurable -- the eval harness is the tuning channel): ftsEntityKPerType = 5 ftsEntityRankCeiling = 0.0 // permissive; eval will tighten ftsEntityTotalCap = 20 entity_id tiebreaks rank in every ORDER BY so results are stable when BM25 produces identical ranks on similarly-shaped rows. Query tolerance: - prepareFTSEntityQuery lowercases, strips non-alphanum, drops short and stopword tokens, and OR-joins the survivors as quoted prefix phrases. - Returns early when no content words survive so a pure-stopword question like "what is it?" doesn't hammer FTS with an empty MATCH. Tests cover per-type quota preservation under a flood of first-class matches, single-type searches using the full cap, every matching type surfacing when 5+ types share a token, total cap enforcement, rank threshold plumbing, stable ordering across runs, the query builder directly, and the end-to-end regression that "what's the status of the kitchen project?" now surfaces the Kitchen Remodel project. Refs micasa-dev#707.

Wires the chat-quality eval described in the plan: - internal/ftseval/ package with typed Config, Question, ArmResult, RunResult; Run() drives each question through both FTS arms against a pre-built store, grades with a deterministic regex rubric plus an optional LLM judge, and returns the per-question results. - Fixture seed (SeedFixture) populating projects, vendors, appliances, maintenance items, incidents, one service log, and one quote that ties kitchen to Pacific Plumbing (with the "permit delays" vendor note the long-tail-note question relies on). - Default question set covering disambiguation, cross-entity joins, service-log lookup, aggregate (FTS-neutral), basement incidents, nonexistent entity, long-tail note, and brand filter. - Judge-score sentinel -1 when the judge didn't run (--skip-judge, no summary, parse failure, or judge error); 0-5 when it did. Judge parser tolerates real-world model output: markdown-decorated rubric lines, `:` vs `=` separators, mixed case, leading <think>/<thinking>/ <reasoning> blocks, and "Rationale" as an alias for "Reason". The judge_reason surfaces in Notes when the score is the sentinel. - Table report (default on TTYs, via lipgloss), markdown (default when piping or writing to a file), and JSON. JSON writes a redactedConfig that excludes APIKey so the key never leaks to stdout, --output, or CI artifacts. Judge-score aggregates exclude sentinel rows. - --strict exits 1 on per-question FTS-on rubric regression over questions completed on both arms (sql_error still counts as completed per production behavior; stage-1/stage-2 provider errors do not). - Empty ExpectedEntityIDs are skipped in entity-hit scoring so --db runs (which have a zero-valued SeededFixture) don't false-positive. CLI: `micasa eval fts` with --db, --provider, --model, --judge-model, --questions, --skip-judge, --no-ab, --format, --output, --strict. Default fixture is built in a tempdir that cleans up on exit; --db points at an existing store. Privacy warning printed on stderr when running against a non-fixture DB on a non-local provider. Nix: `nix run '.#fts-eval'` wraps the subcommand. Refactor: moves buildFTSContext and buildTableInfoFrom out of internal/app/chat.go into internal/llm as exported BuildFTSContextFromStore and BuildTableInfo so the eval harness reproduces exactly the prompt-building logic chat uses. Refs micasa-dev#707.

cpcloud · 2026-04-20T12:20:24Z

Closed in favor of five separately reviewable PRs:

feat(data): FTS5 entity search engine #960 — FTS5 entity search engine (foundation)
feat(data): entities_fts triggers with cascading refresh #961 — entities_fts triggers with cascading refresh (stacked on feat(data): FTS5 entity search engine #960)
feat(data): per-type quotas, rank threshold, NL-tolerant entity FTS #962 — per-type quotas, rank threshold, NL-tolerant entity FTS (stacked on feat(data): entities_fts triggers with cascading refresh #961)
feat(cli): add micasa eval fts subcommand #963 — micasa eval fts subcommand (stacked on feat(data): per-type quotas, rank threshold, NL-tolerant entity FTS #962)
docs(agents): forbid reinventing stdlib helpers and nil-passing #964 — AGENTS.md stdlib/nil-passing rules (independent)

The chat pipeline wiring from this PR is held back pending a stronger eval signal. #963 ships the eval infra so that evaluation can happen on main.

cpcloud added enhancement New feature or request data Data layer, models, database llm LLM and chat features labels Apr 14, 2026

cpcloud force-pushed the worktree-agent-a839b6cf branch 3 times, most recently from 212d280 to cdcc204 Compare April 19, 2026 11:47

cpcloud added 4 commits April 20, 2026 07:28

cpcloud force-pushed the worktree-agent-a839b6cf branch from cdcc204 to 9680f38 Compare April 20, 2026 11:34

cpcloud added 3 commits April 20, 2026 07:52

docs(agents): forbid reinventing stdlib helpers and nil-passing

daa025b

cpcloud force-pushed the worktree-agent-a839b6cf branch from 9680f38 to daa025b Compare April 20, 2026 11:52

cpcloud closed this Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(data): FTS-powered context enrichment for LLM chat#933

feat(data): FTS-powered context enrichment for LLM chat#933
cpcloud wants to merge 7 commits intomicasa-dev:mainfrom
cpcloud:worktree-agent-a839b6cf

cpcloud commented Apr 14, 2026 •

edited

Loading

Uh oh!

cpcloud commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cpcloud commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

cpcloud commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cpcloud commented Apr 14, 2026 •

edited

Loading