Skip to content

feat(data): FTS-powered context enrichment for LLM chat#933

Closed
cpcloud wants to merge 7 commits intomicasa-dev:mainfrom
cpcloud:worktree-agent-a839b6cf
Closed

feat(data): FTS-powered context enrichment for LLM chat#933
cpcloud wants to merge 7 commits intomicasa-dev:mainfrom
cpcloud:worktree-agent-a839b6cf

Conversation

@cpcloud
Copy link
Copy Markdown
Collaborator

@cpcloud cpcloud commented Apr 14, 2026

Summary

  • FTS5 entities_fts table indexing 7 entity types (projects, vendors, appliances, incidents, quotes, maintenance items, service logs), wired into the chat pipeline so the LLM gets entity context in both SQL and summary prompts.
  • Entity search hardened: per-type quotas guarantee cross-type representation (three-tier window-function query), NL-tolerant query (lowercase, stopword filtering, OR-join of content words as quoted prefix phrases) so questions like "what's the status of the kitchen project?" match, rank threshold plumbing, stable entity_id tiebreaker.
  • entities_fts stays current via SQLite triggers: own-row INSERT/UPDATE/DELETE per entity plus cascading refresh of children — quote rows refresh when their parent project or vendor changes; service_log rows refresh when their parent maintenance_item changes.
  • BuildFTSContext escapes delimiters to protect against prompt injection. EntitySummary returns tri-state (found/stale/missing) so the caller can revalidate before using cached results.
  • micasa eval fts subcommand for chat-quality evaluation: seeded fixture, 8 default questions, FTS-on/off A/B arms, deterministic regex rubric plus optional LLM judge (tolerant parser for real-world model output — markdown decoration, :/= separators, <think> blocks), table/markdown/JSON reports (table default on TTYs), --strict exit code on per-question rubric regression, Nix app wrapper.
  • AGENTS.md rules against reinventing stdlib helpers and passing conceptual-zero values (nil/empty/0) to third-party functions without checking godoc.

Closes #707

@cpcloud cpcloud added enhancement New feature or request data Data layer, models, database llm LLM and chat features labels Apr 14, 2026
@cpcloud cpcloud force-pushed the worktree-agent-a839b6cf branch 3 times, most recently from 212d280 to cdcc204 Compare April 19, 2026 11:47
cpcloud added 4 commits April 20, 2026 07:28
Adds plans/707-fts-context-enrichment.md (spec) and
plans/707-fts-context-enrichment-plan.md (implementation plan) for the
first FTS feature: injecting entity-search context into the chat
pipeline's SQL-generation, summary, and fallback prompt builders.

Refs micasa-dev#707.
Wires FTS entity search into all three chat prompt builders
(BuildSQLPrompt, BuildSummaryPrompt, BuildSystemPrompt fallback). Each
chat turn runs SearchEntities against an entities_fts virtual table
indexing every text-bearing entity type, fetches a live one-line
summary for each hit via EntitySummary (which revalidates against the
source table), fences the results with a clearly-labeled BEGIN/END
block, and escapes the fence delimiter + code-fence tokens in
user-controlled entity text to prevent prompt-injection breakout.

Stage 1 gets the context for better SQL (entity IDs for disambiguation,
better WHERE filters, fewer fragile LIKEs). Stage 2 gets it as
disambiguation-only background ("use solely for disambiguation, not as
a source of additional facts") so summaries are still grounded in the
SQL results. Fallback gets it to improve the no-SQL answer path.

Error handling: any failure in SearchEntities or EntitySummary
short-circuits to empty context, matching pre-FTS behavior exactly --
enrichment degrades gracefully.

Regression test covers the query-error wrapping path (corrupted FTS
schema is forwarded as a wrapped "search entities:" error).

Refs micasa-dev#707.
Adds plans/707-fts-eval-and-hardening.md: follow-up plan for the FTS
context-enrichment work. Three pieces:

A. A `micasa eval fts` subcommand that runs a benchmark question set
   through the live chat pipeline against a fixture DB (or the user's
   own DB), grades each answer with a deterministic rubric plus an
   optional LLM judge, and reports FTS-on vs FTS-off deltas.
B. SearchEntities becomes a single window-function query with
   per-entity-type quotas and a BM25 rank threshold.
C. setupEntitiesFTS installs AFTER INSERT/UPDATE/DELETE triggers on
   every source table. UPDATE triggers on parents whose text is
   embedded in a child's FTS row cascade a refresh to those children.

Covers acceptance criteria, privacy warnings for non-fixture runs,
partial-failure taxonomy, and JudgeScore sentinel semantics
(-1 = "not run"; 0 = genuine all-criteria-failed grade).

Refs micasa-dev#707.
setupEntitiesFTS now installs AFTER INSERT / UPDATE / DELETE triggers
on every source table that contributes rows to entities_fts (projects,
vendors, appliances, maintenance_items, incidents,
service_log_entries, quotes). Parent tables whose text is embedded in
a child's entity_name (project.title and vendor.name in quote,
maintenance_item.name in SLE) get companion _au_cascade triggers that
rebuild the child's FTS row when the parent is updated.

Cascade JOINs filter on parent.deleted_at IS NULL so a parent
soft-delete degrades the child's entity_name (project title
disappears from the quote; vendor name disappears; SLE name blanks
out) instead of leaving stale text in the index. The populate path
carries the same filter so initial rebuilds on app open match the
trigger invariant.

Trigger installation is idempotent (DROP IF EXISTS + CREATE), so
schema drift across app versions heals on the next Store.Open. FK
constraints (RESTRICT on quote parents, CASCADE on SLE parents)
continue to govern hard-delete feasibility; parent _ad triggers are
plain single-table cleanups, no cascade blocks needed.

Tests cover: insert, rename, soft-delete, parent-rename cascade for
all three relationships, parent-soft-delete cascade via raw DML (the
app gates soft-delete with live children, so the cascade path is
exercised by sync in production; raw DML matches that scenario in
tests), FK cascade on maintenance_item hard-delete, and initial
rebuild preserving the soft-delete filter for both SLE and quote
joins.

Refs micasa-dev#707.
@cpcloud cpcloud force-pushed the worktree-agent-a839b6cf branch from cdcc204 to 9680f38 Compare April 20, 2026 11:34
cpcloud added 3 commits April 20, 2026 07:52
Replaces the flat LIMIT 20 in SearchEntities with a three-tier
window-function query and adds natural-language query tolerance.

Ranking:
- Tier 1 takes exactly one row per matching entity type (guarantees
  cross-type representation).
- Tier 2 raises each type up to ftsEntityKPerType rows so single noisy
  types can't dominate.
- Tier 3 fills any remaining room up to ftsEntityTotalCap from whatever's
  left, globally ranked. Single-type searches use the full cap this way.

Package-level tuning constants (not user-configurable -- the eval harness
is the tuning channel):

    ftsEntityKPerType    = 5
    ftsEntityRankCeiling = 0.0   // permissive; eval will tighten
    ftsEntityTotalCap    = 20

entity_id tiebreaks rank in every ORDER BY so results are stable when
BM25 produces identical ranks on similarly-shaped rows.

Query tolerance:
- prepareFTSEntityQuery lowercases, strips non-alphanum, drops short and
  stopword tokens, and OR-joins the survivors as quoted prefix phrases.
- Returns early when no content words survive so a pure-stopword question
  like "what is it?" doesn't hammer FTS with an empty MATCH.

Tests cover per-type quota preservation under a flood of first-class
matches, single-type searches using the full cap, every matching type
surfacing when 5+ types share a token, total cap enforcement, rank
threshold plumbing, stable ordering across runs, the query builder
directly, and the end-to-end regression that "what's the status of the
kitchen project?" now surfaces the Kitchen Remodel project.

Refs micasa-dev#707.
Wires the chat-quality eval described in the plan:

- internal/ftseval/ package with typed Config, Question, ArmResult,
  RunResult; Run() drives each question through both FTS arms against
  a pre-built store, grades with a deterministic regex rubric plus an
  optional LLM judge, and returns the per-question results.
- Fixture seed (SeedFixture) populating projects, vendors, appliances,
  maintenance items, incidents, one service log, and one quote that
  ties kitchen to Pacific Plumbing (with the "permit delays" vendor
  note the long-tail-note question relies on).
- Default question set covering disambiguation, cross-entity joins,
  service-log lookup, aggregate (FTS-neutral), basement incidents,
  nonexistent entity, long-tail note, and brand filter.
- Judge-score sentinel -1 when the judge didn't run (--skip-judge, no
  summary, parse failure, or judge error); 0-5 when it did. Judge
  parser tolerates real-world model output: markdown-decorated rubric
  lines, `:` vs `=` separators, mixed case, leading <think>/<thinking>/
  <reasoning> blocks, and "Rationale" as an alias for "Reason". The
  judge_reason surfaces in Notes when the score is the sentinel.
- Table report (default on TTYs, via lipgloss), markdown (default when
  piping or writing to a file), and JSON. JSON writes a redactedConfig
  that excludes APIKey so the key never leaks to stdout, --output, or
  CI artifacts. Judge-score aggregates exclude sentinel rows.
- --strict exits 1 on per-question FTS-on rubric regression over
  questions completed on both arms (sql_error still counts as
  completed per production behavior; stage-1/stage-2 provider errors
  do not).
- Empty ExpectedEntityIDs are skipped in entity-hit scoring so --db
  runs (which have a zero-valued SeededFixture) don't false-positive.

CLI: `micasa eval fts` with --db, --provider, --model, --judge-model,
--questions, --skip-judge, --no-ab, --format, --output, --strict.
Default fixture is built in a tempdir that cleans up on exit; --db
points at an existing store.

Privacy warning printed on stderr when running against a non-fixture
DB on a non-local provider.

Nix: `nix run '.#fts-eval'` wraps the subcommand.

Refactor: moves buildFTSContext and buildTableInfoFrom out of
internal/app/chat.go into internal/llm as exported
BuildFTSContextFromStore and BuildTableInfo so the eval harness
reproduces exactly the prompt-building logic chat uses.

Refs micasa-dev#707.
@cpcloud cpcloud force-pushed the worktree-agent-a839b6cf branch from 9680f38 to daa025b Compare April 20, 2026 11:52
@cpcloud
Copy link
Copy Markdown
Collaborator Author

cpcloud commented Apr 20, 2026

Closed in favor of five separately reviewable PRs:

The chat pipeline wiring from this PR is held back pending a stronger eval signal. #963 ships the eval infra so that evaluation can happen on main.

@cpcloud cpcloud closed this Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Data layer, models, database enhancement New feature or request llm LLM and chat features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(data): FTS-powered context enrichment for LLM chat

1 participant