Gate prediction-error flags on recurrence and fit; fix contextual_mismatch by raphasouthall · Pull Request #47 · raphasouthall/neurostack

raphasouthall · 2026-06-03T14:48:51Z

Problem

Triaging all 46 live prediction-error flags on LXC 122 found ~0 genuinely actionable note defects. The subsystem was measuring query difficulty, not note health.

low_overlap fired on any single query whose search: add --json flag for machine-readable output #1 result had low cosine. 35 of 36 flags came from one ad-hoc query each: bare-identifier lookups ("azvmsqlp02") that the keyword path matched correctly, abstract/meta sweeps ("open decisions verification needed TODO"), and cross-domain least-bad hits ("spline 3D shader" → neurostack.md).
contextual_mismatch flagged correct retrievals — it fired whenever the search: add --json flag for machine-readable output #1 note was absent from the recall-limited in_context_notes boost set, including exact-title hits with strong cosine (m365-copilot-mcp at sim 0.63, reddit-engagement-daemon at 0.69). Root cause: the caller context label ("nyk-azure") isn't even a substring of the folder (nyk-europe-azure), so the set leans on a brittle tag/folder-cosine heuristic that excludes correctly-domiciled notes. Precision ≈ 0%.

These false flags weren't inert — the prediction-error demotion stage down-weighted the (correct) flagged notes in later retrieval.

Changes

contextual_mismatch now also requires the top note to be a weak fit (sim < CONTEXTUAL_MISMATCH_MAX_SIM = 0.45). A strong hit outside the boost set is not a mismatch.
Surfacing (vault_prediction_errors MCP + CLI) and the retrieval demotion now require ≥ PREDICTION_ERROR_MIN_OCCURRENCES (2) distinct events. Single flags still accumulate toward the threshold but neither surface nor demote.
New tests/test_prediction_errors.py exercises the detection branch end-to-end with a real in-memory sqlite DB (not MagicMock): fires below threshold, no flag above, FTS-only hits skip detection, only deduped[0] is checked, contextual_mismatch fires in-band and is suppressed for strong hits, and the occurrence gate surfaces only recurrent notes.

Impact

On the live DB this collapses 46 surfaced flags → 1 (third-parties.md, which genuinely surprised two distinct CSP/AOBO queries). Full suite: 575 passed, ruff clean.

…match Triaging the 46 live flags on LXC 122 showed ~0 actionable note defects. The subsystem was measuring query difficulty, not note health: - low_overlap fired on any single query whose top hit had low cosine — bare-identifier lookups ("azvmsqlp02") that keyword-matched correctly, abstract/meta sweeps, and cross-domain least-bad hits. 35 of 36 flags came from one ad-hoc query each. - contextual_mismatch flagged *correct* retrievals: it fired whenever the #1 note was absent from the recall-limited in_context_notes boost set, including exact-title hits with strong cosine (m365-copilot-mcp at sim 0.63, reddit-engagement-daemon at 0.69). Precision ~0%. These false flags also demoted the correct notes in later searches via the prediction-error demotion stage. Changes: - contextual_mismatch now requires the top note to also be a weak fit (sim < CONTEXTUAL_MISMATCH_MAX_SIM = 0.45). A strong hit outside the boost set is not a mismatch. - Surfacing (vault_prediction_errors, CLI) and the retrieval demotion now require >= PREDICTION_ERROR_MIN_OCCURRENCES (2) distinct events. Single flags still accumulate toward the threshold but neither surface nor demote. - New tests/test_prediction_errors.py exercises the detection branch end-to-end (real in-memory sqlite): low_overlap fires below threshold, no flag above, FTS-only hits skip detection, only deduped[0] is checked, contextual_mismatch fires in-band and is suppressed for strong hits, and the occurrence gate surfaces only recurrent notes. On the live DB this collapses 46 surfaced flags to 1 (third-parties.md, which genuinely surprised two distinct CSP/AOBO queries).

raphasouthall merged commit 942917d into main Jun 3, 2026
5 checks passed

raphasouthall deleted the fix/prediction-error-precision branch June 3, 2026 14:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gate prediction-error flags on recurrence and fit; fix contextual_mismatch#47

Gate prediction-error flags on recurrence and fit; fix contextual_mismatch#47
raphasouthall merged 1 commit into
mainfrom
fix/prediction-error-precision

raphasouthall commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raphasouthall commented Jun 3, 2026

Problem

Changes

Impact

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant