fix(search): recall non-first-token matches via full-text indexes#2140
Conversation
|
@momothemage is attempting to deploy a commit to the Amantus Machina Team on Vercel. A member of the Team first needs to authorize it. |
|
Codex review: needs real behavior proof before merge. Summary Reproducibility: yes. source-reproducible from current main: the direct indexed path only covers full-string and first-token prefixes, while the fallback is bounded by recent digest windows, so a stable older Real behavior proof Next step before merge Security Review detailsBest possible solution: Land one reviewed full-text digest search implementation that preserves soft-delete and suspicious filters, includes real behavior proof, and supersedes the duplicate parallel branches. Do we have a high-confidence way to reproduce the issue? Yes, source-reproducible from current main: the direct indexed path only covers full-string and first-token prefixes, while the fallback is bounded by recent digest windows, so a stable older Is this the best way to solve the issue? Yes, the updated Phase 1 approach is a narrow maintainable fix direction because it adds full-text digest indexes alongside the existing prefix paths and post-filters candidates through the existing all-token matcher. Merge should wait for real behavior proof and a canonical-branch choice among the duplicate PRs. What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 75e04d2c9145. |
|
/clawsweeper automerge |
e600dbb to
b597454
Compare
|
Merged via squash.
Thanks @momothemage! |
Summary
searchIndexes toskillSearchDigestinconvex/schema.ts:search_by_display_nameandsearch_by_slug, both withfilterFields: ["softDeletedAt", "isSuspicious"].withSearchIndex(...)queries to the existingPromise.allindirectPrefixSkillMatches(convex/search.ts) so non-first-token tokens are now recalled. The four legacy prefix-index queries are kept untouched and merged through the existingskillId-based dedup.convex/search.test.tsto model Convex's token-level inverted-index semantics, and added 4 regression tests (non-first-token slug, non-first-token displayName,nonSuspiciousOnlysafety, cross-path dedup).slug/displayNameand was not the first token. For example, searchingyijianorvisionagainstbaidu-yijian-visionproduced an empty list even though the skill was active and non-suspicious.directPrefixSkillMatchesare anchored to the start of either the full normalized field or its first alphanumeric token, so non-first-token queries miss every path simultaneously. The lexical fallback only scans the most-recently-updated/created N rows, which explains the "publish a new version, becomes findable, then disappears again" time-decay symptom. Vector search has weak recall for transliterated tokens (e.g. Chinese pinyin), so it cannot reliably backfill the gap.readyin the Convex dashboard, a follow-up PR will retire the redundant prefix indexes.Linked Issue
Screenshots
For website/UI changes, attach screenshots or recordings from the real app. Include mobile/narrow views when layout changes.
N/A— N/A (backend-only change to Convex query layer; no UI surface modified)Security / Trust Impact
Notes (kept for reviewer context, not a separate impact):
searchIndexes declaresoftDeletedAtandisSuspiciousasfilterFields, and every new query passes.eq("softDeletedAt", undefined)plus (whennonSuspiciousOnlyis set).eq("isSuspicious", false), matching the existing prefix-query behavior.isSkillSuspicious(skill)andisSkillHighlighted(skill)guards indirectPrefixSkillMatchesare unchanged, so suspicious or non-highlighted skills cannot leak through the new path.Data / Deploy Impact
No data/deploy impact
Data/deploy impact explained
Schema change: two new
searchIndexentries onskillSearchDigest. Convex will automatically backfill all existing documents in the background once the schema is deployed; no migration script is required.Index readiness can be observed in the Convex dashboard. During backfill, queries against the new indexes still execute and progressively return more matches as the backfill advances; the legacy prefix paths run in parallel, so user-visible recall never regresses below the pre-PR baseline.
No writes to existing fields, no row migrations, no data deletions.
Rollback is safe: removing the schema entries and the two
withSearchIndexcalls restores the previous behavior with no data fix-up needed.Verification
bun run format:check— verified viabunx oxfmt --checkonconvex/schema.ts,convex/search.ts,convex/search.test.ts: "All matched files use the correct format."bun run lint— verified viabunx oxlinton the same three files: 0 warnings, 0 errors across 119 rules.bun run test—bunx vitest run convex/: 96 files / 1176 tests passed;bunx vitest run convex/search.test.ts: 38 / 38 passed (4 newly added).