fix: Overhaul search relevance — prioritize exact matches, eliminate false positives#266
Closed
shellcorpnet wants to merge 1 commit into
Closed
fix: Overhaul search relevance — prioritize exact matches, eliminate false positives#266shellcorpnet wants to merge 1 commit into
shellcorpnet wants to merge 1 commit into
Conversation
- Require ALL query tokens to match (was: only ONE), preventing false positives - Add scoreTokenMatch for granular lexical scoring (exact > prefix > partial) - Double+ lexical boost weights for slug/name matches - Add summary text matching to scoring pipeline - Update tests for new matching behavior Fixes openclaw#15
Contributor
|
@shellcorpnet is attempting to deploy a commit to the Amantus Machina Team on Vercel. A member of the Team first needs to authorize it. |
Contributor
|
Closing this one to keep fix scoped.\n\nNeeded behavior:\n- short exact-name queries only (1-2 tokens): require all query tokens\n- longer queries: keep semantic/vector recall; no strict lexical gate\n- skip summary-weight + broader scoring rebalance in this PR\n\nPlease re-open with a narrow patch for #15. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Searching for a skill by its exact name returns it far down the results list. For example, searching "Remind Me" shows the actual "Remind Me" skill at position #71 — the #1 result doesn't even mention "Remind Me" anywhere in its name or description.
Reported in #15.
Root Cause
Three compounding issues in the search pipeline:
matchesExactTokensonly required ONE query token to match — so"Remind Me"matched any skill containing the word "me" (or any word starting with "me"). Since "me" is extremely common, nearly every skill passed the "exact match" filter, defeating its purpose entirely.Lexical boosts were too weak relative to vector similarity scores — even when a skill's name exactly matched the query, the boost was only +1.1 to +1.4. This was easily overwhelmed by high vector cosine similarity scores from semantically-adjacent but differently-named skills.
Summary/description text wasn't factored into lexical scoring — only
displayNameandslugwere checked for lexical boosts, missing cases where the query terms appear prominently in the skill's summary.Changes
convex/lib/searchText.ts— Core matching logicmatchesExactTokens— now requires ALL tokens (was: any one)Before:
"Remind Me"→ matches anything with "remind" OR "me"After:
"Remind Me"→ only matches skills containing BOTH "remind" AND "me"This single change eliminates the vast majority of false positives.
New:
scoreTokenMatch— granular lexical scoringA new scoring function that returns a numeric relevance score instead of a boolean:
This enables ranked results within the set of matching skills.
convex/search.ts— Search scoring weightsLexical boost weights increased ~2x
These higher weights ensure that a skill literally named "Remind Me" will rank above a skill that's merely semantically similar (e.g., a "Notifications" skill with high vector similarity but no lexical match).
New: Summary text matching
Added
SUMMARY_MATCH_WEIGHT = 0.3— the skill's summary is now scored usingscoreTokenMatchand contributes to the final ranking. This helps surface skills where the query terms appear in the description even if the name/slug don't match exactly.convex/lib/searchText.test.ts— Updated testsmatchesExactTokenstests to verify ALL-token matching behaviorscoreTokenMatchtests verifying exact > prefix scoring and threshold behaviorconvex/search.test.ts— All existing tests pass unchangedThe
scoreSkillResultfunction's newsummaryparameter is optional, so all 10 existing search tests continue to pass without modification.Impact
Before (searching "Remind Me")
After
Test Results
All 393 tests pass, including 16 search-specific tests.
Fixes #15
Greptile Overview
Greptile Summary
This PR tightens lexical matching and rebalances scoring so exact-name queries rank as expected.
matchesExactTokensto require all query tokens to prefix-match across a skill’sdisplayName/slug/summary, reducing false positives for common tokens.scoreTokenMatchfor graded lexical scoring and uses it insearchSkillsas an additional (lightweight) summary-based boost.convex/lib/searchText.test.tsto reflect the stricter token matching and validatescoreTokenMatchbehavior.No functional regressions or runtime errors were found in the changed code paths; existing search tests remain compatible with the optional
summaryparameter inscoreSkillResult.Confidence Score: 5/5
Last reviewed commit: 025f665
(2/5) Greptile learns from your feedback when you react with thumbs up/down!