fix(search): recall non-first-token matches via full-text indexes by momothemage · Pull Request #2140 · openclaw/clawhub

momothemage · 2026-05-11T06:50:58Z

Summary

What changed:
- Added two Convex full-text searchIndexes to skillSearchDigest in convex/schema.ts: search_by_display_name and search_by_slug, both with filterFields: ["softDeletedAt", "isSuspicious"].
- Appended two withSearchIndex(...) queries to the existing Promise.all in directPrefixSkillMatches (convex/search.ts) so non-first-token tokens are now recalled. The four legacy prefix-index queries are kept untouched and merged through the existing skillId-based dedup.
- Extended the test mock in convex/search.test.ts to model Convex's token-level inverted-index semantics, and added 4 regression tests (non-first-token slug, non-first-token displayName, nonSuspiciousOnly safety, cross-path dedup).
Why:
- Skill search systematically returned zero results when the user's query was not a string prefix of the skill's slug / displayName and was not the first token. For example, searching yijian or vision against baidu-yijian-vision produced an empty list even though the skill was active and non-suspicious.
- The four prefix indexes used by directPrefixSkillMatches are anchored to the start of either the full normalized field or its first alphanumeric token, so non-first-token queries miss every path simultaneously. The lexical fallback only scans the most-recently-updated/created N rows, which explains the "publish a new version, becomes findable, then disappears again" time-decay symptom. Vector search has weak recall for transliterated tokens (e.g. Chinese pinyin), so it cannot reliably backfill the gap.
- This is Phase 1 of a two-phase zero-downtime rollout: the legacy prefix paths stay live during the index backfill, so worst case the result set is no smaller than today; once the new search indexes show ready in the Convex dashboard, a follow-up PR will retire the redundant prefix indexes.

Linked Issue

Screenshots

For website/UI changes, attach screenshots or recordings from the real app. Include mobile/narrow views when layout changes.

Screenshots/recordings attached, or N/A — N/A (backend-only change to Convex query layer; no UI surface modified)

Security / Trust Impact

No security/trust impact
Security/trust impact explained

Notes (kept for reviewer context, not a separate impact):

The new searchIndexes declare softDeletedAt and isSuspicious as filterFields, and every new query passes .eq("softDeletedAt", undefined) plus (when nonSuspiciousOnly is set) .eq("isSuspicious", false), matching the existing prefix-query behavior.
The post-hydration isSkillSuspicious(skill) and isSkillHighlighted(skill) guards in directPrefixSkillMatches are unchanged, so suspicious or non-highlighted skills cannot leak through the new path.

Data / Deploy Impact

No data/deploy impact
Data/deploy impact explained
Schema change: two new searchIndex entries on skillSearchDigest. Convex will automatically backfill all existing documents in the background once the schema is deployed; no migration script is required.
Index readiness can be observed in the Convex dashboard. During backfill, queries against the new indexes still execute and progressively return more matches as the backfill advances; the legacy prefix paths run in parallel, so user-visible recall never regresses below the pre-PR baseline.
No writes to existing fields, no row migrations, no data deletions.
Rollback is safe: removing the schema entries and the two withSearchIndex calls restores the previous behavior with no data fix-up needed.

Verification

bun run format:check — verified via bunx oxfmt --check on convex/schema.ts, convex/search.ts, convex/search.test.ts: "All matched files use the correct format."
bun run lint — verified via bunx oxlint on the same three files: 0 warnings, 0 errors across 119 rules.
bun run test — bunx vitest run convex/: 96 files / 1176 tests passed; bunx vitest run convex/search.test.ts: 38 / 38 passed (4 newly added).

vercel · 2026-05-11T06:51:04Z

@momothemage is attempting to deploy a commit to the Amantus Machina Team on Vercel.

A member of the Team first needs to authorize it.

clawsweeper · 2026-05-11T07:46:54Z

Codex review: needs real behavior proof before merge.

Summary
The PR adds skillSearchDigest full-text search indexes for displayName and slug, queries them from directPrefixSkillMatches, and adds regression tests for middle-token recall and filtering.

Reproducibility: yes. source-reproducible from current main: the direct indexed path only covers full-string and first-token prefixes, while the fallback is bounded by recent digest windows, so a stable older baidu-yijian-vision match can miss yijian or vision. I did not run a live Convex deployment search in this read-only review.

Real behavior proof
Needs real behavior proof before merge: Needs real behavior proof before merge: the PR body contains tests and CI-style output, but no redacted live Convex/search output, logs, terminal screenshot, linked artifact, or recording showing the after-fix behavior. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, ask a maintainer to comment @clawsweeper re-review.

Next step before merge
This active external PR needs contributor real behavior proof and maintainer selection among duplicate fix branches, not an automated cleanup close or repair lane.

Security
Cleared: No concrete supply-chain, secret-handling, or code-execution concern was found; the new search paths keep softDeletedAt and isSuspicious filters plus existing post-hydration guards.

Review details

Best possible solution:

Land one reviewed full-text digest search implementation that preserves soft-delete and suspicious filters, includes real behavior proof, and supersedes the duplicate parallel branches.

Do we have a high-confidence way to reproduce the issue?

Yes, source-reproducible from current main: the direct indexed path only covers full-string and first-token prefixes, while the fallback is bounded by recent digest windows, so a stable older baidu-yijian-vision match can miss yijian or vision. I did not run a live Convex deployment search in this read-only review.

Is this the best way to solve the issue?

Yes, the updated Phase 1 approach is a narrow maintainable fix direction because it adds full-text digest indexes alongside the existing prefix paths and post-filters candidates through the existing all-token matcher. Merge should wait for real behavior proof and a canonical-branch choice among the duplicate PRs.

What I checked:

current-main direct search path: Current main's directPrefixSkillMatches queries only normalized full-string and first-token prefix indexes and dedupes those four result sets; it has no withSearchIndex path for middle-token recall. (convex/search.ts:348, 75e04d2c9145)
current-main schema: Current main's skillSearchDigest table defines ordinary indexes through by_nonsuspicious_installs and does not define search_by_display_name or search_by_slug. (convex/schema.ts:720, 75e04d2c9145)
current-main fallback window: The fallback search path still reads bounded updated/created digest windows before token filtering, so it is not a full-corpus deterministic lookup for older middle-token matches. (convex/search.ts:591, 75e04d2c9145)
latest PR diff: The latest PR head adds two schema searchIndex definitions, appends two withSearchIndex queries to directPrefixSkillMatches, and adds an all-query-token post-filter plus regression tests after the earlier overrecall review. (convex/search.ts:345, e97ea4665b9c)
proof and checks: The PR body reports format, lint, and Vitest runs, and GitHub CI checks are mostly successful, but the body and comments still do not include live Convex/search output, logs, a recording, or another after-fix real behavior artifact. (e97ea4665b9c)
parallel fix branches: Three open PRs currently target the same search-underrecall issue, including fix(search): add full-text search indexes to fix non-first-token skill underrecall #2139 and fix: add full-text skill digest search #2142, so maintainers should choose one canonical branch rather than merge duplicates.

Likely related people:

vincentkoc: Recent commits added normalized skill prefix recall, first-token recall, and bounded lexical fallback behavior in the central search path this PR changes. (role: recent search implementation owner; confidence: high; commits: 881514f444e4, c9fe6db34df8, 21abd07672e0; files: convex/search.ts, convex/schema.ts, convex/lib/skillSearchDigest.ts)
sethconvex: PR history shows sethconvex authored the merged skill search digest feature that introduced the table and search hydration structure this PR extends. (role: digest feature-history owner; confidence: high; commits: 9861da02d5fb, 872045681ba8, 5433c6620076; files: convex/schema.ts, convex/search.ts, convex/lib/skillSearchDigest.ts)
momothemage: Before this PR, momothemage merged adjacent unified-search suspicious-result disclosure work, relevant to the new path's nonSuspiciousOnly behavior. (role: recent adjacent contributor; confidence: medium; commits: df7e0f77a6bb, b116ef7aa175, 1477ca7120ea; files: src/lib/useUnifiedSearch.ts, src/routes/search.tsx, src/__tests__/search-route.test.tsx)

Remaining risk / open question:

No after-fix live Convex/search proof is attached yet; current evidence is tests and CI output only.
There are parallel open PRs for the same linked issue, so duplicate implementations need maintainer selection before merge.
The linked issue carries a security label, so the moderation-filter behavior should receive explicit maintainer attention even though this diff preserves the relevant filters.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 75e04d2c9145.

momothemage · 2026-05-11T09:42:06Z

/clawsweeper automerge

…dexes

…l path

momothemage · 2026-05-11T10:36:45Z

Merged via squash.

Prepared head SHA: b597454bc242dae60bc004915cd0aaf360a3960e
Merge commit: 190ce3769f47335f68264cf2bc734bdaf7cfc756

Thanks @momothemage!

momothemage requested review from a team and Patrick-Erichsen as code owners May 11, 2026 06:51

momothemage added 6 commits May 11, 2026 17:56

fix(search): recall non-first-token matches via full-text indexes

63a6fcd

fix(search): recall skills by non-first-token via Convex full-text in…

d3bd15c

…dexes

fix(search): recall skills by non-first-token via Convex full-text in…

220768a

…dexes

fix(search): require all query tokens to match on direct prefix recal…

0770425

…l path

docs(changelog): add search recall note

b0ac64d

docs(changelog): place search recall note in active release

b597454

momothemage force-pushed the feature/fix_search branch from e600dbb to b597454 Compare May 11, 2026 09:56

momothemage merged commit 190ce37 into openclaw:main May 11, 2026
16 of 17 checks passed

momothemage deleted the feature/fix_search branch May 11, 2026 10:36

This was referenced May 12, 2026

fix(search): add full-text search indexes to fix non-first-token skill underrecall #2139

Closed

Skill search indexing issue and ownership transfer request: realestate-deep-research #1575

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(search): recall non-first-token matches via full-text indexes#2140

fix(search): recall non-first-token matches via full-text indexes#2140
momothemage merged 6 commits into
openclaw:mainfrom
momothemage:feature/fix_search

momothemage commented May 11, 2026 •

edited

Loading

Uh oh!

vercel Bot commented May 11, 2026

Uh oh!

clawsweeper Bot commented May 11, 2026 •

edited

Loading

Uh oh!

momothemage commented May 11, 2026

Uh oh!

Uh oh!

momothemage commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

momothemage commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Linked Issue

Screenshots

Security / Trust Impact

Data / Deploy Impact

Verification

Uh oh!

vercel Bot commented May 11, 2026

Uh oh!

clawsweeper Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

momothemage commented May 11, 2026

Uh oh!

Uh oh!

momothemage commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

momothemage commented May 11, 2026 •

edited

Loading

clawsweeper Bot commented May 11, 2026 •

edited

Loading