-
Notifications
You must be signed in to change notification settings - Fork 177
Add vector embedding support and hybrid search to ToolStore #3808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
c06effa to
5d93afe
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3808 +/- ##
==========================================
+ Coverage 66.84% 66.91% +0.07%
==========================================
Files 439 441 +2
Lines 43509 43660 +151
==========================================
+ Hits 29083 29216 +133
- Misses 12175 12182 +7
- Partials 2251 2262 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
631f1d6 to
85a4c84
Compare
jerm-dro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diff is quite large due to the conflicts with the base branch. The structure looks good, but I will take a closer look when the base branch is merged.
5d93afe to
0fa9e53
Compare
Large PR justification has been provided. Thank you!
|
✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review. |
0fa9e53 to
d17bbb0
Compare
d17bbb0 to
53657a3
Compare
53657a3 to
39ad0e2
Compare
Extend the SQLite-backed ToolStore with optional embedding-based semantic search. When an EmbeddingClient is provided, Search runs FTS5 and cosine similarity in parallel via errgroup, merges and deduplicates results by keeping the lower distance score, and caps output at 4 results. Key additions: - EmbeddingClient interface in internal/types (Embed, EmbedBatch, Dimension, Close) - FakeEmbeddingClient using SHA-256 seeded RNG with L2-normalized vectors - Pure Go cosine similarity/distance in internal/similarity package - Binary encode/decode helpers for embedding BLOB storage - normalizeBM25 rescaled to [0, 2) range to align with cosine distance - Comprehensive unit tests, concurrency tests, and benchmarks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
39ad0e2 to
09908e0
Compare
Closes: #3732
Extend the SQLite-backed ToolStore with optional embedding-based semantic search. When an EmbeddingClient is provided, Search runs FTS5 and cosine similarity in parallel via errgroup, merges and deduplicates results by keeping the lower distance score, and caps output at 4 results.
Key additions:
Large PR Justification