Skip to content

Conversation

@aponcedeleonch
Copy link
Member

@aponcedeleonch aponcedeleonch commented Feb 12, 2026

Closes: #3732

Extend the SQLite-backed ToolStore with optional embedding-based semantic search. When an EmbeddingClient is provided, Search runs FTS5 and cosine similarity in parallel via errgroup, merges and deduplicates results by keeping the lower distance score, and caps output at 4 results.

Key additions:

  • EmbeddingClient interface in internal/types (Embed, EmbedBatch, Dimension, Close)
  • FakeEmbeddingClient using SHA-256 seeded RNG with L2-normalized vectors
  • Pure Go cosine similarity/distance in internal/similarity package
  • Binary encode/decode helpers for embedding BLOB storage
  • normalizeBM25 rescaled to [0, 2) range to align with cosine distance
  • Comprehensive unit tests, concurrency tests, and benchmarks

Large PR Justification

  • Multiple related changes that would break if separated. This PR introduces semantic search and a wait to combine results from full text search and semantic

@github-actions github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Feb 12, 2026
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@aponcedeleonch aponcedeleonch marked this pull request as draft February 12, 2026 17:20
@aponcedeleonch aponcedeleonch force-pushed the issue-3732/semantic-search-store branch from c06effa to 5d93afe Compare February 12, 2026 17:21
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Feb 12, 2026
@codecov
Copy link

codecov bot commented Feb 12, 2026

Codecov Report

❌ Patch coverage is 86.30952% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.91%. Comparing base (f1772c6) to head (09908e0).

Files with missing lines Patch % Lines
...cp/optimizer/internal/sqlite_store/sqlite_store.go 84.42% 9 Missing and 10 partials ⚠️
pkg/vmcp/optimizer/fake_embedding.go 86.20% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3808      +/-   ##
==========================================
+ Coverage   66.84%   66.91%   +0.07%     
==========================================
  Files         439      441       +2     
  Lines       43509    43660     +151     
==========================================
+ Hits        29083    29216     +133     
- Misses      12175    12182       +7     
- Partials     2251     2262      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@aponcedeleonch aponcedeleonch force-pushed the issue-3731/sqlite-fts5-store branch from 631f1d6 to 85a4c84 Compare February 12, 2026 20:00
Copy link
Contributor

@jerm-dro jerm-dro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff is quite large due to the conflicts with the base branch. The structure looks good, but I will take a closer look when the base branch is merged.

Base automatically changed from issue-3731/sqlite-fts5-store to main February 13, 2026 06:33
@aponcedeleonch aponcedeleonch force-pushed the issue-3732/semantic-search-store branch from 5d93afe to 0fa9e53 Compare February 13, 2026 09:01
@aponcedeleonch aponcedeleonch marked this pull request as ready for review February 13, 2026 09:01
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Feb 13, 2026
@github-actions github-actions bot dismissed their stale review February 13, 2026 09:01

Large PR justification has been provided. Thank you!

@github-actions
Copy link
Contributor

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

@aponcedeleonch aponcedeleonch force-pushed the issue-3732/semantic-search-store branch from 0fa9e53 to d17bbb0 Compare February 13, 2026 09:21
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Feb 13, 2026
@aponcedeleonch aponcedeleonch force-pushed the issue-3732/semantic-search-store branch from d17bbb0 to 53657a3 Compare February 13, 2026 09:24
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Feb 13, 2026
@aponcedeleonch aponcedeleonch force-pushed the issue-3732/semantic-search-store branch from 53657a3 to 39ad0e2 Compare February 13, 2026 09:27
@github-actions github-actions bot removed the size/XL Extra large PR: 1000+ lines changed label Feb 13, 2026
@github-actions github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Feb 13, 2026
aponcedeleonch and others added 2 commits February 13, 2026 17:56
Extend the SQLite-backed ToolStore with optional embedding-based semantic
search. When an EmbeddingClient is provided, Search runs FTS5 and cosine
similarity in parallel via errgroup, merges and deduplicates results by
keeping the lower distance score, and caps output at 4 results.

Key additions:
- EmbeddingClient interface in internal/types (Embed, EmbedBatch, Dimension, Close)
- FakeEmbeddingClient using SHA-256 seeded RNG with L2-normalized vectors
- Pure Go cosine similarity/distance in internal/similarity package
- Binary encode/decode helpers for embedding BLOB storage
- normalizeBM25 rescaled to [0, 2) range to align with cosine distance
- Comprehensive unit tests, concurrency tests, and benchmarks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@aponcedeleonch aponcedeleonch force-pushed the issue-3732/semantic-search-store branch from 39ad0e2 to 09908e0 Compare February 13, 2026 15:59
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Feb 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add vector embedding support and hybrid similarity search to the optimizer ToolStore

3 participants