libSQL: add brute-force vector search fallback for missing index#88
Conversation
|
Note Reviews pausedUse the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughState that libSQL workspace search now attempts indexed Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant HybridSearch as Hybrid Search
participant VectorIndex as Vector Index Query
participant BruteForce as Brute‑Force Cosine
participant FTS as FTS5 Search
participant RRF as Reciprocal Rank Fusion
Client->>HybridSearch: hybrid_search(query, embedding)
par Vector Search Path
HybridSearch->>VectorIndex: vector_top_k('idx_memory_chunks_embedding', vector(?), k)
alt Index Available
VectorIndex-->>HybridSearch: indexed results
else Index Unavailable
VectorIndex-->>HybridSearch: query fails (index missing)
HybridSearch->>BruteForce: load embeddings, compute cosine distances
BruteForce-->>HybridSearch: brute‑force results
end
and Full‑Text Search Path
HybridSearch->>FTS: fts_search(query)
FTS-->>HybridSearch: FTS results
end
HybridSearch->>RRF: combine vector + FTS results
RRF-->>Client: ranked merged results
Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 7 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (7 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📋 Issue PlannerBuilt with CodeRabbit's Coding Plans for faster development and fewer bugs. View plan used: ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Reviewer's GuideImplements a robust libSQL hybrid search fallback that uses brute-force cosine similarity when the vector index is unavailable, and simultaneously hardens the hosted remote-tool contract by refactoring worker/orchestrator transport types, adding comprehensive schema-fidelity and routing tests, and adjusting webhook/server helpers and docs to reflect the new behavior and completed roadmap item 1.1.4. Sequence diagram for libSQL hybrid search with vector index fallback to brute-forcesequenceDiagram
actor Caller
participant LibSqlBackend
participant LibsqlConnection
participant VectorIndex
participant RustBruteForceEngine
participant RRFEngine
Caller->>LibSqlBackend: hybrid_search(HybridSearchParams)
LibSqlBackend->>LibSqlBackend: read SearchConfig
alt config.use_vector and embedding present
LibSqlBackend->>LibsqlConnection: vector_ranked_results(user_id, agent_id, embedding, pre_limit)
alt vector index available
LibsqlConnection->>VectorIndex: vector_top_k(idx_memory_chunks_embedding, embedding, pre_limit)
VectorIndex-->>LibsqlConnection: rows
LibsqlConnection-->>LibSqlBackend: VectorSearchOutcome::Indexed(results)
LibSqlBackend->>LibSqlBackend: vector_results = results
else index unavailable or query fails
LibsqlConnection-->>LibSqlBackend: VectorSearchOutcome::IndexUnavailable
LibSqlBackend->>RustBruteForceEngine: brute_force_vector_search(user_id, agent_id, embedding, pre_limit)
RustBruteForceEngine-->>LibSqlBackend: Vec_RankedResult_
LibSqlBackend->>LibSqlBackend: vector_results = brute_force_results
end
else no vector search
LibSqlBackend->>LibSqlBackend: vector_results = empty
end
LibSqlBackend->>LibsqlConnection: fts_ranked_results(user_id, agent_id, query, pre_limit)
LibsqlConnection-->>LibSqlBackend: fts_results
LibSqlBackend->>RRFEngine: fuse(vector_results, fts_results)
RRFEngine-->>LibSqlBackend: hybrid_results
LibSqlBackend-->>Caller: hybrid_results
Class diagram for libSQL hybrid search fallback and vector search outcomeclassDiagram
class LibSqlBackend {
+hybrid_search(params: HybridSearchParams) Result~Vec_HybridSearchResult_~
+brute_force_vector_search(user_id: &str, agent_id: Option_&str_, embedding: &[f32], limit: usize) Result~Vec_RankedResult_~
}
class VectorSearchOutcome {
<<enum>>
Indexed(results: Vec_RankedResult_)
IndexUnavailable
}
class RankedResult {
+chunk_id: i64
+document_id: i64
+similarity: f32
+rank: u32
}
class Candidate {
+chunk_id: i64
+document_id: i64
+similarity: f32
}
class LibsqlConnection {
+query(sql: &str, params: &[Value]) Result~Rows~
}
class WorkspaceError {
+reason: String
}
class HybridSearchParams {
+user_id: String
+agent_id: Option_String_
+query: String
+embedding: Option_Vec_f32_
+config: SearchConfigRef
}
class SearchConfigRef {
+use_vector: bool
+limit: u32
}
LibSqlBackend --> LibsqlConnection : uses
LibSqlBackend --> RankedResult : returns
LibSqlBackend --> VectorSearchOutcome : handles
Candidate --> RankedResult : converted_to
%% free functions represented as utility classes
class VectorSearchFunctions {
+vector_ranked_results(conn: &LibsqlConnection, user_id: &str, agent_id: Option_&str_, embedding: &[f32], limit: i64) Result~VectorSearchOutcome~
+rank_candidates(candidates: Vec_Candidate_, limit: usize) Vec_RankedResult_
}
VectorSearchFunctions --> LibsqlConnection : queries
VectorSearchFunctions --> VectorSearchOutcome : returns
VectorSearchFunctions --> WorkspaceError : error_type
VectorSearchFunctions --> Candidate : ranks
VectorSearchFunctions --> RankedResult : produces
LibSqlBackend ..> VectorSearchFunctions : calls
Flow diagram for libSQL semantic search path after V9 migrationflowchart TD
A[Start hybrid_search
use_vector true
embedding present] --> B[Call vector_ranked_results]
B --> C{vector_top_k available
and rows stream OK?}
C -->|Yes| D[VectorSearchOutcome::Indexed
with ranked results]
D --> E[Use indexed vector results
as vector_results]
C -->|No
index missing or query error| F[VectorSearchOutcome::IndexUnavailable]
F --> G[Log brute-force fallback]
G --> H[Run brute_force_vector_search
cosine similarity in Rust]
H --> I[Use brute-force results
as vector_results]
E --> J[Fetch FTS results via FTS5]
I --> J
J --> K[Apply RRF fusion
over vector_results and fts_results]
K --> L[Return fused hybrid results]
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
368458a to
7e4ec1b
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 36a8cf61a1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/channels/webhook_server.rs (1)
64-71:⚠️ Potential issue | 🟠 MajorLog the actual bound address, not the configured address.
The code binds to
self.config.addrbut logs that same value rather than queryinglistener.local_addr(). If the configured port is0(ephemeral), the log message will misleadingly show port 0 instead of the actual bound port. Additionally,current_addr()will return the wrong value.The AI summary confirms prior behaviour was to overwrite
self.config.addrwith the resolved address—this removal is a regression for ephemeral port usage.🐛 Proposed fix to capture the actual bound address
async fn bind_and_spawn(&mut self, app: Router) -> Result<(), ChannelError> { let listener = tokio::net::TcpListener::bind(self.config.addr) .await .map_err(|e| ChannelError::StartupFailed { name: "webhook_server".to_string(), reason: format!("Failed to bind to {}: {}", self.config.addr, e), })?; - tracing::info!("Webhook server listening on {}", self.config.addr); + let addr = listener.local_addr().map_err(|e| ChannelError::StartupFailed { + name: "webhook_server".to_string(), + reason: format!("Failed to get local address: {}", e), + })?; + self.config.addr = addr; + tracing::info!("Webhook server listening on {}", addr); let (shutdown_tx, shutdown_rx) = oneshot::channel();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/channels/webhook_server.rs` around lines 64 - 71, The log currently prints self.config.addr (and current_addr()) which is wrong for ephemeral ports; after binding with tokio::net::TcpListener::bind and obtaining listener, call listener.local_addr() (or use listener.local_addr().ok()) to get the actual bound SocketAddr, update self.config.addr (or the struct field used by current_addr()) with that resolved address, and change the tracing::info message to log the resolved address instead of the original configured value so ephemeral ports are reported correctly; locate the logic around listener, self.config.addr, tracing::info, and current_addr() to apply this change.docs/database-integrations.md (1)
370-371:⚠️ Potential issue | 🟡 MinorRemove or update the stale caveat at line 370–371.
This section states "libSQL migration and schema comments still describe a brute-force vector fallback that the current code does not implement." However, this PR implements exactly that fallback. Either remove this bullet point or update it to reflect that the fallback is now implemented.
Proposed fix
3. libSQL migration and schema comments still describe a brute-force vector - fallback that the current code does not implement. + fallback; this is now implemented and documented above.Alternatively, remove the bullet entirely if it no longer serves as a caveat.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/database-integrations.md` around lines 370 - 371, Update the docs to remove or revise the stale caveat that claims "libSQL migration and schema comments still describe a brute-force vector fallback that the current code does not implement": either delete that bullet point entirely from the document or reword it to state that the brute-force vector fallback has been implemented in this PR (mentioning the implemented fallback behavior), ensuring the documentation matches the current implementation and removes the misleading warning.src/db/libsql/workspace.rs (1)
234-304:⚠️ Potential issue | 🟠 MajorDistinguish missing-index errors from other database faults in
vector_ranked_results.Line 271–298: The function returns
VectorSearchOutcome::IndexUnavailablefor any indexed-search failure—query errors, row-fetch errors, and missing-index errors alike. This conflation masks genuine database faults (malformed SQL, connection drops, corrupted data) as "index unavailable", misaligns the fallback log messages, and diverges from PostgreSQL, which propagates vector search errors via the?operator. Inspect libSQL error messages or types to distinguish a missing index from other failures, propagate non-index errors, and returnIndexUnavailableonly when the vector index is genuinely absent.Additionally, line 816–821 (
hybrid_search): brute-force errors are silently swallowed by.unwrap_or_else(), returning empty results with only a warn-level log. Propagate brute-force search failures instead of masking them.The test at line 902 uses
SearchConfig::default().vector_only()which disables FTS entirely, so it never exercises reciprocal rank fusion with both FTS and vector results. Add a test case that enables both FTS and vector search to verify hybrid fusion succeeds when the vector index is unavailable and brute-force fallback is applied.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/db/libsql/workspace.rs` around lines 234 - 304, vector_ranked_results currently treats any query or row-fetch error as VectorSearchOutcome::IndexUnavailable; change it to inspect the libSQL error (from conn.query and rows.next) and only return IndexUnavailable when the error clearly indicates the missing vector index/function (e.g., error message contains "vector_top_k", "no such function", or the specific index name 'idx_memory_chunks_embedding'); for all other errors propagate them as a WorkspaceError (i.e., return Err(...)/use ?). Apply the same discrimination to the rows.next() Err branch. In hybrid_search replace the .unwrap_or_else() that swallows brute-force failures with proper error propagation so brute-force search errors bubble up. Finally, add a test (instead of SearchConfig::default().vector_only()) that enables both FTS and vector search to verify reciprocal rank fusion behavior when the vector index is absent and brute-force fallback runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md`:
- Around line 858-861: The markdown entry incorrectly attributes the three new
tests to remote_tools.rs; update the inventory line to reference
catalogue_fidelity.rs instead (replace the filename token `remote_tools.rs` with
`catalogue_fidelity.rs`) while keeping the three test names
(`remote_tool_catalog_preserves_full_tool_definition_payload`,
`remote_tool_catalog_version_is_deterministic_and_sensitive_to_content`,
`orchestrator_responses_deserialize_into_worker_shared_types`) intact so the
ExecPlan accurately points to the changed test file.
- Around line 907-934: Update the "Validation evidence" paragraph so it no
longer claims "all validation gates have been run and passed successfully" while
also noting the documented Markdown lint failures in docs/roadmap.md; edit the
Validation evidence section (the paragraph and the Markdown linting bullet) to
either mark the markdown gate as "partial/blocked" or to explicitly narrow the
success claim to the specific checks that passed (format check, git whitespace,
and full test suite) and state that Markdown linting reported pre-existing
issues in docs/roadmap.md which prevent the docs gate from being fully green.
In `@src/channels/webhook_server.rs`:
- Around line 255-258: The current assertion on old_result is hard to read due
to nested .ok().and_then(...). Replace it with a matches!-based assertion that
directly matches either an Err variant or an Ok variant whose inner result
yields no value (i.e., Err(_) OR Ok(inner) where inner.ok().is_none()), keeping
the original assertion message; update the assert! call that references
old_result to use matches! for clarity.
- Around line 173-176: The current test fixture performs a TOCTOU by binding a
StdTcpListener to "127.0.0.1:0", dropping it, and then starting the server which
can lead to the port being taken; change the approach so the server is the one
that binds to port 0 and then you read back the actual bound port after start()
(or make bind_and_spawn return the actual bound SocketAddr when it creates the
listener), then use server.current_addr() to get the real port for assertions;
update both the code block that creates port with StdTcpListener and the similar
block at lines 222–225 to stop extracting the port from a dropped listener and
instead obtain the bound address from the running server (or have bind_and_spawn
store/return the listener’s local_addr()).
In `@src/db/libsql/workspace.rs`:
- Around line 822-829: In the VectorSearchOutcome::IndexUnavailable branch,
don't swallow errors from self.brute_force_vector_search — replace the current
unwrap_or_else(...) that returns Vec::new() with propagating the error (e.g.,
call .await, map_err to log the warning with the error, then use ? to return the
error) so brute_force_vector_search failures bubble up instead of silently
returning an empty Vec; this touches the VectorSearchOutcome::IndexUnavailable
match arm and the self.brute_force_vector_search(...) call.
- Around line 931-945: The test currently disables FTS by using
SearchConfig::default().vector_only(), so it never hits the hybrid/RRF path in
LibSqlBackend::hybrid_search; change the config to enable both vector and FTS
(e.g., use SearchConfig::default().with_limit(5) instead of .vector_only()), run
hybrid_search, and update the assertions to verify the fused result exposes both
ranks (check results[0].vector_rank and results[0].fts_rank are
present/expected) so the reciprocal-rank-fusion branch is covered.
In `@src/orchestrator/api/tests/catalogue_fidelity.rs`:
- Around line 65-93: The test
remote_tool_catalog_version_is_deterministic_and_sensitive_to_content currently
changes tool identity when comparing versions; instead ensure the tool name
remains identical while only the payload differs: keep registry_a and registry_b
registering the same tool fixture (ToolFixture::CatalogAlpha) and change
registry_c to register a variant that preserves the same tool name but mutates
its description/parameters (e.g., create or use a fixture like
CatalogAlphaWithDifferentPayload or modify build_tool_fixture to accept an
override for description/parameters) before calling hosted_remote_tool_catalog
for version_c; assert version_a == version_b and version_a != version_c so the
catalog version is sensitive to payload changes but not tool name changes.
In `@src/test_support.rs`:
- Line 12: Replace the US spelling "serialization" with en-GB "serialisation" in
the comment that reads "JSON Schema features to validate that tool definitions
survive serialization" so it becomes "JSON Schema features to validate that tool
definitions survive serialisation" to conform to the en-GB-oxendict spelling
guideline; update that comment text wherever the exact phrase appears in the
file (e.g., above the JSON Schema features description).
In `@src/workspace/README.md`:
- Line 87: The markdown bullet "libSQL: FTS5 plus vector search; uses
`vector_top_k(...)` when a compatible fixed-dimension index exists, otherwise
brute-force cosine similarity in Rust" exceeds the 80-column wrapping rule; edit
the README bullet (the line referencing libSQL, FTS5, and `vector_top_k(...)`)
to insert line breaks so the paragraph is wrapped to 80 columns (split the
sentence into multiple wrapped lines at natural breaks, e.g., after "FTS5 plus
vector search;" and before "otherwise brute-force...") while preserving the same
text and inline code.
---
Outside diff comments:
In `@docs/database-integrations.md`:
- Around line 370-371: Update the docs to remove or revise the stale caveat that
claims "libSQL migration and schema comments still describe a brute-force vector
fallback that the current code does not implement": either delete that bullet
point entirely from the document or reword it to state that the brute-force
vector fallback has been implemented in this PR (mentioning the implemented
fallback behavior), ensuring the documentation matches the current
implementation and removes the misleading warning.
In `@src/channels/webhook_server.rs`:
- Around line 64-71: The log currently prints self.config.addr (and
current_addr()) which is wrong for ephemeral ports; after binding with
tokio::net::TcpListener::bind and obtaining listener, call listener.local_addr()
(or use listener.local_addr().ok()) to get the actual bound SocketAddr, update
self.config.addr (or the struct field used by current_addr()) with that resolved
address, and change the tracing::info message to log the resolved address
instead of the original configured value so ephemeral ports are reported
correctly; locate the logic around listener, self.config.addr, tracing::info,
and current_addr() to apply this change.
In `@src/db/libsql/workspace.rs`:
- Around line 234-304: vector_ranked_results currently treats any query or
row-fetch error as VectorSearchOutcome::IndexUnavailable; change it to inspect
the libSQL error (from conn.query and rows.next) and only return
IndexUnavailable when the error clearly indicates the missing vector
index/function (e.g., error message contains "vector_top_k", "no such function",
or the specific index name 'idx_memory_chunks_embedding'); for all other errors
propagate them as a WorkspaceError (i.e., return Err(...)/use ?). Apply the same
discrimination to the rows.next() Err branch. In hybrid_search replace the
.unwrap_or_else() that swallows brute-force failures with proper error
propagation so brute-force search errors bubble up. Finally, add a test (instead
of SearchConfig::default().vector_only()) that enables both FTS and vector
search to verify reciprocal rank fusion behavior when the vector index is absent
and brute-force fallback runs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 63144539-4d98-48f0-a98b-d8ad356f84d1
📒 Files selected for processing (33)
FEATURE_PARITY.mddocs/configuration-guide.mddocs/contents.mddocs/database-integrations.mddocs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.mddocs/rfcs/0001-expose-mcp-tool-definitions.mddocs/roadmap.mdsrc/channels/webhook_server.rssrc/db/CLAUDE.mdsrc/db/libsql/workspace.rssrc/lib.rssrc/orchestrator/api/tests.rssrc/orchestrator/api/tests/catalogue_fidelity.rssrc/orchestrator/api/tests/fixtures/remote_tool_mocks.rssrc/orchestrator/api/tests/remote_tools.rssrc/orchestrator/api/tests/transport_parity.rssrc/test_support.rssrc/tools/builtin/worker_remote_tool_proxy.rssrc/tools/builtin/worker_remote_tool_proxy/tests.rssrc/tools/tool/traits.rssrc/worker/api/tests.rssrc/worker/api/tests/finish_reason.rssrc/worker/api/tests/fixtures.rssrc/worker/api/tests/mod.rssrc/worker/api/tests/remote_tool_catalog.rssrc/worker/api/tests/remote_tool_execute.rssrc/worker/api/tests/transport_types.rssrc/worker/api/tests/url_construction.rssrc/worker/api/types.rssrc/worker/container/tests/hosted_fidelity.rssrc/worker/container/tests/mod.rssrc/worker/container/tests/remote_tools.rssrc/workspace/README.md
💤 Files with no reviewable changes (1)
- src/worker/api/tests.rs
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/channels/webhook_server.rs (2)
98-105:⚠️ Potential issue | 🟠 MajorAdd direct behavioural coverage for the
start()ephemeral-bind path.Exercise the exact change at Line 100 through
start()with127.0.0.1:0, then assertcurrent_addr().port() != 0and/healthresponds on that resolved address. Keep this separate fromstart_with_listenertests so the production bind path is validated.Patch sketch
+ #[rstest] + #[tokio::test] + async fn test_start_resolves_ephemeral_addr() -> Result<(), Box<dyn std::error::Error + Send + Sync>> { + let mut server = WebhookServer::new(WebhookServerConfig { + addr: "127.0.0.1:0".parse()?, + }); + server.add_routes(Router::new().route( + "/health", + axum::routing::get(|| async { Json(json!({"status": "ok"})) }), + )); + + server.start().await?; + let addr = server.current_addr(); + assert_ne!(addr.port(), 0, "Bound port must be resolved from :0"); + + let response = reqwest::Client::new() + .get(format!("http://{}/health", addr)) + .send() + .await?; + assert_eq!(response.status(), 200); + + server.shutdown().await; + Ok(()) + }As per coding guidelines, "All new functionality or changes must be guarded by unit tests" and "All new functionality or changes must be guarded by behavioural tests where applicable".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/channels/webhook_server.rs` around lines 98 - 105, Add a behavioural test that calls WebhookServer::start() with config.addr set to "127.0.0.1:0" (ephemeral bind), then assert that server.current_addr().port() != 0 and that an HTTP GET to "{current_addr()}/health" returns a healthy response; keep this test separate from existing start_with_listener tests and ensure it actually runs the production bind path exercised by the code that resolves listener.local_addr() in start().
277-313:⚠️ Potential issue | 🟡 MinorAlign test naming and assertion text with the exercised API.
Rename the test and message to
restart_with_listenersemantics. Keep diagnostics precise.Patch sketch
- async fn test_restart_with_addr_rebinds_listener( + async fn test_restart_with_listener_rebinds_listener( @@ - "Address should change after restart_with_addr" + "Address should change after restart_with_listener"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/channels/webhook_server.rs` around lines 277 - 313, Rename the test function test_restart_with_addr_rebinds_listener to reflect the exercised API (e.g., test_restart_with_listener) and update the assertion messages to reference restart_with_listener semantics: change the initial address assertion message to mention "before restart_with_listener", update the health-check assertion message to "Server should respond to health check before restart_with_listener", and change the final address-change assertion message from "Address should change after restart_with_addr" to "Address should change after restart_with_listener" so diagnostics match the actual method under test (server.restart_with_listener).
♻️ Duplicate comments (2)
src/db/libsql/workspace.rs (2)
823-829:⚠️ Potential issue | 🟠 MajorPropagate brute-force fallback failures.
Line 826 swallows fallback errors and returns an empty vector, which reintroduces silent semantic degradation and hides operational faults.
🔧 Bubble the fallback error up
VectorSearchOutcome::IndexUnavailable => { tracing::info!("Using brute-force vector search (no vector index)"); self.brute_force_vector_search(user_id, agent_id, emb, pre_limit as usize) .await - .unwrap_or_else(|e| { - tracing::warn!("Brute-force vector search failed: {e}"); - Vec::new() - }) + .map_err(|e| { + tracing::warn!("Brute-force vector search failed: {e}"); + e + })? }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/db/libsql/workspace.rs` around lines 823 - 829, The fallback branch currently swallows errors from self.brute_force_vector_search and returns an empty Vec, masking failures; instead propagate the error to the caller by removing the unwrap_or_else that returns Vec::new and use proper error propagation (e.g., await and ? or map_err) so that brute_force_vector_search failures bubble up from the current function (the call site using self.brute_force_vector_search(user_id, agent_id, emb, pre_limit as usize)). Ensure the surrounding function's signature returns a compatible Result so the propagated error is handled by callers.
940-948: 🛠️ Refactor suggestion | 🟠 MajorExercise the hybrid/RRF branch in this fallback test.
Line 940 uses
vector_only(), so the test never validates fused ranking behaviour after fallback. Enable both FTS and vector in this test and assert both rank channels on the fused result.As per coding guidelines: "All new functionality or changes must be guarded by behavioural tests where applicable."🧪 Extend coverage to fusion semantics
embedding: Some(&[1.0, 0.0, 0.0]), - config: &SearchConfig::default().vector_only().with_limit(5), + config: &SearchConfig::default().with_limit(5), }) .await .expect("failed to execute hybrid search"); @@ assert_eq!(results.len(), 1); assert_eq!(results[0].document_path, "notes/search.md"); assert_eq!(results[0].vector_rank, Some(1)); - assert!(results[0].fts_rank.is_none()); + assert_eq!(results[0].fts_rank, Some(1));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/db/libsql/workspace.rs` around lines 940 - 948, The test currently forces vector-only by calling SearchConfig::default().vector_only(), so it never exercises the hybrid/RRF fusion path; remove or replace vector_only() so the SearchConfig enables both FTS and vector (e.g., use SearchConfig::default().with_limit(5) or an explicit hybrid config), then update the assertions on results (the results variable) to check that the fused result contains both rank channels (assert results[0].vector_rank.is_some() and assert results[0].fts_rank.is_some(), and optionally check their expected values) so the fused ranking behaviour after fallback is validated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/channels/webhook_server.rs`:
- Around line 176-179: Update the Rustdoc for the current_addr method on the
webhook server to explicitly state its contract: before the first successful
start/restart it returns the configured address (self.config.addr) and not a
live bound address; after a successful start/restart it will reflect the actual
bound address if different. Edit the doc comment above pub fn
current_addr(&self) -> SocketAddr to remove ambiguity and make this behavior
explicit, referencing self.config.addr and the start/restart lifecycle so
readers know when the value is only the configured address versus a bound
address.
In `@src/db/libsql/workspace.rs`:
- Around line 299-304: The current catch-all converts any vector query error
into VectorSearchOutcome::IndexUnavailable (seen at the tracing::debug(...)
block), which masks real DB failures; change the error handling to inspect and
match the query error's concrete kind/variant (e.g., index-not-present /
capability-missing variants returned by the underlying query call) and only map
those specific cases to VectorSearchOutcome::IndexUnavailable, while propagating
all other errors (return Err(e) or use ? with context) so genuine DB faults
surface instead of being downgraded; update the tracing::debug call to run only
for the index-unavailable branch.
---
Outside diff comments:
In `@src/channels/webhook_server.rs`:
- Around line 98-105: Add a behavioural test that calls WebhookServer::start()
with config.addr set to "127.0.0.1:0" (ephemeral bind), then assert that
server.current_addr().port() != 0 and that an HTTP GET to
"{current_addr()}/health" returns a healthy response; keep this test separate
from existing start_with_listener tests and ensure it actually runs the
production bind path exercised by the code that resolves listener.local_addr()
in start().
- Around line 277-313: Rename the test function
test_restart_with_addr_rebinds_listener to reflect the exercised API (e.g.,
test_restart_with_listener) and update the assertion messages to reference
restart_with_listener semantics: change the initial address assertion message to
mention "before restart_with_listener", update the health-check assertion
message to "Server should respond to health check before restart_with_listener",
and change the final address-change assertion message from "Address should
change after restart_with_addr" to "Address should change after
restart_with_listener" so diagnostics match the actual method under test
(server.restart_with_listener).
---
Duplicate comments:
In `@src/db/libsql/workspace.rs`:
- Around line 823-829: The fallback branch currently swallows errors from
self.brute_force_vector_search and returns an empty Vec, masking failures;
instead propagate the error to the caller by removing the unwrap_or_else that
returns Vec::new and use proper error propagation (e.g., await and ? or map_err)
so that brute_force_vector_search failures bubble up from the current function
(the call site using self.brute_force_vector_search(user_id, agent_id, emb,
pre_limit as usize)). Ensure the surrounding function's signature returns a
compatible Result so the propagated error is handled by callers.
- Around line 940-948: The test currently forces vector-only by calling
SearchConfig::default().vector_only(), so it never exercises the hybrid/RRF
fusion path; remove or replace vector_only() so the SearchConfig enables both
FTS and vector (e.g., use SearchConfig::default().with_limit(5) or an explicit
hybrid config), then update the assertions on results (the results variable) to
check that the fused result contains both rank channels (assert
results[0].vector_rank.is_some() and assert results[0].fts_rank.is_some(), and
optionally check their expected values) so the fused ranking behaviour after
fallback is validated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 931f61b6-ce53-45d7-8bef-74546c405b79
📒 Files selected for processing (2)
src/channels/webhook_server.rssrc/db/libsql/workspace.rs
535a2a0 to
2083989
Compare
Add colocated module tests for the libSQL workspace helpers and\nexpand the developer guide with the internal patterns that now\nshape this backend.\n\nThis closes the coverage gap called out in review without growing the\ncentral integration test file further, and it documents the shared\ndatabase handle, workspace module layout, and parameter-object pattern\nthat recent refactors introduced.
Add the missing central workspace-store tests for invalid chunk\nUUID handling, negative chunk indices, and document-not-found\nlookups.\n\nThis keeps the review-requested coverage in workspace/tests.rs\nwithout changing production code, and documents that the existing\nvector-index fallback test also covers the IndexUnavailable path.
Reduce large assertion blocks in the central libSQL workspace tests\nby extracting shared assertion helpers for embedding comparisons\nand single-result hybrid-search checks. This keeps the test\nbehaviour unchanged while addressing the CodeScene finding.
Reduce duplication in the central libSQL workspace tests by\nextracting shared backend setup and DocumentNotFound\nassertion helpers. This keeps the existing test coverage and\nbehaviour intact while addressing the repeated test boilerplate.
Restore the libsql backend module shape that this branch expects after\nrebasing onto origin/main, and add the missing terminal persistence\nforwarder required by the updated NativeDatabase trait.\n\nAlso expose the webhook listener test helpers under the test-helpers\nfeature so integration tests can keep using the pre-bound listener path\nafter the rebase.
4ae5d07 to
ee23d0d
Compare
|
@coderabbitai Please investigate the following test failure in the Windows branch of the CI matrix, identify a fix and provide an AI coding agent prompt for the fix: |
This comment was marked as resolved.
This comment was marked as resolved.
Make start_with_listener and restart_with_listener available\nwithout the test-helpers cfg gate so integration-test support code\ncan call them reliably when ironclaw is compiled as a library\ndependency, including on the Windows CI matrix.
|
@coderabbitai Please suggest a fix for the following concern and provide an AI coding agent prompt for the fix: ❌ Failed checks (2 warnings)
|
This comment was marked as resolved.
This comment was marked as resolved.
Expand the developers guide with concrete examples showing how the\nArc<libsql::Database> to Arc<LibSqlDatabase> change propagates\nthrough store constructors and shared_db() consumers.
Expand the central libsql workspace test module with round-trip\nchunk operations and document-store success-path coverage, using\nsetup_backend for all async cases.
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. Ensure that this is validated against the current version of the codegraph. If further refinement to address this finding would be deleterious, please supply a clear explanatory one to two paragraph markdown message I can paste into the CodeScene web ui's diagnostic suppression function so this diagnostic can be silenced. src/db/libsql/workspace/tests.rs Comment on lines +339 to +379 async fn insert_chunk_and_delete_chunks_round_trip() {
let backend = setup_backend().await;
let document = backend
.get_or_create_document_by_path("default", None, "notes/chunks.md")
.await
.expect("failed to create document");
let chunk_id = backend
.insert_chunk(InsertChunkParams {
document_id: document.id,
chunk_index: 0,
content: "round-trip chunk",
embedding: None,
})
.await
.expect("failed to insert chunk");
let before = backend
.get_chunks_without_embeddings("default", None, 10)
.await
.expect("failed to list chunks before delete");
assert!(
before.iter().any(|c| c.id == chunk_id),
"inserted chunk must appear in get_chunks_without_embeddings"
);
backend
.delete_chunks(document.id)
.await
.expect("failed to delete chunks");
let after = backend
.get_chunks_without_embeddings("default", None, 10)
.await
.expect("failed to list chunks after delete");
assert!(
after.iter().all(|c| c.id != chunk_id),
"deleted chunk must not appear after delete_chunks"
);
}❌ New issue: Code Duplication |
This comment was marked as resolved.
This comment was marked as resolved.
Reduce duplicated document setup in the central libsql workspace\ntests by routing the repeated default-scope creation path through a\nprivate create_test_document helper.
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. Ensure that this is validated against the current version of the codegraph. If further refinement to address this finding would be deleterious, please supply a clear explanatory one to two paragraph markdown message I can paste into the CodeScene web ui's diagnostic suppression function so this diagnostic can be silenced. Code Duplicationsrc/db/libsql/workspace/tests.rs: What lead to degradation?The module contains 4 functions with similar structure: insert_chunk_and_delete_chunks_round_trip,list_all_paths_returns_inserted_document_path,list_documents_returns_inserted_document,update_chunk_embedding_is_reflected_in_chunks_list Why does this problem occur?Duplicated code often leads to code that's harder to change since the same logical change has to be done in multiple functions. More duplication gives lower code health. How to fix it?A certain degree of duplicated code might be acceptable. The problems start when it is the same behavior that is duplicated across the functions in the module, ie. a violation of the Don't Repeat Yourself (DRY) principle. DRY violations lead to code that is changed together in predictable patterns, which is both expensive and risky. DRY violations can be identified using CodeScene's X-Ray analysis to detect clusters of change coupled functions with high code similarity. Read More |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Add a unit test for embedding_to_vector_json and expose the helper to sibling workspace tests via pub(super). The requested cargo test target names a non-existent package in this workspace, so I also validated the focused test against the real ironclaw crate.
There was a problem hiding this comment.
Gates Passed
6 Quality Gates Passed
See analysis details in CodeScene
Absence of Expected Change Pattern
- axinite/src/db/mod.rs is usually changed with: axinite/src/db/postgres.rs
Quality Gate Profile: Pay Down Tech Debt
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 3102
No further work is required for either check. |
Summary
Implements a robust fallback path for libSQL semantic search when a fixed-dimension vector index is unavailable after the V9 migration. If the indexed vectorTopK query cannot run, we fall back to a Rust-based brute-force cosine similarity over stored embeddings and feed results through the existing RRF fusion path. This aligns libSQL behavior with the current off-dallback parity while preserving semantic retrieval.
Changes
Core functionality
libSQL backend changes
Tests
Documentation
Test plan
Why this is needed
◳ Generated by DevBoxer ◰
ℹ️ Tag @devboxerhub to ask questions and address PR feedback
📎 Task: https://www.devboxer.com/task/cf1e74de-dca1-45eb-95be-c5ba909697a5
📝 Closes #5
Summary by Sourcery
Add a brute-force cosine similarity fallback for libSQL vector search when the fixed-dimension index is unavailable, strengthen remote tool proxy and worker-orchestrator transport tests for schema and routing fidelity, and update documentation and roadmap to reflect the new behavior and completed test plan.
New Features:
Bug Fixes:
Enhancements:
Documentation:
Tests: