Conversation
Adds an optional listIds() method to the SearchProvider interface, implemented in the sqlite driver and wired through createRetriv. Returns all document IDs stored in the index, enabling consumers to diff incoming docs against what's already indexed and only process the delta.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR adds a new async listIds(): Promise<string[]> method to the SearchProvider interface, implements it in the SQLite provider (querying documents_meta), and exposes it at the top-level retriv API which delegates to the first driver that provides listIds and post-processes chunk IDs when a chunker is configured. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Retriv as Retriv API
participant Driver as SearchProvider (SQLite)
participant DB as Database (documents_meta)
Client->>Retriv: listIds()
Retriv->>Driver: listIds() on first supporting driver
Driver->>DB: SELECT id FROM documents_meta
DB-->>Driver: rows with id values
Driver-->>Retriv: string[] of IDs
Retriv-->>Client: Promise<string[]>
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
1 issue found across 3 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/retriv.ts">
<violation number="1" location="src/retriv.ts:229">
P2: When chunking is enabled, `listIds()` returns chunk IDs (e.g. `doc1#chunk-0`) rather than the original document IDs consumers pass to `index()`. Since `createRetriv` is the layer that creates chunk IDs in `prepareDocs`, it should also be responsible for mapping them back to parent IDs here — otherwise the stated use case of diffing incoming docs against the stored set won't work.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Add one-off context when rerunning by tagging
@cubic-dev-aiwith guidance or docs links (includingllms.txt) - Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/retriv.ts (1)
229-232: Prefer aggregating IDs from all supporting drivers in hybrid mode.Current logic returns IDs from only the first driver with
listIds, which can mask backend drift. Union + dedupe is safer.Suggested refactor
async listIds() { - const driver = drivers.find(d => d.listIds) - return driver?.listIds?.() ?? [] + const providers = drivers.filter(d => d.listIds) + if (!providers.length) + return [] + + const sets = await Promise.all(providers.map(d => d.listIds!())) + return Array.from(new Set(sets.flat())) },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/retriv.ts` around lines 229 - 232, The current listIds method only calls the first driver with listIds, which hides mismatches; update the retriv.ts async listIds() to call listIds on all drivers in the drivers array that implement it (e.g., filter drivers by d.listIds), await all results (Promise.all), flatten and union/dedupe the ID arrays, and return that aggregated list instead of the first result so hybrid mode returns the combined set from all supporting drivers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/db/sqlite.ts`:
- Around line 336-339: listIds() currently returns chunk IDs from documents_meta
(e.g., "<doc>#chunk-..."); update listIds in src/db/sqlite.ts to return
canonical/original document IDs by extracting the prefix before any "#chunk"
suffix and deduplicating the results: run the same SELECT id FROM
documents_meta, map each row to r.id.split('#chunk')[0] (or strip the "#chunk"
part if present), collect unique values (Set) and return the array of canonical
IDs so delta-sync uses original document IDs.
---
Nitpick comments:
In `@src/retriv.ts`:
- Around line 229-232: The current listIds method only calls the first driver
with listIds, which hides mismatches; update the retriv.ts async listIds() to
call listIds on all drivers in the drivers array that implement it (e.g., filter
drivers by d.listIds), await all results (Promise.all), flatten and union/dedupe
the ID arrays, and return that aggregated list instead of the first result so
hybrid mode returns the combined set from all supporting drivers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d3961bdd-9aab-4f1f-9b8d-39a67417dcb4
📒 Files selected for processing (3)
src/db/sqlite.tssrc/retriv.tssrc/types.ts
| async listIds() { | ||
| const rows = db.prepare('SELECT id FROM documents_meta').all() as Array<{ id: string }> | ||
| return rows.map(r => r.id) | ||
| }, |
There was a problem hiding this comment.
listIds() leaks internal chunk IDs instead of canonical document IDs.
At Line 337, SELECT id FROM documents_meta returns chunk IDs (<doc>#chunk-n) when chunking is enabled, which breaks delta-sync based on original document IDs.
Proposed fix
async listIds() {
- const rows = db.prepare('SELECT id FROM documents_meta').all() as Array<{ id: string }>
- return rows.map(r => r.id)
+ const rows = db.prepare(`
+ SELECT DISTINCT
+ COALESCE(json_extract(metadata, '$._parentId'), id) AS id
+ FROM documents_meta
+ ORDER BY id
+ `).all() as Array<{ id: string }>
+ return rows.map(r => r.id)
},🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/db/sqlite.ts` around lines 336 - 339, listIds() currently returns chunk
IDs from documents_meta (e.g., "<doc>#chunk-..."); update listIds in
src/db/sqlite.ts to return canonical/original document IDs by extracting the
prefix before any "#chunk" suffix and deduplicating the results: run the same
SELECT id FROM documents_meta, map each row to r.id.split('#chunk')[0] (or strip
the "#chunk" part if present), collect unique values (Set) and return the array
of canonical IDs so delta-sync uses original document IDs.
When chunking is enabled, the driver returns chunk IDs (e.g. doc1#chunk-0). Since createRetriv owns the chunking layer, it should map these back to canonical parent document IDs so consumers can diff against the original IDs they passed to index().
🔗 Linked issue
Related to skilld-dev/skilld#28
❓ Type of change
📚 Description
The
SearchProviderinterface had no way to list existing document IDs, which forced consumers into all-or-nothing rebuilds. This adds an optionallistIds()method to the interface, implements it in the sqlite driver (SELECT id FROM documents_meta), and wires it throughcreateRetriv. Consumers can now diff incoming docs against the stored set and only chunk/embed the delta.Summary by CodeRabbit