feat: add listIds() to SearchProvider by harlan-zw · Pull Request #11 · skilld-dev/retriv

harlan-zw · 2026-03-21T04:46:51Z

🔗 Linked issue

❓ Type of change

📚 Description

The SearchProvider interface had no way to list existing document IDs, which forced consumers into all-or-nothing rebuilds. This adds an optional listIds() method to the interface, implements it in the sqlite driver (SELECT id FROM documents_meta), and wires it through createRetriv. Consumers can now diff incoming docs against the stored set and only chunk/embed the delta.

Summary by CodeRabbit

New Features
- Added a method to list all indexed document IDs, allowing inspection and management of the full document set.
- When content was stored in chunks, the method collapses chunk-scoped IDs into their parent document IDs and returns unique parents.
- If no provider exposes IDs, the method safely returns an empty list.

Adds an optional listIds() method to the SearchProvider interface, implemented in the sqlite driver and wired through createRetriv. Returns all document IDs stored in the index, enabling consumers to diff incoming docs against what's already indexed and only process the delta.

coderabbitai · 2026-03-21T04:47:10Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cab32f08-1145-40e2-82c6-bb7fc6ed6a17

📥 Commits

Reviewing files that changed from the base of the PR and between 7835485 and 3b2ac23.

📒 Files selected for processing (1)

src/retriv.ts

🚧 Files skipped from review as they are similar to previous changes (1)

src/retriv.ts

📝 Walkthrough

Walkthrough

This PR adds a new async listIds(): Promise<string[]> method to the SearchProvider interface, implements it in the SQLite provider (querying documents_meta), and exposes it at the top-level retriv API which delegates to the first driver that provides listIds and post-processes chunk IDs when a chunker is configured.

Changes

Cohort / File(s)	Summary
Type Definition `src/types.ts`	Added optional `listIds?: () => Promise<string[]>` to `SearchProvider`.
SQLite Provider `src/db/sqlite.ts`	Implemented `listIds()` to run `SELECT id FROM documents_meta` and return `string[]` of ids.
Retriv API `src/retriv.ts`	Added top-level `listIds()` that calls the first driver exposing `listIds`, falls back to `[]`, and collapses `#chunk-` IDs to parent document IDs when a `chunker` is configured.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Retriv as Retriv API
    participant Driver as SearchProvider (SQLite)
    participant DB as Database (documents_meta)

    Client->>Retriv: listIds()
    Retriv->>Driver: listIds() on first supporting driver
    Driver->>DB: SELECT id FROM documents_meta
    DB-->>Driver: rows with id values
    Driver-->>Retriv: string[] of IDs
    Retriv-->>Client: Promise<string[]>

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I dug through rows both near and far,

Found every id like a shiny star,
Chunk-bits stitched back to parent ground,
One small hop and all are found,
🥕📜 Hop, retriv — list them round.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding a listIds() method to the SearchProvider interface. It is specific, directly related to the primary objective, and uses appropriate semantic versioning prefix (feat:).
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/list-ids

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cubic-dev-ai

1 issue found across 3 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/retriv.ts">

<violation number="1" location="src/retriv.ts:229">
P2: When chunking is enabled, `listIds()` returns chunk IDs (e.g. `doc1#chunk-0`) rather than the original document IDs consumers pass to `index()`. Since `createRetriv` is the layer that creates chunk IDs in `prepareDocs`, it should also be responsible for mapping them back to parent IDs here — otherwise the stated use case of diffing incoming docs against the stored set won't work.</violation>
</file>

Since this is your first cubic review, here's how it works:

cubic automatically reviews your code and comments on bugs and improvements
Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
Add one-off context when rerunning by tagging @cubic-dev-ai with guidance or docs links (including llms.txt)
Ask questions if you need clarification on any suggestion

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

src/retriv.ts

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/retriv.ts (1)

229-232: Prefer aggregating IDs from all supporting drivers in hybrid mode.

Current logic returns IDs from only the first driver with listIds, which can mask backend drift. Union + dedupe is safer.

Suggested refactor

 async listIds() {
-  const driver = drivers.find(d => d.listIds)
-  return driver?.listIds?.() ?? []
+  const providers = drivers.filter(d => d.listIds)
+  if (!providers.length)
+    return []
+
+  const sets = await Promise.all(providers.map(d => d.listIds!()))
+  return Array.from(new Set(sets.flat()))
 },

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/retriv.ts` around lines 229 - 232, The current listIds method only calls
the first driver with listIds, which hides mismatches; update the retriv.ts
async listIds() to call listIds on all drivers in the drivers array that
implement it (e.g., filter drivers by d.listIds), await all results
(Promise.all), flatten and union/dedupe the ID arrays, and return that
aggregated list instead of the first result so hybrid mode returns the combined
set from all supporting drivers.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/db/sqlite.ts`:
- Around line 336-339: listIds() currently returns chunk IDs from documents_meta
(e.g., "<doc>#chunk-..."); update listIds in src/db/sqlite.ts to return
canonical/original document IDs by extracting the prefix before any "#chunk"
suffix and deduplicating the results: run the same SELECT id FROM
documents_meta, map each row to r.id.split('#chunk')[0] (or strip the "#chunk"
part if present), collect unique values (Set) and return the array of canonical
IDs so delta-sync uses original document IDs.

---

Nitpick comments:
In `@src/retriv.ts`:
- Around line 229-232: The current listIds method only calls the first driver
with listIds, which hides mismatches; update the retriv.ts async listIds() to
call listIds on all drivers in the drivers array that implement it (e.g., filter
drivers by d.listIds), await all results (Promise.all), flatten and union/dedupe
the ID arrays, and return that aggregated list instead of the first result so
hybrid mode returns the combined set from all supporting drivers.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d3961bdd-9aab-4f1f-9b8d-39a67417dcb4

📥 Commits

Reviewing files that changed from the base of the PR and between e8837c2 and 7835485.

📒 Files selected for processing (3)

src/db/sqlite.ts
src/retriv.ts
src/types.ts

coderabbitai · 2026-03-21T04:50:14Z

src/db/sqlite.ts

+    async listIds() {
+      const rows = db.prepare('SELECT id FROM documents_meta').all() as Array<{ id: string }>
+      return rows.map(r => r.id)
+    },


⚠️ Potential issue | 🟠 Major

listIds() leaks internal chunk IDs instead of canonical document IDs.

At Line 337, SELECT id FROM documents_meta returns chunk IDs (<doc>#chunk-n) when chunking is enabled, which breaks delta-sync based on original document IDs.

Proposed fix

async listIds() { - const rows = db.prepare('SELECT id FROM documents_meta').all() as Array<{ id: string }> - return rows.map(r => r.id) + const rows = db.prepare(` + SELECT DISTINCT + COALESCE(json_extract(metadata, '$._parentId'), id) AS id + FROM documents_meta + ORDER BY id + `).all() as Array<{ id: string }> + return rows.map(r => r.id) },

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/db/sqlite.ts` around lines 336 - 339, listIds() currently returns chunk IDs from documents_meta (e.g., "<doc>#chunk-..."); update listIds in src/db/sqlite.ts to return canonical/original document IDs by extracting the prefix before any "#chunk" suffix and deduplicating the results: run the same SELECT id FROM documents_meta, map each row to r.id.split('#chunk')[0] (or strip the "#chunk" part if present), collect unique values (Set) and return the array of canonical IDs so delta-sync uses original document IDs.

When chunking is enabled, the driver returns chunk IDs (e.g. doc1#chunk-0). Since createRetriv owns the chunking layer, it should map these back to canonical parent document IDs so consumers can diff against the original IDs they passed to index().

cubic-dev-ai bot reviewed Mar 21, 2026

View reviewed changes

src/retriv.ts Show resolved Hide resolved

coderabbitai bot reviewed Mar 21, 2026

View reviewed changes

harlan-zw merged commit cc70d68 into main Mar 21, 2026
3 checks passed

coderabbitai bot mentioned this pull request Mar 30, 2026

fix: expand chunk IDs in remove() when chunking is enabled #13

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add listIds() to SearchProvider#11

feat: add listIds() to SearchProvider#11
harlan-zw merged 2 commits intomainfrom
feat/list-ids

harlan-zw commented Mar 21, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 21, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

harlan-zw commented Mar 21, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Linked issue

❓ Type of change

📚 Description

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

harlan-zw commented Mar 21, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 21, 2026 •

edited

Loading