Skip to content

fix: address adversarial review findings for data query#1035

Merged
stack72 merged 1 commit intomainfrom
fix/data-query-review
Apr 1, 2026
Merged

fix: address adversarial review findings for data query#1035
stack72 merged 1 commit intomainfrom
fix/data-query-review

Conversation

@adamhjk
Copy link
Copy Markdown
Contributor

@adamhjk adamhjk commented Apr 1, 2026

Summary

Fixes three medium-severity findings from the adversarial review on #1029:

  • Paged iteration: CatalogStore.iterate() now fetches rows in batches of 1000 via LIMIT/OFFSET instead of stmt.all() which loaded all rows into memory
  • Serve shutdown cleanup: Close CatalogStore after server.finished so the SQLite connection isn't leaked for long-running processes
  • Debug logging: Log skipped rows at debug level in DataQueryService instead of silently swallowing evaluation errors, visible with --verbose

Also incorporates UX review suggestions from the same PR: CLI examples, improved --select description, and JSON envelope for projected output.

Test Plan

  • All 3940 existing tests pass
  • deno fmt, deno lint, deno check all clean
  • Manual: swamp data query 'size > 0' --verbose shows debug output for type-mismatch rows

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

Three medium-severity issues from the adversarial review:

1. CatalogStore.iterate() now pages with LIMIT/OFFSET (1000 rows per
   batch) instead of stmt.all() which loaded all rows into memory.

2. serve command closes CatalogStore on graceful shutdown so the SQLite
   connection is not leaked for long-running processes.

3. DataQueryService logs skipped rows at debug level instead of silently
   swallowing evaluation errors, so users can diagnose with --verbose.

Also incorporates UX review suggestions: CLI examples, improved --select
description, and JSON envelope for projected output.
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI UX Review

Blocking

None.

Suggestions

  1. JSON shape inconsistency between projected and non-projected paths (src/presentation/renderers/data_query.ts): The non-projected path dumps the full DataQueryData object ({ predicate, select, results, projected, total, limited }), while the projected path emits a stripped envelope ({ results, total, limited }) without predicate or select. A script toggling --select on/off gets differently shaped JSON. Both shapes contain the critical fields, but aligning them (e.g., always including predicate and select in the projected envelope) would make scripting more predictable.

  2. --limit description (src/cli/commands/data_query.ts, line 41): "Maximum results" is terse — something like "Maximum number of results to return (default: 100)" would match the level of detail now given to --select and make the default visible in help text without users having to run the command.

Verdict

PASS — The projected JSON envelope fix (adding total and limited) is a solid improvement for scripting use cases, the --select description upgrade and added examples are clear wins, and debug-level logging for skipped rows is the right behavior. No blocking issues.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

All five files reviewed. The changes are well-scoped and correctly address the three findings from the adversarial review.

Blocking Issues

None.

Suggestions

  1. catalog_store.ts:148 — Add ORDER BY rowid to the paged query. The current SELECT * FROM catalog LIMIT ? OFFSET ? relies on SQLite's implicit rowid ordering, which is practical but not guaranteed by the SQL spec. Adding an explicit ORDER BY rowid makes the pagination deterministic and costs nothing on the primary key index. Without it, concurrent inserts (e.g. a backfill in another connection) could theoretically cause skipped or duplicated rows.

  2. catalog_store_test.ts — Consider a paging boundary test. The existing tests exercise iterate() with 0–3 rows. A test inserting slightly more than ITERATE_PAGE_SIZE rows (e.g. 1001) would directly verify the multi-batch logic and the rows.length < ITERATE_PAGE_SIZE early-exit path.

Notes

  • Import boundary: Presentation renderer correctly imports from libswamp/mod.ts. CLI command imports from libswamp/mod.ts for application-layer types. No violations.
  • DDD: DataQueryService remains a well-placed domain service. CatalogStore stays in infrastructure persistence. The debug logging addition doesn't leak domain concerns — it uses the standard LogTape logger at an appropriate level.
  • JSON envelope change (data_query.ts renderer): Wrapping projected output in {results, total, limited} is a good consistency improvement for JSON consumers.
  • Serve shutdown cleanup: The optional-chaining repoContext.catalogStore?.close() is correct — catalogStore may be undefined if initialization failed.
  • License headers: All modified files have the AGPLv3 header.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Critical / High

None found.

Medium

  1. src/infrastructure/persistence/catalog_store.ts:148LIMIT/OFFSET pagination without ORDER BY is undefined behavior in SQL.

    The paged iteration query is:

    SELECT * FROM catalog LIMIT ? OFFSET ?

    Without ORDER BY, the SQL standard (and SQLite docs) make no guarantee about row ordering between successive queries. This means across page boundaries, rows could theoretically be skipped or duplicated if the optimizer chooses a different access path for different offset values, or if any concurrent write (e.g., a backfill triggered by another process) mutates the table between page fetches.

    Why it doesn't blow up today: The generator is consumed synchronously in DataQueryService.executeQuery(), and no writes happen to the catalog during iteration. SQLite in practice tends to return rows in rowid order for full table scans. So this is safe by accident, not by contract.

    Breaking example (theoretical): If a future caller iterates the generator asynchronously with interleaved catalog writes, or if SQLite's optimizer changes behavior (e.g., after adding a covering index), rows could be missed or yielded twice.

    Suggested fix:

    SELECT * FROM catalog ORDER BY type_normalized, model_id, data_name LIMIT ? OFFSET ?

    This uses the primary key, so it's essentially free (no sort needed — the PK index already provides this order).

Low

  1. src/presentation/renderers/data_query.ts:179-192 — JSON envelope for projected output is a breaking change to the --json contract.

    Previously, --json --select emitted a bare array. Now it emits { results, total, limited }. Any downstream consumer parsing the old format would break. Since this feature shipped in the immediately preceding PR (#1029), the blast radius is likely zero, but worth noting for changelog/documentation purposes.

Verdict

PASS — The changes are straightforward fixes. The missing ORDER BY is a correctness gap that should be addressed but is not exploitable in the current synchronous call path, so it does not block merge.

@stack72 stack72 merged commit ff0dc1e into main Apr 1, 2026
9 checks passed
@stack72 stack72 deleted the fix/data-query-review branch April 1, 2026 18:21
adamhjk added a commit that referenced this pull request Apr 1, 2026
The paged LIMIT/OFFSET query in CatalogStore.iterate() relied on
SQLite's implicit rowid ordering, which is not guaranteed by the SQL
spec. Add explicit ORDER BY rowid to make pagination deterministic
across page boundaries.

Also adds a pagination boundary test that inserts ITERATE_PAGE_SIZE+1
rows and verifies all rows are returned in order across two fetches.

Addresses review feedback on PR #1035.
adamhjk added a commit that referenced this pull request Apr 1, 2026
The paged LIMIT/OFFSET query in CatalogStore.iterate() relied on
SQLite's implicit rowid ordering, which is not guaranteed by the SQL
spec. Add explicit ORDER BY rowid to make pagination deterministic
across page boundaries.

Also adds a pagination boundary test that inserts ITERATE_PAGE_SIZE+1
rows and verifies all rows are returned in order across two fetches.

Addresses review feedback on PR #1035.
adamhjk added a commit that referenced this pull request Apr 1, 2026
The paged LIMIT/OFFSET query in CatalogStore.iterate() relied on
SQLite's implicit rowid ordering, which is not guaranteed by the SQL
spec. Add explicit ORDER BY rowid to make pagination deterministic
across page boundaries.

Also adds a pagination boundary test that inserts ITERATE_PAGE_SIZE+1
rows and verifies all rows are returned in order across two fetches.

Addresses review feedback on PR #1035.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants