Skip to content

perf(es): warn when --json is used with unbounded --max-docs in scroll-search#249

Merged
MattDevy merged 4 commits into
mainfrom
max-docs-limit
May 1, 2026
Merged

perf(es): warn when --json is used with unbounded --max-docs in scroll-search#249
MattDevy merged 4 commits into
mainfrom
max-docs-limit

Conversation

@ssh-esh
Copy link
Copy Markdown
Contributor

@ssh-esh ssh-esh commented Apr 28, 2026

Closes #211

When --json is active, scroll-search buffers all documents in memory. Since --max-docs defaults to unlimited, this can OOM on large indices. The NDJSON streaming path (default, no --json) is unaffected since it writes each doc to stdout immediately.

Options considered

  • Option 1: Cap --max-docs
    • Approach: default to 10,000 in --json mode
    • Downside: silent data truncation — user gets incomplete results that look correct
  • Option 2: Warn on stderr
    • Approach: emit a warning when --json + unbounded --max-docs
    • Downside: user could still OOM if they ignore the warning
  • Option 3: Both
    • Approach: warn + cap, allow explicit override
    • Downside: same truncation risk as option 1; two behaviors for one flag depending on --json

Decision: option 2 (warn only)

I went with a stderr warning because silent truncation is worse than a loud failure. An OOM is at least obvious, while truncated results look correct but aren't. This is especially important for agents, which parse stdout and would silently act on incomplete data.

This aligns with how other CLIs handle unbounded JSON output:

Recent writing on agent-focused CLI design also recommends NDJSON for streaming (which we already do) and explicit user-set limits over silent caps:

@ssh-esh ssh-esh marked this pull request as ready for review April 28, 2026 22:08
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 28, 2026

MegaLinter analysis: Success

Descriptor Linter Files Fixed Errors Warnings Elapsed time
✅ COPYPASTE jscpd yes no no 9.12s
✅ REPOSITORY gitleaks yes no no 94.75s
✅ REPOSITORY git_diff yes no no 0.42s
✅ REPOSITORY secretlint yes no no 12.56s
✅ REPOSITORY trivy yes no no 17.67s
✅ TYPESCRIPT eslint 2 0 0 3.86s

See detailed reports in MegaLinter artifacts
Set VALIDATE_ALL_CODEBASE: true in mega-linter.yml to validate all sources, not only the diff

MegaLinter is graciously provided by OX Security
Show us your support by starring ⭐ the repository

@ssh-esh ssh-esh requested a review from JoshMock April 28, 2026 22:08
@MattDevy MattDevy self-assigned this Apr 30, 2026
MattDevy and others added 2 commits May 1, 2026 13:07
… warning

Adds env injection to ScrollSearchDeps so tests can verify suppression
without touching process.env directly.
@MattDevy MattDevy enabled auto-merge (squash) May 1, 2026 12:21
@MattDevy MattDevy merged commit 9cfa147 into main May 1, 2026
17 of 18 checks passed
@MattDevy MattDevy deleted the max-docs-limit branch May 1, 2026 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: cap or warn when --json is used with unbounded --max-docs in scroll-search

2 participants