Skip to content

fix(replay): preserve real sessionId + surface 200-file scan cap (#202, #203)#207

Merged
rohitg00 merged 3 commits intomainfrom
fix/202-203-jsonl-import
Apr 27, 2026
Merged

fix(replay): preserve real sessionId + surface 200-file scan cap (#202, #203)#207
rohitg00 merged 3 commits intomainfrom
fix/202-203-jsonl-import

Conversation

@rohitg00
Copy link
Copy Markdown
Owner

@rohitg00 rohitg00 commented Apr 26, 2026

Summary

Two bugs in the JSONL import path that combined to make `agentmemory import-jsonl` against a real `~/.claude/projects` tree quietly mis-import: ~10% of files scanned, each landing under a fresh random session id even when the file's own header named one.

#202 — `parseJsonlText` ignores the file's sessionId

The parser pre-populated `sessionId` from `fallbackSessionId` and then guarded `if (entry.sessionId && !sessionId)`, so the file's real id was never adopted. Each call returned a freshly generated session, breaking merge-on-reimport and inflating session counts ~4× on real trees.

Fix: don't pre-populate `sessionId`; fall back to `fallbackSessionId` then `generateId('sess')` only after the loop.

#203 — `import-jsonl` CLI silently caps at 200 files

`findJsonlFiles` had a hard 200-file cap (sensible for the 30s function timeout) but the CLI exposed no flag to override it and never warned when files were skipped. On a 2167-file tree users got a silent ~10% import.

Fix:

  • `findJsonlFiles` now reports `{files, truncated, discovered}` so the handler can count beyond the cap without retaining the paths
  • Handler returns `discovered`, `truncated`, `maxFiles` in the response
  • CLI accepts `--max-files ` and `--max-files=`
  • CLI warns on truncation, suggests a larger cap

Tests

  • 3 new regression cases in `test/replay.test.ts`: file sessionId beats fallback, repeated parses produce the same id, fallback only when file has no id
  • Full suite: 830/830 passing

Thanks @bloodcarter for the precise repros and root-cause pointers in #202/#203.

Closes #202
Closes #203

Test plan

  • `agentmemory import-jsonl ~/.claude/projects` on a tree >200 files — must surface a warning + skipped count
  • `agentmemory import-jsonl --max-files 5000 ~/.claude/projects` — must scan beyond 200
  • Re-run import twice on the same file — must merge into one Session (no duplicates)

Summary by CodeRabbit

  • New Features
    • CLI: added optional --max-files (both forms); imports now report discovered/skipped counts, truncation/traversal caps, and recommend batching or a higher max within the server-provided upper bound.
  • Bug Fixes
    • Fixed CLI argument parsing so long-form flags consume their values correctly.
    • Enhanced server-side validation of maxFiles with clearer range errors; invalid CLI values emit warnings and are ignored.
  • Tests
    • Added tests covering session ID precedence and fallback behavior.

Two bugs in the JSONL import path that combined to make
\`agentmemory import-jsonl\` against a real \`~/.claude/projects\` tree
quietly mis-import: ~10% of files scanned, each landing under a fresh
random session id even when the file's own header named one.

### #202 — \`parseJsonlText\` ignores the file's sessionId

The parser pre-populated \`sessionId\` from \`fallbackSessionId\` and then
guarded \`if (entry.sessionId && !sessionId)\`, so the file's real id
was never adopted. Each call returned a freshly generated session,
breaking merge-on-reimport (Session.observationCount accumulation,
endedAt extension, jsonl-import tag set) and inflating the session
count ~4× on real trees.

Fix: don't pre-populate sessionId; fall back to \`fallbackSessionId\`
and then \`generateId('sess')\` only after the loop has had a chance
to find the file's id.

### #203 — \`import-jsonl\` CLI silently caps at 200 files

\`findJsonlFiles\` had a hard 200-file safety cap (sensible given the
30s function timeout) but the CLI exposed no flag to override it and
the post-import summary never warned that files were skipped. On a
2167-file tree users got a silent ~10% import.

Fix:
- \`findJsonlFiles\` now returns \`{files, truncated, discovered}\` so
  the handler can keep counting beyond the cap without retaining paths
- Handler returns \`discovered\`, \`truncated\`, \`maxFiles\` in the response
- CLI accepts \`--max-files <N>\` (and \`--max-files=<N>\`)
- CLI warns on truncation: how many were skipped, suggested re-run
  with a larger cap

Adds three regression tests in \`test/replay.test.ts\` covering: file
sessionId beats fallback, two parses of the same file produce the same
id, fallback used only when file has no id.

Thanks @bloodcarter for the precise repros and root-cause pointers.

Closes #202
Closes #203
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7eca6df8-dc46-4c61-9451-7e90aa0a23e2

📥 Commits

Reviewing files that changed from the base of the PR and between 400b57d and a0b8cd2.

📒 Files selected for processing (3)
  • src/cli.ts
  • src/functions/replay.ts
  • src/triggers/api.ts

📝 Walkthrough

Walkthrough

Parser now prefers a JSONL-embedded sessionId over any fallback; CLI import-jsonl accepts and validates an optional --max-files flag, includes it in the import request, and surfaces server-provided scan metadata (discovered, truncated, traversalCapped, maxFiles, maxFilesUpperBound) to warn when directory scanning was capped.

Changes

Cohort / File(s) Summary
CLI enhancements
src/cli.ts
Adds parsing for --max-files (both --max-files N and --max-files=N), validates positive integers (warns on invalid), ensures other long flags consume their values, includes maxFiles in POST payload, and interprets new response metadata to emit cap/truncation warnings and suggestions.
Import scanner & API handler
src/functions/replay.ts, src/triggers/api.ts
findJsonlFiles now returns { files, discovered, truncated, traversalCapped }; mem::replay::import-jsonl clamps/normalizes maxFiles to [MAX_FILES_DEFAULT, MAX_FILES_UPPER_BOUND], propagates discovered, truncated, traversalCapped, maxFiles, and maxFilesUpperBound in all success responses; API endpoint validates maxFiles against the upper bound and stores the validated numeric value.
JSONL parser sessionId fix
src/replay/jsonl-parser.ts
Defers fallback assignment until after parsing so an embedded entry.sessionId is preferred; effectiveSessionId considers entry.sessionId, generated id, then fallbackSessionId when none found; parsed observations and returned sessionId reflect that precedence.
Tests
test/replay.test.ts
Adds three Vitest cases verifying sessionId precedence and stability: embedded id overrides fallback, embedded id is stable across different fallback inputs, and fallback is used when input lacks a sessionId.
Exports / constants
src/functions/replay.ts
Introduces exported MAX_FILES_DEFAULT and MAX_FILES_UPPER_BOUND constants and includes maxFiles metadata in the function's success response shape.

Sequence Diagram

sequenceDiagram
    participant CLI as User/CLI
    participant API as Import API
    participant Scanner as findJsonlFiles
    participant Parser as parseJsonlText
    participant KV as KV Storage

    CLI->>API: POST /agentmemory/replay/import-jsonl\n{ path, maxFiles? }
    API->>Scanner: scan(path, maxFiles || DEFAULT)
    Scanner->>Scanner: walk filesystem\ncount .jsonl, collect up to limit
    Scanner-->>API: { files[], discovered, truncated, traversalCapped, limit }
    loop per file
        API->>Parser: parse(file.content, fallbackSessionId?)
        Parser->>Parser: prefer file's sessionId\nelse generate or use fallback
        Parser-->>API: ParsedTranscript { sessionId, observations }
        API->>KV: upsert session & observations
    end
    API-->>CLI: { imported, sessionIds, discovered, truncated, traversalCapped, maxFiles, maxFilesUpperBound }
    alt truncated == true or traversalCapped == true
        CLI->>CLI: warn user about cap,\nshow discovered/skipped and suggest batching or higher --max-files (≤ upper bound)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 A hop, a parse, a careful test,
Session IDs now choose the best,
Files counted, caps made known,
Imports merge — no more unknown,
Hooray — the burrow's neatly blessed!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the two main fixes: preserving the real sessionId (issue #202) and surfacing the 200-file scan cap (issue #203), with clear issue references.
Linked Issues check ✅ Passed All coding requirements from #202 and #203 are met: parseJsonlText now respects file sessionIds [#202], truncation metadata is tracked and returned [#203], CLI accepts --max-files with validation [#203], and regression tests added [#202].
Out of Scope Changes check ✅ Passed All changes directly address the linked issues. Modifications span CLI argument parsing, JSONL parsing logic, function return types, API validation, and tests—all scoped to #202/#203 objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/202-203-jsonl-import

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/functions/replay.ts (1)

209-234: ⚠️ Potential issue | 🟡 Minor

Walk no longer short-circuits at the cap.

The previous 200-file limit acted as a hard upper bound on directory traversal, which the comment in #203 cites as protection against the 30s function timeout. Now that discovered is counted past limit, very large trees (e.g. millions of .jsonl files via accidental scan of a non-Claude directory) will be walked in full even though only limit paths are retained. Consider either documenting this trade-off or capping the walk at e.g. max(limit * 10, 10_000) so truncated is reported but the function still returns within the timeout.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/replay.ts` around lines 209 - 234, The recursive walker
function walk currently continues traversing the entire tree even after out has
reached limit because discovered keeps counting past limit, risking timeouts;
modify walk (and its use of discovered, out, limit) to short‑circuit traversal
once a configurable traversal cap is reached (e.g. const traversalCap =
Math.max(limit * 10, 10_000)) or stop when discovered >= traversalCap, so you
still report truncated = discovered > out.length but avoid walking millions of
files; update references to discovered, out, limit and root in walk to check
traversalCap before recursing or counting files.
🧹 Nitpick comments (3)
src/cli.ts (2)

39-39: Help text only documents the space-separated form.

The CLI accepts both --max-files <N> and --max-files=<N>, but only the former is mentioned. Worth listing the = form too — especially since (per the comment above) it is currently the only form that works reliably alongside a positional path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli.ts` at line 39, The help text for the CLI only documents the
space-separated form of the --max-files option; update the help string where the
option description is defined (the text "Use --max-files <N> to override the
200-file scan cap (default: 200)") to include the equals form as well (e.g.
"--max-files <N> or --max-files=<N>") so both "--max-files <N>" and
"--max-files=<N>" are documented for the option --max-files in src/cli.ts.

967-978: Invalid --max-files values fall back silently.

--max-files abc, --max-files 0, or --max-files -5 all leave maxFiles undefined and the server applies the default 200 with no feedback to the user. A short p.log.warn("ignoring invalid --max-files value: <x>") would prevent confused users from re-running with the same broken flag.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli.ts` around lines 967 - 978, The parsing for --max-files (variable
maxFiles, flagIdx, eqArg) currently ignores invalid values silently; update the
branches that parse parsed (both the separate arg path using flagIdx and the
--max-files=... path using eqArg) to emit a warning via p.log.warn with the
original invalid token (e.g., args[flagIdx+1] or eqArg.slice(...)) when parsed
is NaN, <= 0, or missing, and only assign to maxFiles when parsed is a positive
integer; include the offending value in the warning message so users see why the
flag was ignored.
src/functions/replay.ts (1)

302-302: Minor: consider validating maxFiles is an integer here too.

src/triggers/api.ts already rejects non-integer maxFiles at the HTTP boundary, but a direct SDK caller (sdk.trigger) bypasses that check and could pass 1.5 or Number.MAX_SAFE_INTEGER. A small Number.isInteger(data.maxFiles) && data.maxFiles > 0 guard with a sane upper bound would defense-in-depth this entry point.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/replay.ts` at line 302, Validate and clamp the incoming
data.maxFiles before assigning maxFiles: ensure you only accept integer values
via Number.isInteger(data.maxFiles) && data.maxFiles > 0 and enforce a sane
upper bound (e.g., <= 1000) so SDK callers can't pass fractions or extremely
large numbers; if the check fails, fall back to the existing default (200).
Update the assignment that defines maxFiles (the const maxFiles = ... expression
in replay.ts) to use this guard and upper-bound logic referencing data.maxFiles
and the constant default.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cli.ts`:
- Around line 964-978: The positional-arg collection lets standalone flag values
(e.g., "500") through so pathArg becomes the numeric value; update how
nonFlagArgs is built to skip flag values for the --max-files form. Instead of
filtering only by a.startsWith("-"), iterate the args slice and when you
encounter the flag token "--max-files" skip that token and the immediately
following token (the numeric value), and also skip tokens that start with "-" or
match the "--max-files=..." form; then set pathArg from the remaining non-flag
tokens. This change should reference the existing variables args, nonFlagArgs,
pathArg and the flag name "--max-files".

---

Outside diff comments:
In `@src/functions/replay.ts`:
- Around line 209-234: The recursive walker function walk currently continues
traversing the entire tree even after out has reached limit because discovered
keeps counting past limit, risking timeouts; modify walk (and its use of
discovered, out, limit) to short‑circuit traversal once a configurable traversal
cap is reached (e.g. const traversalCap = Math.max(limit * 10, 10_000)) or stop
when discovered >= traversalCap, so you still report truncated = discovered >
out.length but avoid walking millions of files; update references to discovered,
out, limit and root in walk to check traversalCap before recursing or counting
files.

---

Nitpick comments:
In `@src/cli.ts`:
- Line 39: The help text for the CLI only documents the space-separated form of
the --max-files option; update the help string where the option description is
defined (the text "Use --max-files <N> to override the 200-file scan cap
(default: 200)") to include the equals form as well (e.g. "--max-files <N> or
--max-files=<N>") so both "--max-files <N>" and "--max-files=<N>" are documented
for the option --max-files in src/cli.ts.
- Around line 967-978: The parsing for --max-files (variable maxFiles, flagIdx,
eqArg) currently ignores invalid values silently; update the branches that parse
parsed (both the separate arg path using flagIdx and the --max-files=... path
using eqArg) to emit a warning via p.log.warn with the original invalid token
(e.g., args[flagIdx+1] or eqArg.slice(...)) when parsed is NaN, <= 0, or
missing, and only assign to maxFiles when parsed is a positive integer; include
the offending value in the warning message so users see why the flag was
ignored.

In `@src/functions/replay.ts`:
- Line 302: Validate and clamp the incoming data.maxFiles before assigning
maxFiles: ensure you only accept integer values via
Number.isInteger(data.maxFiles) && data.maxFiles > 0 and enforce a sane upper
bound (e.g., <= 1000) so SDK callers can't pass fractions or extremely large
numbers; if the check fails, fall back to the existing default (200). Update the
assignment that defines maxFiles (the const maxFiles = ... expression in
replay.ts) to use this guard and upper-bound logic referencing data.maxFiles and
the constant default.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: feee76a6-0776-43da-96f4-6733666d2d58

📥 Commits

Reviewing files that changed from the base of the PR and between 18e3257 and 39d4ca2.

📒 Files selected for processing (4)
  • src/cli.ts
  • src/functions/replay.ts
  • src/replay/jsonl-parser.ts
  • test/replay.test.ts

Comment thread src/cli.ts Outdated
… parsing

Five follow-ups from CodeRabbit on the import-jsonl patch:

1. **Walker no longer short-circuits** — after dropping the in-loop
   limit return, \`findJsonlFiles\` walked the entire tree even when
   the user only wanted 200 files, risking 30s timeouts on million-file
   trees. Added a \`traversalCap = max(limit * 10, 10_000)\` so
   \`discovered\`/\`truncated\` stay accurate but the walk stops well
   short of pathological depths.

2. **\`maxFiles\` accepted floats and unbounded values** — guard the
   handler with \`Number.isInteger\` + upper bound (1000) so SDK callers
   can't pass 1.5 or 1_000_000. Out-of-range falls back to default 200.

3. **CLI flag value contaminated \`pathArg\`** — the old
   \`filter(a => !a.startsWith('-'))\` left the numeric value of
   \`--max-files 500\` as a positional, so \`agentmemory import-jsonl
   --max-files 500\` resolved \`pathArg\` to \`"500"\`. Replaced the
   filter with a single pass that consumes the flag's value and skips
   both space-form and equals-form.

4. **CLI swallowed invalid \`--max-files\` values silently** — emit a
   \`p.log.warn\` with the offending token when parsing fails so the
   user sees why their flag was ignored.

5. **Help text only documented the space form** — added the equals
   form (\`--max-files=<N>\`) so both invocations are visible in
   \`--help\`.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
src/functions/replay.ts (1)

203-242: Note: discovered is capped at traversalCap, not a true total.

The walk stops when discovered >= traversalCap (e.g. 2,000 for a 200-file limit), so for very large trees the reported discovered is an undercount. This is fine for the safety property (bounded latency), but the CLI uses discovered to suggest a new --max-files value, so on huge trees the suggestion will be too small. Worth a brief comment so future readers don't assume discovered is exhaustive, and/or returning a separate flag (e.g. traversalCapped) so the CLI can switch its message to "batch by subdirectory" unconditionally in that case.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/replay.ts` around lines 203 - 242, The findJsonlFiles function
currently caps the discovered counter at traversalCap (variable traversalCap)
which makes discovered an undercount on very large trees; update findJsonlFiles
so callers can tell when the walk was truncated by adding and returning a
traversalCapped boolean (e.g. traversalCapped = discovered >= traversalCap)
alongside files/truncated/discovered, and update any callers to check
traversalCapped to change CLI messaging; also add a short inline comment near
traversalCap and the walk stop condition explaining that discovered is bounded
and may be incomplete when traversalCapped is true.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cli.ts`:
- Line 39: Update the CLI help text for the --max-files option to state the
upper bound and fallback behavior: mention MAX_FILES_UPPER_BOUND (1000) as the
maximum allowed value and that values outside the allowed range will be clamped
or fall back to the default (200); edit the help string currently shown in the
--max-files description (the line containing "Use --max-files <N> or
--max-files=<N> to override the 200-file scan cap (default: 200)") so it
references MAX_FILES_UPPER_BOUND and explains that e.g. values >1000 will be
clamped and out-of-range inputs revert to the default, keeping the text concise
and consistent with the logic in the replay handler that enforces
MAX_FILES_UPPER_BOUND.
- Around line 1087-1095: The warning suggests an unbounded --max-files value
that can exceed the server hard cap; update the logic that builds the suggested
value in the json.truncated branch (where p.log.warn is called) to clamp the
proposed max to the server-side cap (use the same MAX_FILES_UPPER_BOUND constant
from replay.ts or a shared constant) and, if discovered > MAX_FILES_UPPER_BOUND,
recommend batching by subdirectory instead of a larger single --max-files; keep
the existing message structure but replace Math.max((json.discovered ?? cap) +
100, cap * 2) with a clamped value no greater than MAX_FILES_UPPER_BOUND and
mention batching when that cap would still be insufficient.

In `@src/functions/replay.ts`:
- Around line 309-316: The code currently silently falls back to
MAX_FILES_DEFAULT when data.maxFiles is non-integer or > MAX_FILES_UPPER_BOUND;
change the logic around MAX_FILES_DEFAULT / MAX_FILES_UPPER_BOUND and maxFiles
so that valid integer inputs >=1 are clamped to the upper bound instead of
defaulting (e.g., if Number.isInteger(data.maxFiles) && data.maxFiles > 0 then
maxFiles = Math.min(data.maxFiles, MAX_FILES_UPPER_BOUND) else maxFiles =
MAX_FILES_DEFAULT), and ensure any API response or caller-visible return uses
this computed maxFiles value so callers can see the clamped value;
alternatively, if you prefer rejecting out-of-range values, tighten validation
where requests are accepted to enforce <= MAX_FILES_UPPER_BOUND and return an
error instead of silently changing the value.

---

Nitpick comments:
In `@src/functions/replay.ts`:
- Around line 203-242: The findJsonlFiles function currently caps the discovered
counter at traversalCap (variable traversalCap) which makes discovered an
undercount on very large trees; update findJsonlFiles so callers can tell when
the walk was truncated by adding and returning a traversalCapped boolean (e.g.
traversalCapped = discovered >= traversalCap) alongside
files/truncated/discovered, and update any callers to check traversalCapped to
change CLI messaging; also add a short inline comment near traversalCap and the
walk stop condition explaining that discovered is bounded and may be incomplete
when traversalCapped is true.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fd916bd5-2d63-4413-9fdd-a4b3198b4d24

📥 Commits

Reviewing files that changed from the base of the PR and between 39d4ca2 and 400b57d.

📒 Files selected for processing (2)
  • src/cli.ts
  • src/functions/replay.ts

Comment thread src/cli.ts Outdated
Comment thread src/cli.ts
Comment thread src/functions/replay.ts Outdated
@qodo-ai-reviewer
Copy link
Copy Markdown

Hi, findJsonlFiles() enforces its traversal cap using discovered, but discovered is only incremented for .jsonl files, so trees with many non-.jsonl entries can still be walked almost entirely and hit function timeouts.

Severity: action required | Category: reliability

How to fix: Cap by visited entries

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

findJsonlFiles() attempts to bound filesystem traversal via traversalCap, but it uses discovered (count of .jsonl files) to enforce that cap. In directories with many non-.jsonl files, the traversal can still run extremely long and may exceed function timeouts.

Issue Context

We still want:

  • discovered: number of .jsonl files encountered (for reporting)
  • files: first limit .jsonl files
  • a hard bound on total work regardless of file extension

Fix Focus Areas

  • src/functions/replay.ts[203-242]
    • Introduce a separate counter (e.g., visitedEntries / walked) incremented per directory entry processed (or per lstat attempt), and use THAT for the traversalCap checks.
    • Keep discovered++ for .jsonl only so reporting remains correct.
    • Consider setting truncated true when the traversal cap is hit even if discovered <= out.length (i.e., stopped early for safety).

Found by Qodo. Free code review for open-source maintainers.

@qodo-ai-reviewer
Copy link
Copy Markdown

Hi, runImportJsonl() treats any non-dash token as a positional path, so values belonging to other flags (e.g. --port 3112 or --tools core) can be incorrectly consumed as the import path.

Severity: remediation recommended | Category: correctness

How to fix: Skip values for known flags

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

runImportJsonl() builds positional args by skipping only tokens that start with -. This misclassifies flag values (like the 3112 after --port) as positional arguments and can set pathArg incorrectly.

Issue Context

The CLI already supports global flags like --port and --tools by scanning args earlier and setting env vars, but those tokens remain in args and are re-parsed by runImportJsonl().

Fix Focus Areas

  • src/cli.ts[963-994]
    • When encountering known flags that take a value (e.g., --port, --tools), increment i to skip the next token.
    • Alternatively, refactor to a small shared arg parser that returns { positionals, flags } and properly consumes flag values.
    • Add a regression test (if CLI has tests) or at least manual note in PR description with example invocations.

Found by Qodo code review

…ries visited

Five follow-ups from CodeRabbit and qodo on the import-jsonl patch:

1. **Out-of-range maxFiles silently degraded to default** — caller
   passing maxFiles > 1000 got a 202 with maxFiles: 200 in the
   response, no error. The HTTP layer in src/triggers/api.ts only
   validated `Number.isInteger && >= 1` (no ceiling), so the contract
   drifted between transport and handler. Now:
   - HTTP returns 400 with the bound in the error message
   - The SDK-callable handler clamps valid integers to MAX_FILES_UPPER_BOUND
     (safety net for non-HTTP callers)
   - Constants exported from src/functions/replay.ts so api.ts and
     downstream callers share one source of truth

2. **Walker traversal bounded by `discovered`** — `discovered` only
   counts .jsonl files. Trees dominated by non-jsonl files
   (node_modules, lockfiles) could still walk past the function
   timeout. Switched to a separate `walked` counter incremented per
   directory entry visited; threshold raised to max(limit*50, 50_000)
   to give legitimate trees room. Returns `traversalCapped: boolean`
   so callers can distinguish "found more than we showed" from
   "stopped walking early for safety".

3. **CLI swallowed flag values from --port and --tools** — the
   import-jsonl arg loop only consumed `--max-files`'s value token.
   `agentmemory --port 3112 import-jsonl` left "3112" as a positional,
   resolving pathArg to a numeric string. Added a VALUE_FLAGS set
   (`--port`, `--tools`) that consumes the next token alongside the
   flag.

4. **Suggested --max-files in warning could exceed 1000** — the old
   message said "Re-run with --max-files=5100" for a 5000-file tree,
   but the server caps at 1000, so the user would see the same
   warning with no progress. Now clamps the suggestion to
   maxFilesUpperBound, and when discovered > upper bound (or
   traversalCapped fired), recommends batching by subdirectory
   instead.

5. **Help text omitted the upper bound** — added the 1000 ceiling
   and the "batch by subdirectory" guidance for trees larger than
   the cap.
@rohitg00 rohitg00 merged commit aef1dbb into main Apr 27, 2026
3 of 4 checks passed
@rohitg00 rohitg00 deleted the fix/202-203-jsonl-import branch April 27, 2026 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants