fix(replay): preserve real sessionId + surface 200-file scan cap (#202, #203)#207
fix(replay): preserve real sessionId + surface 200-file scan cap (#202, #203)#207
Conversation
Two bugs in the JSONL import path that combined to make \`agentmemory import-jsonl\` against a real \`~/.claude/projects\` tree quietly mis-import: ~10% of files scanned, each landing under a fresh random session id even when the file's own header named one. ### #202 — \`parseJsonlText\` ignores the file's sessionId The parser pre-populated \`sessionId\` from \`fallbackSessionId\` and then guarded \`if (entry.sessionId && !sessionId)\`, so the file's real id was never adopted. Each call returned a freshly generated session, breaking merge-on-reimport (Session.observationCount accumulation, endedAt extension, jsonl-import tag set) and inflating the session count ~4× on real trees. Fix: don't pre-populate sessionId; fall back to \`fallbackSessionId\` and then \`generateId('sess')\` only after the loop has had a chance to find the file's id. ### #203 — \`import-jsonl\` CLI silently caps at 200 files \`findJsonlFiles\` had a hard 200-file safety cap (sensible given the 30s function timeout) but the CLI exposed no flag to override it and the post-import summary never warned that files were skipped. On a 2167-file tree users got a silent ~10% import. Fix: - \`findJsonlFiles\` now returns \`{files, truncated, discovered}\` so the handler can keep counting beyond the cap without retaining paths - Handler returns \`discovered\`, \`truncated\`, \`maxFiles\` in the response - CLI accepts \`--max-files <N>\` (and \`--max-files=<N>\`) - CLI warns on truncation: how many were skipped, suggested re-run with a larger cap Adds three regression tests in \`test/replay.test.ts\` covering: file sessionId beats fallback, two parses of the same file produce the same id, fallback used only when file has no id. Thanks @bloodcarter for the precise repros and root-cause pointers. Closes #202 Closes #203
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughParser now prefers a JSONL-embedded sessionId over any fallback; CLI Changes
Sequence DiagramsequenceDiagram
participant CLI as User/CLI
participant API as Import API
participant Scanner as findJsonlFiles
participant Parser as parseJsonlText
participant KV as KV Storage
CLI->>API: POST /agentmemory/replay/import-jsonl\n{ path, maxFiles? }
API->>Scanner: scan(path, maxFiles || DEFAULT)
Scanner->>Scanner: walk filesystem\ncount .jsonl, collect up to limit
Scanner-->>API: { files[], discovered, truncated, traversalCapped, limit }
loop per file
API->>Parser: parse(file.content, fallbackSessionId?)
Parser->>Parser: prefer file's sessionId\nelse generate or use fallback
Parser-->>API: ParsedTranscript { sessionId, observations }
API->>KV: upsert session & observations
end
API-->>CLI: { imported, sessionIds, discovered, truncated, traversalCapped, maxFiles, maxFilesUpperBound }
alt truncated == true or traversalCapped == true
CLI->>CLI: warn user about cap,\nshow discovered/skipped and suggest batching or higher --max-files (≤ upper bound)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/functions/replay.ts (1)
209-234:⚠️ Potential issue | 🟡 MinorWalk no longer short-circuits at the cap.
The previous 200-file limit acted as a hard upper bound on directory traversal, which the comment in
#203cites as protection against the 30s function timeout. Now thatdiscoveredis counted pastlimit, very large trees (e.g. millions of.jsonlfiles via accidental scan of a non-Claude directory) will be walked in full even though onlylimitpaths are retained. Consider either documenting this trade-off or capping the walk at e.g.max(limit * 10, 10_000)sotruncatedis reported but the function still returns within the timeout.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/replay.ts` around lines 209 - 234, The recursive walker function walk currently continues traversing the entire tree even after out has reached limit because discovered keeps counting past limit, risking timeouts; modify walk (and its use of discovered, out, limit) to short‑circuit traversal once a configurable traversal cap is reached (e.g. const traversalCap = Math.max(limit * 10, 10_000)) or stop when discovered >= traversalCap, so you still report truncated = discovered > out.length but avoid walking millions of files; update references to discovered, out, limit and root in walk to check traversalCap before recursing or counting files.
🧹 Nitpick comments (3)
src/cli.ts (2)
39-39: Help text only documents the space-separated form.The CLI accepts both
--max-files <N>and--max-files=<N>, but only the former is mentioned. Worth listing the=form too — especially since (per the comment above) it is currently the only form that works reliably alongside a positional path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cli.ts` at line 39, The help text for the CLI only documents the space-separated form of the --max-files option; update the help string where the option description is defined (the text "Use --max-files <N> to override the 200-file scan cap (default: 200)") to include the equals form as well (e.g. "--max-files <N> or --max-files=<N>") so both "--max-files <N>" and "--max-files=<N>" are documented for the option --max-files in src/cli.ts.
967-978: Invalid--max-filesvalues fall back silently.
--max-files abc,--max-files 0, or--max-files -5all leavemaxFilesundefined and the server applies the default 200 with no feedback to the user. A shortp.log.warn("ignoring invalid --max-files value: <x>")would prevent confused users from re-running with the same broken flag.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cli.ts` around lines 967 - 978, The parsing for --max-files (variable maxFiles, flagIdx, eqArg) currently ignores invalid values silently; update the branches that parse parsed (both the separate arg path using flagIdx and the --max-files=... path using eqArg) to emit a warning via p.log.warn with the original invalid token (e.g., args[flagIdx+1] or eqArg.slice(...)) when parsed is NaN, <= 0, or missing, and only assign to maxFiles when parsed is a positive integer; include the offending value in the warning message so users see why the flag was ignored.src/functions/replay.ts (1)
302-302: Minor: consider validatingmaxFilesis an integer here too.
src/triggers/api.tsalready rejects non-integermaxFilesat the HTTP boundary, but a direct SDK caller (sdk.trigger) bypasses that check and could pass1.5orNumber.MAX_SAFE_INTEGER. A smallNumber.isInteger(data.maxFiles) && data.maxFiles > 0guard with a sane upper bound would defense-in-depth this entry point.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/replay.ts` at line 302, Validate and clamp the incoming data.maxFiles before assigning maxFiles: ensure you only accept integer values via Number.isInteger(data.maxFiles) && data.maxFiles > 0 and enforce a sane upper bound (e.g., <= 1000) so SDK callers can't pass fractions or extremely large numbers; if the check fails, fall back to the existing default (200). Update the assignment that defines maxFiles (the const maxFiles = ... expression in replay.ts) to use this guard and upper-bound logic referencing data.maxFiles and the constant default.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cli.ts`:
- Around line 964-978: The positional-arg collection lets standalone flag values
(e.g., "500") through so pathArg becomes the numeric value; update how
nonFlagArgs is built to skip flag values for the --max-files form. Instead of
filtering only by a.startsWith("-"), iterate the args slice and when you
encounter the flag token "--max-files" skip that token and the immediately
following token (the numeric value), and also skip tokens that start with "-" or
match the "--max-files=..." form; then set pathArg from the remaining non-flag
tokens. This change should reference the existing variables args, nonFlagArgs,
pathArg and the flag name "--max-files".
---
Outside diff comments:
In `@src/functions/replay.ts`:
- Around line 209-234: The recursive walker function walk currently continues
traversing the entire tree even after out has reached limit because discovered
keeps counting past limit, risking timeouts; modify walk (and its use of
discovered, out, limit) to short‑circuit traversal once a configurable traversal
cap is reached (e.g. const traversalCap = Math.max(limit * 10, 10_000)) or stop
when discovered >= traversalCap, so you still report truncated = discovered >
out.length but avoid walking millions of files; update references to discovered,
out, limit and root in walk to check traversalCap before recursing or counting
files.
---
Nitpick comments:
In `@src/cli.ts`:
- Line 39: The help text for the CLI only documents the space-separated form of
the --max-files option; update the help string where the option description is
defined (the text "Use --max-files <N> to override the 200-file scan cap
(default: 200)") to include the equals form as well (e.g. "--max-files <N> or
--max-files=<N>") so both "--max-files <N>" and "--max-files=<N>" are documented
for the option --max-files in src/cli.ts.
- Around line 967-978: The parsing for --max-files (variable maxFiles, flagIdx,
eqArg) currently ignores invalid values silently; update the branches that parse
parsed (both the separate arg path using flagIdx and the --max-files=... path
using eqArg) to emit a warning via p.log.warn with the original invalid token
(e.g., args[flagIdx+1] or eqArg.slice(...)) when parsed is NaN, <= 0, or
missing, and only assign to maxFiles when parsed is a positive integer; include
the offending value in the warning message so users see why the flag was
ignored.
In `@src/functions/replay.ts`:
- Line 302: Validate and clamp the incoming data.maxFiles before assigning
maxFiles: ensure you only accept integer values via
Number.isInteger(data.maxFiles) && data.maxFiles > 0 and enforce a sane upper
bound (e.g., <= 1000) so SDK callers can't pass fractions or extremely large
numbers; if the check fails, fall back to the existing default (200). Update the
assignment that defines maxFiles (the const maxFiles = ... expression in
replay.ts) to use this guard and upper-bound logic referencing data.maxFiles and
the constant default.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: feee76a6-0776-43da-96f4-6733666d2d58
📒 Files selected for processing (4)
src/cli.tssrc/functions/replay.tssrc/replay/jsonl-parser.tstest/replay.test.ts
… parsing
Five follow-ups from CodeRabbit on the import-jsonl patch:
1. **Walker no longer short-circuits** — after dropping the in-loop
limit return, \`findJsonlFiles\` walked the entire tree even when
the user only wanted 200 files, risking 30s timeouts on million-file
trees. Added a \`traversalCap = max(limit * 10, 10_000)\` so
\`discovered\`/\`truncated\` stay accurate but the walk stops well
short of pathological depths.
2. **\`maxFiles\` accepted floats and unbounded values** — guard the
handler with \`Number.isInteger\` + upper bound (1000) so SDK callers
can't pass 1.5 or 1_000_000. Out-of-range falls back to default 200.
3. **CLI flag value contaminated \`pathArg\`** — the old
\`filter(a => !a.startsWith('-'))\` left the numeric value of
\`--max-files 500\` as a positional, so \`agentmemory import-jsonl
--max-files 500\` resolved \`pathArg\` to \`"500"\`. Replaced the
filter with a single pass that consumes the flag's value and skips
both space-form and equals-form.
4. **CLI swallowed invalid \`--max-files\` values silently** — emit a
\`p.log.warn\` with the offending token when parsing fails so the
user sees why their flag was ignored.
5. **Help text only documented the space form** — added the equals
form (\`--max-files=<N>\`) so both invocations are visible in
\`--help\`.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
src/functions/replay.ts (1)
203-242: Note:discoveredis capped attraversalCap, not a true total.The walk stops when
discovered >= traversalCap(e.g. 2,000 for a 200-file limit), so for very large trees the reporteddiscoveredis an undercount. This is fine for the safety property (bounded latency), but the CLI usesdiscoveredto suggest a new--max-filesvalue, so on huge trees the suggestion will be too small. Worth a brief comment so future readers don't assumediscoveredis exhaustive, and/or returning a separate flag (e.g.traversalCapped) so the CLI can switch its message to "batch by subdirectory" unconditionally in that case.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/replay.ts` around lines 203 - 242, The findJsonlFiles function currently caps the discovered counter at traversalCap (variable traversalCap) which makes discovered an undercount on very large trees; update findJsonlFiles so callers can tell when the walk was truncated by adding and returning a traversalCapped boolean (e.g. traversalCapped = discovered >= traversalCap) alongside files/truncated/discovered, and update any callers to check traversalCapped to change CLI messaging; also add a short inline comment near traversalCap and the walk stop condition explaining that discovered is bounded and may be incomplete when traversalCapped is true.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cli.ts`:
- Line 39: Update the CLI help text for the --max-files option to state the
upper bound and fallback behavior: mention MAX_FILES_UPPER_BOUND (1000) as the
maximum allowed value and that values outside the allowed range will be clamped
or fall back to the default (200); edit the help string currently shown in the
--max-files description (the line containing "Use --max-files <N> or
--max-files=<N> to override the 200-file scan cap (default: 200)") so it
references MAX_FILES_UPPER_BOUND and explains that e.g. values >1000 will be
clamped and out-of-range inputs revert to the default, keeping the text concise
and consistent with the logic in the replay handler that enforces
MAX_FILES_UPPER_BOUND.
- Around line 1087-1095: The warning suggests an unbounded --max-files value
that can exceed the server hard cap; update the logic that builds the suggested
value in the json.truncated branch (where p.log.warn is called) to clamp the
proposed max to the server-side cap (use the same MAX_FILES_UPPER_BOUND constant
from replay.ts or a shared constant) and, if discovered > MAX_FILES_UPPER_BOUND,
recommend batching by subdirectory instead of a larger single --max-files; keep
the existing message structure but replace Math.max((json.discovered ?? cap) +
100, cap * 2) with a clamped value no greater than MAX_FILES_UPPER_BOUND and
mention batching when that cap would still be insufficient.
In `@src/functions/replay.ts`:
- Around line 309-316: The code currently silently falls back to
MAX_FILES_DEFAULT when data.maxFiles is non-integer or > MAX_FILES_UPPER_BOUND;
change the logic around MAX_FILES_DEFAULT / MAX_FILES_UPPER_BOUND and maxFiles
so that valid integer inputs >=1 are clamped to the upper bound instead of
defaulting (e.g., if Number.isInteger(data.maxFiles) && data.maxFiles > 0 then
maxFiles = Math.min(data.maxFiles, MAX_FILES_UPPER_BOUND) else maxFiles =
MAX_FILES_DEFAULT), and ensure any API response or caller-visible return uses
this computed maxFiles value so callers can see the clamped value;
alternatively, if you prefer rejecting out-of-range values, tighten validation
where requests are accepted to enforce <= MAX_FILES_UPPER_BOUND and return an
error instead of silently changing the value.
---
Nitpick comments:
In `@src/functions/replay.ts`:
- Around line 203-242: The findJsonlFiles function currently caps the discovered
counter at traversalCap (variable traversalCap) which makes discovered an
undercount on very large trees; update findJsonlFiles so callers can tell when
the walk was truncated by adding and returning a traversalCapped boolean (e.g.
traversalCapped = discovered >= traversalCap) alongside
files/truncated/discovered, and update any callers to check traversalCapped to
change CLI messaging; also add a short inline comment near traversalCap and the
walk stop condition explaining that discovered is bounded and may be incomplete
when traversalCapped is true.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fd916bd5-2d63-4413-9fdd-a4b3198b4d24
📒 Files selected for processing (2)
src/cli.tssrc/functions/replay.ts
|
Hi, findJsonlFiles() enforces its traversal cap using Severity: action required | Category: reliability How to fix: Cap by visited entries Agent prompt to fix - you can give this to your LLM of choice:
Found by Qodo. Free code review for open-source maintainers. |
|
Hi, runImportJsonl() treats any non-dash token as a positional path, so values belonging to other flags (e.g. Severity: remediation recommended | Category: correctness How to fix: Skip values for known flags Agent prompt to fix - you can give this to your LLM of choice:
Found by Qodo code review |
…ries visited
Five follow-ups from CodeRabbit and qodo on the import-jsonl patch:
1. **Out-of-range maxFiles silently degraded to default** — caller
passing maxFiles > 1000 got a 202 with maxFiles: 200 in the
response, no error. The HTTP layer in src/triggers/api.ts only
validated `Number.isInteger && >= 1` (no ceiling), so the contract
drifted between transport and handler. Now:
- HTTP returns 400 with the bound in the error message
- The SDK-callable handler clamps valid integers to MAX_FILES_UPPER_BOUND
(safety net for non-HTTP callers)
- Constants exported from src/functions/replay.ts so api.ts and
downstream callers share one source of truth
2. **Walker traversal bounded by `discovered`** — `discovered` only
counts .jsonl files. Trees dominated by non-jsonl files
(node_modules, lockfiles) could still walk past the function
timeout. Switched to a separate `walked` counter incremented per
directory entry visited; threshold raised to max(limit*50, 50_000)
to give legitimate trees room. Returns `traversalCapped: boolean`
so callers can distinguish "found more than we showed" from
"stopped walking early for safety".
3. **CLI swallowed flag values from --port and --tools** — the
import-jsonl arg loop only consumed `--max-files`'s value token.
`agentmemory --port 3112 import-jsonl` left "3112" as a positional,
resolving pathArg to a numeric string. Added a VALUE_FLAGS set
(`--port`, `--tools`) that consumes the next token alongside the
flag.
4. **Suggested --max-files in warning could exceed 1000** — the old
message said "Re-run with --max-files=5100" for a 5000-file tree,
but the server caps at 1000, so the user would see the same
warning with no progress. Now clamps the suggestion to
maxFilesUpperBound, and when discovered > upper bound (or
traversalCapped fired), recommends batching by subdirectory
instead.
5. **Help text omitted the upper bound** — added the 1000 ceiling
and the "batch by subdirectory" guidance for trees larger than
the cap.
Summary
Two bugs in the JSONL import path that combined to make `agentmemory import-jsonl` against a real `~/.claude/projects` tree quietly mis-import: ~10% of files scanned, each landing under a fresh random session id even when the file's own header named one.
#202 — `parseJsonlText` ignores the file's sessionId
The parser pre-populated `sessionId` from `fallbackSessionId` and then guarded `if (entry.sessionId && !sessionId)`, so the file's real id was never adopted. Each call returned a freshly generated session, breaking merge-on-reimport and inflating session counts ~4× on real trees.
Fix: don't pre-populate `sessionId`; fall back to `fallbackSessionId` then `generateId('sess')` only after the loop.
#203 — `import-jsonl` CLI silently caps at 200 files
`findJsonlFiles` had a hard 200-file cap (sensible for the 30s function timeout) but the CLI exposed no flag to override it and never warned when files were skipped. On a 2167-file tree users got a silent ~10% import.
Fix:
Tests
Thanks @bloodcarter for the precise repros and root-cause pointers in #202/#203.
Closes #202
Closes #203
Test plan
Summary by CodeRabbit