fix(summarize): chunk large sessions to fit LLM context window#472
Conversation
JSONL-imported sessions can have far more observations than the 500-cap MAX_OBS_PER_SESSION that constrains native sessions. mem::summarize previously built one prompt containing every observation and shipped it as a single LLM call, which exceeded the provider's context window for sessions >~7,000 observations and returned an unhelpful 400 from upstream — silently leaving large bulk-imported sessions out of the semantic tier. Approach: map-reduce inside mem::summarize. - Sessions ≤ SUMMARIZE_CHUNK_SIZE (default 400) take the legacy single-call path with no overhead - Larger sessions are split into chunks, each summarized with the existing per-session prompt in parallel batches of SUMMARIZE_CHUNK_CONCURRENCY (default 6), and partial summaries merged via a new REDUCE_SYSTEM prompt - Per-chunk retry-once on transient parse / provider errors - Persistently-failing chunks are skipped (not propagated) so a flaky chunk doesn't waste 30+ already-completed LLM calls on the same session - Bail with too_many_chunks_skipped only if >50% of chunks fail Companion operator tool: scripts/backfill-imported-sessions.sh walks jsonl-imported sessions and POSTs mem::summarize per session, with project / agent / obs-count filters, cost estimation, and per-failure payload dumping for debugging provider rejections. Validated locally against a real corpus: - 5,392-obs session (14 chunks, c=6): 39s - 10,704-obs session (27 chunks, c=6): 34s - 105,966-obs session (265 chunks, c=50): handler completes server-side and persists - 52-session bulk backfill → 25 new semantic facts + 6 new reflect insights produced by consolidate-pipeline Known limit: iii-engine has a hardcoded 180s function-invocation timeout. Sessions large enough that chunked summarize wallclock exceeds that will return a timeout/500 to the HTTP client even though the handler completes and persists server-side. High-RPM providers (Novita / DeepInfra / DeepSeek typically allow 100+ concurrent) can raise SUMMARIZE_CHUNK_CONCURRENCY to push the cliff well past any realistic session size. True fix is an async-job pattern; left as follow-up. - src/prompts/summary.ts: add REDUCE_SYSTEM + buildReducePrompt - src/functions/summarize.ts: chunking, retry, skip, parallelism - test/summarize.test.ts: 9 cases covering single-call path, chunking, env-override, retry-then-success, persistent skip, too-many-skipped bail, provider error after retry, concurrency - .env.example: document SUMMARIZE_CHUNK_SIZE / _CONCURRENCY - .gitignore: agentmemory-debug/, data-*/ (operator artefacts) - scripts/backfill-imported-sessions.sh: bulk-import backfill tool 9/9 new tests pass; existing tests untouched.
|
@efenex is attempting to deploy a commit to the rohitg00's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds map-reduce chunked summarization (configurable chunk size/concurrency, per-chunk retry/skip, reduce merge), integrates it into mem::summarize, adds comprehensive tests, a backfill script to process existing sessions, and documents env/gitignore updates. ChangesChunked summarization with map-reduce
Sequence DiagramsequenceDiagram
participant Handler as mem::summarize handler
participant Chunker as produceSummaryXml
participant Provider as LLM provider
participant Reduce as reduce phase
Handler->>Chunker: compressed observations
alt observations < chunk threshold
Chunker->>Provider: single summary call
Provider-->>Chunker: summary XML
else observations >= chunk threshold
Chunker->>Provider: chunk 1 summarize (with retry)
Chunker->>Provider: chunk 2 summarize (parallel)
Chunker->>Provider: chunk N summarize (parallel batches)
Provider-->>Chunker: partial summaries (skip on failure)
Chunker->>Reduce: partial summaries list
Reduce->>Provider: reduce prompt (merge partials)
Provider-->>Reduce: merged final summary
Reduce-->>Chunker: final summary XML
end
Chunker-->>Handler: response + mode + chunks metadata
Handler->>Handler: persist summary and metadata
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
src/functions/summarize.ts (1)
168-179: 💤 Low valueConsider tracking original indices alongside partials to avoid O(n²) lookup.
Using
partialByIdx.indexOf(p)to recover original indices is O(n) per partial, yielding O(n²) for large sessions with many chunks. For a session with ~250 chunks (100k observations at default chunk size), this becomes ~62k comparisons.♻️ Suggested fix
- const partialByIdx: Array<SessionSummary | null> = new Array(chunks.length).fill(null); + const partialByIdx: Array<{ summary: SessionSummary; idx: number } | null> = new Array(chunks.length).fill(null); for (let batchStart = 0; batchStart < chunks.length; batchStart += concurrency) { const batch = chunks.slice(batchStart, batchStart + concurrency); await Promise.all( batch.map(async (chunk, j) => { const idx = batchStart + j; - partialByIdx[idx] = await summarizeChunkWithRetry( + const summary = await summarizeChunkWithRetry( provider, chunk, sessionId, project, idx, chunks.length, ); + partialByIdx[idx] = summary ? { summary, idx } : null; }), ); } - const skipped = partialByIdx.filter((p) => p === null).length; - const partials = partialByIdx.filter((p): p is SessionSummary => p !== null); + const skipped = partialByIdx.filter((p) => p === null).length; + const partialsWithIdx = partialByIdx.filter((p): p is { summary: SessionSummary; idx: number } => p !== null); // ... skip ratio check unchanged ... - const reduceInput = partials.map((p) => { - const originalIdx = partialByIdx.indexOf(p); + const reduceInput = partialsWithIdx.map(({ summary: p, idx: originalIdx }) => { return { title: p.title, // ... rest unchanged }; });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/functions/summarize.ts` around lines 168 - 179, The current reduceInput builds obsRangeStart/End by calling partialByIdx.indexOf(p), which is O(n²); instead, preserve each chunk's original index when you create or transform partials and use that stored index here—e.g., ensure each partial object includes an originalIdx (or build partials as tuples [partial, originalIdx]) and then compute obsRangeStart as originalIdx * chunkSize + 1 and obsRangeEnd with Math.min((originalIdx + 1) * chunkSize, compressed.length) using that stored originalIdx; update the code paths that construct partials so they attach originalIdx rather than relying on partialByIdx.indexOf in reduceInput.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/backfill-imported-sessions.sh`:
- Line 151: The script currently constructs local file paths and URLs using
sessionId directly (local file="$DEBUG_DIR/${id}.json" and the URL query param
build), which allows path traversal and breaks queries for special characters;
update the code that sets local file and any URL that uses sessionId to (1)
sanitize or slugify/hex-encode the sessionId before joining with DEBUG_DIR
(e.g., remove/replace path separators and dots or use a safe basename/hex
function) so the resulting filename cannot escape DEBUG_DIR, and (2) URL-encode
the sessionId when building query strings so special characters are
percent-encoded; apply these changes to the variables/expressions that define
local file and the URL/query construction that reference sessionId/id.
- Around line 71-74: The curl calls that check endpoints (e.g., the one hitting
"$URL/agentmemory/livez") must include connection and total timeouts to avoid
indefinite hangs; update each curl invocation in this script (the occurrences
around the livez check and the other spots noted) to add options like
--connect-timeout <seconds> and --max-time <seconds> (or -m) with sensible
defaults (e.g., 5s connect, 15-30s total) so the script fails fast on network
issues and prints the existing error message/exit code as before.
- Around line 188-193: The script currently runs jq unconditionally on resp (the
curl result for POST to /agentmemory/summarize), which can cause set -e to abort
if resp is not valid JSON; fix by first validating resp is JSON (e.g., use jq -e
. <<<"$resp" >/dev/null 2>&1) and if that check fails, set status="false",
err="invalid_json" (or include raw resp), and title="" before attempting any jq
extraction; otherwise proceed to run the existing jq calls that populate status,
err, and title from resp.
In `@test/summarize.test.ts`:
- Around line 28-30: The vi.mock call currently points at "./audit.js" (creating
a test/audit.js mock) but should mock the real source module; update the vi.mock
invocation in test/summarize.test.ts to reference the actual module path used
elsewhere (../src/functions/audit.js) and keep the mock factory (safeAudit:
vi.fn()) intact so imports of safeAudit from the source module are correctly
stubbed.
---
Nitpick comments:
In `@src/functions/summarize.ts`:
- Around line 168-179: The current reduceInput builds obsRangeStart/End by
calling partialByIdx.indexOf(p), which is O(n²); instead, preserve each chunk's
original index when you create or transform partials and use that stored index
here—e.g., ensure each partial object includes an originalIdx (or build partials
as tuples [partial, originalIdx]) and then compute obsRangeStart as originalIdx
* chunkSize + 1 and obsRangeEnd with Math.min((originalIdx + 1) * chunkSize,
compressed.length) using that stored originalIdx; update the code paths that
construct partials so they attach originalIdx rather than relying on
partialByIdx.indexOf in reduceInput.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 55ab9501-0221-47f4-8e5c-402a33a83668
📒 Files selected for processing (6)
.env.example.gitignorescripts/backfill-imported-sessions.shsrc/functions/summarize.tssrc/prompts/summary.tstest/summarize.test.ts
Four nits flagged by the automated reviewer, all worth fixing:
- scripts/backfill: add curl --connect-timeout + --max-time profiles
(META_CURL_OPTS vs WORK_CURL_OPTS). Metadata reads fail fast and
retry on transient blips; LLM-backed work calls get a wide 30-min
cap and no retry (retrying a half-finished LLM job double-spends).
- scripts/backfill: sanitize sessionId before joining with DEBUG_DIR
in dump_failure() (otherwise a session id containing `/` or `..`
could escape the debug dir). UUIDs in practice, but the server
doesn't enforce that.
- scripts/backfill: switch the observations query to
`--get --data-urlencode "sessionId=$id"` so special characters
can't corrupt the query string.
- scripts/backfill: guard `jq` on summarize + consolidate responses
with `jq -e . </dev/null 2>&1` first. iii's HTTP layer occasionally
returns non-JSON (HTML 5xx, empty body on timeout). Without the
guard, `set -e` aborts the whole backfill loop on a single bad
response — now it logs `invalid_json_response` and moves on.
- test/summarize.test.ts: fix `vi.mock("./audit.js", ...)` path to
`"../src/functions/audit.js"`. The old path resolved to
`test/audit.js` (nonexistent), so the mock was a silent no-op.
Tests passed anyway because `safeAudit` writes to a mocked KV.
9/9 tests still pass; backfill dry-run still resolves the corpus
cleanly.
|
Thanks for the review — pushed 86c0622 addressing all four points:
9/9 tests still pass. |
|
Merged + shipping in v0.9.21. Thanks @efenex — sessions >7k obs were silently failing at the LLM context limit, so this restores the consolidation pipeline for power users. Backfill script under scripts/ is a one-time operator helper for users hitting the pre-fix bug. |
Quality + integration wave. Bundles 11 PRs since v0.9.20: Contributor feature: - #237 OpenCode plugin with 22 auto-capture hooks (@cl0ckt0wer) Bug fixes (9): - #516 memory_recall endpoint + format/token_budget (@serhiizghama, closes #507/#440) - #461 env-file AGENTMEMORY_DROP_STALE_INDEX flag honored (@honor2030, closes #456) - #487 Windows hook path quoting (@honor2030, closes #477) - #517 viewer IME composition guard (@jonathanzhan1975) - #472 chunk large sessions for LLM context window (@efenex) - #473 surface lessons in smart-search + diagnose tally (@efenex) - #486 declare all Hermes plugin hooks (@honor2030) - #500 rebuildIndex non-blocking on boot (@efenex) - #504 batched embed in rebuildIndex (25h -> 3h) (@efenex) - #491 cli skip onboarding without tty (@honor2030) Upstream-installer revert: - #546 drop --next workaround now that iii-hq/iii#1660 shipped 1067/1067 tests pass across 95 files.
Summary
JSONL-imported sessions can have far more observations than the 500-cap
MAX_OBS_PER_SESSIONthat constrains native sessions.mem::summarizepreviously built one prompt containing every observation and shipped it as a single LLM call, which exceeded the provider's context window for sessions >~7,000 observations and returned an unhelpful 400 from upstream - silently leaving large bulk-imported sessions out of the semantic tier.This PR makes
mem::summarizechunk + map-reduce when needed, with per-chunk retry, skip on persistent failure, and parallel batching.Problem
MAX_OBS_PER_SESSION=500viamem::observe's guard (src/functions/observe.ts:128).mem::replay::import-jsonlinsrc/functions/replay.ts) writes directly to KV, bypassing the guard. Real-world imports produce sessions with 5k–100k+ observations.mem::summarizethen builds one prompt with every observation and ships it viaprovider.summarize(...). Sessions above ~7k obs exceed the provider's context window; the upstream returns a generic 400; the summary never lands; the session is invisible toconsolidate-pipeline → semantic / reflect.Approach
Map-reduce inside
mem::summarize:SUMMARIZE_CHUNK_SIZE(default 400) take the legacy single-call path with zero behavior change and zero overhead.SUMMARIZE_CHUNK_SIZESUMMARIZE_CHUNK_CONCURRENCY(default 6)REDUCE_SYSTEM/buildReducePromptRobustness
too_many_chunks_skippedonly if >50% of chunks fail — guards against producing a misleadingly thin merged summaryConfiguration
SUMMARIZE_CHUNK_SIZESUMMARIZE_CHUNK_CONCURRENCYValidation (live, against a real bulk-imported corpus)
c=50)consolidate-pipelineKnown limit
The iii-engine has a hardcoded ~180s function-invocation timeout (in the Rust binary; not exposed in
iii-config.yaml). Sessions large enough that chunked summarize wallclock exceeds that will see the HTTP client receive a timeout/500 response, even though the handler completes and persists server-side. RaisingSUMMARIZE_CHUNK_CONCURRENCYto 50+ pushes the cliff past any realistic session size on a generous-RPM provider.The true fix is an async-job pattern (return jobId, poll for result). Out of scope here.
Operator tool
scripts/backfill-imported-sessions.shwalksjsonl-import-tagged sessions and POSTsmem::summarizeper session, with:--project-pattern REGEX,--skip-agents,--tag NAME(filter)--max-obs N,--limit N(scope)--dry-run(preview cost, no POSTs)--debug-on-error(dump failed-session metadata + obs + provider response for diagnosis)Test plan
npx vitest run test/summarize.test.ts- 9 casesSUMMARIZE_CHUNK_SIZEenv override respectedtoo_many_chunks_skippedbailoutSUMMARIZE_CHUNK_CONCURRENCYFiles
src/prompts/summary.ts— addREDUCE_SYSTEM+buildReducePromptsrc/functions/summarize.ts— chunking, retry, skip, parallelism (single-call path unchanged)test/summarize.test.ts— new.env.example— document the two new env vars.gitignore— ignoreagentmemory-debug/anddata-*/(operator artefacts)scripts/backfill-imported-sessions.sh— new, companion operator toolSummary by CodeRabbit