feat: compaction stability overhaul — loop prevention, stream rejection fixes, benchmark tasks by minpeter · Pull Request #94 · minpeter/plugsuits

minpeter · 2026-04-02T02:24:52Z

Summary

Large-scope stability release for the compaction subsystem. Resolves infinite compaction loops in small-context scenarios, eliminates process crashes caused by unhandled stream rejections, and lands numerous harness/TUI stability fixes. Also ships minimal-agent speculative compaction tuning and new benchmark tasks (the work that originally opened this PR).

Key fixes

🔁 Infinite compaction loop prevention (harness)

Per-turn cap (maxAcceptedCompactionsPerTurn, default 10): combines accepted + ineffective compactions. When the cap is hit, no further compaction runs that turn.
Relaxed acceptance gate: only fitsBudget failures reject a compaction attempt. belowTriggerThreshold/meetsMinSavings are observability-only signals.
Turn boundaries tracked via notifyNewUserTurn() wired from TUI and headless runtime.
Task-aware 2-step compaction (opt-in, enabled in CEA): extracts the current user turn's task intent before summarizing history, injecting it into the compacted user-turn content so the agent doesn't lose the work context.
Fixes `CompactionCircuitBreaker.getState()` consistency (cooldown-triggered reset no longer mixes pre/post-reset state in the snapshot).

💥 Unhandled rejection crash fixes

Crashes with `Unhandled rejection: NoOutputGeneratedError` when the provider returns an empty stream (or the network fails):

harness/agent.ts: `createAgent.stream()` eagerly invokes four promise getters (`finishReason`, `response`, `usage`, `totalUsage`). Production consumers only awaited three, leaving `totalUsage` as a floating rejected promise. Added no-op rejection silencers on all four original promise instances while still returning them unchanged, so awaiters receive rejections normally.
cea/entrypoints/main.ts: `buildAgentStreamWithTodoContinuation` wrapper creates new promise chains via IIFEs and `.then()` that fan out from the base stream. Added matching silencers on `continuationDecision`, `response`, and `finishReason` as defense-in-depth for partial-await call patterns.

🔧 Harness compaction stability (numerous commits from earlier work)

`updateActualUsage` no longer misattributes `totalTokens` as `promptTokens` in compaction trigger calculations.
Tool-result truncation happens immediately after `updateActualUsage` so context-overflow enforcement is accurate.
Tool-result truncation preserves output structure and uses consistent inner-field text extraction.
Truncation ceiling raised to 90% of context limit to prevent garbled context.
`actualUsage` is never nulled; `refreshEstimatedUsage` runs after every message mutation.
Circuit breaker auto-resets on cooldown expiry and tracks non-benign failures.
Tool-result truncation on `addModelMessages` context-budget overflow.
Fallback to message token estimation when usage reports zero.
Post-compact restoration uses boundary-based label matching.
`BackgroundMemoryExtractor` reused across agent rebuilds with tuned restoration budget.

📊 minimal-agent speculative compaction (original PR scope)

Enable speculative compaction in minimal-agent with thresholds tuned for 30-turn chatbot conversations within a 4096-token context window:

Context limit: 2000 → 4096
speculativeStartRatio: 0.8 (background compaction at ~2130 tokens)
thresholdRatio: 0.65 (blocking compaction at ~2662 tokens)
reserveTokens: 512
keepRecentTokens: 800 (~8-10 turns preserved)
30-turn benchmark suite with memory-probe questions every 5 turns.

🎨 TUI & headless

`compactBeforeNextTurnIfNeeded` always runs regardless of probe success.
Context-usage notices use estimated tokens to avoid stale actualUsage races.
Compaction callbacks passed via `CompactionOrchestratorOptions.callbacks`.
Context suggestions in footer disabled by default (opt-in via config option).
Headless uses a local default for ATIF output path and `mkdir -p` before write.

📦 Benchmark additions

Compaction stress benchmark tasks.
Search-heavy compaction stress task (32K context).
Trajectory analysis scorer.
Harbor v1.1 format restructure.

Version bumps (via changesets)

Package	Bump	From → To
`plugsuits` (CEA)	minor	2.2.1 → 2.3.0
`@ai-sdk-tool/harness`	patch	1.2.0 → 1.2.1
`@ai-sdk-tool/tui`	patch	3.0.0 → 3.0.1
`@ai-sdk-tool/headless`	patch	3.0.0 → 3.0.1
`@plugsuits/minimal-agent`	patch	0.2.3 → 0.2.4

CEA gets the minor bump because the user-facing feature set (task-aware compaction, loop prevention, crash fixes) surfaces through CEA's runtime. Harness/tui/headless changes are internal enhancements that preserve their public API contracts.

Benchmark results (minimal-agent, 4096 context)

Context	Compaction Cycles	Memory Retention	Speculative Verified
2000	2+	53% (9/17)	✅ spec started → applied → compacted
3000	2	71-76% (12-13/17)	✅ spec started → applied → compacted
4096	0	76-82% (13-14/17)	threshold set, no trigger needed

At 4096, all 30 turns fit without compaction (~2300 tokens peak) — probe scores reflect model response quality, not information loss.

Testing

1075+ tests pass across all workspaces (515 harness + 523 cea + 14 tui + 23 headless + minimal-agent)
Per-field isolation regression tests for promise silencers (mutation-verified: each guard is independently necessary)
Compaction loop prevention tests: 21 tests covering gate rejection, per-turn cap, infinite-loop scenarios
Compaction integration tests: 7 scenarios simulating 32k-context investigations, verbose usage, multi-turn flows — all produce 0 blocking compactions

Tune compaction config for 2000-token context to support speculative compaction during multi-turn chatbot conversations: - Add speculativeStartRatio (0.75) to trigger background compaction at 750 tokens - Set explicit thresholdRatio (0.5) for blocking compaction at 1000 tokens - Lower reserveTokens from 500 to 400 (chatbot responses are shorter) - Lower keepRecentTokens from 500 to 350 (~3-4 turns preserved) - Add contextLimit to compaction config directly - Add benchmark script (benchmark.ts) for automated 30-turn memory retention testing with probe questions every 5 turns, metrics table, and ASCII context usage chart

coderabbitai · 2026-04-02T02:24:59Z

Important

Review skipped

Auto reviews are disabled on this repository. To trigger a review, include @crb review in the PR description. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3ca3cb33-2c4d-4a05-afb7-52db70680401

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/minimal-agent-speculative-compaction

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a new benchmark script for testing compaction strategies in the minimal agent, along with updated configuration parameters for compaction tuning. The review identified redundant calls to setContextLimit in both the agent and the benchmark script, as this value is already configured during CheckpointHistory instantiation. Additionally, it was noted that compaction parameters are currently duplicated between the agent and the benchmark script, and should be centralized to ensure consistency.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 915d862903

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

cubic-dev-ai

2 issues found across 3 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/minimal-agent/benchmark.ts">

<violation number="1" location="packages/minimal-agent/benchmark.ts:456">
P3: These compaction thresholds (`thresholdRatio`, `speculativeRatio`, `reserveTokens`, `keepRecentTokens`) are duplicated from `index.ts`. Since this benchmark exists to validate the agent's compaction behavior, the values should be imported from a shared location — otherwise a tuning change in `index.ts` silently makes this benchmark test the wrong config.</violation>

<violation number="2" location="packages/minimal-agent/benchmark.ts:603">
P2: Validate `--context-limit` before use. Small/invalid values can make chart `repeat()` counts negative and crash the benchmark.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

Increase context budget from 2000 to 4096 tokens and retune thresholds to minimize compaction cycles: - contextLimit: 2000 → 4096 - reserveTokens: 400 → 512 - keepRecentTokens: 350 → 800 (~8-10 turns preserved) - thresholdRatio: 0.5 → 0.65 (blocking at 2662 tokens) - speculativeStartRatio: 0.75 → 0.8 (speculative at 2130 tokens) Benchmark results (30-turn chatbot, GLM-5): - Memory retention: 41% → 82% (14/17 probes passed) - Compaction cycles: 3 → 0 (all 30 turns fit in context) - All targeted probes (turns 5-20, 30) pass at 100% - Turn 25 comprehensive probe: 3/6 (model response quality, not context loss)

…esults Set temperature to 0 in generateText calls to eliminate model nondeterminism. Verified 82% (14/17) retention at 4096 context is reproducible across runs.

Add --output flag to benchmark for JSON result export. Add visualize.py (matplotlib) that generates 4 charts from JSON results: - retention_curve: Memory retention % vs context size - token_usage: Context token usage over 30 turns - probe_heatmap: Per-probe recall scores - summary: 3-panel overview Usage: python3 visualize.py results/*.json --output charts/

minpeter · 2026-04-02T03:53:20Z

Benchmark Visualization — 5-Point Context Size Sweep

Ran the benchmark at 5 context sizes (1500, 2000, 2500, 3000, 4096) with temperature: 0. Charts generated via visualize.py.

How to reproduce

# Run benchmarks
for ctx in 1500 2000 2500 3000 4096; do
  pnpm --filter @plugsuits/minimal-agent benchmark --context-limit $ctx --output results/$ctx.json
done

# Generate charts
python3 packages/minimal-agent/visualize.py results/*.json --output charts/

Raw Data

Context	Retention	Compaction Cycles	Peak Tokens	Peak Utilization
1500	65% (11/17)	7	3245	216%
2000	53% (9/17)	1	1775	89%
2500	59% (10/17)	0	1998	80%
3000	82% (14/17)	0	1978	66%
4096	82% (14/17)	0	2412	59%

Key Insights

Sweet spot at 3000+ tokens: Memory retention plateaus at 82% for context ≥ 3000. Going from 3000 → 4096 doesn't improve retention but provides more headroom for longer conversations.
The "valley of death" at 2000: 2000 tokens sits at a boundary where compaction triggers once but the summary loses too much information. 1500 actually performs better (65%) despite 7 compaction cycles because the repeated summarization creates a stable compressed memory.
Turn 25 is the universal bottleneck: The comprehensive recall probe ("list everything about me") scores poorly (0-3/6) across ALL context sizes, even when all information is in context. This is a model response quality issue, not a compaction issue.
Improvement opportunities: The 82% ceiling is not from compaction — it's from the model's ability to enumerate facts when asked. Potential improvements:
- System prompt engineering: "When asked to recall, list ALL known facts"
- Structured user profile tracking
- Better probe question design

cubic-dev-ai

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/minimal-agent/visualize.py">

<violation number="1" location="packages/minimal-agent/visualize.py:284">
P2: Validate that `--output` has a following directory argument before indexing, otherwise this crashes with `IndexError` on malformed CLI input.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

…+ friendli) Add --provider flag to benchmark supporting 'anthropic' and 'friendli'. Refactor callModel to accept LanguageModel interface for provider-agnostic benchmarking. Add @ai-sdk/anthropic dependency. Opus benchmark result at 4096 context: 94% (16/17) vs GLM-5's 82% (14/17). Key difference: Turn 25 comprehensive recall 6/6 (vs 3/6 with GLM-5), confirming the 82% ceiling was model response quality, not compaction.

minpeter · 2026-04-02T04:57:48Z

Model Comparison: GLM-5 vs Claude Opus 4

Ran the benchmark with --provider anthropic --model claude-opus-4-20250514 at 4096 context to test whether the 82% retention ceiling is a compaction issue or a model quality issue.

Results

Model	Retention	Turn 25 (comprehensive)	Turn 30 (first+recent)	Compaction
GLM-5	82% (14/17)	3/6	3/3	0 cycles
Claude Opus 4	94% (16/17)	6/6 ✨	2/3	0 cycles

Analysis

The 82% ceiling was indeed model quality, not compaction. With the same 4096-token context and same compaction config:

Turn 25 (comprehensive recall): Opus scores 6/6 vs GLM-5's 3/6. When asked "list everything you remember about me", Opus enumerates ALL facts (Alice, Italian, Max, Rust, San Francisco, hiking). GLM-5 omits several.
Turns 5-20 (targeted probes): Both models score 100% — confirming the compaction tuning preserves all information.
Turn 30: Opus scores 2/3 (missed "Italian" — likely because compaction happened at turn 27 and the summary compressed that detail). GLM-5 scored 3/3 because it had no compaction.

Key Insight

Opus's longer, more thorough responses (~2483 peak tokens) actually triggered speculative compaction at turn 25, while GLM-5's shorter responses (~2291 peak) never triggered it. Despite this compaction handicap, Opus still scored 12%p higher — proving that model capability is the dominant factor in memory retention, not context management.

Benchmark command

pnpm --filter @plugsuits/minimal-agent benchmark \
  --provider anthropic --model claude-opus-4-20250514 \
  --output results/opus-4096.json

cubic-dev-ai

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/minimal-agent/benchmark.ts">

<violation number="1" location="packages/minimal-agent/benchmark.ts:653">
P2: Validate and normalize `--provider` before branching; invalid values currently default silently to Friendli.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

… chatbot Replace generic system prompt with fact-retention-aware version that instructs the model to remember personal information and list ALL known facts when asked to recall. Replace code-agent-oriented compaction prompt (Files & Changes, Technical Discoveries) with chatbot-specific version that prioritizes: - User Profile extraction (all personal details as bullet points) - Conversation Highlights (topics, advice, decisions) - Current Topic (for continuity) Impact on memory retention (GLM-5): - 2000 tokens: 53% → 71% (+18pp, compaction preserves user facts) - 4096 tokens: 82% → 82% (no change, compaction not triggered)

minpeter · 2026-04-02T05:07:34Z

System Prompt + Compaction Prompt Optimization

Changes

System prompt — added fact-retention instructions:

When the user shares personal information (name, preferences, pets, job, hobbies, etc.), remember it carefully.
When asked to recall information, list ALL known facts — do not omit any details.

Compaction prompt — replaced code-agent template (Files & Changes, Technical Discoveries) with chatbot-specific template:

User Profile: extract ALL personal details as bullet points
Conversation Highlights: topics, advice, decisions (brief)
Current Topic: what the user was last discussing

Results

Context	Before (old prompts)	After (new prompts)	Delta
2000	53% (9/17)	71% (12/17)	+18pp ✨
4096	82% (14/17)	82% (14/17)	+0pp

Analysis

The +18pp improvement at 2000 tokens is significant — it proves the chatbot compaction prompt preserves user facts much better than the code-agent default. The compaction summary now produces a "User Profile" section that explicitly lists all personal details.

At 4096, no change because compaction never triggers (all 30 turns fit in context). The improvement only manifests when compaction actually runs.

Cumulative improvement path

Change	2000 tokens	4096 tokens
Baseline (original)	53%	n/a
+ Context 4096	—	82%
+ Chatbot prompts	71%	82%
+ Claude Opus model	—	94%

…rison Extend conversation from 30 to 50 turns (10 probes total) to force compaction at 4096 context. Add --baseline flag to benchmark for A/B testing against the default code-agent compaction prompt. Key result at 4096 context, 50 turns: - Chatbot prompt: Turn 35 (post-compaction) scores 4/4 on pet recall - Baseline prompt: Turn 35 scores 0/4 — pet info lost in compaction - Overall: 54% vs 51% (chatbot vs baseline)

…nalysis Upgrade CHATBOT_COMPACTION_PROMPT with techniques learned from Claude Code: 1. Analysis scratchpad: <analysis> block for think-before-summarize (harness already strips it, only <summary> content is kept) 2. All User Messages list: explicit section preserving user intent trail 3. Previous-summary fact preservation: carry forward ALL facts from prior compaction, never drop information across cycles 4. Partial compact awareness: focus summary on older messages since recent ones are preserved separately via keepRecentTokens 50-turn benchmark at 4096 context (GLM-5): - Before: 54% (20/37) - After: 62% (23/37) — +8pp improvement - Turn 40 (post-compaction recall): 2/4 → 4/4 (perfect)

cubic-dev-ai

3 issues found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/minimal-agent/index.ts">

<violation number="1" location="packages/minimal-agent/index.ts:30">
P2: Requiring an `<analysis>` section wastes the summarizer’s limited output budget and can reduce the amount of useful summary content retained.</violation>

<violation number="2" location="packages/minimal-agent/index.ts:44">
P1: The prompt asks to list every user message, but compaction instructions are injected as a user turn, so internal control text can be incorrectly persisted into summaries.</violation>
</file>

<file name="packages/minimal-agent/benchmark.ts">

<violation number="1" location="packages/minimal-agent/benchmark.ts:35">
P2: Exclude the internal compaction instruction from the “All User Messages” list; otherwise the model is likely to summarize the compaction prompt itself (it is sent as a user message), which bloats and pollutes the summary.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

Three new shared compaction modules inspired by Claude Code's context management architecture: 1. CompactionCircuitBreaker (compaction-circuit-breaker.ts) - Tracks consecutive compaction failures, opens after 3 (configurable) - Auto-closes after cooldown period (default 60s) - Prevents infinite retry loops on irrecoverable context overflow 2. microCompactMessages (micro-compact.ts) - Pre-compaction step that shrinks old long assistant responses - Protects recent messages (configurable token window) - Preserves user messages and summary messages - Immutable: returns new array without modifying input 3. SessionMemoryTracker (session-memory.ts) - Structured key-value memory persisting across compaction cycles - Categorized facts: identity, preferences, relationships, context - getStructuredState() callback for CompactionConfig integration - extractFactsFromSummary() to parse User Profile from summaries - JSON serialization for persistence (toJSON/fromJSON) All modules exported from @ai-sdk-tool/harness. 21 test files, 399 tests passing.

cubic-dev-ai

7 issues found across 12 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/minimal-agent/charts/1500.json">

<violation number="1" location="packages/minimal-agent/charts/1500.json:255">
P2: This benchmark artifact is mislabeled or generated with the wrong context setting: `peakTokens` and per-turn `contextAfter` values exceed the declared `contextLimit` (1500), so the chart data is unreliable.</violation>
</file>

<file name="packages/harness/src/session-memory.ts">

<violation number="1" location="packages/harness/src/session-memory.ts:286">
P2: Guard non-object entries before normalizing imported facts to prevent runtime crashes when JSON contains `null` or other invalid items.</violation>
</file>

<file name="packages/harness/src/compaction-circuit-breaker.ts">

<violation number="1" location="packages/harness/src/compaction-circuit-breaker.ts:28">
P2: `maxConsecutiveFailures` can become `0` for fractional inputs (e.g. `0.5`), which breaks threshold semantics and can open the circuit prematurely.</violation>
</file>

<file name="packages/minimal-agent/charts/2000.json">

<violation number="1" location="packages/minimal-agent/charts/2000.json:158">
P2: The compaction delta text is inconsistent with the recorded context values (`1775` to `123`), so this chart entry reports an incorrect compaction magnitude.</violation>
</file>

<file name="packages/minimal-agent/charts/3000.json">

<violation number="1" location="packages/minimal-agent/charts/3000.json:254">
P2: `summary.compactionCycles` is inconsistent with the recorded compaction events/token reset and should not be 0.</violation>
</file>

<file name="packages/minimal-agent/charts/2500.json">

<violation number="1" location="packages/minimal-agent/charts/2500.json:172">
P2: The compaction delta annotation is inconsistent with the recorded context totals (`1991 -> 206`), so the `(-285)` value is incorrect/misleading.</violation>

<violation number="2" location="packages/minimal-agent/charts/2500.json:254">
P2: `summary.compactionCycles` is inconsistent with the recorded compaction events and should not be `0`.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

Connect all 3 Phase 1 modules into the compaction pipeline: Circuit Breaker → CompactionOrchestrator: - New optional circuitBreaker param in constructor - checkAndCompact() skips when circuit is open - recordSuccess/recordFailure on compaction outcome - manualCompact() ignores circuit breaker (user intent) - getState() exposes circuitBreakerOpen status MicroCompact → CheckpointHistory: - New microCompact option in CompactionConfig (boolean or options) - Pre-compaction step: shrinks old assistant responses before summarization - Reduces summarizer input tokens → better summary quality - COMPACTION_DEBUG logging for tokensSaved/messagesModified Session Memory → minimal-agent: - SessionMemoryTracker instance wired via getStructuredState callback - extractFactsFromSummary called on compaction completion - Structured user profile injected into every compaction prompt TUI + Headless compactionCallbacks: - Both runners now accept compactionCallbacks in config - Chains external callbacks with internal ones (both fire) - Enables minimal-agent to hook into compaction lifecycle Adaptive Thresholds → harness (from CEA): - Moved computeAdaptiveThresholdRatio, computeCompactionMaxTokens, computeSpeculativeStartRatio from CEA to harness/compaction-policy - CEA now imports from harness (backwards-compatible re-exports)

minpeter · 2026-04-02T06:45:50Z

Phase 3 benchmarks complete. 2k: 57% (21/37, 4 compactions). 4k: 62% (23/37). 8k: 62% (23/37). Circuit Breaker + MicroCompact + Session Memory all verified working.

cubic-dev-ai

2 issues found across 9 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/minimal-agent/index.ts">

<violation number="1" location="packages/minimal-agent/index.ts:160">
P2: Reset the SessionMemoryTracker on "new-session"; otherwise facts extracted from a previous session will keep being injected into compaction prompts for the new session.</violation>
</file>

<file name="packages/harness/src/compaction-policy.ts">

<violation number="1" location="packages/harness/src/compaction-policy.ts:147">
P2: Cap the compaction max tokens to the usable budget; the current 1024 minimum can exceed small context lengths or over-reserved budgets.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

Enable all 3 harness compaction features in minimal-agent: - CircuitBreaker: passed to TUI/headless via new config option - MicroCompact: enabled via microCompact: true in compaction config - SessionMemory: already wired (getStructuredState + extractFactsFromSummary) Also expose circuitBreaker option in TUI and headless runner configs so any consuming agent can pass one through.

…tool result clearing BackgroundMemoryExtractor system (Claude Code's #1 missing feature): - BackgroundMemoryExtractor class: periodic LLM-based memory extraction with configurable thresholds (token growth + turn count), single-flight guard, and getStructuredState() for compaction integration - MemoryStore interface with InMemoryStore and FileMemoryStore impls - Two built-in presets: CHAT_MEMORY_PRESET (user facts) and CODE_MEMORY_PRESET (Claude Code-style session notes) Tool Result MicroCompact extension: - clearToolResults option to replace old tool_result content - keepRecentToolResults to preserve N most recent results - clearableToolNames filter for selective clearing - Complements existing assistant text shrinking 24 test files, 418 tests passing.

cubic-dev-ai

1 issue found across 9 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/harness/src/micro-compact.ts">

<violation number="1" location="packages/harness/src/micro-compact.ts:243">
P2: Tool result clearing ignores `protectRecentTokens`, so recent tool outputs can be wiped even though they’re supposed to be preserved. Consider scoping tool-result clearing to the same protected window used for assistant compaction.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

…agents Integration of new harness modules into consuming packages: minimal-agent: - Replace SessionMemoryTracker with BackgroundMemoryExtractor (chat preset) - Aggressive thresholds for small context (300 tokens, 2 turns) - Fire-and-forget onTurnComplete for non-blocking extraction benchmark: - Same BME integration as agent for fair comparison - Each turn triggers extraction check CEA: - Enable tool result clearing: microCompact.clearToolResults = true - Keep 5 most recent tool results intact TUI + headless: - New onTurnComplete callback in config interface - Called after each model turn with messages + usage - Non-blocking: doesn't delay main agent loop

cubic-dev-ai

1 issue found across 5 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/minimal-agent/index.ts">

<violation number="1" location="packages/minimal-agent/index.ts:168">
P2: `/new-session` only resets the session ID, but the new BackgroundMemoryExtractor + InMemoryStore are process‑global. That keeps previous user facts in structured state and can leak memory into the next session. Reset or recreate the extractor/store when a new session starts.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

Fix BackgroundMemoryExtractor returning template text via getStructuredState before any extraction has occurred. This wasted tokens in small contexts (2000 tokens: 65% → 46% regression). Changes: - getStructuredState returns undefined until first successful extraction - Raise default thresholds: minTokenGrowth 300→500, minTurns 2→5 - Cap maxExtractionTokens at 500 for chat preset - Update test expectation for pre-extraction state

cubic-dev-ai

1 issue found across 4 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/harness/src/background-memory-extractor.ts">

<violation number="1" location="packages/harness/src/background-memory-extractor.ts:89">
P2: `getStructuredState()` now hides previously persisted memory after startup because it depends on a flag that is only set by new extractions.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-04-05T03:27:22Z

@cubic-dev-ai review

@minpeter I have started the AI code review. It will take a few minutes to complete.

minpeter · 2026-04-05T03:27:23Z

@codex review

…iles ultracite formatter requires trailing newlines on JSON files; CI lint was failing on 10 chart files missing them.

minpeter · 2026-04-05T03:28:10Z

@crb review

cubic-dev-ai

18 issues found across 105 files

Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed.

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/minimal-agent/charts/3000.json">

<violation number="1" location="packages/minimal-agent/charts/3000.json:193">
P2: `spec applied` is attached to the wrong turn: the context reset happens at turn 26, but the event is logged at turn 27, which makes this benchmark trace inconsistent for downstream analysis/visualization.</violation>
</file>

<file name="packages/headless/src/stream-processor.ts">

<violation number="1" location="packages/headless/src/stream-processor.ts:232">
P1: `tool-call` events can be lost because calls are only recorded when a pending entry already exists.</violation>
</file>

<file name="packages/minimal-agent/package.json">

<violation number="1" location="packages/minimal-agent/package.json:18">
P3: `@ai-sdk/anthropic` is benchmark-only usage but was added to runtime `dependencies`; move it to `devDependencies` to avoid unnecessary production dependency footprint.</violation>
</file>

<file name="packages/cea/benchmark/tasks/compaction-stress-long/tests/test.sh">

<violation number="1" location="packages/cea/benchmark/tasks/compaction-stress-long/tests/test.sh:2">
P2: Enable `pipefail` so failures in piped commands are not silently ignored during verification.</violation>
</file>

<file name="packages/cea/benchmark/cea-memory-bench.sh">

<violation number="1" location="packages/cea/benchmark/cea-memory-bench.sh:44">
P2: Do not swallow benchmark execution failures with `|| true`; it masks real run errors and can produce invalid summary data.</violation>
</file>

<file name="packages/minimal-agent/benchmark.ts">

<violation number="1" location="packages/minimal-agent/benchmark.ts:1035">
P2: Use the package env schema instead of reading `process.env` directly; this file currently bypasses the repository’s required validated env pattern.</violation>
</file>

<file name="packages/minimal-agent/BENCHMARK-RESULTS.md">

<violation number="1" location="packages/minimal-agent/BENCHMARK-RESULTS.md:7">
P3: The benchmark configuration is internally inconsistent: it says 16 probes, but reported results use 17 probe/scoring items. Align this count so readers can correctly reproduce and interpret retention metrics.</violation>
</file>

<file name="packages/minimal-agent/visualize.py">

<violation number="1" location="packages/minimal-agent/visualize.py:286">
P2: Validate that `--output` has a following directory argument before reading `args[idx + 1]` to avoid an unhandled IndexError.</violation>
</file>

<file name="packages/harness/src/compaction-orchestrator.ts">

<violation number="1" location="packages/harness/src/compaction-orchestrator.ts:86">
P3: The JSDoc default value is incorrect (`3`), but runtime default is `10`.</violation>

<violation number="2" location="packages/harness/src/compaction-orchestrator.ts:437">
P2: `maxAcceptedCompactionsPerTurn` can become 0 for fractional inputs, which immediately blocks all auto/speculative compactions.</violation>
</file>

<file name="packages/harness/src/compaction-prompts.ts">

<violation number="1" location="packages/harness/src/compaction-prompts.ts:269">
P2: Task-intent extraction sends unbounded user history to the model, which can overflow context and silently disable task-aware compaction on long conversations.</violation>

<violation number="2" location="packages/harness/src/compaction-prompts.ts:281">
P2: Task-intent extraction is scoped to all historical user messages instead of the latest user turn, which can reintroduce stale goals into the summary prompt.</violation>
</file>

<file name="packages/cea/benchmark/harbor_agent.py">

<violation number="1" location="packages/cea/benchmark/harbor_agent.py:24">
P2: `populate_context_post_run` looks for `trajectory.json` in the wrong location relative to where the run command writes it, so metrics extraction can silently stop working.</violation>
</file>

<file name="packages/headless/src/runner.ts">

<violation number="1" location="packages/headless/src/runner.ts:740">
P2: Synchronous throws from `onTurnComplete` bypass the `.catch()` handler and crash the runner. `Promise.resolve(expr)` evaluates `expr` eagerly — a sync throw propagates before the promise is created. Use `Promise.resolve().then(...)` to defer evaluation into a microtask, making both sync throws and async rejections non-fatal.</violation>
</file>

<file name="packages/harness/src/compaction-types.ts">

<violation number="1" location="packages/harness/src/compaction-types.ts:261">
P3: `CompactionRejectionReason` includes unreachable variants that the current acceptance logic can never produce.</violation>
</file>

<file name="packages/minimal-agent/charts/2500.json">

<violation number="1" location="packages/minimal-agent/charts/2500.json:172">
P2: The compaction delta annotation is incorrect for turn 24 (`-285` does not match the `contextAfter` drop from 1991 to 206).</violation>
</file>

<file name="packages/harness/src/compaction-circuit-breaker.ts">

<violation number="1" location="packages/harness/src/compaction-circuit-breaker.ts:27">
P2: `maxConsecutiveFailures` validation allows values like `0.5`, which are floored to `0` and produce an invalid failure threshold.</violation>
</file>

<file name="packages/cea/benchmark/install-agent.sh.j2">

<violation number="1" location="packages/cea/benchmark/install-agent.sh.j2:8">
P2: Validate Node major version, not just presence, so unsupported preinstalled versions are upgraded.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

…xture markers The compaction-stress-search benchmark task seeds a fake codebase for the agent to explore. Two placeholder credentials looked realistic enough to trigger GitGuardian's secret scanner: - JWT_SECRET = "super-secret-jwt-key-2024-prod" - ADMIN_DEFAULT_PASSWORD = "admin123!@#" Replace with BENCHMARK_FIXTURE_FAKE_* markers that make the fixture nature obvious to secret scanners and future readers.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bd79b397c4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

minpeter · 2026-04-05T03:41:05Z

CKPT, 잘 동작함

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e9e8f55f2f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…add input validation - Remove redundant setContextLimit() calls in minimal-agent index.ts and benchmark.ts (CheckpointHistory constructor already sets contextLimit from compaction config) - Extract shared compaction constants into compaction-config.ts to prevent silent drift - Add --context-limit positive integer validation in benchmark CLI - Add --provider allowlist validation (friendli | anthropic) in benchmark CLI - Normalize thresholdRatio in computeContextBudget() to guard against invalid values

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9612710fd8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…y fixes - install-agent.sh.j2: replace curl pipe with download-then-execute to avoid masking failures - compact.ts: wrap compact() in try/catch to propagate failure status - main.ts: use logical OR for ATIF_OUTPUT_PATH to handle empty strings - cea-memory-bench.sh: remove || true that swallows benchmark failures, fix grep double-zero - package.json: move @ai-sdk/anthropic to devDependencies (benchmark-only) - BENCHMARK-RESULTS.md: fix probe count (16 → 17) - compaction-orchestrator.ts: fix JSDoc default (3 → 10) - compaction-types.ts: document currently-unused rejection reason variants - benchmark/AGENTS.md: fix uniq -c expected output format

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 13fde5ff83

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

minpeter · 2026-04-05T07:04:52Z

@cubic-dev-ai review

cubic-dev-ai · 2026-04-05T07:04:59Z

@cubic-dev-ai review

@minpeter I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

6 issues found across 107 files

Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed.

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/harness/src/compaction-types.ts">

<violation number="1" location="packages/harness/src/compaction-types.ts:244">
P3: The acceptance criteria comment contradicts the actual behavior (only `fitsBudget` is enforced). Update the doc so it doesn’t imply belowTriggerThreshold/meetsMinSavings are hard gates.</violation>
</file>

<file name="packages/cea/src/context/system-prompt.ts">

<violation number="1" location="packages/cea/src/context/system-prompt.ts:113">
P2: This new rule contradicts the earlier “preserve path format / never change absolute vs. relative” guidance. If a task specifies a relative path, the prompt now simultaneously requires keeping it relative and converting it to absolute, which is inconsistent.</violation>
</file>

<file name="packages/minimal-agent/index.ts">

<violation number="1" location="packages/minimal-agent/index.ts:90">
P2: Avoid defaulting FRIENDLI_TOKEN to an empty string; fail fast or require a real token so missing credentials are caught early.</violation>

<violation number="2" location="packages/minimal-agent/index.ts:185">
P2: Handle history compaction shrinking the messages list before slicing; otherwise new user messages added in the same turn after compaction can be skipped and their facts never extracted.</violation>
</file>

<file name="packages/minimal-agent/env.ts">

<violation number="1" location="packages/minimal-agent/env.ts:9">
P2: Make FRIENDLI_TOKEN required in the schema so the agent fails fast with a validation error instead of running with an empty API key.</violation>
</file>

<file name="packages/harness/src/checkpoint-history.ts">

<violation number="1" location="packages/harness/src/checkpoint-history.ts:1459">
P1: `collectToolResultEntries` iterates over all messages (`this.messages`) but the token budget is calculated from active messages only. Truncating a pre-summary tool result decreases `remaining` without actually reducing active-context token usage, so the loop may exit early while active tokens still exceed `hardCeiling`.

Scope the scan to active messages, e.g. by computing an active-message start offset and skipping earlier indices.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

- checkpoint-history: scope collectToolResultEntries to active messages only - compaction-types: fix CompactionEffectiveness doc to match actual behavior - system-prompt: clarify path rule exception for generated scripts - minimal-agent: guard onTurnComplete slice against compaction-shrunk history

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 27f67f2a71

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

minpeter · 2026-04-05T07:41:35Z

@cubic-dev-ai review

cubic-dev-ai · 2026-04-05T07:41:43Z

@cubic-dev-ai review

@minpeter I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

4 issues found across 107 files

Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed.

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/cea/benchmark/scorer.py">

<violation number="1" location="packages/cea/benchmark/scorer.py:40">
P2: Guard against `total_prompt_tokens` being 0 before dividing; otherwise this can crash on empty/error trajectories.</violation>
</file>

<file name="packages/harness/src/background-memory-extractor.ts">

<violation number="1" location="packages/harness/src/background-memory-extractor.ts:311">
P1: In incremental mode, when the LLM response contains no `<update>` or `<memory>` tags, `resolveExtractedMemory` falls through to `parseMemoryFromResponse` which accepts the raw LLM text. This overwrites the structured notes store with unstructured conversational text (e.g., "No updates needed."), corrupting the memory. In the incremental path, return `undefined` when no recognized tags are present to preserve the existing notes.</violation>
</file>

<file name="packages/minimal-agent/benchmark.ts">

<violation number="1" location="packages/minimal-agent/benchmark.ts:907">
P2: Don’t record zero-token usage when the provider omits usage data — it forces actual usage to 0 and makes compaction/metrics ignore the real context size. Guard updateActualUsage so it only runs when usage is present (or provide an estimated fallback).</violation>
</file>

<file name="packages/cea/benchmark/tasks/compaction-stress-search/instruction.md">

<violation number="1" location="packages/cea/benchmark/tasks/compaction-stress-search/instruction.md:49">
P2: The output file path in the instructions doesn’t match the verifier. The test expects `/agent/work/audit_report.txt`, but the instructions tell the agent to write to `/work/audit_report.txt`, which will fail validation.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

- scorer.py: guard against divide-by-zero when total_prompt_tokens is 0 - env.ts: add ATIF_OUTPUT_PATH to validated env schema - main.ts: read ATIF_OUTPUT_PATH from env instead of process.env directly

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 219c7923c9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-05T07:58:06Z

-                f"logs_dir contents: {list(self.logs_dir.glob('**/*')) if self.logs_dir.exists() else 'dir not found'}"
-            )
+        d = self.logs_dir
+        path = Path(os.getenv("ATIF_OUTPUT_PATH", str(d / "trajectory.json")))


Read trajectory from the generated agent subdirectory

populate_context_post_run defaults to logs_dir/trajectory.json, but the run command writes trajectories to /logs/agent/trajectory.json via ATIF_OUTPUT_PATH. Since that env var is only set on the executed agent command, it may not be present when this post-run Python method executes, and there is no fallback to logs_dir/agent/trajectory.json; in that common layout the file is missed and benchmark token metrics are silently reported as zero.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-05T07:58:06Z

+        Promise.resolve()
+          .then(() =>
+            config.onTurnComplete?.(
+              config.messageHistory.getAll(),
+              normalizedUsage


Await turn-complete hook before triggering compaction

This callback is launched in a detached promise and execution immediately continues into speculative/blocking compaction. In this PR, CEA wires onTurnComplete to collect read-file restoration items used by compaction-complete restoration; when context pressure triggers immediate compaction, that race lets compaction run before tracking finishes, so the freshly read context is omitted from restoration and memory continuity degrades.

Useful? React with 👍 / 👎.

minpeter · 2026-04-06T02:26:50Z

Post-merge follow-up items

Tracked in #97 — test coverage gaps, code quality cleanup, and architecture concerns identified during the final merge review (Oracle + automated agents).

None are blocking; all CI green, 100/100 threads resolved. See issue for full context and checklist.

gemini-code-assist Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread packages/minimal-agent/index.ts Outdated

Comment thread packages/minimal-agent/benchmark.ts Outdated

Comment thread packages/minimal-agent/benchmark.ts Outdated

chatgpt-codex-connector Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread packages/minimal-agent/benchmark.ts

cubic-dev-ai Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread packages/minimal-agent/benchmark.ts Outdated

Comment thread packages/minimal-agent/benchmark.ts Outdated

minpeter added 3 commits April 1, 2026 19:34

fix(minimal-agent): add temperature:0 to benchmark for reproducible r…

e24ff81

…esults Set temperature to 0 in generateText calls to eliminate model nondeterminism. Verified 82% (14/17) retention at 4096 context is reproducible across runs.

cubic-dev-ai Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread packages/minimal-agent/visualize.py

cubic-dev-ai Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread packages/minimal-agent/benchmark.ts

minpeter added 2 commits April 1, 2026 22:24

cubic-dev-ai Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread packages/minimal-agent/index.ts

Comment thread packages/minimal-agent/index.ts

Comment thread packages/minimal-agent/benchmark.ts

cubic-dev-ai Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread packages/minimal-agent/index.ts

Comment thread packages/harness/src/compaction-policy.ts

minpeter added 2 commits April 1, 2026 23:58

cubic-dev-ai Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread packages/harness/src/micro-compact.ts

cubic-dev-ai Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread packages/minimal-agent/index.ts Outdated

cubic-dev-ai Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread packages/harness/src/background-memory-extractor.ts

chore(minimal-agent): add trailing newlines to benchmark chart JSON f…

06e6426

…iles ultracite formatter requires trailing newlines on JSON files; CI lint was failing on 10 chart files missing them.

cubic-dev-ai Bot reviewed Apr 5, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 5, 2026

View reviewed changes

Comment thread packages/headless/src/runner.ts

Comment thread packages/headless/src/stream-processor.ts Outdated

Write README and fix compaction edge cases and benchmark stability

e9e8f55

chatgpt-codex-connector Bot reviewed Apr 5, 2026

View reviewed changes

Comment thread packages/cea/src/commands/compact.ts Outdated

Comment thread packages/cea/src/entrypoints/main.ts Outdated

chatgpt-codex-connector Bot reviewed Apr 5, 2026

View reviewed changes

Comment thread packages/minimal-agent/benchmark.ts

Comment thread packages/cea/benchmark/scorer.py Outdated

chatgpt-codex-connector Bot reviewed Apr 5, 2026

View reviewed changes

Comment thread packages/cea/src/commands/compact.ts

Comment thread packages/cea/src/entrypoints/main.ts

fix: update lockfile for @ai-sdk/anthropic devDependency move

e5bd3ab

cubic-dev-ai Bot reviewed Apr 5, 2026

View reviewed changes

minpeter added 2 commits April 5, 2026 16:24

fix: update compact command test to match new success message

041269d

chatgpt-codex-connector Bot reviewed Apr 5, 2026

View reviewed changes

Comment thread packages/cea/src/entrypoints/main.ts Outdated

Comment thread packages/tui/src/agent-tui.ts

cubic-dev-ai Bot reviewed Apr 5, 2026

View reviewed changes

Comment thread packages/harness/src/background-memory-extractor.ts

Comment thread packages/cea/benchmark/scorer.py

Comment thread packages/minimal-agent/benchmark.ts

Comment thread packages/cea/benchmark/tasks/compaction-stress-search/instruction.md

fix: address PR #94 round-3 review feedback

219c792

- scorer.py: guard against divide-by-zero when total_prompt_tokens is 0 - env.ts: add ATIF_OUTPUT_PATH to validated env schema - main.ts: read ATIF_OUTPUT_PATH from env instead of process.env directly

chatgpt-codex-connector Bot reviewed Apr 5, 2026

View reviewed changes

minpeter merged commit 2f62589 into main Apr 6, 2026
6 checks passed

minpeter mentioned this pull request Apr 6, 2026

Post-merge cleanup: PR #94 compaction feature follow-ups #97

Open

9 tasks

minpeter deleted the feat/minimal-agent-speculative-compaction branch April 22, 2026 09:59

Conversation

minpeter commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key fixes

🔁 Infinite compaction loop prevention (harness)

💥 Unhandled rejection crash fixes

🔧 Harness compaction stability (numerous commits from earlier work)

📊 minimal-agent speculative compaction (original PR scope)

🎨 TUI & headless

📦 Benchmark additions

Version bumps (via changesets)

Benchmark results (minimal-agent, 4096 context)

Testing

Related

Uh oh!

coderabbitai Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

minpeter commented Apr 2, 2026

Benchmark Visualization — 5-Point Context Size Sweep

How to reproduce

Raw Data

Key Insights

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

minpeter commented Apr 2, 2026

Model Comparison: GLM-5 vs Claude Opus 4

Results

Analysis

Key Insight

Benchmark command

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

minpeter commented Apr 2, 2026

System Prompt + Compaction Prompt Optimization

Changes

Results

Analysis

Cumulative improvement path

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

minpeter commented Apr 2, 2026 •

edited

Loading

coderabbitai Bot commented Apr 2, 2026 •

edited

Loading