Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,70 @@

All notable changes to `@inceptionstack/roundhouse` are documented here.

## [0.5.38] — 2026-05-16

### Fixed
- **Soft-reset pre-turn gap.** Idle sessions that grew via background work
(cron jobs, boot turn, sub-agent results) could cross the provider's context
limit without ever tripping the proactive `softTokens`/`hardTokens`
thresholds while live. The next user turn called `agent.prompt()` directly
and overflowed; the gateway catch posted the raw `prompt is too long: N >
200000` error with no classification or recovery, and the loop persisted
until manual surgery on the jsonl. The v0.5.29–v0.5.32 soft-reset machinery
only fired from `flushMemoryThenCompact`'s catch (i.e. when *compact itself*
overflowed), not from a normal user-prompt overflow. Concrete evidence on
the maintainer's machine: `~/.roundhouse/sessions/main` jsonl reached 2.8 MB
with zero `"main"` entries in `compact-timing.jsonl` between
2026-05-14 and 2026-05-16.
- **Fix:** classify `agent.prompt()` / `agent.promptStream()` exceptions in the
gateway catch via the existing `isContextOverflowError`. On overflow, call
`agent.softReset(threadId)` (extracted into a shared
`recoverFromContextOverflow` helper, also used by the v0.5.32 compact-time
path). On success, set `forceInjectReason="after-soft-reset"` and clear
`pendingCompact`; on no-op or failure with `agent.compact` available, arm
`pendingCompact="emergency"` so the existing pre-turn branch fires on the
user's next message. UX is deferred-retry only — same-turn replay would
duplicate streamed text and re-execute side-effecting tools. Background
turns (boot/subagent) get distinct copy that doesn't ask a user to
resend. Telemetry: one line per gateway-side recovery in
`compact-timing.jsonl` with `level: "gateway-overflow"`.
- **Streaming path coverage (post-review F1).** pi-ai's streaming surfaces
provider errors as `model_error` *events*, not thrown exceptions — so the
initial fix above only caught synchronous-throw overflow. On Telegram
(streaming-default), streamed `prompt is too long` still bypassed recovery:
`gateway/streaming.ts` posted the raw error and the for-await loop returned
normally. Per codex-cli design (option a, refined): classify the
`model_error` message in `streaming.ts`. Non-overflow keeps today's inline
`⚠️ Agent error:` post + continue-loop. Overflow flushes, suppresses the
inline raw post, and throws a typed `StreamModelOverflowError` so the
gateway catch routes through `recoverFromAgentTurnOverflow` exactly like
synchronous-throw overflow. Single recovery surface, no duplicate posts,
no flag plumbed through the `StreamResult` contract.
- **Code-review polish (F2–F6).** Removed dead `"cron"` from the `TurnSource`
union (cron jobs run via `cron/runner.ts` in their own session and never
reach `Gateway.handleAgentTurn`). Replaced the raw provider error in the
`unsupported`-recovery branch with explicit guidance (`⚠️ Session full —
adapter doesn't support automatic recovery. Run /compact manually or
restart session.`). Extracted `appendCompactLog` + `CompactLogEntry` to a
new `src/memory/telemetry.ts` to remove the gateway→memory cross-domain
import; `lifecycle.ts` re-exports for back-compat. De-duplicated
`MAX_ERROR_PREVIEW = 200` (gateway.ts copy was unused after the v0.5.38
catch refactor; deleted). Replaced bare `slice(0, 100)` magic number with
`MAX_FAILURE_REASON_PREVIEW`.
- 26 new tests across `test/overflow-recovery.test.ts` (helper-level
classify/recover/no-op/failed/cause-chain),
`test/gateway-overflow-recovery.test.ts` (gateway-level state writes,
pendingCompact arming, streaming partial-text branch, background-turn
copy, post-throw resilience, F3 unsupported-guidance regression), and
`test/streaming-overflow.test.ts` (F1: model_error overflow throws,
non-overflow inline post regression, end-to-end streaming→recovery for
both clean and partial-text turns). **591 tests passing** (+26 net).
- Design doc: `docs/design/v0.5.38-soft-reset-pre-turn-gap.md` (codex-cli
Alternative D — shared reactive recovery helper, deferred retry,
pendingCompact fallback).
- F1 design: `~/.roundhouse/workspace/softreset-f1-codex-design.md`
(codex-cli option (a) refined — typed `StreamModelOverflowError`).

## [0.5.37] — 2026-05-16

### Fixed
Expand Down
290 changes: 290 additions & 0 deletions docs/design/v0.5.38-soft-reset-pre-turn-gap.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@inceptionstack/roundhouse",
"version": "0.5.37",
"version": "0.5.38",
"type": "module",
"description": "Multi-platform chat gateway that routes messages through a configured AI agent",
"license": "MIT",
Expand Down
83 changes: 83 additions & 0 deletions src/agents/shared/overflow-recovery.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
/**
* agents/shared/overflow-recovery.ts — Reactive context-overflow recovery helper
*
* Used by:
* - src/memory/lifecycle.ts: catch in flushMemoryThenCompact (compact itself overflowed)
* - src/gateway/gateway.ts: catch around agent.prompt/agent.promptStream
* (the live session was already past the limit before this turn even started — typically
* after idle background growth via cron/boot/sub-agents)
*
* Pure agent-error → agent-action helper. Memory-state effects
* (forceInjectReason="after-soft-reset", clearing pendingCompact, arming
* pendingCompact="emergency") are the CALLER's responsibility, because the two
* call sites need different fallback semantics:
* - lifecycle: re-arms pendingCompact at whatever level was failing
* - gateway: only arms pendingCompact="emergency" when agent.compact exists
* and softReset didn't recover (so the next pre-turn branch fires)
*
* Returns a discriminated outcome rather than {attempted, succeeded} so callers
* can branch precisely.
*/

import type { AgentAdapter } from "../../types";
import type { SoftResetReport } from "./session-soft-reset";
import { isContextOverflowError } from "./error-classifiers";

export type OverflowRecoveryOutcome =
| { kind: "not-overflow" }
| { kind: "unsupported" } // agent.softReset undefined
| { kind: "recovered"; report: SoftResetReport } // softReset returned reset:true
| { kind: "noop"; reason: string } // softReset returned reset:false
| { kind: "failed"; error: string }; // softReset itself threw

/** Max bytes of resetErr.message we surface in `failed.error` and onProgress. */
const MAX_RESET_ERROR_PREVIEW = 200;

/**
* Classify err and, on context-overflow, run agent.softReset to trim the
* on-disk session jsonl.
*
* Emits onProgress("♻️ Session overflowed — soft-resetting to recent turns...")
* when entering recovery, and one of the v0.5.32 trio (✅/⚠️/❌) on outcome.
*
* Does NOT mutate memory state. Caller is responsible for state writes.
*/
export async function recoverFromContextOverflow(
err: unknown,
threadId: string,
agent: AgentAdapter,
onProgress?: (step: string) => void | Promise<void>,
): Promise<OverflowRecoveryOutcome> {
if (!isContextOverflowError(err)) {
return { kind: "not-overflow" };
}

if (!agent.softReset) {
return { kind: "unsupported" };
}

try {
await onProgress?.("♻️ Session overflowed — soft-resetting to recent turns...");
const report = await agent.softReset(threadId);

if (report?.reset) {
console.warn(`[overflow-recovery] soft-reset recovered ${threadId} from overflow`);
const { entriesBefore, entriesAfter } = report as SoftResetReport;
const detail = typeof entriesBefore === "number" && typeof entriesAfter === "number"
? ` (${entriesBefore} → ${entriesAfter} entries)`
: "";
await onProgress?.(`✅ Soft-reset complete${detail}. Durable memory will re-inject on next turn.`);
return { kind: "recovered", report: report as SoftResetReport };
}

const reason = (report as { reason?: string } | null)?.reason ?? "unknown";
console.warn(`[overflow-recovery] soft-reset returned no-op for ${threadId} (${reason})`);
await onProgress?.(`⚠️ Soft-reset no-op (${reason}). Will retry compact next turn.`);
return { kind: "noop", reason };
} catch (resetErr) {
const msg = resetErr instanceof Error ? resetErr.message : String(resetErr);
console.error(`[overflow-recovery] soft-reset failed for ${threadId}:`, msg);
await onProgress?.(`❌ Soft-reset failed: ${msg.slice(0, MAX_RESET_ERROR_PREVIEW)}. Will retry next turn.`);
return { kind: "failed", error: msg };
}
}
27 changes: 17 additions & 10 deletions src/gateway/gateway.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import { ROUNDHOUSE_DIR, ROUNDHOUSE_VERSION } from "../config";
import { CronSchedulerService } from "../cron/scheduler";
import { IpcServer, createIpcHandler } from "../ipc";
import { prepareMemoryForTurn, finalizeMemoryForTurn, flushMemoryThenCompact } from "../memory/lifecycle";
import { recoverFromAgentTurnOverflow, type TurnSource } from "./overflow";
import { maxPressure } from "../memory/policy";
import type { PressureLevel } from "../memory/types";
// progress messages now flow through the transport via this.transport.progress().
Expand Down Expand Up @@ -44,10 +45,12 @@ import { injectToolsSection } from "./tools-inject";
import { injectPersonaSection, loadPersona } from "./persona-inject";
import { checkVersionChange } from "./whats-new";

/** Origin of an agent turn — used for recovery copy and telemetry. */
export type { TurnSource };

/** Limits */
const MAX_SUBAGENT_STDOUT_CHARS = 3000;
const MAX_MESSAGE_CHUNK = 4000;
const MAX_ERROR_PREVIEW = 200;

/** Bot username for command suffix validation (set during gateway init) */
let _botUsername = "";
Expand Down Expand Up @@ -375,6 +378,7 @@ export class Gateway {
private async handleAgentTurn(
thread: any, agentThreadId: string, userText: string, rawAttachments: any[],
verboseThreads: Set<string>, threadLocks: Map<string, Promise<void>>, abortControllers: Map<string, AbortController>,
turnSource: TurnSource = "user",
): Promise<void> {
// Prepare message (save attachments, build AgentMessage)
const result = await this.prepareAgentMessage(thread, agentThreadId, userText, rawAttachments);
Expand Down Expand Up @@ -462,6 +466,10 @@ export class Gateway {
// so the post-finally `if (deferredSoftFlush) { ... }` block (line ~530)
// can still see it. Earlier versions declared it here, then read it
// OUTSIDE the enclosing try block — a scoping bug that tsc surfaced.
// streamHadVisibleText is hoisted for the same reason: the catch below
// needs to know whether the user already saw partial assistant text
// when picking gateway-overflow recovery copy.
let streamHadVisibleText = false;
try {
let turnUsedTools = false;
if (agent.promptStream) {
Expand All @@ -470,6 +478,7 @@ export class Gateway {
try {
const streamResult = await this.handleStreaming(thread, agent.promptStream(agentThreadId, agentMessage), verboseThreads.has(agentThreadId), ac.signal);
turnUsedTools = streamResult.usedTools;
streamHadVisibleText = streamResult.hadVisibleText;
Comment on lines 479 to +481
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve visible-stream state when overflow aborts streaming

streamHadVisibleText is only updated from streamResult after handleStreaming(...) returns, but handleStreaming now throws StreamModelOverflowError on overflow model_error events. In the common case where some text_delta was already emitted and then overflow occurs, this assignment never runs, so the catch path always receives hadVisibleText: false and posts the "Please resend your last message" copy. That is misleading after partial output and can prompt duplicate side effects if the user resends a message whose tools already started running.

Useful? React with 👍 / 👎.

} finally {
abortControllers.delete(agentThreadId);
}
Expand Down Expand Up @@ -506,12 +515,10 @@ export class Gateway {
console.error(`[roundhouse] memory finalize error:`, (err as Error).message);
}
} catch (err) {
const errMsg = err instanceof Error ? err.message : String(err);
const safeMsg = errMsg.split('\n')[0].slice(0, MAX_ERROR_PREVIEW);
console.error(`[roundhouse] agent error:`, err);
try {
await thread.post(`⚠️ Error: ${safeMsg}`);
} catch {}
await recoverFromAgentTurnOverflow(thread, agentThreadId, agent, err, {
turnSource,
hadVisibleText: streamHadVisibleText,
});
} finally {
if (stopTyping) stopTyping();
}
Expand Down Expand Up @@ -840,7 +847,7 @@ export class Gateway {
];
}

private async handleStreaming(thread: any, stream: AsyncIterable<AgentStreamEvent>, verbose: boolean, signal?: AbortSignal): Promise<{ usedTools: boolean }> {
private async handleStreaming(thread: any, stream: AsyncIterable<AgentStreamEvent>, verbose: boolean, signal?: AbortSignal): Promise<{ usedTools: boolean; hadVisibleText: boolean }> {
return _handleStream(stream, {
thread,
verbose,
Expand Down Expand Up @@ -1014,7 +1021,7 @@ export class Gateway {
const bootPrompt = "You just came online after a restart. Say a brief hello in-character (1–2 sentences max). Check your workspace for any pending tasks.";

try {
await this.handleAgentTurn(syntheticThread, agentThreadId, bootPrompt, [], verboseThreads, threadLocks, abortControllers);
await this.handleAgentTurn(syntheticThread, agentThreadId, bootPrompt, [], verboseThreads, threadLocks, abortControllers, "boot");
} catch (err) {
console.error("[roundhouse] boot turn failed:", (err as Error).message);
}
Expand Down Expand Up @@ -1070,7 +1077,7 @@ export class Gateway {
: `[Sub-agent ${status.role} ${status.status} — no output]`;

const syntheticThread = this.transport.createThread(chatId);
await this.handleAgentTurn(syntheticThread, "main", resultText, [], this.verboseThreads, this.threadLocks, this.abortControllers);
await this.handleAgentTurn(syntheticThread, "main", resultText, [], this.verboseThreads, this.threadLocks, this.abortControllers, "subagent");
} catch (err) {
console.error("[roundhouse] sub-agent result injection failed:", err);
}
Expand Down
Loading
Loading