Skip to content

Session resume fails when session.compaction_complete writes negative tokensRemoved (schema requires >= 0) #3598

@corelli18512

Description

@corelli18512

Describe the bug

Sessions become unloadable on /resume because the CLI's own writer emits session.compaction_complete events with negative tokensRemoved values, which the bundled schema constrains to >= 0. This is the same writer↔validator-mismatch pattern as #3454 (exitCode<0), #3432 (totalPremiumRequests float), and #3520 (ephemeral missing).

Error message

Failed to resume session: Error: Request session.resume failed with message:
Session file is corrupted (line 46745: data.tokensRemoved: Number must be greater than or equal to 0)

The line number varies per session and recurs at multiple lines per file once compaction has run a few times.

Offending event in events.jsonl

{
  "type": "session.compaction_complete",
  "data": {
    "tokensRemoved": -2098,
    ...
  }
}

A single 3-hour slice of one running session accumulated 44 new events with negative tokensRemoved between two resume attempts (after I had previously sanitized that file to zero).

Affected versions

  • Session created with: GitHub Copilot CLI 1.0.48 (via @github/copilot-sdk 0.3.0)
  • Resume attempted with: same version (and any prior version that ran compaction on the same session)
  • OS: macOS 26 (also reproduced on the same session resumed on a Linux relay)
  • Affects long-running sessions disproportionately — one session at 47k events had 28 bad entries; a session at 11k events had 44.

Sample real-world data (across 7 broken sessions on one user, one day)

totalRuns events.jsonl lines negative tokensRemoved
119 47715 10
7 48594 28
1 1042 2
1 135 1
22 49287 26
8 11582 44
119 5698 2

Negatives accumulate continuously during normal use, not just under unusual conditions. Two of these sessions had been patched to zero earlier the same day and acquired new negatives within 3 hours of active conversation.

Steps to reproduce

  1. Run any long-lived Copilot CLI session that triggers context compaction (large context windows, big tool outputs).
  2. After several compactions, exit the session.
  3. copilot --resume <id> → "Session file is corrupted".

Easier reproduction: open one of the listed user sessions above with a fresh CLI build and observe the schema rejection on the first line containing a negative tokensRemoved.

Expected behavior

  • Either the writer should never emit a negative tokensRemoved (the logical maximum a compaction can free is prevTokenCount — and the floor is 0; not sure how a negative arises but presumably token-counting drift between pre/post estimates).
  • Or the schema should accept the full int range and treat negatives as a hint of accounting drift, since tokensRemoved is metadata only — it doesn't affect session reconstruction.

Suggested fixes

  1. Clamp on write: Math.max(0, computedTokensRemoved). Cheapest fix, schema unchanged.
  2. Widen the schema to drop the >= 0 constraint. Matches what the writer actually produces; preserves the diagnostic info that "the estimator drifted by N tokens" if anyone wants to use it later.
  3. Graceful loader degradation (also requested in Session unloadable when Windows shell exits with negative exit code -- data.kind.exitCode: Number must be greater than or equal to 0 #3454): skip-and-warn rather than abort the whole session on per-event validation failure.

Option 1 is a one-line fix in the compaction code path and would prevent any user-visible regression.

Workaround

# Find a sample of bad entries in a session's events.jsonl
grep -c '"tokensRemoved":-' ~/.copilot/session-state/<sessionId>/events.jsonl

# In-place fix
python3 -c '
import json, sys, os
p = os.path.expanduser("~/.copilot/session-state/<sessionId>/events.jsonl")
lines = open(p).read().split("\n")
fixed = 0
for i, line in enumerate(lines):
    if not line: continue
    try:
        d = json.loads(line)
        tr = d.get("data", {}).get("tokensRemoved")
        if isinstance(tr, (int, float)) and tr < 0:
            d["data"]["tokensRemoved"] = 0
            lines[i] = json.dumps(d, separators=(",", ":"))
            fixed += 1
    except: pass
tmp = p + ".tmp"
open(tmp, "w").write("\n".join(lines))
os.replace(tmp, p)
print(f"fixed {fixed}")
'

For sessions that are also broken via #3454 / #3432 / #3520, run a combined sanitizer — the patterns are all "clamp/coerce to nearest schema-valid value".

Additional context

Reporting this from a third-party wrapper around the SDK (@github/copilot-sdk) that runs Copilot CLI as a long-lived daemon for many concurrent sessions, where session resume is on the hot path of every daemon restart. We've shipped a defensive sanitizer in our own code as a workaround, but the upstream fix would benefit every consumer of the SDK.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions