Backfill Pi model into checkpoint metadata from transcript#1298
Merged
Conversation
Pi's hook events (session_start, before_agent_start, agent_end) carry no model field, so Event.Model was never set and checkpoint metadata recorded an empty "model" — even though every Pi assistant message records message.model (e.g. "gpt-5.5"), right next to the usage we already read. Add an optional agent.ModelExtractor interface (resolved via the ungated AsModelExtractor helper, mirroring AsSessionBaseDirProvider) and implement it for Pi by reading message.model from the most recent active-branch assistant message — handling mid-session model changes. Condensation calls sessionStateBackfillModel to fill state.ModelName when it's empty, so hook-reported models (Claude Code, Gemini) still take precedence. This mirrors the existing transcript-based token-usage backfill. While here, extract pijsonl.ForEachActiveMessage to own the active-branch scan skeleton that CalculateTokenUsage, ExtractModifiedFilesFromOffset, ExtractPrompts, and the new ExtractModel had each hand-rolled. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 66e7cddb010c
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a transcript-based model backfill for Pi sessions whose hook events don't carry a model field, so checkpoint metadata records the actual LLM used (e.g. gpt-5.5) instead of an empty string. Also refactors the four Pi transcript-walk loops to share a single helper.
Changes:
- New optional
agent.ModelExtractorinterface + ungatedAsModelExtractorhelper; Pi implements it by readingmessage.modelfrom the most recent active-branch assistant message. CondenseSessioncallssessionStateBackfillModelto populatestate.ModelNameonly when empty (hook-reported models keep precedence).- Extracts
pijsonl.ForEachActiveMessageand routesCalculateTokenUsage,ExtractModifiedFilesFromOffset,ExtractPrompts, andExtractModelthrough it.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| cmd/entire/cli/agent/agent.go | Defines new ModelExtractor optional interface. |
| cmd/entire/cli/agent/capabilities.go | Adds ungated AsModelExtractor helper and documents built-in-only exclusions from DeclaredCaps. |
| cmd/entire/cli/agent/capabilities_test.go | Tests for AsModelExtractor (implemented / not / nil). |
| cmd/entire/cli/agent/pi/pijsonl/pijsonl.go | Adds Message.Model field and shared ForEachActiveMessage iterator. |
| cmd/entire/cli/agent/pi/pijsonl/pijsonl_test.go | Tests ForEachActiveMessage filtering, offset, and empty cases. |
| cmd/entire/cli/agent/pi/transcript.go | Implements ExtractModel; refactors three existing analyzers onto the shared iterator. |
| cmd/entire/cli/agent/pi/transcript_test.go | Unit tests for ExtractModel (linear, mid-session change, branching, empty, missing field). |
| cmd/entire/cli/strategy/manual_commit_condensation.go | Adds sessionStateBackfillModel and calls it during condensation when ModelName is empty. |
| cmd/entire/cli/strategy/manual_commit_condensation_test.go | Tests backfill for Pi, empty transcript, and unsupported agent. |
| docs/architecture/agent-guide.md | Documents Pi model backfill and updated active-branch scope. |
alishakawaguchi
approved these changes
May 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
https://entire.io/gh/entireio/cli/trails/454
Problem
Pi checkpoint metadata records an empty model:
Pi's hook events (
session_start,before_agent_start,agent_end) carry no model field, soEvent.Modelis never set andstate.ModelNamestays empty — unlike Claude Code (reports model onSessionStart) or Gemini (BeforeModel). But every Pi assistant message in the JSONL recordsmessage.model(e.g.gpt-5.5) andmessage.provider(e.g.openai-codex), sitting right next to theusagewe already parse.Fix
Backfill the model from the transcript, mirroring the existing token-usage backfill:
agent.ModelExtractorinterface, resolved via an ungatedAsModelExtractorhelper (modeled onAsSessionBaseDirProvider, since external agents report model through their own hook protocol).ExtractModel, readingmessage.modelfrom the most recent active-branch assistant message so mid-session model changes are reflected.CondenseSessioncallssessionStateBackfillModelto fillstate.ModelNameonly when empty — hook-reported models always win.Refactor (from
/simplify)ExtractModelwould have been the 4th hand-rolled copy of the "resolve active branch → skip lines → scan → unmarshal → filter" loop. Extractedpijsonl.ForEachActiveMessage, routingCalculateTokenUsage,ExtractModifiedFilesFromOffset,ExtractPrompts, andExtractModelthrough it (net reduction intranscript.godespite the new method). Also removed a deadProviderstruct field and a redundant empty-transcript guard.Scope note
Fills the model on new condensations going forward; already-condensed metadata is not retroactively rewritten (same behavior as the token-usage backfill).
Tests
ExtractModel: linear, mid-session model change (most recent wins), branching (active branch only), empty, no-message.model.ForEachActiveMessage: message filtering, abandoned-branch + offset handling, empty no-op.AsModelExtractor: implemented / not-implemented / nil.sessionStateBackfillModel: Pi reads model, empty transcript, unsupported agent (Cursor) no-op.mise run fmt+mise run lintclean; agent, strategy, and transcript/compact suites green.🤖 Generated with Claude Code
Note
Low Risk
Metadata-only enrichment with hook precedence preserved; scoped to Pi transcript parsing and condensation backfill, covered by unit tests.
Overview
Pi checkpoint metadata could record an empty model because Pi hooks never set
Event.Model, even though assistant lines in the JSONL includemessage.model.This PR adds a built-in-only
ModelExtractorpath (ungatedAsModelExtractor, likeSessionBaseDirProvider) andsessionStateBackfillModelduringCondenseSession, which setsstate.ModelNamefrom the transcript only when it is still empty so hook-reported models win. Pi’sExtractModeluses the latest active-branch assistantmessage.model(branching and mid-session model switches included).Pi transcript parsing is refactored through shared
pijsonl.ForEachActiveMessage(full-data active-branch resolution, offset-aware scan), replacing duplicated loops in token usage, file extraction, prompts, and the new model path;Message.Modelis parsed on the shared struct. Tests and the agent guide document the backfill behavior; new condensations get the model—existing metadata is not rewritten.Reviewed by Cursor Bugbot for commit 8bbe414. Configure here.