Skip to content

Backfill Pi model into checkpoint metadata from transcript#1298

Merged
Soph merged 1 commit into
mainfrom
soph/pi-model-backfill
May 29, 2026
Merged

Backfill Pi model into checkpoint metadata from transcript#1298
Soph merged 1 commit into
mainfrom
soph/pi-model-backfill

Conversation

@Soph
Copy link
Copy Markdown
Collaborator

@Soph Soph commented May 29, 2026

https://entire.io/gh/entireio/cli/trails/454

Problem

Pi checkpoint metadata records an empty model:

"agent": "Pi",
"model": "",

Pi's hook events (session_start, before_agent_start, agent_end) carry no model field, so Event.Model is never set and state.ModelName stays empty — unlike Claude Code (reports model on SessionStart) or Gemini (BeforeModel). But every Pi assistant message in the JSONL records message.model (e.g. gpt-5.5) and message.provider (e.g. openai-codex), sitting right next to the usage we already parse.

Fix

Backfill the model from the transcript, mirroring the existing token-usage backfill:

  • New optional agent.ModelExtractor interface, resolved via an ungated AsModelExtractor helper (modeled on AsSessionBaseDirProvider, since external agents report model through their own hook protocol).
  • Pi implements ExtractModel, reading message.model from the most recent active-branch assistant message so mid-session model changes are reflected.
  • CondenseSession calls sessionStateBackfillModel to fill state.ModelName only when empty — hook-reported models always win.

Refactor (from /simplify)

ExtractModel would have been the 4th hand-rolled copy of the "resolve active branch → skip lines → scan → unmarshal → filter" loop. Extracted pijsonl.ForEachActiveMessage, routing CalculateTokenUsage, ExtractModifiedFilesFromOffset, ExtractPrompts, and ExtractModel through it (net reduction in transcript.go despite the new method). Also removed a dead Provider struct field and a redundant empty-transcript guard.

Scope note

Fills the model on new condensations going forward; already-condensed metadata is not retroactively rewritten (same behavior as the token-usage backfill).

Tests

  • ExtractModel: linear, mid-session model change (most recent wins), branching (active branch only), empty, no-message.model.
  • ForEachActiveMessage: message filtering, abandoned-branch + offset handling, empty no-op.
  • AsModelExtractor: implemented / not-implemented / nil.
  • sessionStateBackfillModel: Pi reads model, empty transcript, unsupported agent (Cursor) no-op.

mise run fmt + mise run lint clean; agent, strategy, and transcript/compact suites green.

🤖 Generated with Claude Code


Note

Low Risk
Metadata-only enrichment with hook precedence preserved; scoped to Pi transcript parsing and condensation backfill, covered by unit tests.

Overview
Pi checkpoint metadata could record an empty model because Pi hooks never set Event.Model, even though assistant lines in the JSONL include message.model.

This PR adds a built-in-only ModelExtractor path (ungated AsModelExtractor, like SessionBaseDirProvider) and sessionStateBackfillModel during CondenseSession, which sets state.ModelName from the transcript only when it is still empty so hook-reported models win. Pi’s ExtractModel uses the latest active-branch assistant message.model (branching and mid-session model switches included).

Pi transcript parsing is refactored through shared pijsonl.ForEachActiveMessage (full-data active-branch resolution, offset-aware scan), replacing duplicated loops in token usage, file extraction, prompts, and the new model path; Message.Model is parsed on the shared struct. Tests and the agent guide document the backfill behavior; new condensations get the model—existing metadata is not rewritten.

Reviewed by Cursor Bugbot for commit 8bbe414. Configure here.

Pi's hook events (session_start, before_agent_start, agent_end) carry no
model field, so Event.Model was never set and checkpoint metadata recorded
an empty "model" — even though every Pi assistant message records
message.model (e.g. "gpt-5.5"), right next to the usage we already read.

Add an optional agent.ModelExtractor interface (resolved via the ungated
AsModelExtractor helper, mirroring AsSessionBaseDirProvider) and implement
it for Pi by reading message.model from the most recent active-branch
assistant message — handling mid-session model changes. Condensation calls
sessionStateBackfillModel to fill state.ModelName when it's empty, so
hook-reported models (Claude Code, Gemini) still take precedence. This
mirrors the existing transcript-based token-usage backfill.

While here, extract pijsonl.ForEachActiveMessage to own the active-branch
scan skeleton that CalculateTokenUsage, ExtractModifiedFilesFromOffset,
ExtractPrompts, and the new ExtractModel had each hand-rolled.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 66e7cddb010c
Copilot AI review requested due to automatic review settings May 29, 2026 16:40
@Soph Soph requested a review from a team as a code owner May 29, 2026 16:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a transcript-based model backfill for Pi sessions whose hook events don't carry a model field, so checkpoint metadata records the actual LLM used (e.g. gpt-5.5) instead of an empty string. Also refactors the four Pi transcript-walk loops to share a single helper.

Changes:

  • New optional agent.ModelExtractor interface + ungated AsModelExtractor helper; Pi implements it by reading message.model from the most recent active-branch assistant message.
  • CondenseSession calls sessionStateBackfillModel to populate state.ModelName only when empty (hook-reported models keep precedence).
  • Extracts pijsonl.ForEachActiveMessage and routes CalculateTokenUsage, ExtractModifiedFilesFromOffset, ExtractPrompts, and ExtractModel through it.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
cmd/entire/cli/agent/agent.go Defines new ModelExtractor optional interface.
cmd/entire/cli/agent/capabilities.go Adds ungated AsModelExtractor helper and documents built-in-only exclusions from DeclaredCaps.
cmd/entire/cli/agent/capabilities_test.go Tests for AsModelExtractor (implemented / not / nil).
cmd/entire/cli/agent/pi/pijsonl/pijsonl.go Adds Message.Model field and shared ForEachActiveMessage iterator.
cmd/entire/cli/agent/pi/pijsonl/pijsonl_test.go Tests ForEachActiveMessage filtering, offset, and empty cases.
cmd/entire/cli/agent/pi/transcript.go Implements ExtractModel; refactors three existing analyzers onto the shared iterator.
cmd/entire/cli/agent/pi/transcript_test.go Unit tests for ExtractModel (linear, mid-session change, branching, empty, missing field).
cmd/entire/cli/strategy/manual_commit_condensation.go Adds sessionStateBackfillModel and calls it during condensation when ModelName is empty.
cmd/entire/cli/strategy/manual_commit_condensation_test.go Tests backfill for Pi, empty transcript, and unsupported agent.
docs/architecture/agent-guide.md Documents Pi model backfill and updated active-branch scope.

@Soph Soph merged commit 848f03a into main May 29, 2026
11 checks passed
@Soph Soph deleted the soph/pi-model-backfill branch May 29, 2026 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants