feat: align ep logs chat view with tokenized rollout prompts by benjibc · Pull Request #431 · eval-protocol/python-sdk

benjibc · 2026-03-07T00:56:38Z

Summary

merge tool declarations into the first system message so the chat pane matches the tokenized prompt more closely
hide raw assistant tool-call payload text when tool_calls are already rendered structurally
keep the generic token-debug UI from main, while making the message transcript more useful for multimodal rollout review

Validation

npm run build
verified in Chromium against real FrozenLake visual rollouts from Kimi K2.5 VL and Qwen3 VL

Screenshots

Kimi K2.5 VL

Qwen3 VL

Notes

supersedes feat: add token debug view for rollout rows in ep logs #430, which was opened from an outdated local branch

Note

Low Risk
Primarily UI/transcript rendering changes; the main risk is accidentally altering how prompts/tool calls are displayed or copied during rollout review, not core execution.

Overview
Aligns the eval log chat transcript with tokenized rollout prompts by synthesizing a # Tools declaration block from row.tools and prepending it to the first system message (or inserting a new one) before rendering ChatInterface.

Updates MessageBubble to hide raw assistant message content when tool_calls are present (since tool calls are rendered structurally), while keeping copy/expand behavior for non-hidden content and adding per-tool-call copy actions.

Adds an inline “add filter” control next to invocation_id in EvaluationRow to quickly append a matching filter to the current filter config (with tooltip feedback), and tweaks model display to use JSONTooltip for object-valued models.

^{Written by Cursor Bugbot for commit 5073a8c. This will update automatically on new commits. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5073a8cafa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-07T05:01:34Z

vite-app/src/components/MessageBubble.tsx

  const isTool = message.role === "tool";
  const hasToolCalls = message.tool_calls && message.tool_calls.length > 0;
  const hasFunctionCall = message.function_call;
+  const hideMessageContent = message.role === "assistant" && hasToolCalls;


Preserve assistant text when tool_calls are present

This unconditionally suppresses assistant content for any message that has tool_calls, so renderContent() returns null even when the content contains meaningful text (for example, many recorded trajectories include <think>...</think> in assistant turns alongside tool calls, such as examples/cliff_walking_mcp/tests/recordings/production_trajectory.jsonl). In those cases the chat transcript loses the assistant’s reasoning/context entirely, which makes rollout review inaccurate; the hide logic should only apply to payload-only duplicates, not all tool-call turns.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-07T05:01:34Z

vite-app/src/components/EvaluationRow.tsx

+        `  action: ${actionEnum},`,
+        "  [k: string]: never",


Generate tool declaration from actual function parameters

The synthesized tool signature is hard-coded to an action argument and then forbids all other keys via [k: string]: never, regardless of each tool’s real schema. For tools whose parameters are not action-based (e.g. get_weather(location, unit) in tests/pytest/data/function_calling.jsonl), the displayed declaration is incorrect and can’t represent valid calls, which defeats the new “prompt-faithful” transcript behavior.

Useful? React with 👍 / 👎.

feat: align ep logs chat view with tokenized rollout prompts

5073a8c

benjibc mentioned this pull request Mar 7, 2026

feat: add token debug view for rollout rows in ep logs #430

Closed

benjibc merged commit 9f0f5e4 into main Mar 7, 2026
16 of 19 checks passed

benjibc deleted the benjibc/ep-logs-transcript-alignment branch March 7, 2026 01:12

benjibc mentioned this pull request Mar 7, 2026

feat: unify training rendering and harden FrozenLake VL fw-ai/cookbook#176

Merged

chatgpt-codex-connector bot reviewed Mar 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: align ep logs chat view with tokenized rollout prompts#431

feat: align ep logs chat view with tokenized rollout prompts#431
benjibc merged 1 commit intomainfrom
benjibc/ep-logs-transcript-alignment

benjibc commented Mar 7, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 7, 2026

Uh oh!

chatgpt-codex-connector bot Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

benjibc commented Mar 7, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Screenshots

Kimi K2.5 VL

Qwen3 VL

Notes

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

benjibc commented Mar 7, 2026 •

edited by cursor bot

Loading