Skip to content

feat: align ep logs chat view with tokenized rollout prompts#431

Merged
benjibc merged 1 commit intomainfrom
benjibc/ep-logs-transcript-alignment
Mar 7, 2026
Merged

feat: align ep logs chat view with tokenized rollout prompts#431
benjibc merged 1 commit intomainfrom
benjibc/ep-logs-transcript-alignment

Conversation

@benjibc
Copy link
Contributor

@benjibc benjibc commented Mar 7, 2026

Summary

  • merge tool declarations into the first system message so the chat pane matches the tokenized prompt more closely
  • hide raw assistant tool-call payload text when tool_calls are already rendered structurally
  • keep the generic token-debug UI from main, while making the message transcript more useful for multimodal rollout review

Validation

  • npm run build
  • verified in Chromium against real FrozenLake visual rollouts from Kimi K2.5 VL and Qwen3 VL

Screenshots

Kimi K2.5 VL

Kimi ep logs token debug

Qwen3 VL

Qwen3 VL ep logs token debug

Notes


Note

Low Risk
Primarily UI/transcript rendering changes; the main risk is accidentally altering how prompts/tool calls are displayed or copied during rollout review, not core execution.

Overview
Aligns the eval log chat transcript with tokenized rollout prompts by synthesizing a # Tools declaration block from row.tools and prepending it to the first system message (or inserting a new one) before rendering ChatInterface.

Updates MessageBubble to hide raw assistant message content when tool_calls are present (since tool calls are rendered structurally), while keeping copy/expand behavior for non-hidden content and adding per-tool-call copy actions.

Adds an inline “add filter” control next to invocation_id in EvaluationRow to quickly append a matching filter to the current filter config (with tooltip feedback), and tweaks model display to use JSONTooltip for object-valued models.

Written by Cursor Bugbot for commit 5073a8c. This will update automatically on new commits. Configure here.

@benjibc benjibc merged commit 9f0f5e4 into main Mar 7, 2026
16 of 19 checks passed
@benjibc benjibc deleted the benjibc/ep-logs-transcript-alignment branch March 7, 2026 01:12
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5073a8cafa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const isTool = message.role === "tool";
const hasToolCalls = message.tool_calls && message.tool_calls.length > 0;
const hasFunctionCall = message.function_call;
const hideMessageContent = message.role === "assistant" && hasToolCalls;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve assistant text when tool_calls are present

This unconditionally suppresses assistant content for any message that has tool_calls, so renderContent() returns null even when the content contains meaningful text (for example, many recorded trajectories include <think>...</think> in assistant turns alongside tool calls, such as examples/cliff_walking_mcp/tests/recordings/production_trajectory.jsonl). In those cases the chat transcript loses the assistant’s reasoning/context entirely, which makes rollout review inaccurate; the hide logic should only apply to payload-only duplicates, not all tool-call turns.

Useful? React with 👍 / 👎.

Comment on lines +353 to +354
` action: ${actionEnum},`,
" [k: string]: never",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Generate tool declaration from actual function parameters

The synthesized tool signature is hard-coded to an action argument and then forbids all other keys via [k: string]: never, regardless of each tool’s real schema. For tools whose parameters are not action-based (e.g. get_weather(location, unit) in tests/pytest/data/function_calling.jsonl), the displayed declaration is incorrect and can’t represent valid calls, which defeats the new “prompt-faithful” transcript behavior.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant