feat: align ep logs chat view with tokenized rollout prompts#431
feat: align ep logs chat view with tokenized rollout prompts#431
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5073a8cafa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const isTool = message.role === "tool"; | ||
| const hasToolCalls = message.tool_calls && message.tool_calls.length > 0; | ||
| const hasFunctionCall = message.function_call; | ||
| const hideMessageContent = message.role === "assistant" && hasToolCalls; |
There was a problem hiding this comment.
Preserve assistant text when tool_calls are present
This unconditionally suppresses assistant content for any message that has tool_calls, so renderContent() returns null even when the content contains meaningful text (for example, many recorded trajectories include <think>...</think> in assistant turns alongside tool calls, such as examples/cliff_walking_mcp/tests/recordings/production_trajectory.jsonl). In those cases the chat transcript loses the assistant’s reasoning/context entirely, which makes rollout review inaccurate; the hide logic should only apply to payload-only duplicates, not all tool-call turns.
Useful? React with 👍 / 👎.
| ` action: ${actionEnum},`, | ||
| " [k: string]: never", |
There was a problem hiding this comment.
Generate tool declaration from actual function parameters
The synthesized tool signature is hard-coded to an action argument and then forbids all other keys via [k: string]: never, regardless of each tool’s real schema. For tools whose parameters are not action-based (e.g. get_weather(location, unit) in tests/pytest/data/function_calling.jsonl), the displayed declaration is incorrect and can’t represent valid calls, which defeats the new “prompt-faithful” transcript behavior.
Useful? React with 👍 / 👎.
Summary
tool_callsare already rendered structurallymain, while making the message transcript more useful for multimodal rollout reviewValidation
npm run buildScreenshots
Kimi K2.5 VL
Qwen3 VL
Notes
Note
Low Risk
Primarily UI/transcript rendering changes; the main risk is accidentally altering how prompts/tool calls are displayed or copied during rollout review, not core execution.
Overview
Aligns the eval log chat transcript with tokenized rollout prompts by synthesizing a
# Toolsdeclaration block fromrow.toolsand prepending it to the firstsystemmessage (or inserting a new one) before renderingChatInterface.Updates
MessageBubbleto hide raw assistant message content whentool_callsare present (since tool calls are rendered structurally), while keeping copy/expand behavior for non-hidden content and adding per-tool-call copy actions.Adds an inline “add filter” control next to
invocation_idinEvaluationRowto quickly append a matching filter to the current filter config (with tooltip feedback), and tweaks model display to useJSONTooltipfor object-valued models.Written by Cursor Bugbot for commit 5073a8c. This will update automatically on new commits. Configure here.