Skip to content

Python: Flatten hyperlight execute_code output#5333

Merged
eavanvalkenburg merged 2 commits intomicrosoft:mainfrom
eavanvalkenburg:hyperlight_fix
Apr 20, 2026
Merged

Python: Flatten hyperlight execute_code output#5333
eavanvalkenburg merged 2 commits intomicrosoft:mainfrom
eavanvalkenburg:hyperlight_fix

Conversation

@eavanvalkenburg
Copy link
Copy Markdown
Member

Motivation and Context

Hyperlight's execute_code tool previously wrapped its stdout/file outputs in a single code_interpreter_tool_result content item. Content.from_function_result only collects text from top-level items, so the nested text never made it into function_result.result, which meant the OpenAI Responses client (and any other client that reads .result) sent an empty string to the model even though the data was available on items.

Description

  • Flatten _build_execution_contents to return the output list directly (text + data + error contents), removing the code_interpreter_tool_result wrapper. Keeps the fix isolated to the hyperlight package - no core changes needed.
  • Preserve user-supplied result_parser in _make_sandbox_callback; only auto-assign the passthrough parser when none is set.
  • Teach build_codeact_instructions about filesystem_enabled and add guidance to print(...) at the end of code and use /output/<file> for larger artifacts.
  • Update unit tests and the codeact_context_provider sample to consume the flattened output shape.
  • Add a codeact_benchmark sample under packages/hyperlight/samples/ that compares traditional tool-calling and CodeAct side-by-side with FoundryChatClient (timing + token counts + structured output).

Copilot AI review requested due to automatic review settings April 17, 2026 14:25
@github-actions github-actions bot changed the title Flatten hyperlight execute_code output Python: Flatten hyperlight execute_code output Apr 17, 2026
@moonbox3
Copy link
Copy Markdown
Contributor

moonbox3 commented Apr 17, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/hyperlight/agent_framework_hyperlight
   _execute_code_tool.py4788582%64, 100–101, 120, 122, 135, 153, 163, 188, 193, 200, 206, 214, 222–224, 226–231, 271, 276, 278, 280, 297–298, 307–310, 337–340, 346, 348, 358–359, 391–392, 395–396, 403, 431, 469–470, 476, 484–485, 488, 492, 503–509, 561–562, 588, 651, 687, 693–695, 724–728, 732–733, 738, 755–759, 763–764, 821–822
   _instructions.py47687%14, 33, 45–46, 56, 86
TOTAL28331330088% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
5653 30 💤 0 ❌ 0 🔥 1m 29s ⏱️

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Hyperlight execute_code tool output shape so stdout/files/errors are returned as a flat list of Content items, ensuring Content.from_function_result can populate .result (and rich .items) correctly for downstream clients.

Changes:

  • Flatten _build_execution_contents to return list[Content] directly (no code_interpreter_tool_result wrapper).
  • Preserve user-provided result_parser in _make_sandbox_callback (only default to passthrough when unset).
  • Update CodeAct instructions to include filesystem-aware output guidance; update tests and samples for the flattened output shape; add a new benchmark sample.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
python/packages/hyperlight/agent_framework_hyperlight/_execute_code_tool.py Flattens execute_code outputs and adjusts sandbox callback result_parser behavior.
python/packages/hyperlight/agent_framework_hyperlight/_instructions.py Adds filesystem-aware guidance and encourages print(...) for surfacing results.
python/packages/hyperlight/tests/hyperlight/test_hyperlight_codeact.py Updates assertions/helpers to consume flattened tool outputs.
python/packages/hyperlight/samples/codeact_context_provider.py Updates sample logging to iterate flattened execute_code results.
python/packages/hyperlight/samples/codeact_benchmark.py Adds a new sample comparing traditional tool-calling vs CodeAct.

Comment thread python/packages/hyperlight/samples/codeact_benchmark.py
@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue Apr 20, 2026
Merged via the queue into microsoft:main with commit 69894ed Apr 20, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants