Python: Flatten hyperlight execute_code output by eavanvalkenburg · Pull Request #5333 · microsoft/agent-framework

eavanvalkenburg · 2026-04-17T14:25:31Z

Motivation and Context

Hyperlight's execute_code tool previously wrapped its stdout/file outputs in a single code_interpreter_tool_result content item. Content.from_function_result only collects text from top-level items, so the nested text never made it into function_result.result, which meant the OpenAI Responses client (and any other client that reads .result) sent an empty string to the model even though the data was available on items.

Description

Flatten _build_execution_contents to return the output list directly (text + data + error contents), removing the code_interpreter_tool_result wrapper. Keeps the fix isolated to the hyperlight package - no core changes needed.
Preserve user-supplied result_parser in _make_sandbox_callback; only auto-assign the passthrough parser when none is set.
Teach build_codeact_instructions about filesystem_enabled and add guidance to print(...) at the end of code and use /output/<file> for larger artifacts.
Update unit tests and the codeact_context_provider sample to consume the flattened output shape.
Add a codeact_benchmark sample under packages/hyperlight/samples/ that compares traditional tool-calling and CodeAct side-by-side with FoundryChatClient (timing + token counts + structured output).

moonbox3 · 2026-04-17T14:28:05Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/hyperlight/agent_framework_hyperlight
_execute_code_tool.py	478	85	82%	64, 100–101, 120, 122, 135, 153, 163, 188, 193, 200, 206, 214, 222–224, 226–231, 271, 276, 278, 280, 297–298, 307–310, 337–340, 346, 348, 358–359, 391–392, 395–396, 403, 431, 469–470, 476, 484–485, 488, 492, 503–509, 561–562, 588, 651, 687, 693–695, 724–728, 732–733, 738, 755–759, 763–764, 821–822
_instructions.py	47	6	87%	14, 33, 45–46, 56, 86
TOTAL	28331	3300	88%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
5653	30 💤	0 ❌	0 🔥	1m 29s ⏱️

Copilot

Pull request overview

This PR updates the Hyperlight execute_code tool output shape so stdout/files/errors are returned as a flat list of Content items, ensuring Content.from_function_result can populate .result (and rich .items) correctly for downstream clients.

Changes:

Flatten _build_execution_contents to return list[Content] directly (no code_interpreter_tool_result wrapper).
Preserve user-provided result_parser in _make_sandbox_callback (only default to passthrough when unset).
Update CodeAct instructions to include filesystem-aware output guidance; update tests and samples for the flattened output shape; add a new benchmark sample.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
python/packages/hyperlight/agent_framework_hyperlight/_execute_code_tool.py	Flattens execute_code outputs and adjusts sandbox callback result_parser behavior.
python/packages/hyperlight/agent_framework_hyperlight/_instructions.py	Adds filesystem-aware guidance and encourages `print(...)` for surfacing results.
python/packages/hyperlight/tests/hyperlight/test_hyperlight_codeact.py	Updates assertions/helpers to consume flattened tool outputs.
python/packages/hyperlight/samples/codeact_context_provider.py	Updates sample logging to iterate flattened execute_code results.
python/packages/hyperlight/samples/codeact_benchmark.py	Adds a new sample comparing traditional tool-calling vs CodeAct.

small fix for hyperlight

ddd363b

Copilot AI review requested due to automatic review settings April 17, 2026 14:25

moonbox3 added the python label Apr 17, 2026

github-actions bot changed the title ~~Flatten hyperlight execute_code output~~ Python: Flatten hyperlight execute_code output Apr 17, 2026

Copilot started reviewing on behalf of eavanvalkenburg April 17, 2026 14:26 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

Comment thread python/packages/hyperlight/samples/codeact_benchmark.py

Comment thread python/packages/hyperlight/tests/hyperlight/test_hyperlight_codeact.py

Comment thread python/packages/hyperlight/tests/hyperlight/test_hyperlight_codeact.py

eavanvalkenburg enabled auto-merge April 17, 2026 14:46

moonbox3 approved these changes Apr 20, 2026

View reviewed changes

improved sandbox dependency

98c1e96

westey-m approved these changes Apr 20, 2026

View reviewed changes

eavanvalkenburg added this pull request to the merge queue Apr 20, 2026

Merged via the queue into microsoft:main with commit 69894ed Apr 20, 2026
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Flatten hyperlight execute_code output#5333

Python: Flatten hyperlight execute_code output#5333
eavanvalkenburg merged 2 commits intomicrosoft:mainfrom
eavanvalkenburg:hyperlight_fix

eavanvalkenburg commented Apr 17, 2026

Uh oh!

moonbox3 commented Apr 17, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

eavanvalkenburg commented Apr 17, 2026

Motivation and Context

Description

Uh oh!

moonbox3 commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Unit Test Overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

moonbox3 commented Apr 17, 2026 •

edited

Loading