Skip to content

Python: Add file_ids support and per-message file attachment for Azure AI code interpreter#4201

Open
giles17 wants to merge 10 commits intomicrosoft:mainfrom
giles17:azure_ai_code_int
Open

Python: Add file_ids support and per-message file attachment for Azure AI code interpreter#4201
giles17 wants to merge 10 commits intomicrosoft:mainfrom
giles17:azure_ai_code_int

Conversation

@giles17
Copy link
Contributor

@giles17 giles17 commented Feb 23, 2026

Summary

Adds file_ids and data_sources support to get_code_interpreter_tool() on both V1 (AzureAIAgentClient) and V2 (AzureAIClient) clients, and adds per-message file attachment support via Content.from_hosted_file() for the V1 client.

Problem

  1. get_code_interpreter_tool() accepted no parameters on either client — users had no way to attach files for code interpreter analysis through the framework's factory method.
  2. AzureAIAgentClient._prepare_messages() had no hosted_file handling, so Content.from_hosted_file() was silently dropped. This meant per-message file scoping (via MessageAttachment) was not possible.

Changes

Tool-level file_ids (both clients)

  • _chat_client.py (V1): get_code_interpreter_tool() now accepts file_ids: list[str | Content] and data_sources: list[VectorStoreDataSource], forwarding to the SDK's CodeInterpreterTool.
  • _client.py (V2): Same file_ids: list[str | Content] widening, using CodeInterpreterToolAuto(file_ids=...).
  • _shared.py: Added resolve_file_ids() helper to extract file IDs from both str and Content.from_hosted_file() objects.
  • file_ids accepts Content.from_hosted_file() objects alongside plain strings for ergonomic usage.

Per-message file attachment (V1 client)

  • _chat_client.py: Added hosted_file case in _prepare_messages() that converts Content.from_hosted_file() into MessageAttachment on ThreadMessageOptions. This matches the underlying Azure AI Agents SDK MessageAttachment pattern for per-message file scoping.

Tests

  • test_azure_ai_agent_client.py: Added tests for file_ids, data_sources, mutual exclusivity, Content objects, mixed types, unsupported Content types, and 3 tests for hosted_file → MessageAttachment conversion.
  • test_azure_ai_client.py: Added Content tests for V2 client.
  • test_agent_provider.py: Added tests for tool_resources flow.

Sample

  • azure_ai_with_code_interpreter_file_upload.py: New sample demonstrating per-message CSV file attachment with V1 client using Content.from_hosted_file().
  • employees.csv: Test data for the sample.

Usage

Tool-level (file shared across thread)

tool = client.get_code_interpreter_tool(file_ids=["file-abc123"])
agent = await provider.create_agent(tools=[tool], ...)
response = await agent.run("Analyze the CSV.")

Per-message (file scoped to single message)

file_content = Content.from_hosted_file(file_id=uploaded.id)
message = Message(role="user", contents=["Analyze the CSV.", file_content])
response = await agent.run(message)

Fixes #4050

…et_code_interpreter_tool()

Update the factory method to accept file_ids and data_sources keyword
arguments, matching the underlying azure.ai.agents SDK CodeInterpreterTool
constructor. This enables users to attach uploaded files for code
interpreter analysis.

Fixes microsoft#4050

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 23, 2026 23:34
@giles17 giles17 added bug Something isn't working python labels Feb 23, 2026
@giles17 giles17 changed the title Python: Add file_ids and data_sources support to AzureAIAgentClient.get_code_interpreter_tool() Python: Add file_ids support to AzureAIAgentClient.get_code_interpreter_tool() Feb 23, 2026
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Feb 23, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/azure-ai/agent_framework_azure_ai
   _chat_client.py4827584%416–417, 419, 603, 608–609, 611–612, 615, 618, 620, 625, 886–887, 889, 892, 895, 898–903, 906, 908, 916, 928–930, 934, 937–938, 946–949, 959, 967–970, 972–973, 975–976, 983, 991–992, 1000–1001, 1006–1007, 1011–1018, 1023, 1026, 1034, 1040, 1048–1050, 1053, 1075–1076, 1209, 1237, 1252, 1377, 1499
   _client.py3674587%388, 390, 433, 441–453, 466, 526, 541–546, 589, 624, 626, 647–648, 651, 692, 695, 697, 788, 819, 861, 1070, 1073, 1076–1077, 1079–1082, 1125
   _shared.py2472291%110, 143, 179, 227, 229–231, 263, 306, 318–320, 349, 351, 353, 357, 422, 453, 494, 501, 513, 532
TOTAL22193276387% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
4686 247 💤 0 ❌ 0 🔥 1m 20s ⏱️

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds file_ids and data_sources support to AzureAIAgentClient.get_code_interpreter_tool(), enabling users to attach uploaded files to the code interpreter for analysis. This addresses issue #4050 where users couldn't provide file attachments through the framework's factory method.

Changes:

  • Extended get_code_interpreter_tool() to accept file_ids and data_sources keyword arguments that are forwarded to the Azure Agents SDK's CodeInterpreterTool
  • Added VectorStoreDataSource import to support the new parameter type
  • Added comprehensive test coverage for basic instantiation, file_ids forwarding, and integration with the tool preparation pipeline

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
python/packages/azure-ai/agent_framework_azure_ai/_chat_client.py Added file_ids and data_sources parameters to get_code_interpreter_tool() method with updated docstring and examples; imported VectorStoreDataSource type
python/packages/azure-ai/tests/test_azure_ai_agent_client.py Added three test cases: basic tool creation, file_ids parameter forwarding, and tool_resources population in run options

giles17 and others added 5 commits February 24, 2026 11:20
Add hosted_file handling in _prepare_messages() to convert
Content.from_hosted_file() into MessageAttachment on ThreadMessageOptions.
This enables per-message file scoping for code interpreter, matching the
underlying Azure AI Agents SDK MessageAttachment pattern.

- Add hosted_file case in _prepare_messages() match statement
- Import MessageAttachment from azure.ai.agents.models
- Add sample for per-message CSV file attachment with code interpreter
- Add employees.csv test data file
- Add 3 unit tests for hosted_file attachment conversion

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@giles17 giles17 changed the title Python: Add file_ids support to AzureAIAgentClient.get_code_interpreter_tool() Python: Add file_ids support and per-message file attachment for Azure AI code interpreter Feb 25, 2026
Copy link
Contributor

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 3 | Confidence: 82%

✗ Correctness

The change correctly adds Content object support to resolve_file_ids and propagates file attachments via MessageAttachment in _prepare_messages. However, there are two correctness bugs: a test assertion that checks whether a plain string is contained in a list of VectorStoreDataSource objects (will always be False/fail), and a missing null-guard on content.file_id before constructing MessageAttachment. Additionally, CodeInterpreterTool().definitions is hardcoded as the tool type for every hosted-file attachment, which silently overrides the user's intent if they intended file_search.

✗ Security Reliability

The diff introduces Content object support for code interpreter file_ids and per-message file attachments via MessageAttachment. The resolve_file_ids helper correctly validates Content types and raises on missing file_id. The main security/reliability concerns are: (1) hosted_file attachments in _prepare_messages always attach CodeInterpreterTool definitions without validating whether the agent actually has code interpreter enabled, which could cause silent runtime failures or unexpected tool activation; (2) the sample leaks the uploaded file ID if agent creation fails before the try block but after upload (though the finally block covers the nominal path); (3) empty string file_ids are accepted without validation and would be forwarded to the SDK, likely causing obscure errors; (4) the _client.py path that reads file_ids from a container dict does so before resolve_file_ids is called, so Content objects extracted from a dict would bypass resolution—though in practice dicts wouldn't contain Content objects, this is a subtle inconsistency.

✗ Test Coverage

The diff introduces resolve_file_ids in _shared.py, adds hosted_file attachment support in _prepare_messages, and extends both get_code_interpreter_tool implementations to accept Content objects. Test coverage is generally solid: the happy paths (plain strings, Content objects, mixed lists, unsupported Content types, data_sources, mutually exclusive error) are all covered in both client test files. The _prepare_messages path for hosted_file is also well tested including edge cases like file-only messages and multiple files. However, several specific code paths in resolve_file_ids and related logic are not tested: (1) the Content.file_id is None branch that raises ValueError has no test, (2) resolve_file_ids called with an empty list [] returns None — this behavior has no explicit test and may surprise callers, (3) the _client.py path where file_ids is extracted from a container dict and then passed through resolve_file_ids now accepts Content objects in that dict, but there is no test covering Content objects inside a container dict, and (4) there are no tests for the AzureAIClient.get_code_interpreter_tool path where file_ids=None and an empty list is passed (verifying None is returned from resolve_file_ids).

Blocking Issues

  • Test test_azure_ai_chat_client_get_code_interpreter_tool_with_data_sources asserts "test-asset-id" in tool.data_sources, but data_sources is a list of VectorStoreDataSource objects, not strings. The membership check will always be False, making the test pass vacuously or fail depending on collection semantics—it does not actually verify the data source was forwarded.
  • _prepare_messages reads content.file_id without checking for None. If a Content object with hosted_file type somehow has a None file_id (e.g., constructed directly rather than via Content.from_hosted_file), a MessageAttachment(file_id=None, ...) is silently created and sent to the Azure SDK, which will likely produce a confusing API-level error rather than a clear Python exception.
  • In _prepare_messages, hosted_file content unconditionally attaches CodeInterpreterTool definitions to the message, regardless of whether the agent was configured with a code interpreter tool. If the agent lacks that tool, the Azure AI API will reject the request or the tool invocation will silently fail, making this a reliability hazard at a trust boundary where user-supplied content drives tool activation.
  • No test covers the Content.file_id is None branch in resolve_file_ids, which raises ValueError. This error path is explicitly documented and should be validated.

Suggestions

  • The tools list in the hosted_file attachment case is hardcoded to CodeInterpreterTool().definitions. If the caller intends to attach the file for file_search, this silently misconfigures the attachment. Consider accepting an optional attachment_tools parameter in _prepare_messages, or at least documenting that hosted file attachments are always scoped to code_interpreter.
  • In test_azure_ai_chat_client_get_code_interpreter_tool_basic, assert len(tool.file_ids) == 0 assumes the Azure SDK normalizes file_ids=None to []. If the SDK preserves None, this raises TypeError. It would be safer to assert not tool.file_ids or check the SDK's documented default.
  • When only attachments are present and message_contents is empty, ThreadMessageOptions(content=[], attachments=[...]) is constructed. Verify the Azure AI Agents SDK accepts an empty content list; some API versions reject messages with no content blocks.
  • In resolve_file_ids, add a guard to reject empty string file IDs (e.g. if not item: raise ValueError(...)) to surface the error early rather than forwarding an empty string to the SDK.
  • The sample uploads the file before entering the try block, so if AzureAIAgentClient() or provider.create_agent() raises before the finally, the uploaded file is not cleaned up. Move the upload inside the try block or restructure so the finally always covers it.
  • Consider accepting an optional tools parameter in the hosted_file attachment path (or reading it from the agent's configured tools) rather than hardcoding CodeInterpreterTool, to avoid coupling message preparation to a specific tool type.
  • Add a test for resolve_file_ids when passed an empty list [] — currently it returns None, which may be unexpected and is not explicitly tested.
  • Add a test for AzureAIClient.get_code_interpreter_tool when container is a dict containing Content objects in file_ids — the new resolve_file_ids call now applies there too but it's untested.
  • Consider adding a test asserting that get_code_interpreter_tool() with no arguments produces a tool with file_ids as None (or empty), to guard against regressions in the default no-args case for both clients.
  • The _prepare_messages test for hosted_file checks msg.attachments[0]["file_id"] using dict access — if MessageAttachment is a model object, this may work via __getitem__ but is fragile; consider asserting .file_id attribute directly if the SDK supports it.

Automated review by moonbox3's agents

from azure.ai.agents.models import CodeInterpreterTool, VectorStoreDataSource

ds = VectorStoreDataSource(asset_identifier="test-asset-id", asset_type="id_asset")
tool = AzureAIAgentClient.get_code_interpreter_tool(data_sources=[ds])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tool.data_sources is a list of VectorStoreDataSource objects, so "test-asset-id" in tool.data_sources will always be False. Assert on the actual object or one of its fields instead.

Suggested change
tool = AzureAIAgentClient.get_code_interpreter_tool(data_sources=[ds])
assert any(ds.asset_identifier == "test-asset-id" for ds in tool.data_sources)

case "hosted_file":
attachments.append(
MessageAttachment(
file_id=content.file_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

content.file_id is not validated before use. If it is None, a MessageAttachment with file_id=None is sent to the SDK, producing an opaque API error. Add an explicit guard.

Suggested change
file_id=content.file_id,
file_id=content.file_id if content.file_id is not None else (_ for _ in ()).throw(ValueError(f"hosted_file Content object is missing a file_id: {content!r}")),

message_contents.append(MessageInputTextBlock(text=content.text)) # type: ignore[arg-type]
case "hosted_file":
attachments.append(
MessageAttachment(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unconditionally attaches CodeInterpreterTool definitions for every hosted_file content item, regardless of whether the agent was actually configured with a code interpreter. If the agent doesn't have this tool, the API will reject the request. Consider passing the agent's active tools into _prepare_messages and using them here, or at least documenting that hosted_file content requires code interpreter to be enabled.

Suggested change
MessageAttachment(
case "hosted_file":
attachments.append(
MessageAttachment(
file_id=content.file_id,
tools=CodeInterpreterTool().definitions,
)
)

for item in file_ids:
if isinstance(item, str):
resolved.append(item)
elif isinstance(item, Content):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty string file IDs pass through without validation and will be forwarded to the SDK, likely causing cryptic errors. Add an explicit check.

Suggested change
elif isinstance(item, Content):
if isinstance(item, str):
if not item:
raise ValueError("file_ids must not contain empty strings.")
resolved.append(item)

purpose=FilePurpose.AGENTS,
)
print(f"Uploaded file, file ID: {uploaded.id}\n")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file is uploaded before the try/finally block, so if anything between the upload and the try raises, the file will not be cleaned up. Move the upload inside the try block.

Suggested change
try:
# 1. Upload the CSV file
csv_file_path = Path(__file__).parents[3] / "shared" / "resources" / "employees.csv"
print(f"Uploading file from: {csv_file_path}")
uploaded = await agents_client.files.upload_and_poll(
file_path=str(csv_file_path),
purpose=FilePurpose.AGENTS,
)
print(f"Uploaded file, file ID: {uploaded.id}\n")
# 2. Create a code interpreter tool (no file_ids — files attached per-message)
client = AzureAIAgentClient(credential=credential)


ds = VectorStoreDataSource(asset_identifier="test-asset-id", asset_type="id_asset")
with pytest.raises(ValueError, match="mutually exclusive"):
AzureAIAgentClient.get_code_interpreter_tool(file_ids=["file-abc"], data_sources=[ds])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no test for the case where a Content object has file_id=None. The resolve_file_ids function explicitly raises ValueError in this branch, but it is never exercised. Add a test like: Content.from_hosted_file(None) (or construct one with file_id=None) and assert ValueError is raised.

Suggested change
AzureAIAgentClient.get_code_interpreter_tool(file_ids=["file-abc"], data_sources=[ds])
async def test_azure_ai_chat_client_get_code_interpreter_tool_content_missing_file_id() -> None:
"""Test get_code_interpreter_tool raises ValueError when Content.file_id is None."""
from agent_framework import Content
content = Content.from_hosted_file(None) # or construct with file_id=None
with pytest.raises(ValueError, match="missing a file_id"):
AzureAIAgentClient.get_code_interpreter_tool(file_ids=[content])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: Not able to read attached files when carryout simple analysis with csv files

5 participants