Python: Add file_ids support and per-message file attachment for Azure AI code interpreter#4201
Python: Add file_ids support and per-message file attachment for Azure AI code interpreter#4201giles17 wants to merge 10 commits intomicrosoft:mainfrom
Conversation
…et_code_interpreter_tool() Update the factory method to accept file_ids and data_sources keyword arguments, matching the underlying azure.ai.agents SDK CodeInterpreterTool constructor. This enables users to attach uploaded files for code interpreter analysis. Fixes microsoft#4050 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
file_ids support to AzureAIAgentClient.get_code_interpreter_tool()
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Pull request overview
This PR adds file_ids and data_sources support to AzureAIAgentClient.get_code_interpreter_tool(), enabling users to attach uploaded files to the code interpreter for analysis. This addresses issue #4050 where users couldn't provide file attachments through the framework's factory method.
Changes:
- Extended
get_code_interpreter_tool()to acceptfile_idsanddata_sourceskeyword arguments that are forwarded to the Azure Agents SDK'sCodeInterpreterTool - Added
VectorStoreDataSourceimport to support the new parameter type - Added comprehensive test coverage for basic instantiation, file_ids forwarding, and integration with the tool preparation pipeline
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
python/packages/azure-ai/agent_framework_azure_ai/_chat_client.py |
Added file_ids and data_sources parameters to get_code_interpreter_tool() method with updated docstring and examples; imported VectorStoreDataSource type |
python/packages/azure-ai/tests/test_azure_ai_agent_client.py |
Added three test cases: basic tool creation, file_ids parameter forwarding, and tool_resources population in run options |
python/packages/azure-ai/agent_framework_azure_ai/_chat_client.py
Outdated
Show resolved
Hide resolved
Add hosted_file handling in _prepare_messages() to convert Content.from_hosted_file() into MessageAttachment on ThreadMessageOptions. This enables per-message file scoping for code interpreter, matching the underlying Azure AI Agents SDK MessageAttachment pattern. - Add hosted_file case in _prepare_messages() match statement - Import MessageAttachment from azure.ai.agents.models - Add sample for per-message CSV file attachment with code interpreter - Add employees.csv test data file - Add 3 unit tests for hosted_file attachment conversion Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
file_ids support to AzureAIAgentClient.get_code_interpreter_tool()
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 3 | Confidence: 82%
✗ Correctness
The change correctly adds Content object support to
resolve_file_idsand propagates file attachments via MessageAttachment in_prepare_messages. However, there are two correctness bugs: a test assertion that checks whether a plain string is contained in a list ofVectorStoreDataSourceobjects (will always be False/fail), and a missing null-guard oncontent.file_idbefore constructingMessageAttachment. Additionally,CodeInterpreterTool().definitionsis hardcoded as the tool type for every hosted-file attachment, which silently overrides the user's intent if they intended file_search.
✗ Security Reliability
The diff introduces Content object support for code interpreter file_ids and per-message file attachments via MessageAttachment. The resolve_file_ids helper correctly validates Content types and raises on missing file_id. The main security/reliability concerns are: (1) hosted_file attachments in _prepare_messages always attach CodeInterpreterTool definitions without validating whether the agent actually has code interpreter enabled, which could cause silent runtime failures or unexpected tool activation; (2) the sample leaks the uploaded file ID if agent creation fails before the try block but after upload (though the finally block covers the nominal path); (3) empty string file_ids are accepted without validation and would be forwarded to the SDK, likely causing obscure errors; (4) the _client.py path that reads file_ids from a container dict does so before resolve_file_ids is called, so Content objects extracted from a dict would bypass resolution—though in practice dicts wouldn't contain Content objects, this is a subtle inconsistency.
✗ Test Coverage
The diff introduces
resolve_file_idsin_shared.py, addshosted_fileattachment support in_prepare_messages, and extends bothget_code_interpreter_toolimplementations to acceptContentobjects. Test coverage is generally solid: the happy paths (plain strings, Content objects, mixed lists, unsupported Content types, data_sources, mutually exclusive error) are all covered in both client test files. The_prepare_messagespath forhosted_fileis also well tested including edge cases like file-only messages and multiple files. However, several specific code paths inresolve_file_idsand related logic are not tested: (1) theContent.file_id is Nonebranch that raisesValueErrorhas no test, (2)resolve_file_idscalled with an empty list[]returnsNone— this behavior has no explicit test and may surprise callers, (3) the_client.pypath wherefile_idsis extracted from acontainerdict and then passed throughresolve_file_idsnow acceptsContentobjects in that dict, but there is no test coveringContentobjects inside acontainerdict, and (4) there are no tests for theAzureAIClient.get_code_interpreter_toolpath wherefile_ids=Noneand an empty list is passed (verifyingNoneis returned fromresolve_file_ids).
Blocking Issues
- Test
test_azure_ai_chat_client_get_code_interpreter_tool_with_data_sourcesasserts"test-asset-id" in tool.data_sources, butdata_sourcesis a list ofVectorStoreDataSourceobjects, not strings. The membership check will always beFalse, making the test pass vacuously or fail depending on collection semantics—it does not actually verify the data source was forwarded. -
_prepare_messagesreadscontent.file_idwithout checking forNone. If aContentobject withhosted_filetype somehow has aNonefile_id (e.g., constructed directly rather than viaContent.from_hosted_file), aMessageAttachment(file_id=None, ...)is silently created and sent to the Azure SDK, which will likely produce a confusing API-level error rather than a clear Python exception. - In _prepare_messages, hosted_file content unconditionally attaches CodeInterpreterTool definitions to the message, regardless of whether the agent was configured with a code interpreter tool. If the agent lacks that tool, the Azure AI API will reject the request or the tool invocation will silently fail, making this a reliability hazard at a trust boundary where user-supplied content drives tool activation.
- No test covers the
Content.file_id is Nonebranch inresolve_file_ids, which raisesValueError. This error path is explicitly documented and should be validated.
Suggestions
- The
toolslist in thehosted_fileattachment case is hardcoded toCodeInterpreterTool().definitions. If the caller intends to attach the file for file_search, this silently misconfigures the attachment. Consider accepting an optionalattachment_toolsparameter in_prepare_messages, or at least documenting that hosted file attachments are always scoped to code_interpreter. - In
test_azure_ai_chat_client_get_code_interpreter_tool_basic,assert len(tool.file_ids) == 0assumes the Azure SDK normalizesfile_ids=Noneto[]. If the SDK preservesNone, this raisesTypeError. It would be safer to assertnot tool.file_idsor check the SDK's documented default. - When only attachments are present and
message_contentsis empty,ThreadMessageOptions(content=[], attachments=[...])is constructed. Verify the Azure AI Agents SDK accepts an emptycontentlist; some API versions reject messages with no content blocks. - In resolve_file_ids, add a guard to reject empty string file IDs (e.g.
if not item: raise ValueError(...)) to surface the error early rather than forwarding an empty string to the SDK. - The sample uploads the file before entering the try block, so if AzureAIAgentClient() or provider.create_agent() raises before the finally, the uploaded file is not cleaned up. Move the upload inside the try block or restructure so the finally always covers it.
- Consider accepting an optional
toolsparameter in the hosted_file attachment path (or reading it from the agent's configured tools) rather than hardcoding CodeInterpreterTool, to avoid coupling message preparation to a specific tool type. - Add a test for
resolve_file_idswhen passed an empty list[]— currently it returnsNone, which may be unexpected and is not explicitly tested. - Add a test for
AzureAIClient.get_code_interpreter_toolwhencontaineris a dict containingContentobjects infile_ids— the newresolve_file_idscall now applies there too but it's untested. - Consider adding a test asserting that
get_code_interpreter_tool()with no arguments produces a tool withfile_idsasNone(or empty), to guard against regressions in the default no-args case for both clients. - The
_prepare_messagestest forhosted_filechecksmsg.attachments[0]["file_id"]using dict access — ifMessageAttachmentis a model object, this may work via__getitem__but is fragile; consider asserting.file_idattribute directly if the SDK supports it.
Automated review by moonbox3's agents
| from azure.ai.agents.models import CodeInterpreterTool, VectorStoreDataSource | ||
|
|
||
| ds = VectorStoreDataSource(asset_identifier="test-asset-id", asset_type="id_asset") | ||
| tool = AzureAIAgentClient.get_code_interpreter_tool(data_sources=[ds]) |
There was a problem hiding this comment.
tool.data_sources is a list of VectorStoreDataSource objects, so "test-asset-id" in tool.data_sources will always be False. Assert on the actual object or one of its fields instead.
| tool = AzureAIAgentClient.get_code_interpreter_tool(data_sources=[ds]) | |
| assert any(ds.asset_identifier == "test-asset-id" for ds in tool.data_sources) |
| case "hosted_file": | ||
| attachments.append( | ||
| MessageAttachment( | ||
| file_id=content.file_id, |
There was a problem hiding this comment.
content.file_id is not validated before use. If it is None, a MessageAttachment with file_id=None is sent to the SDK, producing an opaque API error. Add an explicit guard.
| file_id=content.file_id, | |
| file_id=content.file_id if content.file_id is not None else (_ for _ in ()).throw(ValueError(f"hosted_file Content object is missing a file_id: {content!r}")), |
| message_contents.append(MessageInputTextBlock(text=content.text)) # type: ignore[arg-type] | ||
| case "hosted_file": | ||
| attachments.append( | ||
| MessageAttachment( |
There was a problem hiding this comment.
This unconditionally attaches CodeInterpreterTool definitions for every hosted_file content item, regardless of whether the agent was actually configured with a code interpreter. If the agent doesn't have this tool, the API will reject the request. Consider passing the agent's active tools into _prepare_messages and using them here, or at least documenting that hosted_file content requires code interpreter to be enabled.
| MessageAttachment( | |
| case "hosted_file": | |
| attachments.append( | |
| MessageAttachment( | |
| file_id=content.file_id, | |
| tools=CodeInterpreterTool().definitions, | |
| ) | |
| ) |
| for item in file_ids: | ||
| if isinstance(item, str): | ||
| resolved.append(item) | ||
| elif isinstance(item, Content): |
There was a problem hiding this comment.
Empty string file IDs pass through without validation and will be forwarded to the SDK, likely causing cryptic errors. Add an explicit check.
| elif isinstance(item, Content): | |
| if isinstance(item, str): | |
| if not item: | |
| raise ValueError("file_ids must not contain empty strings.") | |
| resolved.append(item) |
| purpose=FilePurpose.AGENTS, | ||
| ) | ||
| print(f"Uploaded file, file ID: {uploaded.id}\n") | ||
|
|
There was a problem hiding this comment.
The file is uploaded before the try/finally block, so if anything between the upload and the try raises, the file will not be cleaned up. Move the upload inside the try block.
| try: | |
| # 1. Upload the CSV file | |
| csv_file_path = Path(__file__).parents[3] / "shared" / "resources" / "employees.csv" | |
| print(f"Uploading file from: {csv_file_path}") | |
| uploaded = await agents_client.files.upload_and_poll( | |
| file_path=str(csv_file_path), | |
| purpose=FilePurpose.AGENTS, | |
| ) | |
| print(f"Uploaded file, file ID: {uploaded.id}\n") | |
| # 2. Create a code interpreter tool (no file_ids — files attached per-message) | |
| client = AzureAIAgentClient(credential=credential) |
|
|
||
| ds = VectorStoreDataSource(asset_identifier="test-asset-id", asset_type="id_asset") | ||
| with pytest.raises(ValueError, match="mutually exclusive"): | ||
| AzureAIAgentClient.get_code_interpreter_tool(file_ids=["file-abc"], data_sources=[ds]) |
There was a problem hiding this comment.
There is no test for the case where a Content object has file_id=None. The resolve_file_ids function explicitly raises ValueError in this branch, but it is never exercised. Add a test like: Content.from_hosted_file(None) (or construct one with file_id=None) and assert ValueError is raised.
| AzureAIAgentClient.get_code_interpreter_tool(file_ids=["file-abc"], data_sources=[ds]) | |
| async def test_azure_ai_chat_client_get_code_interpreter_tool_content_missing_file_id() -> None: | |
| """Test get_code_interpreter_tool raises ValueError when Content.file_id is None.""" | |
| from agent_framework import Content | |
| content = Content.from_hosted_file(None) # or construct with file_id=None | |
| with pytest.raises(ValueError, match="missing a file_id"): | |
| AzureAIAgentClient.get_code_interpreter_tool(file_ids=[content]) |
Summary
Adds
file_idsanddata_sourcessupport toget_code_interpreter_tool()on both V1 (AzureAIAgentClient) and V2 (AzureAIClient) clients, and adds per-message file attachment support viaContent.from_hosted_file()for the V1 client.Problem
get_code_interpreter_tool()accepted no parameters on either client — users had no way to attach files for code interpreter analysis through the framework's factory method.AzureAIAgentClient._prepare_messages()had nohosted_filehandling, soContent.from_hosted_file()was silently dropped. This meant per-message file scoping (viaMessageAttachment) was not possible.Changes
Tool-level file_ids (both clients)
_chat_client.py(V1):get_code_interpreter_tool()now acceptsfile_ids: list[str | Content]anddata_sources: list[VectorStoreDataSource], forwarding to the SDK'sCodeInterpreterTool._client.py(V2): Samefile_ids: list[str | Content]widening, usingCodeInterpreterToolAuto(file_ids=...)._shared.py: Addedresolve_file_ids()helper to extract file IDs from bothstrandContent.from_hosted_file()objects.file_idsacceptsContent.from_hosted_file()objects alongside plain strings for ergonomic usage.Per-message file attachment (V1 client)
_chat_client.py: Addedhosted_filecase in_prepare_messages()that convertsContent.from_hosted_file()intoMessageAttachmentonThreadMessageOptions. This matches the underlying Azure AI Agents SDKMessageAttachmentpattern for per-message file scoping.Tests
test_azure_ai_agent_client.py: Added tests for file_ids, data_sources, mutual exclusivity, Content objects, mixed types, unsupported Content types, and 3 tests for hosted_file → MessageAttachment conversion.test_azure_ai_client.py: Added Content tests for V2 client.test_agent_provider.py: Added tests for tool_resources flow.Sample
azure_ai_with_code_interpreter_file_upload.py: New sample demonstrating per-message CSV file attachment with V1 client usingContent.from_hosted_file().employees.csv: Test data for the sample.Usage
Tool-level (file shared across thread)
Per-message (file scoped to single message)
Fixes #4050