Skip to content

Python: CosmosHistoryProvider Code interpreter tool calls are saved chunk by chunk #5793

@moonbox3

Description

@moonbox3

Discussed in #5715

Originally posted by CristinaStn May 8, 2026
When using code interpreter with streaming, the cosmos history provider receives multiple code interpreter content items each with the same call_id but different sequence_number values. This requires custom aggregation logic to prevent storing hundreds of redundant chunks. Finally, there is also complete code_interpreter script along those chunk-by-chunk items.
Here is an example of a CosmosDB session message:

{
    "id": "...",
    "session_id": "...",
    "sort_key": 1778239085946190300,
    "source_id": "azure_cosmos_history",
    "message": {
        "type": "message",
        "role": "assistant",
        "contents": [
            {
                "type": "text_reasoning",
                "text": "",
                "id": "rs_0f37a82e9edb89710069fdc661205c8190a71ebbc318e9b3b8",
                "additional_properties": {}
            },
            {
                "type": "code_interpreter_tool_result",
                "call_id": "ci_0f37a82e9edb89710069fdc66853ec8190be1cbda7c3a44e83",
                "outputs": [],
                "additional_properties": {}
            },
            {
                "type": "code_interpreter_tool_call",
                "call_id": "ci_0f37a82e9edb89710069fdc66853ec8190be1cbda7c3a44e83",
                "inputs": [
                    {
                        "type": "text",
                        "text": "import",
                        "additional_properties": {
                            "output_index": 1,
                            "sequence_number": 6,
                            "item_id": "ci_0f37a82e9edb89710069fdc66853ec8190be1cbda7c3a44e83"
                        }
                    }
                ],
                "additional_properties": {
                    "output_index": 1,
                    "sequence_number": 6,
                    "item_id": "ci_0f37a82e9edb89710069fdc66853ec8190be1cbda7c3a44e83"
                }
            },
            {
                "type": "code_interpreter_tool_call",
                "call_id": "ci_0f37a82e9edb89710069fdc66853ec8190be1cbda7c3a44e83",
                "inputs": [
                    {
                        "type": "text",
                        "text": " pandas",
                        "additional_properties": {
                            "output_index": 1,
                            "sequence_number": 7,
                            "item_id": "ci_0f37a82e9edb89710069fdc66853ec8190be1cbda7c3a44e83"
                        }
                    }
                ],
                "additional_properties": {
                    "output_index": 1,
                    "sequence_number": 7,
                    "item_id": "ci_0f37a82e9edb89710069fdc66853ec8190be1cbda7c3a44e83"
                }
            },

....................5000 more json lines............
{
                "type": "code_interpreter_tool_call",
                "call_id": "ci_0f37a82e9edb89710069fdc66853ec8190be1cbda7c3a44e83",
                "inputs": [
                    {
                        "type": "text",
                        "text": "import pandas as pd\r\n\r\ndata = [\r\n    {\"Color\": \"Black\", \"Age\": 2, \"name\": \"Luna\"},\r\n    {\"Color\": \"White\", \"Age\": 4, \"name\": \"Snowball\"},\r\n    {\"Color\": \"Calico\", \"Age\": 1, \"name\": \"Patches\"},\r\n    {\"Color\": \"Tabby\", \"Age\": 5, \"name\": \"Tiger\"},\r\n    {\"Color\": \"Gray\", \"Age\": 3, \"name\": \"Smokey\"},\r\n    {\"Color\": \"Orange\", \"Age\": 7, \"name\": \"Marmalade\"},\r\n    {\"Color\": \"Tortoiseshell\", \"Age\": 2, \"name\": \"Pebbles\"},\r\n    {\"Color\": \"Brown\", \"Age\": 6, \"name\": \"Mocha\"},\r\n    {\"Color\": \"Cream\", \"Age\": 8, \"name\": \"Biscuit\"},\r\n    {\"Color\": \"Blue\", \"Age\": 10, \"name\": \"Misty\"},\r\n]\r\n\r\ndf = pd.DataFrame(data, columns=[\"Color\", \"Age\", \"name\"])\r\nfile_path = \"/mnt/data/cats.xlsx\"\r\ndf.to_excel(file_path, index=False)\r\nfile_path",
                        "additional_properties": {
                            "output_index": 1,
                            "sequence_number": 261,
                            "item_id": "ci_0f37a82e9edb89710069fdc66853ec8190be1cbda7c3a44e83"
                        }
                    }
                ],
                "additional_properties": {
                    "output_index": 1,
                    "sequence_number": 261,
                    "item_id": "ci_0f37a82e9edb89710069fdc66853ec8190be1cbda7c3a44e83"
                }
            }
]

Expected Behaviour:

By the time save_messages() is called on the history provider, each code interpreter tool call/result should appear as a single content item with the complete, aggregated text.

Workaround:

I have implemented a CustomCosmosHistoryProvider which overwrites save_messages, aggregating all code_interpreter_tool_calls with same id. However, there are many drawbacks of such approach due to maintenance debt as agent-framework release new features, testing burden, breaking changes risk.

Code sample:

Use sample CosmosHistoryProvider and enable streaming: https://github.com/microsoft/agent-framework/blob/main/python/samples/02-agents/conversations/cosmos_history_provider_conversation_persistence.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpython

    Type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions