Skip to content

hydrate_from_state stores id() of ephemeral objects in sent_items, causing false-positive dedup after GC #2798

@pietrog30

Description

@pietrog30

Please read this first

  • Have you read the docs? Yes — Agents SDK docs
  • Have you searched for related issues? Yes — no existing issues cover this.

Describe the bug

OpenAIServerConversationTracker.hydrate_from_state stores id() of temporary dict objects in self.sent_items. When original_input is a string, ItemHelpers.input_to_new_input_list wraps it in a temporary dict. The id() of that dict is added to sent_items, but no strong reference is kept, so the dict is immediately eligible for garbage collection.

When CPython reuses the same memory address for a later allocation (e.g. a function_call_output dict created during HITL rejection), that new object's id() collides with the stale entry in sent_items. prepare_input then considers it "already sent" and drops it from the API payload, producing:

400 - "No tool output found for function call <call_id>."

The bug is in oai_conversation.py, inside hydrate_from_state:

for item in ItemHelpers.input_to_new_input_list(normalized_input):
    if item is None:
        continue
    self.sent_items.add(id(item))  # ← id() of ephemeral dict; no reference kept

sent_initial_input / remaining_initial_input already control whether the original user input is replayed, so these transient IDs don't appear necessary for correctness. The cleanest fix would be to not add them to sent_items in the first place.

Debug information

  • Agents SDK version: v0.13.1
  • Python version: Python 3.12.8

Repro steps

Self-contained script — no API calls or network required, runs in < 1 second:

"""
Deterministic repro: OpenAIServerConversationTracker.hydrate_from_state stores
id() of ephemeral objects in sent_items, causing false-positive dedup after GC.

No API calls or network required. Runs in < 1 second.

Bug: When original_input is a string, hydrate_from_state converts it via
ItemHelpers.input_to_new_input_list into a temporary dict, stores id(dict) in
sent_items, then lets the dict be garbage collected. Any later dict allocated at
the same address (e.g. the rejection function_call_output) is wrongly considered
"already sent" and dropped from the API payload.

SDK version: openai-agents 0.13.1
File: agents/run_internal/oai_conversation.py, hydrate_from_state
"""

import gc
import sys

from openai.types.responses import ResponseFunctionToolCall, ResponseReasoningItem
from openai.types.responses.response_reasoning_item import Summary

from agents.items import (
    ModelResponse,
    ReasoningItem,
    ToolApprovalItem,
    ToolCallItem,
    ToolCallOutputItem,
)
from agents.usage import Usage
from agents.run_internal.oai_conversation import OpenAIServerConversationTracker


class FakeAgent:
    name = "fake"


agent = FakeAgent()

reasoning_obj = ResponseReasoningItem(
    id="rs_001", type="reasoning",
    summary=[Summary(text="thinking", type="summary_text")],
)
function_call_obj = ResponseFunctionToolCall(
    id="fc_001", type="function_call", call_id="call_ABC",
    name="my_tool", arguments='{"x": 1}', status="completed",
)

reasoning_raw = {
    "type": "reasoning", "id": "rs_001",
    "summary": [{"text": "thinking", "type": "summary_text"}],
}
function_call_raw = {
    "type": "function_call", "id": "fc_001", "call_id": "call_ABC",
    "name": "my_tool", "arguments": '{"x": 1}', "status": "completed",
}
function_call_raw_copy = dict(function_call_raw)

generated_items = [
    ReasoningItem(agent=agent, raw_item=reasoning_obj),
    ToolCallItem(agent=agent, raw_item=function_call_raw),
    ToolApprovalItem(agent=agent, raw_item=function_call_raw_copy, tool_name="my_tool"),
]

model_response = ModelResponse(
    output=[reasoning_obj, function_call_obj],
    usage=Usage(),
    response_id="resp_001",
)

# --- Step 1: Hydrate tracker (simulates a resumed run) ---
tracker = OpenAIServerConversationTracker(previous_response_id="resp_001")
tracker.hydrate_from_state(
    original_input="Do something",
    generated_items=generated_items,
    model_responses=[model_response],
)

print(f"sent_items after hydrate: {tracker.sent_items}")
print(f"  count: {len(tracker.sent_items)}")

# --- Step 2: Find the stale id (from the GC'd temp dict) ---
known_raw_ids = {id(reasoning_obj), id(function_call_raw), id(function_call_raw_copy)}
stale_ids = tracker.sent_items - known_raw_ids
print(f"\nStale ids (from GC'd temp objects): {stale_ids}")
if not stale_ids:
    print("No stale ids found (unexpected).")
    sys.exit(1)
stale_id = stale_ids.pop()
print(f"  stale id = {stale_id}")

# --- Step 3: Force-allocate a rejection dict at that exact address ---
gc.collect()
rejection_raw = None
pool = []
for i in range(500_000):
    candidate = {
        "type": "function_call_output",
        "call_id": "call_ABC",
        "output": "Rejected.",
    }
    if id(candidate) == stale_id:
        rejection_raw = candidate
        break
    pool.append(candidate)

if rejection_raw is None:
    print(f"Could not reproduce id() reuse after {i+1} attempts.")
    sys.exit(0)

print(f"\nReproduced id() reuse after {i+1} attempts")
print(f"  rejection_raw id = {id(rejection_raw)}")
print(f"  id in sent_items: {id(rejection_raw) in tracker.sent_items}")

del pool
gc.collect()

# --- Step 4: Build items as they'd appear after HITL rejection ---
rejection_item = ToolCallOutputItem(
    agent=agent, raw_item=rejection_raw, output="Rejected.",
)
items_after_resolve = [
    generated_items[0],  # reasoning
    generated_items[1],  # tool_call
    rejection_item,      # rejection output (replaces tool_approval)
]

# --- Step 5: Observe the bug ---
result = tracker.prepare_input("Do something", items_after_resolve)

print(f"\nprepare_input returned {len(result)} items")
if len(result) == 0:
    print("\n*** BUG REPRODUCED ***")
    print("The rejection function_call_output was dropped because its id()")
    print("matched a stale entry in sent_items from a GC'd temporary dict.")
    print("The API would receive 0 input items and return:")
    print('  400 - "No tool output found for function call call_ABC."')
else:
    print("\nBug not reproduced — items included correctly.")
    for idx, item in enumerate(result):
        t = item.get("type") if isinstance(item, dict) else "?"
        print(f"  [{idx}] {t}")

Expected behavior

prepare_input should include the rejection function_call_output in the API payload. The item is new and was never sent to the API, so it should not be filtered by dedup.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions