-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Description
Please read this first
- Have you read the docs? Yes — Agents SDK docs
- Have you searched for related issues? Yes — no existing issues cover this.
Describe the bug
OpenAIServerConversationTracker.hydrate_from_state stores id() of temporary dict objects in self.sent_items. When original_input is a string, ItemHelpers.input_to_new_input_list wraps it in a temporary dict. The id() of that dict is added to sent_items, but no strong reference is kept, so the dict is immediately eligible for garbage collection.
When CPython reuses the same memory address for a later allocation (e.g. a function_call_output dict created during HITL rejection), that new object's id() collides with the stale entry in sent_items. prepare_input then considers it "already sent" and drops it from the API payload, producing:
400 - "No tool output found for function call <call_id>."
The bug is in oai_conversation.py, inside hydrate_from_state:
for item in ItemHelpers.input_to_new_input_list(normalized_input):
if item is None:
continue
self.sent_items.add(id(item)) # ← id() of ephemeral dict; no reference keptsent_initial_input / remaining_initial_input already control whether the original user input is replayed, so these transient IDs don't appear necessary for correctness. The cleanest fix would be to not add them to sent_items in the first place.
Debug information
- Agents SDK version:
v0.13.1 - Python version: Python 3.12.8
Repro steps
Self-contained script — no API calls or network required, runs in < 1 second:
"""
Deterministic repro: OpenAIServerConversationTracker.hydrate_from_state stores
id() of ephemeral objects in sent_items, causing false-positive dedup after GC.
No API calls or network required. Runs in < 1 second.
Bug: When original_input is a string, hydrate_from_state converts it via
ItemHelpers.input_to_new_input_list into a temporary dict, stores id(dict) in
sent_items, then lets the dict be garbage collected. Any later dict allocated at
the same address (e.g. the rejection function_call_output) is wrongly considered
"already sent" and dropped from the API payload.
SDK version: openai-agents 0.13.1
File: agents/run_internal/oai_conversation.py, hydrate_from_state
"""
import gc
import sys
from openai.types.responses import ResponseFunctionToolCall, ResponseReasoningItem
from openai.types.responses.response_reasoning_item import Summary
from agents.items import (
ModelResponse,
ReasoningItem,
ToolApprovalItem,
ToolCallItem,
ToolCallOutputItem,
)
from agents.usage import Usage
from agents.run_internal.oai_conversation import OpenAIServerConversationTracker
class FakeAgent:
name = "fake"
agent = FakeAgent()
reasoning_obj = ResponseReasoningItem(
id="rs_001", type="reasoning",
summary=[Summary(text="thinking", type="summary_text")],
)
function_call_obj = ResponseFunctionToolCall(
id="fc_001", type="function_call", call_id="call_ABC",
name="my_tool", arguments='{"x": 1}', status="completed",
)
reasoning_raw = {
"type": "reasoning", "id": "rs_001",
"summary": [{"text": "thinking", "type": "summary_text"}],
}
function_call_raw = {
"type": "function_call", "id": "fc_001", "call_id": "call_ABC",
"name": "my_tool", "arguments": '{"x": 1}', "status": "completed",
}
function_call_raw_copy = dict(function_call_raw)
generated_items = [
ReasoningItem(agent=agent, raw_item=reasoning_obj),
ToolCallItem(agent=agent, raw_item=function_call_raw),
ToolApprovalItem(agent=agent, raw_item=function_call_raw_copy, tool_name="my_tool"),
]
model_response = ModelResponse(
output=[reasoning_obj, function_call_obj],
usage=Usage(),
response_id="resp_001",
)
# --- Step 1: Hydrate tracker (simulates a resumed run) ---
tracker = OpenAIServerConversationTracker(previous_response_id="resp_001")
tracker.hydrate_from_state(
original_input="Do something",
generated_items=generated_items,
model_responses=[model_response],
)
print(f"sent_items after hydrate: {tracker.sent_items}")
print(f" count: {len(tracker.sent_items)}")
# --- Step 2: Find the stale id (from the GC'd temp dict) ---
known_raw_ids = {id(reasoning_obj), id(function_call_raw), id(function_call_raw_copy)}
stale_ids = tracker.sent_items - known_raw_ids
print(f"\nStale ids (from GC'd temp objects): {stale_ids}")
if not stale_ids:
print("No stale ids found (unexpected).")
sys.exit(1)
stale_id = stale_ids.pop()
print(f" stale id = {stale_id}")
# --- Step 3: Force-allocate a rejection dict at that exact address ---
gc.collect()
rejection_raw = None
pool = []
for i in range(500_000):
candidate = {
"type": "function_call_output",
"call_id": "call_ABC",
"output": "Rejected.",
}
if id(candidate) == stale_id:
rejection_raw = candidate
break
pool.append(candidate)
if rejection_raw is None:
print(f"Could not reproduce id() reuse after {i+1} attempts.")
sys.exit(0)
print(f"\nReproduced id() reuse after {i+1} attempts")
print(f" rejection_raw id = {id(rejection_raw)}")
print(f" id in sent_items: {id(rejection_raw) in tracker.sent_items}")
del pool
gc.collect()
# --- Step 4: Build items as they'd appear after HITL rejection ---
rejection_item = ToolCallOutputItem(
agent=agent, raw_item=rejection_raw, output="Rejected.",
)
items_after_resolve = [
generated_items[0], # reasoning
generated_items[1], # tool_call
rejection_item, # rejection output (replaces tool_approval)
]
# --- Step 5: Observe the bug ---
result = tracker.prepare_input("Do something", items_after_resolve)
print(f"\nprepare_input returned {len(result)} items")
if len(result) == 0:
print("\n*** BUG REPRODUCED ***")
print("The rejection function_call_output was dropped because its id()")
print("matched a stale entry in sent_items from a GC'd temporary dict.")
print("The API would receive 0 input items and return:")
print(' 400 - "No tool output found for function call call_ABC."')
else:
print("\nBug not reproduced — items included correctly.")
for idx, item in enumerate(result):
t = item.get("type") if isinstance(item, dict) else "?"
print(f" [{idx}] {t}")Expected behavior
prepare_input should include the rejection function_call_output in the API payload. The item is new and was never sent to the API, so it should not be filtered by dedup.