-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Closed
Labels
Description
Initial Checks
- I confirm that I'm using the latest version of Pydantic AI
- I confirm that I searched for my issue in https://github.com/pydantic/pydantic-ai/issues before opening this issue
Description
Summary
Structured output validation fails intermittently when message history is provided to pydantic-ai agents, causing either validation errors after maximum retries or invalid citation formats in the output. But that does does not happen while using Vanilla open ai tool calling with vanilla open client with message history.
Expected Behavior
When using an agent with structured output (output_type with Pydantic model), the agent should:
- Return valid structured output matching the Pydantic model schema
- Work consistently regardless of whether message history is provided
- Pass validation on first attempt or retry successfully
Affected Versions
- Tested on pydantic-ai 1.1.0 + openai 1.108.1 - BUG PRESENT
- Tested on pydantic-ai 1.7.0 + openai 2.6.1 - BUG PRESENT
Actual Behavior
WITHOUT Message History (Works Correctly)
result = await agent.run(query, deps=deps)
# Returns citation: {"doc1": "HEALTHGATE_CONTRACT.pdf.md-2"} CorrectWITH Message History
result = await agent.run(query, deps=deps, message_history=message_history)
# Results in one of:
# 1. Error: "Exceeded maximum retries (3) for output validation"
# 2. Invalid output: {"doc1": 1} Number instead of string
# 3. Invalid output: {"doc1": "source"} Placeholder instead of document name
# 4. Invalid output: {"0": "doc1"} Wrong key formatHow to reproduce The the code provided bellow as example:
# Install dependencies
pip install pydantic-ai==1.7.0 openai python-dotenv
# Set environment variable
export OPENAI_API_KEY=your_key_here
# Run reproduction script
python examples/pydantic_ai_bug_report.py
# Run multiple times (bug is intermittent)
for i in {1..5}; do python examples/pydantic_ai_bug_report.py; sleep 2; doneReproduction Rate
- Intermittent: Bug appears in ~30-50% of runs with message history
- Consistent: WITHOUT message history always works (0% failure rate)
- With message history: Fails validation after 3 retries or returns invalid formats
When not succeeded following error appears:
Traceback (most recent call last):
File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3697, in run_code
await eval(code_obj, self.user_global_ns, self.user_ns)
File "/var/folders/tn/l_62dr1n6hzgchkrr465lqz40000gn/T/ipykernel_13858/2484678115.py", line 1, in <module>
result = await agent.run(
^^^^^^^^^^^^^^^^
File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_ai/agent/abstract.py", line 235, in run
async for node in agent_run:
File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_ai/run.py", line 148, in __anext__
task = await anext(self._graph_run)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_graph/beta/graph.py", line 410, in __anext__
self._next = await self._iterator.asend(self._next)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_graph/beta/graph.py", line 497, in iter_graph
with _unwrap_exception_groups():
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/datapsycho/.local/share/uv/python/cpython-3.12.9-macos-aarch64-none/lib/python3.12/contextlib.py", line 158, in __exit__
self.gen.throw(value)
File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_graph/beta/graph.py", line 866, in _unwrap_exception_groups
raise exception
File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_graph/beta/graph.py", line 638, in _run_tracked_task
result = await self._run_task(t_)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_graph/beta/graph.py", line 667, in _run_task
output = await node.call(step_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_graph/beta/step.py", line 253, in _call_node
return await node.run(GraphRunContext(state=ctx.state, deps=ctx.deps))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_ai/_agent_graph.py", line 540, in run
async with self.stream(ctx):
^^^^^^^^^^^^^^^^
File "/Users/datapsycho/.local/share/uv/python/cpython-3.12.9-macos-aarch64-none/lib/python3.12/contextlib.py", line 217, in __aexit__
await anext(self.gen)
File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_ai/_agent_graph.py", line 554, in stream
async for _event in stream:
File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_ai/_agent_graph.py", line 668, in _run_stream
async for event in self._events_iterator:
File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_ai/_agent_graph.py", line 629, in _run_stream
async for event in self._handle_tool_calls(ctx, tool_calls):
File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_ai/_agent_graph.py", line 684, in _handle_tool_calls
async for event in process_tool_calls(
File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_ai/_agent_graph.py", line 821, in process_tool_calls
ctx.state.increment_retries(
File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_ai/_agent_graph.py", line 122, in increment_retries
raise exceptions.UnexpectedModelBehavior(message) from error
pydantic_ai.exceptions.UnexpectedModelBehavior: Exceeded maximum retries (3) for output validation
Additional Notes
- The bug appears to be related to how pydantic-ai handles structured output when message history context is provided
- The underlying LLM (GPT-4.1 mini) doesn't have issues with message history in the vanilla implementation
- The issue seems to be in the structured output validation/coercion layer of pydantic-ai
Example Code
"""
Minimal Reproducible Example: pydantic-ai Message History Bug
This is a standalone script demonstrating a bug in pydantic-ai where structured output
validation fails when message history is provided.
Bug Summary:
- WITHOUT message history: Citations are properly formatted as {"doc1": "document_name.pdf"}
- WITH message history: Validation fails after 3 retries, or returns invalid formats like
{"doc1": 1}, {"doc1": "source"}, or {"0": "doc1"}
Tested Versions:
- pydantic-ai: 1.1.0 - BUG PRESENT ❌
- pydantic-ai: 1.7.0 - BUG PRESENT ❌ (tested 2025-10-29)
This example can be run independently without any contramate infrastructure.
Requirements:
pip install pydantic-ai==1.7.0 openai==2.6.1 python-dotenv
Environment:
OPENAI_API_KEY=your_key_here
"""
import asyncio
import os
from typing import Dict, Any, List, Optional
from dataclasses import dataclass
from pydantic import BaseModel, Field, field_validator
from pydantic_ai import Agent, RunContext
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.messages import ModelMessage, ModelRequest, ModelResponse, UserPromptPart, TextPart
from openai import AsyncOpenAI
from dotenv import load_dotenv
# Load environment variables
load_dotenv(".envs/local.env")
# ============================================================================
# MOCK DATA - Simulates OpenSearch vector search results
# ============================================================================
MOCK_SEARCH_RESULTS = {
"payment terms": """# Search Results for: payment terms
**Total Results:** 6 (of 6 total)
**Search Type:** hybrid
---
# Search Result 1
| Field | Value |
|-------|-------|
| Document | HEALTHGATEDATACORP_11_24_1999-EX-10.1-HOSTING AND MANAGEMENT AGREEMENT (1).pdf.md-2 |
| Contract Type | Service Agreement |
| Section | Payment Terms > Milestone Payments |
**Content:**
The payment terms specified in the Hosting and Management Agreement include:
- $100,000 due on 30 January 1998
- $150,000 due on 6 February 1998
- $150,000 due on acceptance of the Specification or 27 February 1998
- $150,000 due on acceptance of System launch
- $150,000 due on system completion date
- $175,000 each due on 1 January, 1 April, 1 July, and 1 September 1999
Invoices are payable within 60 days of receipt. Interest on late payments is charged at 2% above the base rate of Barclays Bank plc in England.
---
# Search Result 2
| Field | Value |
|-------|-------|
| Document | HEALTHGATEDATACORP_11_24_1999-EX-10.1-HOSTING AND MANAGEMENT AGREEMENT (1).pdf.md-3 |
| Contract Type | Service Agreement |
| Section | Payment Terms > Use Fees |
**Content:**
Use Fees are payable based on the "Use" of content, defined as retrieval or download of full-text articles by subscribers. Use Fees are billed monthly with payment due by the end of the month following the invoice date.
---
""",
"liability limitations": """# Search Results for: liability limitations
**Total Results:** 5 (of 5 total)
**Search Type:** hybrid
---
# Search Result 1
| Field | Value |
|-------|-------|
| Document | HEALTHGATEDATACORP_11_24_1999-EX-10.1-HOSTING AND MANAGEMENT AGREEMENT (1).pdf.md-2 |
| Contract Type | Service Agreement |
| Section | Liability and Indemnification |
**Content:**
The Agreement excludes cover for indirect, consequential, special, or punitive damages. HealthGate provides indemnification for product defects and intellectual property infringement claims. Maximum aggregate liability is capped at the total payments made under this agreement.
---
"""
}
# ============================================================================
# MOCK SEARCH SERVICE - Replaces OpenSearch dependency
# ============================================================================
@dataclass
class DummySearchService:
"""Mock search service that returns fixed results."""
def search(self, query: str) -> str:
"""Return mock search results based on query keywords."""
query_lower = query.lower()
if "payment" in query_lower:
return MOCK_SEARCH_RESULTS["payment terms"]
elif "liability" in query_lower or "limitation" in query_lower:
return MOCK_SEARCH_RESULTS["liability limitations"]
else:
# Default response
return MOCK_SEARCH_RESULTS["payment terms"]
@dataclass
class AgentDependencies:
"""Dependencies for the agent."""
search_service: DummySearchService
filters: Optional[Dict[str, Any]] = None
# ============================================================================
# SYSTEM PROMPT - Simplified version focusing on citation formatting
# ============================================================================
SYSTEM_PROMPT = """
## Role
You are a contract analysis assistant that answers questions about contracts using search results.
## Search Result Format
When you receive search results, they will be formatted like this:
# Search Results for: [query]
**Total Results:** X (of Y total)
**Search Type:** hybrid
---
# Search Result 1
| Field | Value |
|-------|-------|
| Document | [document_name.pdf.md-N] |
| Contract Type | [type] |
| Section | [section_hierarchy] |
**Content:**
[actual content text...]
---
# Search Result 2
...
## CRITICAL Citation Rules
**IMPORTANT CITATION MAPPING:**
- Citations are INDEPENDENT of the "# Search Result N" numbering
- You only create citations for documents you ACTUALLY USE in your answer
- Citation numbers are assigned dynamically based on the ORDER you use documents
- The heading "# Search Result 1", "# Search Result 2" is NOT the citation number
**Example:**
If you use information from "Search Result 3" first and "Search Result 7" second:
- Information from Search Result 3 → gets citation [doc1] → maps to Document field from Search Result 3
- Information from Search Result 7 → gets citation [doc2] → maps to Document field from Search Result 7
## Response Format
**REQUIRED OUTPUT STRUCTURE:**
You MUST return a JSON response with TWO fields:
1. `answer` (string, required): Your complete answer with citations
2. `citations` (object, required): A dictionary mapping citation keys to FULL document names
**Example Response:**
{
"answer": "Payment terms are net 30 days [doc1].\\n\\nLate fees apply at 2% interest [doc2].",
"citations": {
"doc1": "CONTRACT_2024-EX-10.1-SERVICE_AGREEMENT.pdf.md-7",
"doc2": "CONTRACT_2024-EX-10.2-PAYMENT_TERMS.pdf.md-1"
}
}
## Citation Value Requirements
**CRITICAL - Citation Values Must Be:**
- ✅ FULL document names from the "Document" field (e.g., "HEALTHGATE_CONTRACT.pdf.md-2")
- ❌ NOT numbers: {"doc1": 1} or {"doc1": 2}
- ❌ NOT placeholders: {"doc1": "source"} or {"doc1": "document"}
- ❌ NOT citation keys: {"doc1": "doc1"}
- ❌ NOT search result numbers: {"doc1": "3"}
**Correct Example:**
{
"citations": {
"doc1": "HEALTHGATEDATACORP_11_24_1999-EX-10.1-HOSTING AND MANAGEMENT AGREEMENT (1).pdf.md-2",
"doc2": "SERVICE_AGREEMENT_2024.pdf.md-5"
}
}
**Wrong Examples:**
{"doc1": 1} ❌ Number instead of string
{"doc1": "source"} ❌ Placeholder instead of document name
{"doc1": "doc1"} ❌ Citation key instead of document name
{"0": "document.pdf"} ❌ Wrong key format
## Tool Available
### hybrid_search
Performs hybrid search combining semantic and text search.
**Args:**
- query: The search query text
**Returns:**
- Dictionary with "success": True and "context": formatted search results
## Instructions
1. Use the `hybrid_search` tool to find relevant information
2. Extract the Document field values from search results
3. Write your answer with inline citations like [doc1], [doc2]
4. Map each citation to its FULL Document field value in the citations dictionary
5. ALWAYS provide both `answer` and `citations` fields - both are REQUIRED
"""
# ============================================================================
# RESPONSE MODEL - Pydantic validation for structured output
# ============================================================================
class ContractResponse(BaseModel):
"""Response model with cited answer."""
answer: str = Field(
...,
description="The complete answer with inline citations like [doc1], [doc2]. "
"MANDATORY: Must contain at least one citation in [docN] format."
)
citations: Dict[str, str] = Field(
...,
description="Dictionary mapping citation keys to FULL DOCUMENT NAMES (strings). "
"Keys: Use 'doc1', 'doc2', 'doc3' (strings, not numbers). "
"Values: Use complete Document field from search results - ALWAYS a string like 'CONTRACT.pdf.md-2'. "
"NEVER use numbers (1, 2, 3) as values. "
"NEVER use placeholders like 'source' or 'document'. "
"NEVER use citation keys ('doc1', 'doc2') as values. "
"Example CORRECT: {'doc1': 'HEALTHGATE_CONTRACT.pdf.md-2', 'doc2': 'SERVICE_AGREEMENT.pdf.md-5'}. "
"Example WRONG: {'doc1': 1} or {'doc1': 'doc1'} or {1: 'document.pdf'}."
)
@field_validator('citations')
@classmethod
def validate_citations_are_strings(cls, v: Dict[str, str]) -> Dict[str, str]:
"""Ensure all citation values are strings, not integers or other types."""
for key, value in v.items():
if not isinstance(value, str):
raise ValueError(
f"Citation value for '{key}' must be a string (document name), "
f"got {type(value).__name__}: {value}. "
f"Use the Document field from search results."
)
if value.isdigit():
raise ValueError(
f"Citation value for '{key}' cannot be just a number: '{value}'. "
f"Use the full Document field from search results (e.g., 'CONTRACT.pdf.md-2')."
)
if len(value) < 10:
raise ValueError(
f"Citation value for '{key}' seems too short: '{value}'. "
f"Use the full Document field from search results, not a placeholder."
)
return v
# ============================================================================
# AGENT CREATION - Minimal pydantic-ai setup
# ============================================================================
def create_agent() -> Agent[ContractResponse, AgentDependencies]:
"""Create the pydantic-ai agent with OpenAI client."""
# Get API key from environment
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
raise ValueError("OPENAI_API_KEY not found in environment")
# Create OpenAI client and provider
openai_client = AsyncOpenAI(api_key=api_key)
# Import provider
from pydantic_ai.providers.openai import OpenAIProvider
provider = OpenAIProvider(openai_client=openai_client)
# Create model with provider
model = OpenAIChatModel("gpt-4.1-mini-2025-04-14", provider=provider)
# Create agent
agent = Agent(
model=model,
system_prompt=SYSTEM_PROMPT,
output_type=ContractResponse,
deps_type=AgentDependencies,
retries=3, # Will retry 3 times on validation errors
)
# Register tool
@agent.tool
async def hybrid_search(
ctx: RunContext[AgentDependencies],
query: str,
) -> Dict[str, Any]:
"""
Perform hybrid search combining semantic and text search.
Args:
query: The search query text
Returns:
Dictionary containing search results and metadata
"""
print(f"🔍 Tool called: hybrid_search(query='{query[:50]}...')")
# Use mock search service
context = ctx.deps.search_service.search(query)
return {
"success": True,
"context": context
}
return agent
# ============================================================================
# TEST FUNCTIONS
# ============================================================================
async def test_without_message_history():
"""Test WITHOUT message history (expected to work)."""
print("\n" + "="*80)
print("TEST 1: WITHOUT Message History")
print("="*80)
# Create agent and dependencies
agent = create_agent()
search_service = DummySearchService()
deps = AgentDependencies(search_service=search_service)
# Run query WITHOUT message history
query = "What are the payment terms in this contract?"
print(f"\nQuery: {query}")
print("Message History: None")
try:
result = await agent.run(query, deps=deps)
print(f"\n✅ SUCCESS!")
print(f"\nAnswer: {result.output.answer[:200]}...")
print(f"\nCitations:")
for key, value in result.output.citations.items():
print(f" {key}: {value}")
print(f" Type: {type(value).__name__}")
# Validation check
if isinstance(value, str) and not value.isdigit() and len(value) > 10:
print(f" ✅ Valid citation format")
else:
print(f" ❌ Invalid citation format")
except Exception as e:
print(f"\n❌ ERROR: {e}")
async def test_with_message_history():
"""Test WITH message history (expected to fail with pydantic-ai bug)."""
print("\n" + "="*80)
print("TEST 2: WITH Message History (Bug Reproduction)")
print("="*80)
# Create agent and dependencies
agent = create_agent()
search_service = DummySearchService()
deps = AgentDependencies(search_service=search_service)
# Build message history
message_history: List[ModelMessage] = [
ModelRequest(
parts=[UserPromptPart(content="Hello, I want to review a contract.")],
),
ModelResponse(
parts=[TextPart(content="Hello! I'm here to help you review the contract. What would you like to know?")],
timestamp=None,
),
]
# Run query WITH message history
query = "What are the payment terms in this contract?"
print(f"\nQuery: {query}")
print(f"Message History: {len(message_history)} previous messages")
try:
result = await agent.run(
query,
deps=deps,
message_history=message_history
)
print(f"\n✅ Agent completed (unexpected - bug may be fixed?)")
print(f"\nAnswer: {result.output.answer[:200]}...")
print(f"\nCitations:")
for key, value in result.output.citations.items():
print(f" {key}: {value}")
print(f" Type: {type(value).__name__}")
# Validation check
if isinstance(value, str) and not value.isdigit() and len(value) > 10:
print(f" ✅ Valid citation format (full document name)")
elif isinstance(value, str) and value.isdigit():
print(f" ❌ BUG: Citation value is a number as string: '{value}'")
elif isinstance(value, int):
print(f" ❌ BUG: Citation value is an integer: {value}")
elif isinstance(value, str) and len(value) < 10:
print(f" ❌ BUG: Citation value is a placeholder: '{value}'")
else:
print(f" ❌ Invalid citation format")
except Exception as e:
print(f"\n❌ BUG CONFIRMED: Agent failed with validation error")
print(f"Error: {e}")
print(f"\nThis demonstrates the pydantic-ai bug where structured output")
print(f"validation fails when message history is provided.")
# ============================================================================
# MAIN EXECUTION
# ============================================================================
async def main():
"""Run both tests to demonstrate the bug."""
print("\n" + "="*80)
print("PYDANTIC-AI MESSAGE HISTORY BUG - MINIMAL REPRODUCTION")
print("="*80)
print("\nThis script demonstrates a bug in pydantic-ai where structured output")
print("validation fails when message history is provided.")
print("\nExpected results:")
print(" Test 1 (no history): ✅ Proper citations")
print(" Test 2 (with history): ❌ Validation error or invalid citations")
print("="*80)
# Test 1: Without message history
await test_without_message_history()
# Test 2: With message history
await test_with_message_history()
print("\n" + "="*80)
print("SUMMARY")
print("="*80)
print("If Test 2 shows validation errors or invalid citations,")
print("the pydantic-ai message history bug is confirmed.")
print("\nExpected bug behavior:")
print(" - Citations have wrong types: {'doc1': 1} instead of {'doc1': 'document.pdf'}")
print(" - Citations have placeholders: {'doc1': 'source'}")
print(" - Validation fails after 3 retries")
print("="*80)
if __name__ == "__main__":
asyncio.run(main())Python, Pydantic AI & LLM client version
## Environment
- Python: 3.12.9
- OpenAI model: gpt-4.1-mini
- pydantic-ai: 1.7.0 (latest)
- openai: 2.6.1 (latest)
- OS: macOS (Darwin 24.6.0)