Skip to content

Validation Failure and Agent Reach Maximum History when Message History Is Passed #3277

@MrDataPsycho

Description

@MrDataPsycho

Initial Checks

Description

Summary

Structured output validation fails intermittently when message history is provided to pydantic-ai agents, causing either validation errors after maximum retries or invalid citation formats in the output. But that does does not happen while using Vanilla open ai tool calling with vanilla open client with message history.

Expected Behavior

When using an agent with structured output (output_type with Pydantic model), the agent should:

  1. Return valid structured output matching the Pydantic model schema
  2. Work consistently regardless of whether message history is provided
  3. Pass validation on first attempt or retry successfully

Affected Versions

  • Tested on pydantic-ai 1.1.0 + openai 1.108.1 - BUG PRESENT
  • Tested on pydantic-ai 1.7.0 + openai 2.6.1 - BUG PRESENT

Actual Behavior

WITHOUT Message History (Works Correctly)

result = await agent.run(query, deps=deps)
# Returns citation: {"doc1": "HEALTHGATE_CONTRACT.pdf.md-2"}  Correct

WITH Message History

result = await agent.run(query, deps=deps, message_history=message_history)
# Results in one of:
# 1. Error: "Exceeded maximum retries (3) for output validation"
# 2. Invalid output: {"doc1": 1}  Number instead of string
# 3. Invalid output: {"doc1": "source"}  Placeholder instead of document name
# 4. Invalid output: {"0": "doc1"}  Wrong key format

How to reproduce The the code provided bellow as example:

# Install dependencies
pip install pydantic-ai==1.7.0 openai python-dotenv

# Set environment variable
export OPENAI_API_KEY=your_key_here

# Run reproduction script
python examples/pydantic_ai_bug_report.py

# Run multiple times (bug is intermittent)
for i in {1..5}; do python examples/pydantic_ai_bug_report.py; sleep 2; done

Reproduction Rate

  • Intermittent: Bug appears in ~30-50% of runs with message history
  • Consistent: WITHOUT message history always works (0% failure rate)
  • With message history: Fails validation after 3 retries or returns invalid formats

When not succeeded following error appears:

Traceback (most recent call last):
  File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3697, in run_code
    await eval(code_obj, self.user_global_ns, self.user_ns)
  File "/var/folders/tn/l_62dr1n6hzgchkrr465lqz40000gn/T/ipykernel_13858/2484678115.py", line 1, in <module>
    result = await agent.run(
             ^^^^^^^^^^^^^^^^
  File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_ai/agent/abstract.py", line 235, in run
    async for node in agent_run:
  File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_ai/run.py", line 148, in __anext__
    task = await anext(self._graph_run)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_graph/beta/graph.py", line 410, in __anext__
    self._next = await self._iterator.asend(self._next)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_graph/beta/graph.py", line 497, in iter_graph
    with _unwrap_exception_groups():
         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/datapsycho/.local/share/uv/python/cpython-3.12.9-macos-aarch64-none/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_graph/beta/graph.py", line 866, in _unwrap_exception_groups
    raise exception
  File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_graph/beta/graph.py", line 638, in _run_tracked_task
    result = await self._run_task(t_)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_graph/beta/graph.py", line 667, in _run_task
    output = await node.call(step_context)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_graph/beta/step.py", line 253, in _call_node
    return await node.run(GraphRunContext(state=ctx.state, deps=ctx.deps))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_ai/_agent_graph.py", line 540, in run
    async with self.stream(ctx):
               ^^^^^^^^^^^^^^^^
  File "/Users/datapsycho/.local/share/uv/python/cpython-3.12.9-macos-aarch64-none/lib/python3.12/contextlib.py", line 217, in __aexit__
    await anext(self.gen)
  File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_ai/_agent_graph.py", line 554, in stream
    async for _event in stream:
  File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_ai/_agent_graph.py", line 668, in _run_stream
    async for event in self._events_iterator:
  File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_ai/_agent_graph.py", line 629, in _run_stream
    async for event in self._handle_tool_calls(ctx, tool_calls):
  File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_ai/_agent_graph.py", line 684, in _handle_tool_calls
    async for event in process_tool_calls(
  File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_ai/_agent_graph.py", line 821, in process_tool_calls
    ctx.state.increment_retries(
  File "/Users/datapsycho/PythonProjects/AgentEngBootCamp/contramate/.venv/lib/python3.12/site-packages/pydantic_ai/_agent_graph.py", line 122, in increment_retries
    raise exceptions.UnexpectedModelBehavior(message) from error
pydantic_ai.exceptions.UnexpectedModelBehavior: Exceeded maximum retries (3) for output validation

Additional Notes

  • The bug appears to be related to how pydantic-ai handles structured output when message history context is provided
  • The underlying LLM (GPT-4.1 mini) doesn't have issues with message history in the vanilla implementation
  • The issue seems to be in the structured output validation/coercion layer of pydantic-ai

Example Code

"""
Minimal Reproducible Example: pydantic-ai Message History Bug

This is a standalone script demonstrating a bug in pydantic-ai where structured output
validation fails when message history is provided.

Bug Summary:
- WITHOUT message history: Citations are properly formatted as {"doc1": "document_name.pdf"}
- WITH message history: Validation fails after 3 retries, or returns invalid formats like
  {"doc1": 1}, {"doc1": "source"}, or {"0": "doc1"}

Tested Versions:
- pydantic-ai: 1.1.0 - BUG PRESENT ❌
- pydantic-ai: 1.7.0 - BUG PRESENT ❌ (tested 2025-10-29)

This example can be run independently without any contramate infrastructure.

Requirements:
  pip install pydantic-ai==1.7.0 openai==2.6.1 python-dotenv

Environment:
  OPENAI_API_KEY=your_key_here
"""

import asyncio
import os
from typing import Dict, Any, List, Optional
from dataclasses import dataclass
from pydantic import BaseModel, Field, field_validator
from pydantic_ai import Agent, RunContext
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.messages import ModelMessage, ModelRequest, ModelResponse, UserPromptPart, TextPart
from openai import AsyncOpenAI
from dotenv import load_dotenv

# Load environment variables
load_dotenv(".envs/local.env")


# ============================================================================
# MOCK DATA - Simulates OpenSearch vector search results
# ============================================================================

MOCK_SEARCH_RESULTS = {
    "payment terms": """# Search Results for: payment terms
**Total Results:** 6 (of 6 total)
**Search Type:** hybrid

---

# Search Result 1

| Field | Value |
|-------|-------|
| Document | HEALTHGATEDATACORP_11_24_1999-EX-10.1-HOSTING AND MANAGEMENT AGREEMENT (1).pdf.md-2 |
| Contract Type | Service Agreement |
| Section | Payment Terms > Milestone Payments |

**Content:**

The payment terms specified in the Hosting and Management Agreement include:
- $100,000 due on 30 January 1998
- $150,000 due on 6 February 1998
- $150,000 due on acceptance of the Specification or 27 February 1998
- $150,000 due on acceptance of System launch
- $150,000 due on system completion date
- $175,000 each due on 1 January, 1 April, 1 July, and 1 September 1999

Invoices are payable within 60 days of receipt. Interest on late payments is charged at 2% above the base rate of Barclays Bank plc in England.

---

# Search Result 2

| Field | Value |
|-------|-------|
| Document | HEALTHGATEDATACORP_11_24_1999-EX-10.1-HOSTING AND MANAGEMENT AGREEMENT (1).pdf.md-3 |
| Contract Type | Service Agreement |
| Section | Payment Terms > Use Fees |

**Content:**

Use Fees are payable based on the "Use" of content, defined as retrieval or download of full-text articles by subscribers. Use Fees are billed monthly with payment due by the end of the month following the invoice date.

---
""",
    "liability limitations": """# Search Results for: liability limitations
**Total Results:** 5 (of 5 total)
**Search Type:** hybrid

---

# Search Result 1

| Field | Value |
|-------|-------|
| Document | HEALTHGATEDATACORP_11_24_1999-EX-10.1-HOSTING AND MANAGEMENT AGREEMENT (1).pdf.md-2 |
| Contract Type | Service Agreement |
| Section | Liability and Indemnification |

**Content:**

The Agreement excludes cover for indirect, consequential, special, or punitive damages. HealthGate provides indemnification for product defects and intellectual property infringement claims. Maximum aggregate liability is capped at the total payments made under this agreement.

---
"""
}


# ============================================================================
# MOCK SEARCH SERVICE - Replaces OpenSearch dependency
# ============================================================================

@dataclass
class DummySearchService:
    """Mock search service that returns fixed results."""

    def search(self, query: str) -> str:
        """Return mock search results based on query keywords."""
        query_lower = query.lower()

        if "payment" in query_lower:
            return MOCK_SEARCH_RESULTS["payment terms"]
        elif "liability" in query_lower or "limitation" in query_lower:
            return MOCK_SEARCH_RESULTS["liability limitations"]
        else:
            # Default response
            return MOCK_SEARCH_RESULTS["payment terms"]


@dataclass
class AgentDependencies:
    """Dependencies for the agent."""
    search_service: DummySearchService
    filters: Optional[Dict[str, Any]] = None


# ============================================================================
# SYSTEM PROMPT - Simplified version focusing on citation formatting
# ============================================================================

SYSTEM_PROMPT = """
## Role
You are a contract analysis assistant that answers questions about contracts using search results.

## Search Result Format
When you receive search results, they will be formatted like this:


# Search Results for: [query]
**Total Results:** X (of Y total)
**Search Type:** hybrid

---

# Search Result 1

| Field | Value |
|-------|-------|
| Document | [document_name.pdf.md-N] |
| Contract Type | [type] |
| Section | [section_hierarchy] |

**Content:**

[actual content text...]

---

# Search Result 2
...


## CRITICAL Citation Rules

**IMPORTANT CITATION MAPPING:**
- Citations are INDEPENDENT of the "# Search Result N" numbering
- You only create citations for documents you ACTUALLY USE in your answer
- Citation numbers are assigned dynamically based on the ORDER you use documents
- The heading "# Search Result 1", "# Search Result 2" is NOT the citation number

**Example:**
If you use information from "Search Result 3" first and "Search Result 7" second:
- Information from Search Result 3 → gets citation [doc1] → maps to Document field from Search Result 3
- Information from Search Result 7 → gets citation [doc2] → maps to Document field from Search Result 7

## Response Format

**REQUIRED OUTPUT STRUCTURE:**
You MUST return a JSON response with TWO fields:
1. `answer` (string, required): Your complete answer with citations
2. `citations` (object, required): A dictionary mapping citation keys to FULL document names

**Example Response:**

{
  "answer": "Payment terms are net 30 days [doc1].\\n\\nLate fees apply at 2% interest [doc2].",
  "citations": {
    "doc1": "CONTRACT_2024-EX-10.1-SERVICE_AGREEMENT.pdf.md-7",
    "doc2": "CONTRACT_2024-EX-10.2-PAYMENT_TERMS.pdf.md-1"
  }
}


## Citation Value Requirements

**CRITICAL - Citation Values Must Be:**
- ✅ FULL document names from the "Document" field (e.g., "HEALTHGATE_CONTRACT.pdf.md-2")
- ❌ NOT numbers: {"doc1": 1} or {"doc1": 2}
- ❌ NOT placeholders: {"doc1": "source"} or {"doc1": "document"}
- ❌ NOT citation keys: {"doc1": "doc1"}
- ❌ NOT search result numbers: {"doc1": "3"}

**Correct Example:**

{
  "citations": {
    "doc1": "HEALTHGATEDATACORP_11_24_1999-EX-10.1-HOSTING AND MANAGEMENT AGREEMENT (1).pdf.md-2",
    "doc2": "SERVICE_AGREEMENT_2024.pdf.md-5"
  }
}


**Wrong Examples:**

{"doc1": 1}                    ❌ Number instead of string
{"doc1": "source"}             ❌ Placeholder instead of document name
{"doc1": "doc1"}               ❌ Citation key instead of document name
{"0": "document.pdf"}          ❌ Wrong key format


## Tool Available

### hybrid_search
Performs hybrid search combining semantic and text search.

**Args:**
- query: The search query text

**Returns:**
- Dictionary with "success": True and "context": formatted search results

## Instructions

1. Use the `hybrid_search` tool to find relevant information
2. Extract the Document field values from search results
3. Write your answer with inline citations like [doc1], [doc2]
4. Map each citation to its FULL Document field value in the citations dictionary
5. ALWAYS provide both `answer` and `citations` fields - both are REQUIRED
"""


# ============================================================================
# RESPONSE MODEL - Pydantic validation for structured output
# ============================================================================

class ContractResponse(BaseModel):
    """Response model with cited answer."""

    answer: str = Field(
        ...,
        description="The complete answer with inline citations like [doc1], [doc2]. "
        "MANDATORY: Must contain at least one citation in [docN] format."
    )

    citations: Dict[str, str] = Field(
        ...,
        description="Dictionary mapping citation keys to FULL DOCUMENT NAMES (strings). "
        "Keys: Use 'doc1', 'doc2', 'doc3' (strings, not numbers). "
        "Values: Use complete Document field from search results - ALWAYS a string like 'CONTRACT.pdf.md-2'. "
        "NEVER use numbers (1, 2, 3) as values. "
        "NEVER use placeholders like 'source' or 'document'. "
        "NEVER use citation keys ('doc1', 'doc2') as values. "
        "Example CORRECT: {'doc1': 'HEALTHGATE_CONTRACT.pdf.md-2', 'doc2': 'SERVICE_AGREEMENT.pdf.md-5'}. "
        "Example WRONG: {'doc1': 1} or {'doc1': 'doc1'} or {1: 'document.pdf'}."
    )

    @field_validator('citations')
    @classmethod
    def validate_citations_are_strings(cls, v: Dict[str, str]) -> Dict[str, str]:
        """Ensure all citation values are strings, not integers or other types."""
        for key, value in v.items():
            if not isinstance(value, str):
                raise ValueError(
                    f"Citation value for '{key}' must be a string (document name), "
                    f"got {type(value).__name__}: {value}. "
                    f"Use the Document field from search results."
                )
            if value.isdigit():
                raise ValueError(
                    f"Citation value for '{key}' cannot be just a number: '{value}'. "
                    f"Use the full Document field from search results (e.g., 'CONTRACT.pdf.md-2')."
                )
            if len(value) < 10:
                raise ValueError(
                    f"Citation value for '{key}' seems too short: '{value}'. "
                    f"Use the full Document field from search results, not a placeholder."
                )
        return v


# ============================================================================
# AGENT CREATION - Minimal pydantic-ai setup
# ============================================================================

def create_agent() -> Agent[ContractResponse, AgentDependencies]:
    """Create the pydantic-ai agent with OpenAI client."""

    # Get API key from environment
    api_key = os.getenv("OPENAI_API_KEY")
    if not api_key:
        raise ValueError("OPENAI_API_KEY not found in environment")

    # Create OpenAI client and provider
    openai_client = AsyncOpenAI(api_key=api_key)

    # Import provider
    from pydantic_ai.providers.openai import OpenAIProvider
    provider = OpenAIProvider(openai_client=openai_client)

    # Create model with provider
    model = OpenAIChatModel("gpt-4.1-mini-2025-04-14", provider=provider)

    # Create agent
    agent = Agent(
        model=model,
        system_prompt=SYSTEM_PROMPT,
        output_type=ContractResponse,
        deps_type=AgentDependencies,
        retries=3,  # Will retry 3 times on validation errors
    )

    # Register tool
    @agent.tool
    async def hybrid_search(
        ctx: RunContext[AgentDependencies],
        query: str,
    ) -> Dict[str, Any]:
        """
        Perform hybrid search combining semantic and text search.

        Args:
            query: The search query text

        Returns:
            Dictionary containing search results and metadata
        """
        print(f"🔍 Tool called: hybrid_search(query='{query[:50]}...')")

        # Use mock search service
        context = ctx.deps.search_service.search(query)

        return {
            "success": True,
            "context": context
        }

    return agent


# ============================================================================
# TEST FUNCTIONS
# ============================================================================

async def test_without_message_history():
    """Test WITHOUT message history (expected to work)."""
    print("\n" + "="*80)
    print("TEST 1: WITHOUT Message History")
    print("="*80)

    # Create agent and dependencies
    agent = create_agent()
    search_service = DummySearchService()
    deps = AgentDependencies(search_service=search_service)

    # Run query WITHOUT message history
    query = "What are the payment terms in this contract?"
    print(f"\nQuery: {query}")
    print("Message History: None")

    try:
        result = await agent.run(query, deps=deps)

        print(f"\n✅ SUCCESS!")
        print(f"\nAnswer: {result.output.answer[:200]}...")
        print(f"\nCitations:")
        for key, value in result.output.citations.items():
            print(f"  {key}: {value}")
            print(f"  Type: {type(value).__name__}")

            # Validation check
            if isinstance(value, str) and not value.isdigit() and len(value) > 10:
                print(f"  ✅ Valid citation format")
            else:
                print(f"  ❌ Invalid citation format")

    except Exception as e:
        print(f"\n❌ ERROR: {e}")


async def test_with_message_history():
    """Test WITH message history (expected to fail with pydantic-ai bug)."""
    print("\n" + "="*80)
    print("TEST 2: WITH Message History (Bug Reproduction)")
    print("="*80)

    # Create agent and dependencies
    agent = create_agent()
    search_service = DummySearchService()
    deps = AgentDependencies(search_service=search_service)

    # Build message history
    message_history: List[ModelMessage] = [
        ModelRequest(
            parts=[UserPromptPart(content="Hello, I want to review a contract.")],
        ),
        ModelResponse(
            parts=[TextPart(content="Hello! I'm here to help you review the contract. What would you like to know?")],
            timestamp=None,
        ),
    ]

    # Run query WITH message history
    query = "What are the payment terms in this contract?"
    print(f"\nQuery: {query}")
    print(f"Message History: {len(message_history)} previous messages")

    try:
        result = await agent.run(
            query,
            deps=deps,
            message_history=message_history
        )

        print(f"\n✅ Agent completed (unexpected - bug may be fixed?)")
        print(f"\nAnswer: {result.output.answer[:200]}...")
        print(f"\nCitations:")
        for key, value in result.output.citations.items():
            print(f"  {key}: {value}")
            print(f"  Type: {type(value).__name__}")

            # Validation check
            if isinstance(value, str) and not value.isdigit() and len(value) > 10:
                print(f"  ✅ Valid citation format (full document name)")
            elif isinstance(value, str) and value.isdigit():
                print(f"  ❌ BUG: Citation value is a number as string: '{value}'")
            elif isinstance(value, int):
                print(f"  ❌ BUG: Citation value is an integer: {value}")
            elif isinstance(value, str) and len(value) < 10:
                print(f"  ❌ BUG: Citation value is a placeholder: '{value}'")
            else:
                print(f"  ❌ Invalid citation format")

    except Exception as e:
        print(f"\n❌ BUG CONFIRMED: Agent failed with validation error")
        print(f"Error: {e}")
        print(f"\nThis demonstrates the pydantic-ai bug where structured output")
        print(f"validation fails when message history is provided.")


# ============================================================================
# MAIN EXECUTION
# ============================================================================

async def main():
    """Run both tests to demonstrate the bug."""
    print("\n" + "="*80)
    print("PYDANTIC-AI MESSAGE HISTORY BUG - MINIMAL REPRODUCTION")
    print("="*80)
    print("\nThis script demonstrates a bug in pydantic-ai where structured output")
    print("validation fails when message history is provided.")
    print("\nExpected results:")
    print("  Test 1 (no history):  ✅ Proper citations")
    print("  Test 2 (with history): ❌ Validation error or invalid citations")
    print("="*80)

    # Test 1: Without message history
    await test_without_message_history()

    # Test 2: With message history
    await test_with_message_history()

    print("\n" + "="*80)
    print("SUMMARY")
    print("="*80)
    print("If Test 2 shows validation errors or invalid citations,")
    print("the pydantic-ai message history bug is confirmed.")
    print("\nExpected bug behavior:")
    print("  - Citations have wrong types: {'doc1': 1} instead of {'doc1': 'document.pdf'}")
    print("  - Citations have placeholders: {'doc1': 'source'}")
    print("  - Validation fails after 3 retries")
    print("="*80)


if __name__ == "__main__":
    asyncio.run(main())

Python, Pydantic AI & LLM client version

## Environment
- Python: 3.12.9
- OpenAI model: gpt-4.1-mini
- pydantic-ai: 1.7.0 (latest)
- openai: 2.6.1 (latest)
- OS: macOS (Darwin 24.6.0)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions