Skip to content

Incompatibility with vLLM Responses API #3245

@pamelafox

Description

@pamelafox

Initial Checks

Description

The vLLM implementation of the Responses API is stricter than other implementations currently, and it will produce an error if it doesn't receive all the fields it expects, like the "status": None field.

Here's complete working code with the low-level openai package:
https://github.com/Azure-Samples/nim-on-azure-serverless-gpus-demos/blob/main/examples/openai_functioncalling_loop.py#L122

In other frameworks, we have resolved this by adding "status": None to the function call response, like in this PR for microsoft agent-framework:
https://github.com/microsoft/agent-framework/pull/1509/files

I did ask the vLLM maintainers if they can fix the issue to be less strict, and they are working on it, but I don't know how long it will take for it to get fixed and be reflected in all the services that use vLLM.

I replicated the issue with a gpt-oss deployed via NVIDIA NIM (which wraps vLLM) with the following code. If you ping me on Slack/LinkedIn/Twitter, I can share the endpoint URL.

import asyncio
import logging
import os
import random

from dotenv import load_dotenv
from openai import AsyncOpenAI
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIResponsesModel
from pydantic_ai.providers.openai import OpenAIProvider
from rich.logging import RichHandler

logging.basicConfig(level=logging.DEBUG, format="%(message)s", datefmt="[%X]", handlers=[RichHandler()])
logger = logging.getLogger("weekend_planner")

load_dotenv(override=True)
client = AsyncOpenAI(
    base_url=os.environ["NIM_ENDPOINT"],
    api_key="none")

model = OpenAIResponsesModel(os.environ["NIM_MODEL"], provider=OpenAIProvider(openai_client=client))

def get_weather(city: str) -> dict:
    """Returns the weather for the given city."""
    logger.info(f"Getting weather for {city}")
    if random.random() < 0.05:
        return {
            "city": city,
            "temperature": 72,
            "description": "Sunny",
        }
    else:
        return {
            "city": city,
            "temperature": 60,
            "description": "Rainy",
        }

agent = Agent(
    model,
    system_prompt="You are a helpful weather assistant.",
    tools=[get_weather],
)


async def main():
    result = await agent.run("what's the weather in Seattle?")
    print(result.output)


if __name__ == "__main__":
    logger.setLevel(logging.INFO)
    asyncio.run(main())

Python, Pydantic AI & LLM client version

pydantic==2.11.10
pydantic-ai==1.0.8
pydantic-ai-slim==1.0.8
pydantic-evals==1.0.8
pydantic-graph==1.0.8
pydantic-settings==2.11.0
pydantic_core==2.33.2

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions