Skip to content

Conversation

@dsfaccini
Copy link
Contributor

Summary

Implements a dump_messages() @classmethod for the VercelAIAdapter to convert Pydantic AI messages to Vercel AI format, enabling export of agent conversation history from Pydantic AI to Vercel AI protocol.

This is the reverse operation of the existing load_messages() method.

Implementation

dump_messages():

  • Accepts a sequence of ModelMessage objects (ModelRequest/ModelResponse)
  • Returns a list of UIMessage objects in Vercel AI format
  • Maintains message grouping and ordering
  • Quirk: accepts an _id_generator() function to generate message ids
    • this is used primarily by the test suite to keep constant UUIDs in the snapshots

Cases Covered

The implementation handles:

  1. Basic message types: System prompts, user messages, assistant responses
  2. Text content: Single and multi-part text with proper concatenation
  3. Tool calls: Regular tool calls with state tracking (in-progress, output-available, output-error)
  4. Builtin tools: Provider-executed tools with proper ID prefixing and metadata
  5. Tool returns: Automatic matching of tool calls with their returns to set proper states
  6. Thinking/reasoning: ThinkingPart conversion to ReasoningUIPart
  7. File attachments:
    • BinaryContent conversion to data URIs
    • URL-based files (ImageUrl, AudioUrl, VideoUrl, DocumentUrl)
  8. Retry prompts: Tool errors converted to output-error state
  9. Multi-modal content: Mixed text and file content in user prompts

Design Considerations

  1. Tool return handling: Tool returns in ModelRequest are used to determine the state of tool calls in ModelResponse messages, not emitted as separate user messages.

  2. Text concatenation: Consecutive TextPart instances are concatenated. When interrupted by non-text parts (tools, files, reasoning), subsequent text is separated with \n\n.

  3. Builtin tool IDs: Uses the existing BUILTIN_TOOL_CALL_ID_PREFIX pattern for consistency with other adapters.

  4. Message grouping: System and user parts within a ModelRequest are split into separate UIMessage objects when needed.

Caveats

  1. Tool call input reconstruction: When tool returns appear in ModelRequest without the original tool call in the same message history, the input field is set to an empty object {} since the original arguments are not available.

  2. No perfect roundtrip: Due to timestamp generation and UUID assignment, dump_messages(load_messages(ui_msgs)) will not produce identical objects, but will preserve semantic equivalence.

  3. Builtin tool return location: The implementation checks both the same ModelResponse (for builtin tools) and subsequent ModelRequest messages (for regular tools) to find tool returns.

Tests

Added tests for the following cases in test_vercel_ai.py

  • Basic message dumping
  • Tool calls with and without returns
  • Builtin tools with provider metadata
  • Thinking/reasoning parts
  • File attachments (data URIs and URLs)
  • Retry prompts and errors
  • Consecutive text concatenation
  • Text with interruptions
  • Roundtrip conversion (dump then load)

@dsfaccini
Copy link
Contributor Author

there's a missing for loop in the method inplementstion, will push a fix tomorrow!

Comment on lines 157 to +163

if isinstance(args, str):
try:
args = json.loads(args)
except json.JSONDecodeError:
pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests showed "load_messages" converted args from a dict to a string, so I added this one to parse them back, but args supports both.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our args is a str | dict[str, Any] while input is Any. Can you make this code a bit more robust in making sure we end up with a valid string or dict on our part? For example, if args is JSON, parsing it could also give us a list or something, in which case we'd rather want the string. And if it's not a string or already a dict, perhaps we should raise an error.

Comment on lines +2340 to +2344
parts=[
TextPart(content='Response text'),
ToolCallPart(tool_name='tool1', args={'key': 'value'}, tool_call_id='tc1'),
],
timestamp=IsDatetime(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this is the test that caught the asymmetry (dict != str)

Comment on lines 157 to +163

if isinstance(args, str):
try:
args = json.loads(args)
except json.JSONDecodeError:
pass

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our args is a str | dict[str, Any] while input is Any. Can you make this code a bit more robust in making sure we end up with a valid string or dict on our part? For example, if args is JSON, parsing it could also give us a list or something, in which case we'd rather want the string. And if it's not a string or already a dict, perhaps we should raise an error.

return builder.messages

@classmethod
def dump_messages( # noqa: C901
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add this to the super class, and maybe you can look at integrating #3068 into this new framework as well, to make sure our decisions/assumptions hold up against 2 standards not just 1.

Copy link
Contributor Author

@dsfaccini dsfaccini Nov 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the args front

  if isinstance(args, str):
      # if args is a list, we'll keep it as a string, otherwise try to parse as dict
      try:
          args = json.loads(args) if args.strip()[:1] != '[' else args
      except json.JSONDecodeError:
          pass
  elif not isinstance(args, dict | list | None):  # pragma: no branch
      raise UserError(f'Unsupported tool call args type: {type(args)}')

on the superclass front

  @classmethod
  @abstractmethod
  def dump_messages(cls, message: ModelMessage) -> MessageT:
      """Transform Pydantic AI messages into protocol-specific messages."""
      raise NotImplementedError

on the 3068 front, you mean integrating the changes from that PR into this one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the args front, I'd rather test the type of the parsed result

On the superclass front, yeah lets include that PR.

cls,
messages: Sequence[ModelMessage],
*,
_id_generator: Callable[[], str] | None = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we don't have this elsewhere I'd rather not have it here, and use IsStr() and IsSameStr() in the test instead

for msg in messages:
if isinstance(msg, ModelRequest):
for part in msg.parts:
if isinstance(part, ToolReturnPart | BuiltinToolReturnPart):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BuiltinToolReturnPart only exists on ModelResponse as it all happens on the server side

if builtin_return:
content = builtin_return.model_response_str()
call_provider_metadata = (
{'pydantic_ai': {'provider_name': part.provider_name}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we were already doing this for builtin tools in the event stream already :)

I suppose the stuff I said above also applies to the other events we output like for reasoning. But since we get the data chunk by chunk and have to stream it using particular Vercel AI events, we may not always have the information a Vercel AI event needs at the time we get the Pydantic AI event. But for best behavior we really should use provider_metadata to store this kind of info there more as well. Feel free to look into that here or in a new PR.

had_interruption = True

if isinstance(part, BuiltinToolCallPart):
prefixed_id = _make_builtin_tool_call_id(part.provider_name, part.tool_call_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we need this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly just read it in your review of 3068 and thought I'd need a helper as well, but I'm only calling it once

# Load back to Pydantic AI format
reloaded_messages = VercelAIAdapter.load_messages(ui_messages)

# Can't use `assert reloaded_messages == original_messages` because the timestamps will be different
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we use a helper function that walks the list and resets the timestamps? assert reloaded_messages == original_messages would be much less brittle than having to manually review these 2 snapshots for equality

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeahp, I was thinking the same. same patch can be applied to the UUIDs to get rid of the _id_generator.

…dumping and IsStr - add dump_messages to base adapter class
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants