---
date: "2025-2-14T14:00:00.00Z"
description: "PydanticAI + Amazon Bedrock now supports images and documents"
published: true
tags:
  - python
  - llm
  - bedrock
  - pydantic
  - multimodal
time_to_read: 5
title: "Multimodal PydanticAI + Amazon Bedrock"
type: post
image: "/public/images/pydantic-ai/multimodal.png"
---

In addition to my [previous post](https://stephenhib.com/posts/pydantic-agents) introducing PydanticAI + Amazon Bedrock, the team have just released full support for images and documents in the latest `0.0.38` release.

## Setup

In [14]:
%%capture
%uv pip install 'pydantic-ai-slim[bedrock]==0.0.39' 'pydantic-graph==0.0.39'

In [4]:
import nest_asyncio
nest_asyncio.apply()  # This allows for nested event loops in Jupyter Notebooks

## ImageUrl
First, let's see how we can pass an image URL directly in the request.

In [8]:
from pydantic_ai import Agent, ImageUrl

agent = Agent(
    model='bedrock:us.amazon.nova-pro-v1:0',
    system_prompt='Be concise, reply with one sentence.'
)

result = agent.run_sync(
    [
        'What is this logo from?',
        ImageUrl(url='https://www.python.org/static/img/python-logo.png'),
    ]
)
print(result.data)

The logo is from Python, a popular programming language known for its simplicity and versatility.


## DocumentUrl
Alot of my work with customers is in back-office intelligent document processing use cases. Information is  encoded in one or more documents and should be decoded before it can be used by software. This is a deep and complex field full of edge cases and also highly domain specific. A pragmatic approach goes a long way, and starting simple is usually a good start. Anthropic provides [a nice starting point](https://docs.anthropic.com/en/docs/build-with-claude/pdf-support#how-pdf-support-works) for PDF support behind the API. This allows the caller to provide a PDF URL and Claude will decode the PDF for you into the format it needs to be tokenised into the LLM. PydanticAI wraps this with the [DocumentUrl](https://ai.pydantic.dev/api/messages/#pydantic_ai.messages.DocumentUrl) dataclass.

In [11]:
from pydantic_ai import Agent, DocumentUrl

agent = Agent(
    model='bedrock:us.anthropic.claude-3-7-sonnet-20250219-v1:0',
    system_prompt='Be concise, reply with one sentence.'
)

result = agent.run_sync(
    [
        'What is the main content of this document?',
        DocumentUrl(url='https://docs.python.org/3.12/whatsnew/3.12.html'),
    ]
)
print(result.data)

The document provides a detailed overview of the new features and changes in Python 3.12, including language improvements, module updates, and API modifications.


However, there are limitations like the maximum request size, and the maximum pages per request. Furthermore, it's not possible to debug the intermediate format. For example imagine you have tables in your PDF and the LLM output is incorrect. Was the issue because the table was badly decoded from the PDF or was the issue that the LLM got confused by making an internal error? I'd suggest starting with the simplest option, convince yourself 

## BinaryContent
If the image or document we're working with is only available locally we can also provide the binary content directly.

In [13]:
import httpx

from pydantic_ai import Agent, BinaryContent

agent = Agent(
    model='bedrock:us.anthropic.claude-3-5-haiku-20241022-v1:0',
    system_prompt='Be concise, reply with one sentence.'
)

r = httpx.get('https://astropgh.github.io/astropgh-boot-camp-2020/seminars/coding_best_practices_2020-06-03.pdf')
print(r.content[0:100])
result = agent.run_sync(
    [
        'What is this?',
        BinaryContent(data=r.content, media_type='application/pdf'),  
    ]
)
print(result.data)

b'%PDF-1.7\r\n%\xb5\xb5\xb5\xb5\r\n1 0 obj\r\n<</Type/Catalog/Pages 2 0 R/Lang(en-US) /StructTreeRoot 153 0 R/MarkInfo<<'
This is a presentation about good coding practices in Python, covering topics like PEP 8 style guidelines, documentation conventions, project organization, version control, and virtual environments.


## Summary
That's all for today, happy multi-modal building!