# How to pass multimodal data to models

Here we demonstrate how to pass [multimodal](/docs/concepts/multimodality/) input directly to models.

LangChain supports multimodal data as input to chat models:

1. Following provider-specific formats;
2. Adhering to a cross-provider standard (see [how-to guides](/docs/how_to/#multimodal) for detail).

Below, we demonstrate the cross-provider standard. See [chat model integrations](/docs/integrations/chat/) for detail
on native formats for specific providers.

:::note

Most chat models that support multimodal **image** inputs also accept those values in
OpenAI's [Chat Completions format](https://platform.openai.com/docs/guides/images?api-mode=chat):

```python
{
    "type": "image_url",
    "image_url": {"url": image_url},
}
```
:::

## Images

Many providers will accept images passed in-line as base64 data. Some will additionall accept an image from a URL directly.

### Images from base64 data

To pass images in-line, format them as content blocks of the following form:

```python
{
    "type": "image",
    "source_type": "base64",
    "mime_type": "image/jpeg",  # or image/png, etc.
    "data": "<base64 data string>",
}
```

Example:

In [10]:
import base64

import httpx
from langchain.chat_models import init_chat_model

# Fetch image data
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")


# Pass to LLM
llm = init_chat_model("anthropic:claude-3-5-sonnet-latest")

message = {
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": "Describe the weather in this image:",
        },
        # highlight-start
        {
            "type": "image",
            "source_type": "base64",
            "data": image_data,
            "mime_type": "image/jpeg",
        },
        # highlight-end
    ],
}
response = llm.invoke([message])
print(response.text())

The image shows a beautiful clear day with bright blue skies and wispy cirrus clouds stretching across the horizon. The clouds are thin and streaky, creating elegant patterns against the blue backdrop. The lighting suggests it's during the day, possibly late afternoon given the warm, golden quality of the light on the grass. The weather appears calm with no signs of wind (the grass looks relatively still) and no indication of rain. It's the kind of perfect, mild weather that's ideal for walking along the wooden boardwalk through the marsh grass.


See [LangSmith trace](https://smith.langchain.com/public/eab05a31-54e8-4fc9-911f-56805da67bef/r) for more detail.

### Images from a URL

Some providers (including [OpenAI](/docs/integrations/chat/openai/),
[Anthropic](/docs/integrations/chat/anthropic/), and
[Google Gemini](/docs/integrations/chat/google_generative_ai/)) will also accept images from URLs directly.

To pass images as URLs, format them as content blocks of the following form:

```python
{
    "type": "image",
    "source_type": "url",
    "url": "https://...",
}
```

Example:

In [2]:
message = {
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": "Describe the weather in this image:",
        },
        {
            "type": "image",
            # highlight-start
            "source_type": "url",
            "url": image_url,
            # highlight-end
        },
    ],
}
response = llm.invoke([message])
print(response.text())

The weather in this image appears to be pleasant and clear. The sky is mostly blue with a few scattered, light clouds, and there is bright sunlight illuminating the green grass and plants. There are no signs of rain or stormy conditions, suggesting it is a calm, likely warm day—typical of spring or summer.


We can also pass in multiple images:

In [4]:
message = {
    "role": "user",
    "content": [
        {"type": "text", "text": "Are these two images the same?"},
        {"type": "image", "source_type": "url", "url": image_url},
        {"type": "image", "source_type": "url", "url": image_url},
    ],
}
response = llm.invoke([message])
print(response.text())

Yes, these two images are the same. They depict a wooden boardwalk going through a grassy field under a blue sky with some clouds. The colors, composition, and elements in both images are identical.


## Documents (PDF)

Some providers (including [OpenAI](/docs/integrations/chat/openai/),
[Anthropic](/docs/integrations/chat/anthropic/), and
[Google Gemini](/docs/integrations/chat/google_generative_ai/)) will accept PDF documents.

### Documents from base64 data

To pass documents in-line, format them as content blocks of the following form:

```python
{
    "type": "file",
    "source_type": "base64",
    "mime_type": "application/pdf",
    "data": "<base64 data string>",
}
```

Example:

In [3]:
import base64

import httpx
from langchain.chat_models import init_chat_model

# Fetch PDF data
pdf_url = "https://pdfobject.com/pdf/sample.pdf"
pdf_data = base64.b64encode(httpx.get(pdf_url).content).decode("utf-8")


# Pass to LLM
llm = init_chat_model("anthropic:claude-3-5-sonnet-latest")

message = {
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": "Describe the document:",
        },
        # highlight-start
        {
            "type": "file",
            "source_type": "base64",
            "data": pdf_data,
            "mime_type": "application/pdf",
        },
        # highlight-end
    ],
}
response = llm.invoke([message])
print(response.text())

This document appears to be a sample PDF file that contains Lorem ipsum placeholder text. It begins with a title "Sample PDF" followed by the subtitle "This is a simple PDF file. Fun fun fun."

The rest of the document consists of several paragraphs of Lorem ipsum text, which is a commonly used placeholder text in design and publishing. The text is formatted in a clean, readable layout with consistent paragraph spacing. The document appears to be a single page containing four main paragraphs of this placeholder text.

The Lorem ipsum text, while appearing to be Latin, is actually scrambled Latin-like text that is used primarily to demonstrate the visual form of a document or typeface without the distraction of meaningful content. It's commonly used in publishing and graphic design when the actual content is not yet available but the layout needs to be demonstrated.

The document has a professional, simple layout with generous margins and clear paragraph separation, making it an effecti

### Documents from a URL

Some providers (specifically [Anthropic](/docs/integrations/chat/anthropic/))
will also accept documents from URLs directly.

To pass documents as URLs, format them as content blocks of the following form:

```python
{
    "type": "file",
    "source_type": "url",
    "url": "https://...",
}
```

Example:

In [4]:
message = {
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": "Describe the document:",
        },
        {
            "type": "file",
            # highlight-start
            "source_type": "url",
            "url": pdf_url,
            # highlight-end
        },
    ],
}
response = llm.invoke([message])
print(response.text())

This document appears to be a sample PDF file with both text and an image. It begins with a title "Sample PDF" followed by the text "This is a simple PDF file. Fun fun fun." The rest of the document contains Lorem ipsum placeholder text arranged in several paragraphs. The content is shown both as text and as an image of the formatted PDF, with the same content displayed in a clean, formatted layout with consistent spacing and typography. The document consists of a single page containing this sample text.


## Provider-specific parameters

Some providers will support or require additional fields on content blocks containing multi-modal data.
For example, Anthropic lets you specify [caching](/docs/integrations/chat/anthropic/#prompt-caching) of
specific content to reduce token consumption.

To use these fields, you can:

1. Store them on the [metadata](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.content_blocks.BaseDataContentBlock.html#langchain_core.messages.content_blocks.BaseDataContentBlock.metadata) of the content block; or
2. Use the native format supported by each provider (see [chat model integrations](/docs/integrations/chat/) for detail).

We show three examples below.

### Example: Anthropic prompt caching

In [2]:
llm = init_chat_model("anthropic:claude-3-5-sonnet-latest")

message = {
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": "Describe the weather in this image:",
        },
        {
            "type": "image",
            "source_type": "url",
            "url": image_url,
            # highlight-next-line
            "metadata": {"cache_control": {"type": "ephemeral"}},
        },
    ],
}
response = llm.invoke([message])
print(response.text())
print(response.usage_metadata)

The image shows beautiful, fair weather conditions with a vibrant blue sky dotted with wispy cirrus clouds stretched across the horizon. The clouds are white and feathery, creating streaks across the bright blue background. The lighting suggests it's during daytime, possibly late afternoon, with good visibility and no signs of rain or storms. The bright sunlight is illuminating the green grass and wooden boardwalk, indicating clear, pleasant conditions. It appears to be a warm season, likely spring or summer, given the lush vegetation and ideal weather conditions shown in the scene.


In [3]:
response.usage_metadata

{'input_tokens': 1586,
 'output_tokens': 121,
 'total_tokens': 1707,
 'input_token_details': {'cache_read': 0, 'cache_creation': 1582}}

In [5]:
next_message = {
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": "Summarize that in 5 words.",
        }
    ],
}
response = llm.invoke([message, next_message])
print(response.text())
print(response.usage_metadata)

The image shows a clear, sunny day with wispy cirrus clouds streaking across a vibrant blue sky. The lighting suggests it's during the middle of the day, and the weather appears calm with no signs of precipitation or strong winds.

5-word summary:
Sunny, blue skies, light clouds
{'input_tokens': 1597, 'output_tokens': 68, 'total_tokens': 1665, 'input_token_details': {'cache_read': 1582, 'cache_creation': 0}}


### Example: Anthropic citations

In [6]:
message = {
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": "Generate a 5 word summary of this document.",
        },
        {
            "type": "file",
            "source_type": "base64",
            "data": pdf_data,
            "mime_type": "application/pdf",
            # highlight-next-line
            "metadata": {"citations": {"enabled": True}},
        },
    ],
}
response = llm.invoke([message])
response.content

[{'citations': [{'cited_text': 'Sample PDF\r\nThis is a simple PDF file. Fun fun fun.\r\n',
    'document_index': 0,
    'document_title': None,
    'end_page_number': 2,
    'start_page_number': 1,
    'type': 'page_location'}],
  'text': 'Simple PDF file with fun text',
  'type': 'text'}]

### Example: OpenAI file names

OpenAI requires that PDF documents be associated with file names:

In [7]:
llm = init_chat_model("openai:gpt-4.1")

message = {
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": "Describe the document:",
        },
        {
            "type": "file",
            "source_type": "base64",
            "data": pdf_data,
            "mime_type": "application/pdf",
            # highlight-next-line
            "metadata": {"filename": "my-file"},
        },
    ],
}
response = llm.invoke([message])
print(response.text())

The document is titled "Sample PDF" and appears to be a single-page file. Its purpose seems to be for demonstration or testing, as indicated by the title and introductory sentence: "This is a simple PDF file. Fun fun fun."

Content overview:
- The majority of the page consists of placeholder text, known as "Lorem ipsum," which is commonly used in publishing and web design as filler text.
- The text consists mostly of generic Latin-like sentences, with no meaningful subject matter or specific information.
- There are no charts, tables, diagrams, images, or specific data present in the document—just paragraphs of text formatted to show typical document structure.

In summary, this document serves as a basic text sample, with no substantive information beyond what’s typical for sample or test files.


## Tool calls

Some multimodal models support [tool calling](/docs/concepts/tool_calling) features as well. To call tools using such models, simply bind tools to them in the [usual way](/docs/how_to/tool_calling), and invoke the model using content blocks of the desired type (e.g., containing image data).

In [12]:
from typing import Literal

from langchain_core.tools import tool


@tool
def weather_tool(weather: Literal["sunny", "cloudy", "rainy"]) -> None:
    """Describe the weather"""
    pass


llm_with_tools = llm.bind_tools([weather_tool])

message = {
    "role": "user",
    "content": [
        {"type": "text", "text": "Describe the weather in this image:"},
        {"type": "image", "source_type": "url", "url": image_url},
    ],
}
response = llm_with_tools.invoke([message])
print(response.tool_calls)

[{'name': 'weather_tool', 'args': {'weather': 'sunny'}, 'id': 'toolu_01RcMxBrsQgwMu4ttrXEi43s', 'type': 'tool_call'}]
