# How to use multimodal prompts

Here we demonstrate how to use prompt templates to format multimodal inputs to models. 

In this example we will ask a model to describe an image.

In [1]:
import base64

import httpx

image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")

In [2]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o")

In [3]:
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "Describe the image provided"),
        (
            "user",
            [
                {
                    "type": "image_url",
                    "image_url": {"url": "data:image/jpeg;base64,{image_data}"},
                }
            ],
        ),
    ]
)

In [4]:
chain = prompt | model

In [5]:
response = chain.invoke({"image_data": image_data})
print(response.content)

The image depicts a scenic landscape with a wooden boardwalk path leading through a lush green field. The sky is a vivid blue with some scattered, wispy clouds. The boardwalk extends straight into the distance, bordered by tall, vibrant green grass on both sides. In the background, there are trees and shrubs, adding to the natural and serene atmosphere of the scene. The lighting suggests it is either early morning or late afternoon, casting a warm and inviting glow over the entire landscape.


We can also pass in multiple images.

In [6]:
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "compare the two pictures provided"),
        (
            "user",
            [
                {
                    "type": "image_url",
                    "image_url": {"url": "data:image/jpeg;base64,{image_data1}"},
                },
                {
                    "type": "image_url",
                    "image_url": {"url": "data:image/jpeg;base64,{image_data2}"},
                },
            ],
        ),
    ]
)

In [7]:
chain = prompt | model

In [8]:
response = chain.invoke({"image_data1": image_data, "image_data2": image_data})
print(response.content)

The two provided pictures appear to be identical. Here are the points of comparison:

1. **Pathway**: Both images feature a wooden pathway extending into the distance, centered in the image.
2. **Vegetation**: The surrounding area is characterized by green grass and bushes, with the same distribution and density in both images.
3. **Sky**: The sky in both images is a clear blue with some scattered clouds, showing similar patterns.
4. **Lighting and Shadows**: The lighting and shadows in both images are consistent, indicating the same time of day and weather conditions.

There are no observable differences between the two images; they are duplicates of each other.
