# Image Description Extraction using Mistral's Pixtral API

# Image Description Extraction using Mistral's Pixtral API

In this notebook, we'll use the `Mistral` API to extract structured image descriptions in JSON format using the `Pixtral-12b-2409` model. We'll send an image URL and prompt the model to return key elements with descriptions.

## Prerequisites
Make sure you have an API key for the Mistral AI platform. We'll also show you how to load it from environment variables.

In [1]:
# Install the Mistral Python SDK
!pip install mistralai

Collecting mistralai
  Downloading mistralai-1.1.0-py3-none-any.whl.metadata (23 kB)
Collecting httpx<0.28.0,>=0.27.0 (from mistralai)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting jsonpath-python<2.0.0,>=1.0.6 (from mistralai)
  Downloading jsonpath_python-1.0.6-py3-none-any.whl.metadata (12 kB)
Collecting typing-inspect<0.10.0,>=0.9.0 (from mistralai)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting httpcore==1.* (from httpx<0.28.0,>=0.27.0->mistralai)
  Downloading httpcore-1.0.6-py3-none-any.whl.metadata (21 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<0.28.0,>=0.27.0->mistralai)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<0.10.0,>=0.9.0->mistralai)
  Downloading mypy_extensions-1.0.0-py3-none-any.whl.metadata (1.1 kB)
Downloading mistralai-1.1.0-py3-none-any.whl (229 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m229.7/22

## Setup
We'll load the Mistral API key from environment variables and initialize the client. Make sure your API key is saved in your environment variables as `MISTRAL_API_KEY`.


In [6]:
%env MISTRAL_API_KEY=

env: MISTRAL_API_KEY=


In [3]:
import os
from mistralai import Mistral

# Load Mistral API key from environment variables
api_key = os.environ["MISTRAL_API_KEY"]

# Model specification
model = "pixtral-12b-2409"

# Initialize the Mistral client
client = Mistral(api_key=api_key)


## Sending Image URL for Description
We'll prompt the model to describe the image by providing an image URL. The response will be returned in a structured JSON format with the key elements described.


In [4]:
# Define the messages for the chat API
messages = [
    {
        "role": "system",
        "content": "Return the answer in a JSON object with the next structure: "
                   "{\"elements\": [{\"element\": \"some name of element1\", "
                   "\"description\": \"some description of element 1\"}, "
                   "{\"element\": \"some name of element2\", \"description\": "
                   "\"some description of element 2\"}]}"
    },
    {
        "role": "user",
        "content": "Describe the image"
    },
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": "https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg"
            }
        ]
    }
]

# Call the Mistral API to complete the chat
chat_response = client.chat.complete(
    model=model,
    messages=messages,
    response_format={
        "type": "json_object",
    }
)

# Get the content of the response
content = chat_response.choices[0].message.content

# Output the raw JSON response
print(content)


 {
    "elements": [
        {
            "element": "Eiffel Tower",
            "description": "A iconic wrought-iron lattice tower located in Paris, France, standing tall amidst a snowy landscape."
        },
        {
            "element": "Snow-covered Trees",
            "description": "Trees surrounding the Eiffel Tower, their branches laden with fresh snow, creating a serene and picturesque winter scene."
        },
        {
            "element": "Snow",
            "description": "A blanket of snow covering the ground, trees, and other structures, giving the scene a tranquil and chilly atmosphere."
        },
        {
            "element": "Lamppost",
            "description": "A traditional lamppost located in the foreground, partially covered in snow, adding to the winter ambiance."
        }
    ]
}


## Parsing the JSON Response
We'll now parse the JSON response from the API and print the elements and their corresponding descriptions.


In [5]:
import json

# Parse the JSON content
json_object = json.loads(content)
elements = json_object["elements"]

# Print each element and its description
for element in elements:
    print("Element:", element["element"])
    print("Description:", element["description"])
    print()


Element: Eiffel Tower
Description: A iconic wrought-iron lattice tower located in Paris, France, standing tall amidst a snowy landscape.

Element: Snow-covered Trees
Description: Trees surrounding the Eiffel Tower, their branches laden with fresh snow, creating a serene and picturesque winter scene.

Element: Snow
Description: A blanket of snow covering the ground, trees, and other structures, giving the scene a tranquil and chilly atmosphere.

Element: Lamppost
Description: A traditional lamppost located in the foreground, partially covered in snow, adding to the winter ambiance.



## Conclusion
In this notebook, we used the Mistral Pixtral model to describe an image by sending an image URL and receiving a structured JSON response. The descriptions provided by the model offer insights into the key elements of the image.
