In [None]:
import base64
import json
import requests

# Mistral Document AI Annotations

In addition to basic OCR functionality, Mistral Document AI has annotations that allow you to extract information from documents and images in structured json with a single call to the API. It offers two types of annotations that can be used independently, or together:

- bbox_annotation: gives you the annotation of the bounding boxes extracted by the OCR model (charts/figures etc.) based on a structure that you define in the request. This is provided for every image, in every page of the document.
- document_annotation: like the bbox_annotation, but for the extracted text across the entire document.

> **Note**: Document annotations are currently limited at 8 pages. Please see the model card for the most up-to-date information on limits

## 0. Setup

In [None]:
AZURE_MISTRAL_DOCUMENT_AI_ENDPOINT = ""
AZURE_MISTRAL_DOCUMENT_AI_KEY = ""
REQUEST_HEADERS = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {AZURE_MISTRAL_DOCUMENT_AI_KEY}",
}

In [None]:
!wget https://raw.githubusercontent.com/mistralai/cookbook/refs/heads/main/mistral/ocr/mistral7b.pdf

## 1. Helper Functions

In [None]:
def encode_image(image_path: str) -> str:
    try:
        with open(image_path, "rb") as image_file:
            return base64.b64encode(image_file.read()).decode("utf-8")
    except FileNotFoundError:
        print(f"Error: The file {image_path} was not found.")
        return None

# 2. Bounding Box Annotations

In this example we will define a `bbox_annotation_format` as part of our request using JSON schema. In it, we will have fields for the type of image, and a short description

In [None]:
encodedDocument = encode_image("../images/mistral7b.pdf")

In [None]:
bboxannotationPayload = {
    "model": "mistral-document-ai-2505",
    "document": {
        "type": "document_url",
        "document_url": f"data:application/pdf;base64,{encodedDocument}",
    },
    "include_image_base64": "true",
    "bbox_annotation_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "string",
            "description": "string",
            "schema": {
                "properties": {
                    "image_type": {
                        "description": "The type of the image.",
                        "title": "Image Type",
                        "type": "string",
                    },
                    "short_description": {
                        "description": "A description in english describing the image.",
                        "title": "Short Description",
                        "type": "string",
                    },
                }
            },
        },
    },
}

In [None]:
bb1Response = requests.post(
    url=AZURE_MISTRAL_DOCUMENT_AI_ENDPOINT,
    json=bboxannotationPayload,
    headers=REQUEST_HEADERS,
)

In [None]:
for page in bb1Response.json()["pages"]:
    for image in page["images"]:
        print("page " + str(page["index"]))
        iaj = json.loads(image["image_annotation"])
        print("Image type: " + iaj["properties"]["image_type"])
        print("Short description: " + iaj["properties"]["short_description"])

And here we see that we get the `image_annotation` in the response for each image in the document. It contains the fields we defined in the request for predictable extraction.

# 3. Document Annotations

Building off the bbox_annotations, we can add a `document_annotation_format` to our request. Just like the bbox_annotations, we define this in JSON schema.

In [None]:
comboAnnotationPayload = {
    "model": "mistral-document-ai-2505",
    "pages": [0, 1, 2, 3],
    "document": {
        "type": "document_url",
        "document_url": f"data:application/pdf;base64,{encodedDocument}",
    },
    "include_image_base64": "true",
    "bbox_annotation_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "string",
            "description": "string",
            "schema": {
                "properties": {
                    "image_type": {
                        "description": "The type of the image.",
                        "title": "Image Type",
                        "type": "string",
                    },
                    "short_description": {
                        "description": "A description in english describing the image.",
                        "title": "Short Description",
                        "type": "string",
                    },
                }
            },
        },
    },
    "document_annotation_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "string",
            "description": "string",
            "schema": {
                "properties": {
                    "language": {
                        "title": "Language",
                        "type": "string",
                        "description": "The language of the document.",
                    },
                    "summary": {
                        "title": "Summary",
                        "type": "string",
                        "description": "A brief summary of the document in English.",
                    },
                    "chapter_titles": {
                        "title": "Chapter_Titles",
                        "type": "string",
                        "description": "The titles of the chapters in the document.",
                    },
                    "urls": {
                        "title": "urls",
                        "type": "string",
                        "description": "The urls in the document.",
                    },
                    "translated_summary": {
                        "title": "Translation",
                        "type": "string",
                        "description": "The French translation of the document summary.",
                    },
                },
            },
        },
    },
}

In [None]:
comboResponse = requests.post(
    url=AZURE_MISTRAL_DOCUMENT_AI_ENDPOINT,
    json=comboAnnotationPayload,
    headers=REQUEST_HEADERS,
)

In [None]:
docAnnotation = json.loads(comboResponse.json()["document_annotation"])
print("Language: " + docAnnotation["properties"]["language"])
print("Summary: " + docAnnotation["properties"]["summary"])
print("Chapter Titles: " + docAnnotation["properties"]["chapter_titles"])
print("URLs: " + docAnnotation["properties"]["urls"])
print("Translated Summary: " + docAnnotation["properties"]["translated_summary"])

And here we see the returned document annotations as we prescribed in the request.

# 4. Wrap-up

Being able to extract text and images from documents is powerful, when you combine this with structured extraction and enrichment it grants you the ability to create powerful document processing and intelligence capabilities. We hope you found this notebook useful, and look forward to seeing what you build with Mistral Document AI.