Skip to content

[BUG CLIENT]: OCR data extraction example gives an ValidationError: 2 validation errors for Unmarshaller Azure Foundry Model #367

@jdvala-hash

Description

@jdvala-hash

Python -VV

Python 3.12.12 (main, Dec 17 2025, 21:07:08) [Clang 21.1.4]

Pip Freeze

alembic==1.18.3
annotated-doc==0.0.4
annotated-types==0.7.0
anyio==4.12.1
appnope==0.1.4
argon2-cffi==25.1.0
argon2-cffi-bindings==25.1.0
arrow==1.4.0
asgiref==3.11.0
asttokens==3.0.1
async-lru==2.1.0
attrs==25.4.0
azure-ai-ml==1.24.0
azure-common==1.1.28
azure-core==1.38.0
azure-core-tracing-opentelemetry==1.0.0b12
azure-identity==1.19.0
azure-mgmt-core==1.6.0
azure-monitor-opentelemetry==1.8.4
azure-monitor-opentelemetry-exporter==1.0.0b46
azure-storage-blob==12.28.0
azure-storage-file-datalake==12.23.0
azure-storage-file-share==12.24.0
babel==2.18.0
bandit==1.8.0
beautifulsoup4==4.14.3
bidict==0.23.1
bleach==6.3.0
boolean.py==5.0
boto3==1.42.41
botocore==1.42.41
bpemb==0.3.6
build==1.4.0
CacheControl==0.14.4
certifi==2026.1.4
cffi==2.0.0
cfgv==3.5.0
charset-normalizer==3.4.4
cleanco==2.3
click==8.3.1
cloudpathlib==0.23.0
colorama==0.4.6
comm==0.2.3
coverage==7.13.1
cramjam==2.11.0
cryptography==46.0.5
cyclonedx-python-lib==7.6.2
debugpy==1.8.20
decorator==5.2.1
deepparse==0.9.14
defusedxml==0.7.1
detect-secrets==1.5.0
distlib==0.4.0
dnspython==2.8.0
DoubleMetaphone==1.2
email-validator==2.3.0
et_xmlfile==2.0.0
eval_type_backport==0.3.1
Events==0.5
executing==2.2.1
fastapi==0.128.0
fastapi-cli==0.0.20
fastapi-cloud-cli==0.11.0
fastar==0.8.0
fasteners==0.20
fastjsonschema==2.21.2
fastparquet==2025.12.0
fasttext-wheel==0.9.2
filelock==3.20.3
fqdn==1.5.1
fsspec==2026.1.0
gensim==4.4.0
google-api-core==2.29.0
google-auth==2.48.0
google-cloud-core==2.5.0
google-cloud-storage==3.9.0
google-crc32c==1.8.0
google-resumable-media==2.8.0
googleapis-common-protos==1.72.0
granian==2.7.0
grpcio==1.76.0
h11==0.16.0
hf-xet==1.2.0
html5lib==1.1
httpcore==1.0.9
httptools==0.7.1
httpx==0.28.1
huggingface_hub==1.4.0
identify==2.6.16
idna==3.11
importlib_metadata==8.7.1
iniconfig==2.3.0
invoke==2.2.1
ipdb==0.13.13
ipykernel==7.2.0
ipython==9.10.0
ipython_pygments_lexers==1.1.1
ipywidgets==8.1.8
isodate==0.7.2
isoduration==20.11.0
jedi==0.19.2
Jinja2==3.1.6
jmespath==1.1.0
json5==0.13.0
jsonpointer==3.0.0
jsonschema==4.26.0
jsonschema-specifications==2025.9.1
jupyter-events==0.12.0
jupyter-lsp==2.3.0
jupyter_client==8.8.0
jupyter_core==5.9.1
jupyter_server==2.17.0
jupyter_server_terminals==0.5.4
jupyterlab==4.5.3
jupyterlab_pygments==0.3.0
jupyterlab_server==2.28.0
jupyterlab_widgets==3.0.16
lark==1.3.1
license-expression==30.4.4
lightning-utilities==0.15.2
lz4==4.4.5
Mako==1.3.10
markdown-it-py==4.0.0
MarkupSafe==3.0.3
marshmallow==4.2.0
matplotlib-inline==0.2.1
mdurl==0.1.2
mistralai==1.12.3
mistune==3.2.0
mpmath==1.3.0
msal==1.34.0
msal-extensions==1.3.1
msgpack==1.1.2
msrest==0.7.1
mypy==1.13.0
mypy_extensions==1.1.0
nbclient==0.10.4
nbconvert==7.17.0
nbformat==5.10.4
nbstripout==0.8.1
nest-asyncio==1.6.0
networkx==3.6.1
nodeenv==1.10.0
notebook_shim==0.2.4
numpy==2.4.1
oauthlib==3.3.1
openpyxl==3.1.5
opensearch-protobufs==0.19.0
opensearch-py==3.1.0
opentelemetry-api==1.39.1
opentelemetry-exporter-otlp-proto-common==1.39.1
opentelemetry-exporter-otlp-proto-http==1.39.1
opentelemetry-instrumentation==0.60b0
opentelemetry-instrumentation-asgi==0.60b0
opentelemetry-instrumentation-dbapi==0.60b0
opentelemetry-instrumentation-django==0.60b0
opentelemetry-instrumentation-fastapi==0.60b0
opentelemetry-instrumentation-flask==0.60b0
opentelemetry-instrumentation-psycopg2==0.60b0
opentelemetry-instrumentation-requests==0.60b0
opentelemetry-instrumentation-urllib==0.60b0
opentelemetry-instrumentation-urllib3==0.60b0
opentelemetry-instrumentation-wsgi==0.60b0
opentelemetry-proto==1.39.1
opentelemetry-resource-detector-azure==0.1.5
opentelemetry-sdk==1.39.1
opentelemetry-semantic-conventions==0.60b1
opentelemetry-util-http==0.60b0
packageurl-python==0.17.6
packaging==25.0
pandas==2.3.3
pandocfilters==1.5.1
parso==0.8.5
pexpect==4.9.0
pip-api==0.0.34
pip-requirements-parser==32.0.1
pip-tools==7.5.2
pip_audit==2.7.3
platformdirs==4.5.1
pluggy==1.6.0
postal==1.1.11
Poutyne==1.17.4
pre_commit==4.0.1
probableparsing==0.0.1
probablepeople==0.5.6
prometheus_client==0.24.1
prompt_toolkit==3.0.52
proto-plus==1.27.1
protobuf==6.33.5
psutil==7.2.1
ptyprocess==0.7.0
pure_eval==0.2.3
py-serializable==1.1.2
pyasn1==0.6.2
pyasn1_modules==0.4.2
pybind11==3.0.1
pycparser==2.23
pydantic==2.12.5
pydantic-extra-types==2.11.0
pydantic-settings==2.12.0
pydantic_core==2.41.5
pydash==8.0.5
pydocstyle==6.3.0
Pygments==2.19.2
PyJWT==2.10.1
pymagnitude-light==0.1.147
pyparsing==3.3.1
pyproject_hooks==1.2.0
pytest==8.3.4
pytest-cov==6.0.0
python-crfsuite==0.9.12
python-dateutil==2.9.0.post0
python-dotenv==1.2.1
python-engineio==4.13.0
python-json-logger==4.0.0
python-multipart==0.0.22
python-socketio==5.16.0
pytz==2025.2
PyYAML==6.0.3
pyzmq==27.1.0
RapidFuzz==3.14.3
redis==7.1.0
referencing==0.37.0
reflex==0.8.26
reflex-hosting-cli==0.1.61
regex==2026.1.15
requests==2.32.4
requests-oauthlib==2.0.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rfc3987-syntax==1.1.0
rich==14.2.0
rich-toolkit==0.17.1
rignore==0.7.6
rpds-py==0.30.0
rsa==4.9.1
ruff==0.8.4
s3transfer==0.16.0
safetensors==0.7.0
scipy==1.17.0
Send2Trash==2.1.0
sentencepiece==0.2.1
sentry-sdk==2.50.0
setuptools==80.9.0
shellingham==1.5.4
simple-websocket==1.1.0
six==1.17.0
smart_open==7.5.0
snowballstemmer==3.0.1
sortedcontainers==2.4.0
soupsieve==2.8.3
SQLAlchemy==2.0.46
sqlmodel==0.0.32
stack-data==0.6.3
starlette==0.50.0
stevedore==5.6.0
strictyaml==1.7.3
structlog==25.5.0
sympy==1.14.0
terminado==0.18.1
tinycss2==1.4.0
tokenizers==0.22.2
toml==0.10.2
torch==2.10.0
torchmetrics==1.8.2
tornado==6.5.4
tqdm==4.67.1
traitlets==5.14.3
transformers==5.0.0
typer==0.21.1
typer-slim==0.21.1
typing-inspection==0.4.2
typing_extensions==4.15.0
tzdata==2025.3
uri-template==1.3.0
urllib3==2.6.3
uvicorn==0.40.0
uvloop==0.22.1
virtualenv==20.36.1
watchfiles==1.1.1
wcwidth==0.5.3
webcolors==25.10.0
webencodings==0.5.1
websocket-client==1.9.0
websockets==16.0
wheel==0.45.1
widgetsnbextension==4.0.15
wrapt==1.17.3
wsproto==1.3.2
xxhash==3.6.0
zipp==3.23.0

Reproduction Steps

  1. Trying out the pdf extraction example from the cookbook (https://github.com/mistralai/cookbook/blob/main/mistral/ocr/data_extraction.ipynb)
  2. Using MistralAzure client.
  3. I used the following snippet
from mistralai.extra import response_format_from_pydantic_model
from pydantic import BaseModel, Field
from mistralai_azure import MistralAzure

client = MistralAzure(azure_endpoint=endpoint, azure_api_key=api_key)

class Item(BaseModel):
    item_1: str
    item_2: str

annotations_response = client.ocr.process(
    model="mistral-document-ai-2505",
    pages=[1],
    document={
        "type": "document_url",
        "document_url": f"data:image/png;base64,{b64}"
    },
    document_annotation_format=response_format_from_pydantic_model(Item),
    include_image_base64=True
)
  1. I get the following error
ValidationError: 2 validation errors for Unmarshaller
body.nullable[ResponseFormat]
  Input should be a valid dictionary or instance of ResponseFormat [type=model_type, input_value=ResponseFormat(type='json...n=Unset(), strict=True)), input_type=ResponseFormat]
    For further information visit https://errors.pydantic.dev/2.12/v/model_type
body.Unset
  Input should be a valid dictionary or instance of Unset [type=model_type, input_value=ResponseFormat(type='json...n=Unset(), strict=True)), input_type=ResponseFormat]
    For further information visit https://errors.pydantic.dev/2.12/v/model_type

Expected Behavior

The model should have behaved same as in the cookbook.

P.S. I did not test with the Mistral's API.

Additional Context

No response

Suggested Solutions

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions