-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Closed
Description
Description
Description
The pydantic_ai.messages module currently doesn't support legacy Microsoft Office MIME types, which causes a KeyError when attempting to process these documents:
- application/msword - Legacy Word .doc files
- application/vnd.ms-excel - Legacy Excel .xls files
When passing a document with these legacy media types, the following error occurs:
Traceback (most recent call last):
File ".../pydantic_ai/messages.py", line 581, in format
return _document_format_lookup[self.media_type]
~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
KeyError: 'application/msword'
# or
KeyError: 'application/vn
AWS Bedrock does support both legacy formats according to their documentation:
- application/msword (.doc)
- application/vnd.ms-excel (.xls)
The modern equivalents work fine:
- application/vnd.openxmlformats-officedocument.wordprocessingml.document (.docx)
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheet (.xlsx)
Many users still have legacy Office files, particularly in enterprise/recruiting/business scenarios. Supporting legacy formats improves backward compatibility without requiring users to convert file
Reproduction
from pydantic_ai import Agent
from pydantic_ai.messages import ModelMessage, ModelRequest, UserPromptPart
# These work fine (modern formats)
docx_message = UserPromptPart(
media_type='application/vnd.openxmlformats-officedocument.wordprocessingml.document',
content=docx_bytes
)
xlsx_message = UserPromptPart(
media_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
content=xlsx_bytes
)
# These raise KeyError (legacy formats)
doc_message = UserPromptPart(
media_type='application/msword', # Legacy .doc format
content=doc_bytes
)
xls_message = UserPromptPart(
media_type='application/vnd.ms-excel', # Legacy .xls format
content=xls_bytes
)
References
No response
Metadata
Metadata
Assignees
Labels
No labels