Skip to content

Add support for legacy Microsoft Office document formats (application/msword and application/vnd.ms-excel) #3317

@DenysMoskalenko

Description

@DenysMoskalenko

Description

Description

The pydantic_ai.messages module currently doesn't support legacy Microsoft Office MIME types, which causes a KeyError when attempting to process these documents:

  • application/msword - Legacy Word .doc files
  • application/vnd.ms-excel - Legacy Excel .xls files

When passing a document with these legacy media types, the following error occurs:

Traceback (most recent call last):
  File ".../pydantic_ai/messages.py", line 581, in format
    return _document_format_lookup[self.media_type]
           ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
KeyError: 'application/msword'
# or
KeyError: 'application/vn

AWS Bedrock does support both legacy formats according to their documentation:

  • application/msword (.doc)
  • application/vnd.ms-excel (.xls)

The modern equivalents work fine:

  • application/vnd.openxmlformats-officedocument.wordprocessingml.document (.docx)
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheet (.xlsx)

Many users still have legacy Office files, particularly in enterprise/recruiting/business scenarios. Supporting legacy formats improves backward compatibility without requiring users to convert file

Reproduction

from pydantic_ai import Agent
from pydantic_ai.messages import ModelMessage, ModelRequest, UserPromptPart

# These work fine (modern formats)
docx_message = UserPromptPart(
    media_type='application/vnd.openxmlformats-officedocument.wordprocessingml.document',
    content=docx_bytes
)

xlsx_message = UserPromptPart(
    media_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
    content=xlsx_bytes
)

# These raise KeyError (legacy formats)
doc_message = UserPromptPart(
    media_type='application/msword',  # Legacy .doc format
    content=doc_bytes
)

xls_message = UserPromptPart(
    media_type='application/vnd.ms-excel',  # Legacy .xls format
    content=xls_bytes
)

References

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions