-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Open
Description
Problem
Many business documents are password-protected for security, but MarkItDown currently fails to convert them, throwing errors instead of gracefully handling the situation.
Use Case
Users often need to process password-protected documents through automated pipelines. Currently, they must either:
- Manually remove passwords before processing
- Use external tools to strip passwords
- The tool simply fails with an unhelpful error
Proposed Solution
Add optional password parameters to the MarkItDown class and CLI:
Python API
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert("protected.pdf", password="mypassword")
result = md.convert("protected.docx", password="mypassword")
result = md.convert("protected.xlsx", password="mypassword")CLI
markitdown protected.pdf --password mypassword
markitdown protected.docx -p mypasswordImplementation Suggestions
- Add optional
passwordparameter to theconvert()method - Add
--password/-pCLI flag - For PDF: Use pypdf or PyPDF2's encryption support
- For DOCX: Use python-docx's decrypter or mammoth's options
- For XLSX: Use openpyxl's workbook encryption handling
- Return a clear error message if password is incorrect or not provided
Behavior
- If password-protected and no password provided: Show clear error message
- If wrong password: Show "incorrect password" error
- If password correct: Process normally
This would make MarkItDown much more useful in enterprise environments where document security is common.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels