related: #19
still having some issues with character decoding, working w/ Windows+Japanese environment.
markitdown example.pdf works fine, but gives the following error with markitdown example.pdf > example.md
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Users\kyosh\anaconda3\Scripts\markitdown.exe\__main__.py", line 7, in <module>
File "C:\Users\kyosh\anaconda3\Lib\site-packages\markitdown\__main__.py", line 16, in main
print(result.text_content)
UnicodeEncodeError: 'cp932' codec can't encode character '\ufb03' in position 509: illegal multibyte sequence
perhaps add some escape sequence in main.py? for unicode errors?
try:
if len(sys.argv) == 1:
markitdown = MarkItDown()
result = markitdown.convert_stream(sys.stdin.buffer)
print(result.text_content)
elif len(sys.argv) == 2:
markitdown = MarkItDown()
result = markitdown.convert(sys.argv[1])
print(result.text_content)
else:
sys.stderr.write("Usage message here\n")
except UnicodeEncodeError:
# Fallback handling if encoding still fails
try:
# Try to encode with replacement characters
if isinstance(result.text_content, str):
print(result.text_content.encode('utf-8', errors='replace').decode('utf-8'))
except Exception as e:
sys.stderr.write(f"Encoding error: {str(e)}\n")
related: #19
still having some issues with character decoding, working w/ Windows+Japanese environment.
markitdown example.pdfworks fine, but gives the following error withmarkitdown example.pdf > example.mdperhaps add some escape sequence in
main.py?for unicode errors?