Skip to content

trouble with writing out markdown file #78

@kentaroy47

Description

@kentaroy47

related: #19

still having some issues with character decoding, working w/ Windows+Japanese environment.

markitdown example.pdf works fine, but gives the following error with markitdown example.pdf > example.md

  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\kyosh\anaconda3\Scripts\markitdown.exe\__main__.py", line 7, in <module>
  File "C:\Users\kyosh\anaconda3\Lib\site-packages\markitdown\__main__.py", line 16, in main
    print(result.text_content)
UnicodeEncodeError: 'cp932' codec can't encode character '\ufb03' in position 509: illegal multibyte sequence

perhaps add some escape sequence in main.py? for unicode errors?

    try:
        if len(sys.argv) == 1:
            markitdown = MarkItDown()
            result = markitdown.convert_stream(sys.stdin.buffer)
            print(result.text_content)
        elif len(sys.argv) == 2:
            markitdown = MarkItDown()
            result = markitdown.convert(sys.argv[1])
            print(result.text_content)
        else:
            sys.stderr.write("Usage message here\n")
    except UnicodeEncodeError:
        # Fallback handling if encoding still fails
        try:
            # Try to encode with replacement characters
            if isinstance(result.text_content, str):
                print(result.text_content.encode('utf-8', errors='replace').decode('utf-8'))
        except Exception as e:
            sys.stderr.write(f"Encoding error: {str(e)}\n")

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingquestionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions