markitdown-ocr process non-text layer PDFs, such as those converted from images or generated through scanning,It will generate an MD document containing only page numbers
the reason is , In the file "pdf_converter_with_ocr.py",first, by "markdown_content.append(f"\n"## Page{page_num}\n)" set a page Message,
but In the subsequent code logic, whether to perform full-page OCR based on whether the content is empty。
Due to the page number information in markdown_content, markdown_content is not empty, so it will skip OCR,finally,generate an MD document containing only page numbers
markitdown-ocr process non-text layer PDFs, such as those converted from images or generated through scanning,It will generate an MD document containing only page numbers
the reason is , In the file "pdf_converter_with_ocr.py",first, by "markdown_content.append(f"\n"## Page{page_num}\n)" set a page Message,
but In the subsequent code logic, whether to perform full-page OCR based on whether the content is empty。
Due to the page number information in markdown_content, markdown_content is not empty, so it will skip OCR,finally,generate an MD document containing only page numbers