Replies: 2 comments 5 replies
-
cnp1.pdf Line 2 in attached pdf is wrongly cropped, and result is as follows. |
Beta Was this translation helpful? Give feedback.
-
This is largely a problem introduced by the standard text flags. Things will improve if you only use Otherwise, you may be interested in looking at the new package / repository pymupdf4llm. import pathlib
import pymupdf4llm
data=pymupdf4llm.to_markdown("cnp1.pdf")
pathlib.Path("cnp1.pdf.md").write_bytes(data.encode())
data=pymupdf4llm.to_markdown("p2.pdf")
pathlib.Path("p2.pdf.md").write_bytes(data.encode()) You will receive these output files: |
Beta Was this translation helpful? Give feedback.
-
p2.pdf
'''
doc = fitz.open(path)
'''
line result example is '【 請 求 項 1 】'
When copying directly with Adobe, result is "【請求項1 】", no additional spaces
Beta Was this translation helpful? Give feedback.
All reactions