-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong text bboxes for some Type 3 font #1433
Comments
The following info is missing:
The second example file cannot be downloaded b/o security issues! |
Thank you for your quick reply.
The bug occurs for the whole document, and page.get_text("words") returns wrong coordinates either. |
PyMuPDF cannot yet deal with fonts having a bbox of [0 0 0 0] - which is the case here. |
Great. Look forward to the new version. |
Done. Download from here. |
Thank you!The new wheel works for most text lines, but there are still some bad bboxes, e.g., the last several lines ("the total world's population ...") of page 3 of the above PDF. PS, page.get_text_words() returns the correct bboxes, but span['bbox'] does not. |
BTW: this actually is an upstream bug (MuPDF): if using xml output |
I will into this. |
Found the problem: |
I do recommend that you submit a bug report to MuPDF! |
New wheels are now ready to download. |
When you do that, you must include an example PDF page and a script (not Python PyMuPDF!). Best use the CLI mutool example I mentioned in a previous post. |
Thank you. I will report this bug to MuPDF later as suggested. |
@LaiSongxuan - please let me have the bug id assigned by MuPDF's bug tracker. I want to putmyself on the CC list there. |
@JorjMcKie The bug id is 704747. |
New version 1.19.3 is being uploaded to PyPI. |
Hello, JorjMcKie~
PyMuPDF extracts wrong bbox for the following pdf:
https://influencermarketinghub.com/ebooks/IMH_SOCIAL_BENCHMARK_REPORT_2021.pdf
The bbox is like this:
[(177.83700561523438, -85899337728.0, 287.6545104980469, 85899345920.0, 'Social', 0, 0, 0), ...]
(However, pdfplumber gives correct results for this document)
Another example with wrong bbox:
http://iapsop.com/ssoc/1902__ione___food_studies.pdf
Could you please give a workaround for this bug?
The text was updated successfully, but these errors were encountered: