-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
False result when finding bounding boxes for lines in blocks. #3581
Comments
ADDED: I used now rawdict instead of dict as parameter for
|
I think there is a basic misconception here: IAW your red arrows are no bug. |
It looks like you want to locate / extract text from table cells. |
Thank you so much, now it is more clear for me what you mean. |
Description of the bug
Hi, I am using fitz module to extract the bounding boxes around texts, so I extracted them then plotted them with matplotlib to figure out how they would look like and how correct are they for me in this case.
I opened the same pdf file using Foxit Editor then toggled object editor for text and selected all texts in order to see how bbox are shown in Foxit Reader, they are perfect and correct as expected to be for us as human readers, then I compared that with result got from matplotlib and fitz, found that bbox from fitz are almost the same but there are wrong cases:
1- Where two columns are handled as if they were a single column (illustrated as one big and small red arrows).
2- Where three cells are handled as if they were a single cell (illustrated as 3 equal red arrows in image).
Please see the illustration for figuring out the comparaison, I tried to annotate and writing down the legend and my problem, any further information needed, please ask.
Hint: This some non-latin words used in document is a rtl language and here exactly Arabic.
How to reproduce the bug
That is the pdf file:
North 02_Minieh_Record 01.pdf
This is my python script and the illustration below for comparaison:
PyMuPDF version
1.24.1
Operating system
Windows
Python version
3.9
The text was updated successfully, but these errors were encountered: