-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AssertionError: Invalid interval #99
Comments
The font enconding mapping to unicode used in PDF has issues. Please share the file to investigate. The encoding cmaps have ranges lo:hi defined in them. It seems for some reason in the mapping file you have high value lesser than low value. Hence, this assertion error. |
A few comments:
U ovoj je knjizi riječ pretežito o hobitima i iz nje će čitatelj doznati štošta o njihovu but the extract function seems to have a problem with the accent marks. I get this: U ovoj je knjizi rijee which doesn't have accent marks and skips a bunch of text. Thoughts? |
I would believe you have some issues related to the font encoding in the file. If I open the file in Adobe Reader and select and copy the text I see exactly below. Which is close to what you are observing. This happens when the font toUnicode c-maps are not properly transferred. The extract text works on the same principle of copying and pasting text from a PDF file. 1 O hobitima |
I will need to investigate the original file with the C-Map to realize why the file does not get transmitted properly. Please share it here, if possible. If there are security concerns you can mail me at: sambitdash at gmail |
email sent |
@bdeonovic Sorry for my delay in looking into the file. The CMap file in the PDF is not aligned to the spec. Figure-6 in the attached spec. That's the reason some readers behave differently. While I will try to repair the cmap for a special case, this is not the correct approach. Code space ranges are rectangular regions in the byte plane and not numbers.
is the CMap. As per the CMap spec the codespace range should have 2 elements.
|
On Page-6 of the document you shared, I get:
This is what you are expecting. While I have introduced a workaround in the code, this is not the code as per spec. |
9ed161f fixes this now. |
What does this error mean?
The text was updated successfully, but these errors were encountered: