Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search strings are not always found correctly #184

Closed
sschuberth opened this issue May 3, 2015 · 2 comments
Closed

Search strings are not always found correctly #184

sschuberth opened this issue May 3, 2015 · 2 comments

Comments

@sschuberth
Copy link

sschuberth commented May 3, 2015

I'm using Sumatra PDF 3.0 (32-bit) on Windows 7 (64-bit). For some PDFs, not all occurrences of a string that is obviously present are found. For example if you search for "Kund" in this PDF you'll find the occurrences in "Privatkundengeschäft" and "Geschäftskunden", but not the ones in "Kundin", "Kunde", "Kunden".

If I save the PDF as a text file from Sumatra it becomes more or less obvious why: For example the line that should say

Sehr geehrte Postbank Kundin, sehr geehrter Postbank Kunde,

instead says

peÜr geeÜrte mostÄank hundinI seÜr geeÜrter mostÄank hundeI

To me this looks like some OCR gone mad. To double check that the text is not stored as an image I've installed Abobe Reader 11.0.10 which is able to search the PDF just fine.

@sschuberth
Copy link
Author

PS: Xpdf's pdftotext seem to have the same issue, it generates (almost) the same extracted text.

@sschuberth
Copy link
Author

Thanks for fixing this so quickly! For anyone interested, a snapshot build is available here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants