New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text Extraction (Too many / less spaces in text) #7327
Comments
I saw that @speedplane and @yurydelendik did some work in this direction. Maybe they can tell me which functions are relevant? |
The error is also directly inspectable in firefox:
|
Maybe I found a quick fix for this issue: |
I'm seeing the same issue with extra spaces in words. Example pdf available at https://github.com/jasonparallel/pdf.js-issues/blob/master/webSnapshot.pdf For example at the bottom of the first page several spaces are inserted into n148584.rar and a space is inserted into the word energy right before that. This causes issues for copying text out of the pdf and for searching for text in the pdf. |
Hi, have you solve this problem, I have the same issue, my search query has one space in between each word, however, the pdf expect two spaces. do you have a solution ? thank you. |
Fixed by #13257 and possibly other patches. |
Link to PDF file (or attach file here):
http://dipbt.bundestag.de/dip21/btp/18/18145.pdf
Configuration:
Steps to reproduce the problem:
What is the expected behavior? (add screenshot)
Output = "." or ". "
What went wrong? (add screenshot)
Output is " . "
The text was updated successfully, but these errors were encountered: