You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Web browser and its version: Chrome Version 80.0.3987.42 (Official Build) beta (64-bit)
Operating system and its version: MacBook Air version 10.14.4 (18E226)
PDF.js version: latest online
Is a browser extension: no
Steps to reproduce the problem:
load page 5
notice you can select text that is invisible to the eye (Preview also has this issue)
What is the expected behavior? (add screenshot)
The TextLayer should not contain text that is not visible on the page
What went wrong? (add screenshot)
How could we detect that some text is not drawn with ctx.fillText?
I checked several things, like the fillcolor (black), the textmatrix (looks correct). No apparent way to determine whether a ctx.fillText actually draws something.
If we could, then we'd be able to only keep the unicode characters in TextLayer that produces something on the screen.
Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):
The text was updated successfully, but these errors were encountered:
Do you understand why during rendering the text is not visible?
Is it a scaling of a matrix?
Is it the font used that contains empty glyphs?
There must be a way to determine if the text is painted or not, and thus use this for the text layer.
Digging a bit more into these examples, I noticed that the text ends with a clip.
So basically, characters are drawn normally on the canvas, but then are clipped out.
I modified pdf.js to render all text in red, and you'll see that the PDF contains extra text when I remove the call to this.pendingClip in the consumePath method.
So basically one would need to keep the locations of previously painted text, and determine whether they get clipped out by subsequent clips.
Probably not impossible, but most likely not on a high priority for the PDF.js project
Attach (recommended) or Link to PDF file here:
materials-12-00322.pdf
Configuration:
Steps to reproduce the problem:
What is the expected behavior? (add screenshot)
The TextLayer should not contain text that is not visible on the page
What went wrong? (add screenshot)
How could we detect that some text is not drawn with ctx.fillText?
I checked several things, like the fillcolor (black), the textmatrix (looks correct). No apparent way to determine whether a ctx.fillText actually draws something.
If we could, then we'd be able to only keep the unicode characters in TextLayer that produces something on the screen.
Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):
The text was updated successfully, but these errors were encountered: