Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text selection omits letters #4004

Closed
dk opened this issue Jan 5, 2024 · 4 comments
Closed

text selection omits letters #4004

dk opened this issue Jan 5, 2024 · 4 comments

Comments

@dk
Copy link

dk commented Jan 5, 2024

Version: 3.5.2

Hello all,

This is most probably not a bug per se, but I have this weird problem that I cannot pinpoint.

The problem is that I test my pdf-generating code, and when trying to select text in the files I produce, with the mouse, Sumatra selects not every letter in the text. This is fairly consistent, if the text is large then this happens also on almost all words in a paragraph. I'd be extremely curious if this is my own bug (but I can't for the life of me figure out why), or if it is Sumatra. I'd be also open for recommendation if the way I'm generating text is not the best way to do so, for one or another reason.

Anyway, I'm attaching the example pdf, where the text section is rather straightforward:

<00> Tj
10.15 0 Td <01> Tj
7.81 0 Td <02> Tj
3.12 0 Td <02> Tj
3.12 0 Td <03> Tj
7.81 0 Td <04> Tj
3.91 0 Td <05> Tj
10.14 0 Td <03> Tj
7.82 0 Td <06> Tj
4.68 0 Td <02> Tj
3.12 0 Td <07> Tj

and this is how the selection works:

image

out.pdf

and this how it looks on a larger text:

image

Any advice (or a confirmation that this is Sumatra's bug) would be much appreciated

Regards,
Dmitry

@GitHubRulesOK
Copy link
Collaborator

GitHubRulesOK commented Jan 5, 2024

Technically you have 11 lines of spaced out text, these can be concatenated into one line
issue 4004 out-overwrite.pdf

image

image

@dk
Copy link
Author

dk commented Jan 5, 2024

You have a point, but the pdf file you attached has the same problem, the last letter "d" is not selected either

@GitHubRulesOK
Copy link
Collaborator

GitHubRulesOK commented Jan 5, 2024

The size of a selection is a guess based on a nominal spacing as there is no clue other that what the writer provides, I accepted the given values which will be computed by MuPDF (and its font reading dependencies)
The permutations for such a placement amounts to thousands of possibilities, since the letters can be placed in any order and still be counted as one word that contains a space character or more.
If I switch the font to MS Windows Arial it works
image
issue 4004 out-MSArial.pdf

However playing around with those values in the original, it does look like total width calculated is less than visible placements. so perhaps there is some oddity in the MuPDF assessment of PS Arial character widths. It would need to be compared with MuPDF reader to see which one is reducing (not increasing) the selection, but basically the custom font is either not providing/or not read as correct widths .

@dk
Copy link
Author

dk commented Jan 5, 2024

Ah, I wasn't aware that Sumatra is based on MuPDF. Just tried it, and it shows same issue. I shall try to talk to MuPDF dev then -- thank you!

@dk dk closed this as completed Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants