Find words with soft hyphens #1189

Kristinita · 2019-08-16T16:03:32Z

1. Related issues

2. Summary

It would be nice, if SumatraPDF will find words with soft hyphens in searchable documents.

3. Data

KiraSuperhero.pdf — PDF file with correct OCR layer.

4. Argumentation

I often need to find something in the PDF files. In SumatraPDF I may not find the word, that I need, because it moves to the next line with soft hyphen, — I don't know where soft hyphen in the documents. Hyphens required in Russian language for wrapping words; I see them in any Russian scanned paper book. This problem important to me; therefore, I use another tools (see section 7 of this issue) instead of SumatraPDF.

5. Additional information

Soft hyphen (already known as »optional hyphen») is a symbol for word-breaking in line ends.

6. Actual behavior

SumatraPDF doesn't recognize soft hyphens:

- symbol required in search:

7. Expected behavior

Free programs versions, that have requested feature:

Foxit Reader:

PDF X-Change Editor:

Okular (open source):

Thanks.

The text was updated successfully, but these errors were encountered:

user1823 · 2023-12-01T13:35:37Z

This feature has been implemented by MuPDF in 2020:
https://git.ghostscript.com/?p=mupdf.git;h=2185f16814074f024800a8bcc2dcf2f68ffcb07e

So, in my opinion, all that needs to be done is to enable this in SumatraPDF.

GitHubRulesOK · 2023-12-01T16:20:27Z

There is agreed a difference as shown below between MuPDF 1.20 and current SumatraPDF (you can see MuPDF skips over the end of line hyphen break but then it is not the same as when used for searching other forms such as hyphenated numbers etc. so in the second view Sumatra finds the hyphenated glyphs they are mixed but generally at end of line plain text (-)Tj (not soft, but it varies as there are mixed types) but MuPDF fails to see them, since they are classed as non existent parts of a word.

user1823 · 2023-12-01T16:58:00Z

I don't understand Russian. But, Google Translate shows the same translation for the word whether the hyphen is included or not.

It may be a problem, but I don't know.

I think that the best solution for this issue would be to add a setting controlling whether to ignore such hyphens or not so that the users can decide on their own what works best for them.

I don't think that doing this would require a significant amount of effort (I am not a dev though).

Kristinita mentioned this issue May 17, 2020

feature_request(ebooks): words with hyphens in the ends of lines phiresky/ripgrep-all#44

Closed

kjk changed the title ~~feature_request(find): words with soft hyphens~~ Find words with soft hyphens Jun 12, 2020

GitHubRulesOK mentioned this issue Jun 29, 2020

Search over line breaks and syllabification #571

Closed

GitHubRulesOK mentioned this issue Sep 5, 2022

Advanced Search Feature #2935

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find words with soft hyphens #1189

Find words with soft hyphens #1189

Kristinita commented Aug 16, 2019

user1823 commented Dec 1, 2023

GitHubRulesOK commented Dec 1, 2023 •

edited

Loading

user1823 commented Dec 1, 2023 •

edited

Loading

Find words with soft hyphens #1189

Find words with soft hyphens #1189

Comments

Kristinita commented Aug 16, 2019

1. Related issues

2. Summary

3. Data

4. Argumentation

5. Additional information

6. Actual behavior

7. Expected behavior

user1823 commented Dec 1, 2023

GitHubRulesOK commented Dec 1, 2023 • edited Loading

user1823 commented Dec 1, 2023 • edited Loading

GitHubRulesOK commented Dec 1, 2023 •

edited

Loading

user1823 commented Dec 1, 2023 •

edited

Loading