Combine hyphenation patterns for Serbian Cyrillic and Latin scripts #566

eevan78 · 2024-05-29T07:48:58Z

This pull request continues on the pull request #372.
As Serbian language uses two scripts with different codepoints, it is safe to combine the patterns into one file. In that way, it doesn't matter which script is used, and even texts that use both scripts will be properly hyphenated. Only the main part of the language tag in (X)HTML should be consulted to load the appropriate patterns. So sr, sr-Cyrl, sr-Latn, and regional versions of these (like sr_RS) should all load the same pattern file.
This approach is already successfully implemented in ConTeXt.

Patterns have been converted from https://devbase.net/dict-sr/ same ones used in LibreOffice extension Serbian Spellchecker.

This change is

…ript Combine the patterns for Cyrillic and Latin scripts.

Frenzie · 2024-05-29T07:52:21Z

As Serbian language uses two scripts with different codepoints, it is safe to combine the patterns into one file.

You mean the Latin one is currently completely absent I presume? As phrased it sounds a bit like you forgot to delete it. :-)

poire-z · 2024-05-29T07:58:13Z

Pinging @strn @roshavagarga who contributed to #372 for thoughts and approval.

roshavagarga · 2024-05-29T08:07:42Z

@poire-z I'd say @strn would be able to give a more valid opinion around whether this is something that should be done, as my understanding of Serbian and the cultural connotations of the above change are fairly basic.

If it works out-of-the-box and there aren't any cultural reasons not to do this, I don't see an issue.

I would note, however, that I'm not sure how the source(s) used for this compare to the one we currently use for Serbian, so possibly something to compare and/or test? (Taken from here)

eevan78 · 2024-05-29T08:08:25Z

As Serbian language uses two scripts with different codepoints, it is safe to combine the patterns into one file.

You mean the Latin one is currently completely absent I presume? As phrased it sounds a bit like you forgot to delete it. :-)

You are right, they are now absent. When I read a Serbian book written in Latin script, I have to change the language to Croatian. That loads the croatian patterns that are based on the same Latin script. Otherwise, there is no hyphenation.

eevan78 · 2024-05-29T08:11:47Z

I would note, however, that I'm not sure how the source(s) used for this compare to the one we currently use for Serbian, so possibly something to compare and/or test? (Taken from here)

Those are the same patterns, made by Dejan Muhamedagić, used in TeX.
I just had to convert the codepages to UTF-8 as these patterns use ISO8859-2 (for Latin patterns) and ISO8859-5 (for Cyrillic patterns) encoding.

Serbian hyphenation patterns are derived from official TeX patterns for Serbocroatian language (Cyrillic and Latin) created by Dejan Muhamedagić, version 2.02 from 22 June 2008 adopted for usage with Hyphen hyphenation library and released under GNU LGPL version 2.1 or later.

poire-z · 2024-06-09T19:08:38Z

Pinging again @strn - please give us some feedback.

strn · 2024-06-09T22:10:56Z

@poire-z , sorry for the late reply.

Yes, if patterns are the same, then they should be used for hyphenating texts in Serbian language - regardless of how it is written now.

However, let me just emphasize and remind you once again that only Serbian Cyrillic is a valid Serbian language alphabet. Usage of Croatian Latin alphabet comes from Yugoslav era and is best to be left there.

eevan78 · 2024-06-10T05:32:25Z

As I've already said, this is just a technical matter that removes the need to change languages when reading books typeset on the Latin script.

@strn Can you please point to some valid reference that supports your claims?
Are you saying that for example these are Croatian books? Cyrillic script is defined as an official script in the Constitution, and both scripts are used in a daily correspondence, media, newspapers and publishing. No matter if we like, it or not.
Personally, I'm using Cyrillic script, but many other people that I know are not.
That's the only reason I'm proposing to unify the patterns in one file, purely as a convenience to the user.

Includes: - Russian hyphenation: revert "allow hyphens after не" koreader/crengine#568 - Serbian hyphenation: combine patterns for Cyrillic and Latin scripts koreader/crengine#566 - writeNodeEx(): fix handling of multilines attribute values koreader/crengine#569 See #12004 (comment). - Add getBalancedHTML() helper Also includes: - kobo: add missing blitbuffer library koreader/koreader-base#1823

Update Serbian.pattern and combine patterns for Cyrillic and Latin sc…

34c558c

…ript Combine the patterns for Cyrillic and Latin scripts.

poire-z merged commit ab1d541 into koreader:master Jun 15, 2024
1 check passed

This was referenced Jun 16, 2024

bump crengine: update Russian and Serbian hyphenation koreader/koreader-base#1824

Merged

bump crengine: update Russian and Serbian hyphenation koreader/koreader#12036

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combine hyphenation patterns for Serbian Cyrillic and Latin scripts #566

Combine hyphenation patterns for Serbian Cyrillic and Latin scripts #566

eevan78 commented May 29, 2024 •

edited by Frenzie

Loading

Frenzie commented May 29, 2024

poire-z commented May 29, 2024

roshavagarga commented May 29, 2024

eevan78 commented May 29, 2024

eevan78 commented May 29, 2024 •

edited

Loading

poire-z commented Jun 9, 2024

strn commented Jun 9, 2024

eevan78 commented Jun 10, 2024

Combine hyphenation patterns for Serbian Cyrillic and Latin scripts #566

Combine hyphenation patterns for Serbian Cyrillic and Latin scripts #566

Conversation

eevan78 commented May 29, 2024 • edited by Frenzie Loading

Frenzie commented May 29, 2024

poire-z commented May 29, 2024

roshavagarga commented May 29, 2024

eevan78 commented May 29, 2024

eevan78 commented May 29, 2024 • edited Loading

poire-z commented Jun 9, 2024

strn commented Jun 9, 2024

eevan78 commented Jun 10, 2024

eevan78 commented May 29, 2024 •

edited by Frenzie

Loading

eevan78 commented May 29, 2024 •

edited

Loading