New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add typography rules for Russian #557
Conversation
Russian typography prohibits having one and two letter words hanging at the end of the line. There are many Russian resources discussing this, I've linked one of the better known ones. If anyone knows if this is true for Ukrainian and Belarusian as well, please let me know.
Pinging @pkb @virxkane @hius07 @mergen3107 @ssvb for confirmation this is the thing to do, and If anyone knows if this is true for Ukrainian and Belarusian ? (And if it is the right thing to do, why have you been fine without it for so long ? :) Because it's some minor expectation ? How does the risk of having more hyphenation or spacing between words compares to getting this nice ?) |
I stopped using KOREader for a while after device upgrade. Now I'm back to it, and this is pretty noticeable. As an afterthought, I think it's better to scale down this change and prohibit one letter words only, so we'll have a balance between too much spacing and nice typography. |
This will balance between too much spacing and nice typography.
In English and possibly many other languages it'd be preferable to avoid it when reasonably possible as well (not to be confused with prohibiting it :-) but it's a fairly rare occurrence. |
Okay, perhaps “prohibits” is a too strong word. I updated the PR description. |
The Belarusian hyphenation rules can be found here. Basically, in layman terms:
|
I was more curious about whether it's okay to leave single letter prepositions at the end of the line — e.g. "з", "ў", etc. |
Ah, sorry, I somehow thought that it was a question about hyphenation. I don't remember any rules regulating one-letter words left hanging out in the beginning or in end of a line. Probably nobody really cares. It's just a matter of aesthetic style and if your patch makes the text look better, then go for it. |
@poire-z As time went on, I was becoming less and less picky up to the point when I am probably illiterate in Russian hyphenations :D so I stopped recognizing these patterns, because I got what I wanted from hyphenations - saved spaced and straight text boxes. But thank you @dmalinovsky for bringing this up, I'll revise all of these again and dig up my old notes with complaints :D |
Just some warnings - as I can't judge about what's preferable, not reading Russian: This is not about hyphenation, but about where to not line wrap when there is a normal space that should usually wrap. Translated to English, https://www.artlebedev.ru/kovodstvo/sections/62/ says (and I think there is just that about this topic):
That's quite little, and hardly reads as "Russian typography frowns upon having one letter words hanging at the end of the line." :)
And to ensure that, the code may need to more often increase spacing between words:
or hyphenate the following word:
So, it's not free benefit and auto-looks-better. It should also just not be a question of taste - or it should be a taste shared by many. The best way to be sure it's something that is worth doing is to check a few books by good Russian publishers, and |
In Russian, initials will have a period added, so it'll be "Joseph K." and won't be affected by my change. As far as I know, only prepositions and conjunctions should have one letter length. Not all books abide by these rules, unfortunately, but it's considered a sign of good typography. For example, one of the biggest Russian ebook sellers, LitRes, is using non-breaking spaces in FB2 files it produces. They also have an English website, litres.com. |
Here's a sample FB2 file from LitRes: Note that it's using ASCII code 160 for non-breaking symbols, so you'll have to use hex editor or something similar to view it. They're added after 1 or 2 letter prepositions and short conjunctions ("а" and "и"). |
Otherwise, a particle "б" will be also included, and it should stay at the end of the line.
Fair enough. If others feel the same, I'll close this PR. |
TL;DR: Cyrillic
Then most publishers (can't even think of a counter-example) add non-breaking space after prepositions, so that never was an issue in my case. Also dangling prepositions are frowned upon, but not considered errors in Polish |
Here's a recommendation from another well known resource about Russian language, its grammar, etc.: https://gramota.ru/spravka/vopros/294020
|
I wish there was a way to make it a local only change with custom hyphenation rules, but alas... |
We can try it - there's a few weeks until next KOReader release to see how good or bad it makes things. May be let it only for "ru" - will it be ok to switch to typography ukrainian or belarussian to compare ? or are other typography rules like hyphenation different enough that other things will be at play and we won't be able to really compare ? Can you fix the indentation for Also, just for our culture of us non-cyrillic readers, could you add the english meaning of these preposition, as it was friendly done for Polish: crengine/crengine/src/textlang.cpp Lines 591 to 597 in e4426ac
@ptrm: I was just asking about any false positive (not really only about just Joseph K. :) and maybe there are people whose last name is just a single letter ). And maybe if there are false positive, they just get not noticed. |
Also improved indentation and reworded the comment.
Note that we have 3 lang tags we can force set for Russian, so you could apply your tweaks to only one of ru-GB or ru-US (dunno if these reaches deep down to our lang_tag here) - but then it won't really be tested. Not advising to do that, just mentionning it in case it gives other thoughts. |
Sure, I think I was too hasty to suggest extending it to other languages as well.
Done.
Done. Also, is it okay to specify Cyrillic letters as UTF-8? Do I need to do something special for the encoding? |
Thanks.
I guess it's fine - it reads fine in Github web, and I guess you compiled and tested it and it works. I may push a PR in the coming days - so I'll merge this one then, if nobody else stops us here - and bump everything into KOReader. |
To be on the safe side, I've replaced raw letters with UTF-32 sequences. The same way is used for the quotes in the file anyway. |
Yeah, and I think I answered about other single letters too, and still can't think of any cases. I think those false positives would be extremely rare, but sure, foreign names may cause such cases :) And since we're talking corner cases, I guess checking if an uppercase letter is at the beginning of a sentence (otherwise not a preposition for sure) would be hard to do? |
Or I dunno. I remember seeing some non-ASCII char litteral quoted with
oh, I see you just did that. I think |
I looked at line 304, for example, and copied it. Looks like there are 2 styles in the file. :) |
I think so - I also don't want to put too much (any :)) heuristics about grammar (what is a sentence start) and how much to look ahead/behind, what to skip, etc... in that low level code :). |
The changes are made in koreader/crengine#557.
May be I copied that list from elsewhere, or I thought this list of quotes could be ready for multi-codepoints quotes when needed - even if there's only single-codepoints quotes in it currently. |
Makes sense. I've updated the style to match. |
@poire-z, thank you! Can you please also merge koreader/koreader#11570 later? It's a cosmetic change. |
Yes I will, just after the PR I'll make to bump all this crengine stuff, to keep things in a logical order. |
I'm not sure you did test these changes, so please do :) |
Russian typography frowns upon having one letter prepositions and conjunctions hanging at the end of the line.
There are many Russian resources discussing this, I've linked some of the better known ones:
https://www.artlebedev.ru/kovodstvo/sections/62/
https://gramota.ru/spravka/vopros/294020
https://gramota.ru/spravka/vopros/219773
This change is