Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow providing and using multiple fallback fonts #339

Merged
merged 4 commits into from
Apr 24, 2020

Conversation

poire-z
Copy link
Contributor

@poire-z poire-z commented Apr 24, 2020

Simplify libunibreak includes
See #337 (comment)

Text: fix read/write outside array bounds
Fix crash that could happen (in rare situations) since #337.
(The large diff is just wrapping that section with if ( pos > 0 ) {).
Witnessed with an embedded float (so, malloc'ed dynamic buffers), being a <bdi> which made text start with U+2068 (FIRST STRONG ISOLATE, which happens to be breakable after a leading space), leading to |= LCHAR_ALLOW_WRAP_AFTER modifying byte at -1, causing a double free (while
&= ~LCHAR_ALLOW_WRAP_AFTER, done when no break allowed, did not cause any harm.)
Also switch to using a Zero-Width Joiner instead of a space as the initial libunibreak char.

lvtextfm: dont adjust space after initial quotation mark/dash (rework)
Rework f41517d so it works even when there are leading collapsed spaces, and probably too on BiDi/RTL text (not tested).
Didn't work when there were a collapsed space at start before the dash, like with the 2nd line here (before | after):
space_dash_space

Fonts: allow providing and using multiple fallback fonts
As our use of the fallback font was a recursive calls, we can provide chained fallback fonts.
As one of the fallback font could be the main font, we need to use first getFallbackFont() (to get the first fallback font, even on a font among these fallback fonts), and getNextFallbackFont() to get the chained fallback font.
Caveat: we might try rendering/drawing with the main font twice if it is among the fallback fonts (but avoiding that wouldn't make this change as simple).
Wished for in #307.

We'll have in frontend something like that (can be discussed, we already had a first round of discussion for the UI fallback fonts in koreader/koreader#3942):

    -- Reasons for the fallback font ordering:
    -- - Noto Sans CJK SC before FreeSans/Serif, as it has nice and larger symbol for Wikipedia EPUB headings)
    -- - FreeSerif after most, has it has good coverage but smaller glyphs (and most other fonts are better looking)
    -- - FreeSans has some coverage that FreeSerif has not - and is usually fine along other fonts (even serif fonts)
    -- - Noto Serif & Sans at the end (just in case). A user can select them as fallback to move them at front
    --   (depending whether his main font is serif or sans). Moving them before FreeSans would need one to set
    --   FreeSans as fallback to get its nice glyphes, which would override Noto Sans CJK good symbol glyphs
    --   with smaller ones.
    fallback_fonts = {
        "Noto Sans CJK SC",
        "Noto Sans Arabic UI",
        "Noto Sans Devanagari UI",
        "FreeSans",
        "FreeSerif",
        "Noto Serif",
        "Noto Sans",
    },

Users will still be able to only set one fallback font, which will be put at start of this list. If the font is already in that list, it will be moved first. This allows some bit of customisation, while always providing large coverage.
Might allow closing koreader/koreader#5277.

We get complete coverage for https://r12a.github.io/scripts/tutorial/summaries/wrapping :
image
and quite a bit more for https://r12a.github.io/scripts/phrases, except:
Balinese, Khmer, Mongolian, Myanmar, Telugu, Tibetan (and hyeroglyphs for Ancient egyptian :)
And possibly Urdu which should be a bit tilted arabic and expects a "nastaliq" font we don't provide - and that it seems we/HB wouldn't render very well if provided.


This change is Reviewable

with the recently added libunibreak changes.
Witnessed with an embedded float (so, malloc'ed dynamic
buffers), being a <bdi> which made text start with U+2068
(FIRST STRONG ISOLATE, which happens to be breakable after
a leading space), leading to "|= LCHAR_ALLOW_WRAP_AFTER"
modifying byte at -1, causing a double free (while
"&= ~LCHAR_ALLOW_WRAP_AFTER", done when no break
allowed, did not cause any harm.)
Also switch to using a Zero-Width Joiner instead of
a space as the initial libunibreak char.
Rework f41517d so it works even when there are leading
collapsed spaces, and probably too on BiDi/RTL text.
As our use of the fallback font was a recursive calls, we can
provide chained fallback fonts.
As one of the fallback font could be the main font, we need to
use first getFallbackFont() (to get the first fallback font, even
on a font among these fallback fonts), and getNextFallbackFont()
to get the chained fallback font.
Caveat: we might try rendering/drawing with the main font twice
if it is among the fallback fonts (but avoiding that wouldn't
make this change as simple).
@NiLuJe
Copy link
Member

NiLuJe commented Apr 24, 2020

LGTM, at a glance ;).

Yay for cascading fallbacks (not that I'll ever use it, but, still, yay ;p).

Copy link
Member

@Frenzie Frenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm too

@virxkane
Copy link
Contributor

virxkane commented Jul 5, 2020

@poire-z Multiple fallback fonts are a great idea, but maybe you are making crengine more and more complex. You add new argument to few of LVFont functions: (bool is_fallback), add new field to LVFreeTypeFace (LVFontRef _nextFallbackFont) and corresponding getter/setter. Perhaps I didn’t fully understand the essence of these changes, but maybe it was just necessary to allow the recursion of the fallback font, i.e. font, if necessary, delegates their functions to _fallbackFont recursively without restrictions (but while _fallbackFont not NULL, of course).

@poire-z
Copy link
Contributor Author

poire-z commented Jul 5, 2020

Well, adding new features is doomed to make the code more complex, sorry :)
I hope you find a bit of valuable help with my verbose comments and commit messages, and my descriptions and screenshots in the PRs here !

This fallback font stuff took me a long time thinking about how to do it. And, for once, I found the end result rather clean and lean :)

maybe it was just necessary to allow the recursion of the fallback font, i.e. font, if necessary, delegates their functions to _fallbackFont recursively without restrictions

That was of course my first idea, but then, the main issue with a single chain was that:

As one of the fallback font could be the main font, we need to use first getFallbackFont() (to get the first fallback font, even on a font among these fallback fonts), and getNextFallbackFont() to get the chained fallback font.

So, let's say you set as fallbacks fonts, this chain: Font1, Font2, Font3.

You set Font4 as the main font for the book. Easy:
Font4 > Font1 > Font2 > Font3 [STOP]

But, if you set Font2 as the main font (or you find some node with <span style="font-family: Font2">, you then get to use either:
Font2 > Font3 [STOP] if you take the falback from the font itself, and you won't be trying Font1.
or
Font2 > Font1 > Font2 > Font1 > Font2 > Font1 > Font2 > ... [no stop, infinite recursion] if you associate the first fallback font in the provided list to the font, and this will break the easy case above which will do:
Font4 > Font1 > Font2 > Font1 > Font2 > Font1 > Font2 > Font1 > Font2

So, my trick to have a _nextFallbackFont when it is used as a fallback font, so, there, for Font2:

Font2._fallbackFont = Font1
Font2._nextFallbackFont = Font3

which will cause the above test case to do:
Font2 > Font1 > Font2 > Font3 [STOP]
as the first time we meet Font2, it's with is_fallback_font=false, and the 2nd time with is_fallback_font=true
(with the caveat we'll be trying Font2 twice - but I found that acceptable :)

@virxkane
Copy link
Contributor

virxkane commented Jul 5, 2020

Yes, I suspected that it’s not so simple.

@virxkane
Copy link
Contributor

virxkane commented Jul 5, 2020

What you think about plain list of the fallback fonts instead of single (couple) fallback font reference? And some wrapper functions to delegate font functions to this list. This would make the code more understandable.

@poire-z
Copy link
Contributor Author

poire-z commented Jul 5, 2020

I don't know, I can't really picture your proposal :/

But in my mind when thinking may be about such a main method to ask for a fallback font, you still need to know (in the recursive calls of font rendering, like with harfbuzz) if you are the first font, or already a next/fallback font - so you would need to keep passing is_fallback_font I guess - and pass it to your wrapper/manager function so it knows what fallback font to give you. My code makes the fonts object themselves know and store which fallback font to use depending on context (first if is_fallback_font=false, other if is_fallback_font=true)

I don't find it that complex: d980059
Even if you had splitted lvfntman.cpp in multiple files, it shouldn't be too hard to put these bits here and there.

Note that in KOReader, we allow the user to set its first prefered fallback font, and we build the list of fallback fonts dynamically from that. As of today, we pass:
Noto Sans CJK SC|Noto Naskh Arabic|Noto Sans Devanagari UI|FreeSans|FreeSerif|Noto Serif|Noto Sans
but I can put FreeSerif first if I prefer its arabic:
FreeSerif|Noto Sans CJK SC|Noto Naskh Arabic|Noto Sans Devanagari UI|FreeSans|Noto Serif|Noto Sans

But I made it compatible so your existing code can still pass a single font name.

@poire-z
Copy link
Contributor Author

poire-z commented Jul 5, 2020

But I made it compatible so your existing code can still pass a single font name.

Well, ok, I renamed SetFallbackFontFace to SetFallbackFontFaces :/ so it needs some frontend update on your side.
edit: well, may be not if you're using PROP_FALLBACK_FONT_FACE :)

@poire-z
Copy link
Contributor Author

poire-z commented Jul 17, 2020

@virxkane : I had a look at your fallback font rework https://github.com/virxkane/coolreader/commit/634438c743ebcc712e58a10dbadedf09a31ded0f
Of course when reading somebodyelse code, I probably had the same reaction you had to mine: I found it quite complex too :) But somehow equivalents in complexity: we pass that additional parameter to all functions.

But I like your idea of using a lUInt32 fallbackPassMask to indicate which font has been already used - that could replace my bool is_fallback_font to skip re-trying that initial font if it happens to be one of the fallback fonts.

I'm not sure I get all what's happening in https://github.com/virxkane/coolreader/blob/634438c743ebcc712e58a10dbadedf09a31ded0f/crengine/src/private/lvfreetypeface.cpp#L321 right, but it feels there are 2 caveats (that I explicitely wanted to avoid):

  • it looks like you're instantiating the full chain of all fallback fonts for the initial font size - even if you're never going to use them (I think - that's quite old to me now - that I do that only when needed)
  • it looks like if the initial font (Font2) is one of the fallback font (Font1 > Font2 > Font3), you might be using the fonts in that order: Font2 > Font3 > Font1 [STOP] - while I expect it should be in that order: Font2 > Font1 > Font3 [STOP]

Also, in your added lString8Collection == operator, and how you use it for checking if you got a different new fallback fonts set, or not, it looks like you don't care about the order of the fallback fonts - generally, and in KOReader case, I think it should matter (as we allow setting a prefered one, so shuffling the order a bit).

Btw, about https://github.com/virxkane/coolreader/commit/f299ac5015651d9b18aab8e2219f2b2fda38a819 , what were the compilation warnings?

@virxkane
Copy link
Contributor

But somehow equivalents in complexity: we pass that additional parameter to all functions.

My version seems more clear to me :)
I could not figure out how your version works in an hour, so I made mine for several days :)

But I like your idea of using a lUInt32 fallbackPassMask to indicate which font has been already used

I thought to replace lUInt32 with LVHashTable<lString8, bool> so as not to limit the number of fonts to 32. :)

  • it looks like you're instantiating the full chain of all fallback fonts for the initial font size - even if you're never going to use them (I think - that's quite old to me now - that I do that only when needed)

Yes, this is a drawback, but it's more reliable, but maybe I'll fix it.

  • it looks like if the initial font (Font2) is one of the fallback font (Font1 > Font2 > Font3), you might be using the fonts in that order: Font2 > Font3 > Font1 [STOP] - while I expect it should be in that order: Font2 > Font1 > Font3 [STOP]

To be honest, I didn't think about it at all. I thought that the main task was to display the glyph with some of the fallback fonts, instead of the question mark. Is it really that important?

Also, in your added lString8Collection == operator, and how you use it for checking if you got a different new fallback fonts set, or not, it looks like you don't care about the order of the fallback fonts - generally, and in KOReader case, I think it should matter (as we allow setting a prefered one, so shuffling the order a bit).

The same thing, but it's easy to fix.

Btw, about virxkane/coolreader@f299ac5 , what were the compilation warnings?

Many times this warning:

In file included from /<cut>/coolreader/crengine/src/../include/lvtinydom.h:36,
                 from /<cut>/coolreader/crengine/src/crtest.cpp:2:
/<cut>/coolreader/crengine/src/../include/lvstsheet.h:141:21: warning: ‘css_pseudo_elements’ defined but not used [-Wunused-variable]
  141 | static const char * css_pseudo_elements[] =
      |                     ^~~~~~~~~~~~~~~~~~~
/<cut>/coolreader/crengine/src/../include/lvstsheet.h:116:21: warning: ‘css_pseudo_classes’ defined but not used [-Wunused-variable]
  116 | static const char * css_pseudo_classes[] =
      |                     ^~~~~~~~~~~~~~~~~~

But it is actually used, but only in one file.

@poire-z
Copy link
Contributor Author

poire-z commented Jul 17, 2020

I thought that the main task was to display the glyph with some of the fallback fonts, instead of the question mark. Is it really that important?

It was for me - but that's for you to judge on the CoolReader side :) depending on whether you allow the user to set the fallback fonts and their order or not - and what fonts you ship - and the risk to find them referenced by publishers in EPUBs via style="font-family: ..." (so, might not be an issue with FB2).

Let's say you ship NotoSansArabic (a good arabic font) and FreeSerif (a good fallback font with many unicode scripts, but smaller glyphs and so-so arabic and let's say so-so cyrillic - dunno if it really has any cyrillic) and you hardcode these as fallback in that order.
A user selects a nice cyrillic font as his main font - read some russian book with some arabic sections like <p lang="ar" style="font-family: NotoSansArabic">some arabic with some dates, numbers but with month names in cyrillic, or people names in cyrillic</p>. He'll have the months and people names picked from the ugly FreeSerif instead of its nice main font.
(OK, couldn't think of a better example, this one wouldn't be really bothering I guess :)

Or let's say you have the same hardcoded fallback fonts, and the user reads Arabic and set NotoSansArabic as his main font. He'll have everything else picked from FreeSerif instead of from the other nicer fallback fonts you set before FreeSerif in your hardcoded list.

So, these might be edge cases you can decide to ignore :)

Many times this warning:

OK, I see. (I did mostly like for the existing enum LVCssSelectorRuleType that follows, except I forgot to move the strings in lvstsheet.cpp. But LVCssSelectorRuleType is used in lvstsheet.h, unlike my added enum LVCssSelectorPseudoElement and enum LVCssSelectorPseudoClass - so ok, they can be in lvstsheet.cpp.)

@virxkane
Copy link
Contributor

It was for me - but that's for you to judge on the CoolReader side :) depending on whether you allow the user to set the fallback fonts and their order or not - and what fonts you ship - and the risk to find them referenced by publishers in EPUBs via style="font-family: ..." (so, might not be an issue with FB2).

«I'll think about it tomorrow» ©

A user selects a nice cyrillic font as his main font - read some russian book with some arabic sections...

Sarcasm mode: good example, funny :) Although, an Arabic textbook in Russian?

In general, I plan to add a selection of fallback fonts to the interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants