Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary definition with mixed LTR and RTL text does not render correctly #9320

Open
jayjf opened this issue Jul 11, 2022 · 9 comments
Open
Labels

Comments

@jayjf
Copy link

jayjf commented Jul 11, 2022

  • KOReader version: Latest
  • Device: Android and Kobo

Dictionary definition with mixed LTR and RTL text does not render correctly

  1. Install Jastrow Dictionary
  2. Look up definition of כותבת

Incorrect rendering:
incorrect

Correct rendering in the Alpus app for Android:
correct

@Frenzie Frenzie added the bug label Jul 11, 2022
@poire-z
Copy link
Contributor

poire-z commented Jul 11, 2022

What is happening is that the definition starts with a RTL word, and there's nothing anywhere saying the full content is mainly LTR and that the paragraph should be laid out LTR.

What got me a bit worried on the first screenshot is that, even though it all looks like it is rendered as a RTL paragraph, the last line is left aligned, while it should be right aligned.

image

But it's rendered by MuPDF as it is a HTML dictionary.
If I change in the .ifo: sametypesequence=h (html) to sametypesequence=t (not html), it gets rendered by our own TextBoxWidget and the RTL paragraph is fully correct:

image

So, in the text rendering, indeed, the first word being RTL, and nothing anywhere saying it is a LTR definition, it makes the whole paragraph assumed to be RTL.
We don't have yet an option to tag dictionaries as having their content mainly LTR or RTL, and there is nothing in the startdict .ifo stuff for that I think.

That's the HTML output we get from the dict:
<strong dir=\"rtl\">¿¿¿¿</strong> f. (<a dir=\"rtl\" class=\"refLink\" href=\"/Jastrow,_¿¿¿ I.1\" data-ref=\"Jastrow, ¿¿¿ I 1\">¿¿¿</a>) <i>morning star</i> (in b. h. <i>a jewel</i>, v. <a class=\"refLink\" href=\"/Jastrow,_¿¿¿¿¿.1\" data-ref=\"Jastrow, ¿¿¿¿¿ 1\">next w.</a>). Y. Yoma III, beg. 40¿; Y. R. Hash. II, beg. 57¿, expl. <a dir=\"rtl\" class=\"refLink\" href=\"/Jastrow,_¿¿¿¿¿ I.1\" data-ref=\"Jastrow, ¿¿¿¿¿ I 1\">¿¿¿¿¿</a>; v. <a class=\"refLink\" href=\"/Jastrow,_¿¿¿ I.1\" data-ref=\"Jastrow, ¿¿¿ I 1\"><span dir=\"rtl\">¿¿¿</span> I</a>.

So, with this, MuPDF (like our text rendering) decides the whole paragrpah is to be RTL (even if it does bad on the last line).
You can help it consider the content is LTR by creating a microscript that will adjust the html output, by creating alongside the Jastrow.ifo a Jastrow.lua containing:

return function(html)
    -- return "<div dir='ltr'>" .. html .. "</div>" -- somehow, this does not work
    return "<span dir='ltr'>" .. html .. "</span>" -- but this does
end

and you will get:
image

(If you may get multiple paragraphs from the dict, you may need to tweak this microscript to split by line/para, and wrap each of them in a <span dir='ltr'>.)

I'd say it's not a KOReader bug, but the HTML dict could have help itself by wrapping each result in such a <span dir='ltr'>.

@jayjf
Copy link
Author

jayjf commented Jul 12, 2022

Thanks!
But it still leaves me wondering, how does the Alpus app get it right? It's pointing to the same dictionary...

@jayjf
Copy link
Author

jayjf commented Jul 12, 2022

Where are these microscripts documented? Can one be used to change the font of the dictionary popup?

@mergen3107
Copy link
Contributor

@jayjf
You can create a .css file with the same name next to the .ifo file in the dictionary folder. Then, I am not sure, maybe apply font via css to body?

@Frenzie
Copy link
Member

Frenzie commented Jul 12, 2022

@poire-z

I'd say it's not a KOReader bug, but the HTML dict could have help itself by wrapping each result in such a <span dir='ltr'>.

It is (well, a MuPDF bug). The default document direction is LTR so the wrapping you refer to is there by default for the entire document unless otherwise specified with dir=auto or dir=rtl. Even with dir=auto I don't think this would be the correct behavior due to the <strong> wrapping, but that's more subtle.

@jayjf

Where are these microscripts documented?

https://github.com/koreader/koreader/wiki/Dictionary-support#html-encoding-within-stardict-dictionaries-supported

But it still leaves me wondering, how does the Alpus app get it right? It's pointing to the same dictionary...

Then it uses something other than MuPDF. ;-) (The most obvious guess would be Android WebView.)

@poire-z
Copy link
Contributor

poire-z commented Jul 12, 2022

The default document direction is LTR

Oh, right.
Well, after a quick look at (our version of) MuPDF, I couldn't really figure out how its bidi handling works...
It doesn't seem to parse body {direction: ltr}, it doesn't work as expected if in htmlboxwidget.lua we wrap with <body dir='ltr'> instead of just <body>... Neither wrapping the inner content with <div dir='ltr'>. Only thing working is <span dir='ltr'>, but we can't make the whole content inline.

But it still leaves me wondering, how does the Alpus app get it right? It's pointing to the same dictionary...

May be because it handles the default direction correctly. Or it doesn't handle RTL at all. I think even with main Hebrew>He or Arabic>Ar dictionaries, that may not wrap their output in a <div dir='rtl'>, it would have to guess/force some behaviour, that may not always be the expected one.

Can one be used to change the font of the dictionary popup?

Only per dictionary, and for HTML dictionaries where we can indeed fix the html with a .lua script, or tweak the styles with a .css file.

You can create a .css file

See here how you could force your preferred font with the CSS: #9229 (comment)

I guess you could then use:

* { font-family: myLTRFont }
*[dir=rtl] { font-family: myHebrewFont }

(looks like MuPDF does support [attr=value])

@Frenzie
Copy link
Member

Frenzie commented Jul 12, 2022

(our version of) MuPDF

The current version (MuPDF 1.20) behaves the same ftr. :-)

@jayjf
Copy link
Author

jayjf commented Jul 12, 2022

Ok, I got the definition in my preferred font, but how about the word itself that I'm looking up, at the top of the popup. That font doesn't seem to change.

@poire-z
Copy link
Contributor

poire-z commented Jul 12, 2022

That is part of the UI, unlike the definition which is rendered by MuPDF and that you could easily tweak.
So, this word will stick with the font and fallbacks set of the UI - so Hebrew probably comes from our FreeSans font.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants