Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Right to left support (Arabic, Hebrew, ...) #5359

Closed
hgrain86 opened this issue Sep 11, 2019 · 169 comments · Fixed by #5667
Closed

Right to left support (Arabic, Hebrew, ...) #5359

hgrain86 opened this issue Sep 11, 2019 · 169 comments · Fixed by #5667

Comments

@hgrain86
Copy link

@hgrain86 hgrain86 commented Sep 11, 2019

  • Device: kobo aura one

Issue

Duplicate topic but I wrote for the importance
Will we see support for Arabic soon?
In the keyboard
And in writing the names of files and folders
Finally in the menus and settings
Thank you very much for this wonderful support and this extraordinary effort

@pazos

This comment has been minimized.

Copy link
Contributor

@pazos pazos commented Sep 11, 2019

From the README:

KOReader is developed and supported by volunteers [...] you can create a bounty for the specific bug or feature request you want and motivate others to do the work.

Duplicate of #5048, which was a duplicate of #1426.
Also related to #4709, #2944, #1767

@Frenzie

This comment has been minimized.

Copy link
Member

@Frenzie Frenzie commented Sep 11, 2019

In the keyboard

A basic keyboard is actually a great place for just about anyone to contribute. You can copy the English layout (or one closer to your goals if applicable) and start changing characters.

https://github.com/koreader/koreader/tree/4da512ce4e61df40a2c07f79d23ed21c9748f68f/frontend/ui/data/keyboardlayouts

And in writing the names of files and folders

Unless someone's "secretly" working on it, all we've got right now are a few promising libraries (FriBiDi, raqm).

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Sep 11, 2019

Well, I'm currently playing with fribidi/harfbuzz on the crengine side (EPUB rendering). Technical notes in koreader/crengine#307.

I can't promise anything on the UI/filename side, but I'll probably have a look at it.
On the surface, it feels it might not be super complicated by just using the libraries to reorder the chars and correctly shape arabic, so displaying arabic/bi-directionnal filenames should be achievable.

But deeper, there are many more tedious things. A keyboard might be easy (but to be done by some arabic users), but the textbox editing/cursor/insertion/wrapping feels like a nightmare, and it would fragilize the already complicated code we have for left-to-right...
So, we'll see.

Anyway, @hgrain86 (and others reading this, that read arabic or hebrew), just a few questions so I get to understand the importance of each step - I don't read at all arabic, I'm just playing with libraries that are supposed to do things right - and visually comparing with how Firefox displays the same Wikipedia page.

First, just to have an early confirmation, are these screenshots readable/correct/perfect (with various western fonts I have, might not have the best arabic glyphs - and discarding the fact that some titles and list bullets should be right aligned, that will be fixed later):

image
image
image
image
(source)
(edit: replaced that 4th screenshot, it didn't have harfbuzz enabled :) moved the original one below).

A) are these readable, and how would you rate the quality of that on a scale of 0 to 10 :)
B) is the hyphenation of english words (in red) in the middle of a line OK/expected? If not, what's the alternative? No hyphenation at all?
C) what are these little www ww in blue, that I see only on this word and not on any others? (They are on the Wikipedia page too, so probably not a bug.)
D) On the last screenshot (a Quran.epub I found), there are these little dots in green above all words, that I don't see in any Wikipedia page. Are these expected? I see that we would need some huge interline space (more than the 130% max we have already) to get them displayed and not overriden by the next line. Would you need more interline space with arabic?

That nice arabic (I hope :) is achieved thanks to Harfbuzz, that shapes the individual characters into correct cursive display.
When we are not using it, but still reordering the chars right-to-left, we would get that:
image
image
image

E) Even if it's probably not as nice as the first screenshots, are these still readable?

This is where it gets complicated: this is the current UI code, that doesn't re-order chars, so this is probably reversed and unreadable:
image

F) Is that totally unreadable? Or is your brain able to re-order it and decypher it ? :)
G) Would the reordering without shaping (I asked about in E) be enough for these kind of text editing?

And about the UI (menu, filenames):
H) would having them displayed correctly ordered, but still left aligned, in our menu, file lists be ok? Would cursive/shaping be needed?
I) how important is it that menu items/filenames be right aligned, as you would expect - because that feels like another nightmare as each of our widgets would need to have some different options/code to align the subwidgets differently... so, alternatives to test each time we make some widget change... which would be painful for many of us.

J) would you be willing to handle the strings (english to arabic) translation work ? :)

In case some hebrew reader passes by, here's some rendering of a hebrew wikipedia epub page (source). same questions as above :) is it readable/perfect?
image
image
image

@hgrain86

This comment has been minimized.

Copy link
Author

@hgrain86 hgrain86 commented Sep 11, 2019

I will try to read your reply leisurely and answer the questions that I can answer because I am not a developer or experience in programming at all, but I am a regular user of this special program Thank you very much for all this interest, even if I can support you financially for your gift all I have

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Sep 14, 2019

Another question (still just thinking how much complexity this would add):

K) How should the following UI sentence be rendered when translated to arabic:

do you want to delete book title (2017).epub ?

In the following, assume the uppercase are the arabic letters making the translated words from their lowercase counterpart in english:

a) ? BOOK TITLE (2017).epub ETELED OT TNAW UOY OD
b) ? ELTIT KOOB (2017).epub ETELED OT TNAW UOY OD
c) ? (2017) ELTIT KOOB.epub ETELED OT TNAW UOY OD
d) ? epub.(2017) ELTIT KOOB ETELED OT TNAW UOY OD
e) ? bupe.(2017) ELTIT KOOB ETELED OT TNAW UOY OD

K1) when book title is english
K2) when book title is arabic

@Frenzie : related technical question: in Transiflex, or other translation tools, are there tools to isolate/override the directionality, or would the translators have to insert the unicode chars mentionned at http://unicode.org/reports/tr9/#Directional_Formatting_Characters - or would us developpers need to take care of that in our translatable strings (?!).
If you see where I'm going with the sample sentence above (a filename direction might be different from the surrounding text (I guess Windows goes with b)

@Frenzie

This comment has been minimized.

Copy link
Member

@Frenzie Frenzie commented Sep 14, 2019

Wouldn't that depend on the filename? It doesn't really strike me as something you can translate in advance.

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Sep 14, 2019

I think all the strings/translation/templace substitutions would be done with the logical order of all strings.
So the arabic translated string would (in logical order) be DO YOU WANT TO DELETE %1 ?
T(_()) applied, we would get DO YOU WANT TO DELETE book title (2017).epub ? with some english book title and DO YOU WANT TO DELETE BOOK TITLE (2017).epub ? with some arabic book title - all that in logical order.

Give these to fribidi, and we would get the visual order:
with english book title: ? book title (2017).epub ETELED OT TNAW UOY OD
with arabic book title: ? epub.(2017) ELTIT KOOB ETELED OT TNAW UOY OD

I guess the .epub suffix would still need to be on the right of the filename.
So, either we would add to our code (as we know it's a filename):
T(_("do you want to delete <LEFT‑TO‑RIGHT ISOLATE>%1<POP DIRECTIONAL ISOLATE>?"), filename)
or:
T(_("do you want to delete %1?"), ltr_isolate(filename))

Or the translators would need to add them:
DO YOU WANT TO DELETE <LEFT‑TO‑RIGHT ISOLATE>%1<POP DIRECTIONAL ISOLATE> ?

(or LEFT-TO-RIGHT OVERRIDE, the distinction between isolate and override was clear to me for about a few minutes last week, but I've already forgotten... :)

Really nothing specific/additional for arabic in translation tools?

@Frenzie

This comment has been minimized.

Copy link
Member

@Frenzie Frenzie commented Sep 14, 2019

There is bidirectional isolation in https://projectfluent.org/ They list it as an advantage over gettext.

In our case I would think T(_("do you want to delete %1?"), ltr_isolate(filename)) looks like it makes the most sense, but maybe that should just be part of the template function?

Edit: interestingly, just the other day someone posted about a Lua implementation https://discourse.mozilla.org/t/work-started-on-a-lua-implementation/44963

Edit 2: whoa, Pontoon works quite well.

@yparitcher

This comment has been minimized.

Copy link
Contributor

@yparitcher yparitcher commented Sep 18, 2019

@poire-z
i am a little late to the party, but would like to help
speaking for hebrew RTL here:

And about the UI (menu, filenames):
H) would having them displayed correctly ordered, but still left aligned, in our menu, file lists be ok? Would cursive/shaping be needed?

yes, right aligned is better but left aligned is ok.
hebrew has (optional) diacritics so if they are present in the text and not aligned/positioned to the right letter would look a little funny.(this happens in crengine when not using best (harfbuzz) kerning) most filenames probably wont have the diacritics. see pictures below.

I) how important is it that menu items/filenames be right aligned, as you would expect - because that feels like another nightmare as each of our widgets would need to have some different options/code to align the subwidgets differently... so, alternatives to test each time we make some widget change... which would be painful for many of us.

depends, for example TOC would be more expected to be right aligned in a totally hebrew book than the file browser with hebrew and english entries

J) would you be willing to handle the strings (english to arabic) translation work ? :)

i don't mind helping out with integrating hebrew support in strings/code, I just need someone to point me in the right direction as i am not familiar with the codebase/lua

In case some hebrew reader passes by, here's some rendering of a hebrew wikipedia epub page same questions as above :) is it readable/perfect?

the render is readable

for comparison i will show good screenshots courtesy of RTL in crengine
Reader_2019-Sep-18_174104
this one has no diacritics.
Reader_2019-Sep-18_180916
this one with harfbuzz/best has the diacritics( dots) in the right places.
Reader_2019-Sep-18_180930
this one with freetype/good has them going into the next letter/ missing.

do you need me to test any specific features, it is quite easy for me to generate a hebrew epub with specific features, lists bullets footnotes etc.

Another question (still just thinking how much complexity this would add):

K) How should the following UI sentence be rendered when translated to arabic:

do you want to delete book title (2017).epub ?

In the following, assume the uppercase are the arabic letters making the translated words from their lowercase counterpart in english:

a) ? BOOK TITLE (2017).epub ETELED OT TNAW UOY OD
b) ? ELTIT KOOB (2017).epub ETELED OT TNAW UOY OD
c) ? (2017) ELTIT KOOB.epub ETELED OT TNAW UOY OD
d) ? epub.(2017) ELTIT KOOB ETELED OT TNAW UOY OD
e) ? bupe.(2017) ELTIT KOOB ETELED OT TNAW UOY OD

K1) when book title is english
K2) when book title is arabic

K1) a
K2) c or d, regular BIDI (d), unless you make an exception that the file suffix is always on the right (c)

i use an english locale (english speaker), but read hebrew books also
so for me.
K1) do you want to delete BOOK TITLE (2017).epub ie. regular LTR
K2) do you want to delete ELTIT KOOB (2017).epub would be proper BIDI, the hebrew part gets reversed
exactly how you said in #5359 (comment)

please implement this, (you will make a great program much better)
if you have any more questions or need someone to validate screenshots please ping me.
also are there any specific things i can add in the code to help implement this. (where are the UI text layout functions located)

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Sep 19, 2019

Thanks for the nice feedback!

is it readable/perfect?

the render is readable

So, not perfect ? :) anything missing to make it perfect?

when not using best (harfbuzz) kerning

Well, you probably have to use "best" for anything a bit complex to show correctly. "good" is really just a hack that may work on some simple western text. "fast" is pure freetype, and similar to how our UI renders text (but without the RTL support).
Even for simple latin ê, Freetype is fine with the single unicode codepoint for ê, but when using e+^, the ^ is a bit miscentered and too near the top of the e. Only best (harfbuzz) is able to decide that this is bad and correct the offsets, or switch to use the glyph for ê if it exists in the font.
(Or may be I messed up with fast and good - but it looks it wasn't better in previous crengine versions).

That's why we'd need to use Harfbuzz for the UI too... (Hebrew without diacritics might be fine with Freetype, but we might as well want cursive like arabic and scripts with heavy glyph substitution like indic to work).

do you need me to test any specific features, it is quite easy for me to generate a hebrew epub with specific features, lists bullets footnotes etc.

Thanks, but for now, I'm ok with EPUBs made out of HE or AR wikipedia articles.
List item bullets and block stuff may have to wait a bit. For now, you might want to use a specific style tweak to workaround that, like:
body, p, li, h1, h2, h3, h4, h5, h6 { text-align: right !important; }
(Or without p if you prefer them justified:)
body, li, h1, h2, h3, h4, h5, h6 { text-align: right !important; }

i don't mind helping out with integrating hebrew support in strings/code, I just need someone to point me in the right direction as i am not familiar with the codebase/lua

Well, that's on Transiflex, and @Frenzie knows more about that than I do. But best to wait before starting translations until we have at least an idea on how to do it and have started the work :)
Because for now, it looks quite tedious (see #3904 (comment)).

where are the UI text layout functions located

It's quite all contained in a few modules:

font.lua font list and simple selection, and wrapper to Freetype
freetype.lua interface with FreeType, simple wrappers to Freetype object
rendertext.lua the real API to draw/size/truncate text, that we should keep to avoid changing all the other widgets - and glyph cache
textwidget.lua Single line text widget (used for menu items and single line file browser filenames)
textboxwidget.lua Multi lines text widget (used when filenames are put on 2 lines, and by mostly all our widgets, like displaying an InfoMessage).

textboxwidget.lua would be the more complex to adapt to using something more complex than simple Freetype (that maps 1 char => 1 glyph and just stack them on the line) - and it currently needs a lot of memory with long text (like long dict entries or Wikipedia articles shown from the UI).
So, still thinking :)

@Frenzie

This comment has been minimized.

Copy link
Member

@Frenzie Frenzie commented Sep 19, 2019

Transifex is just a workflow aid; at the core we simply work with GetText PO/POT files. So you can either use the Transifex online interface or download the relevant translation file locally, work on it in your favorite editor, and upload it. You can do that manually through the web interface or assisted by the tx command line tool.

Nice ways to edit PO files:

  • Poedit
  • Virtaal
  • Lokalize
  • Qt Linguist
  • Gtranslator
  • Text editor (Geany, Kate, etc.)
  • OmegaT

Transifex automatically switches to an RTL interface when appropriate (which can be disabled); I don't know the details with regard to the programs I mentioned.

From here:

Does Transifex support Right-To-Left (RTL) languages?

Yes! When translating to a RTL language such as Arabic or Hebrew, the Editor will automatically switch the input box to RTL for you. You can override this and use LTR by clicking RTL in the translation box.

The primary advantage of working on Transifex is when you've got a lot of strings to localize and you're working on it with multiple people simultaneously. In a regular Git scenario, it'd be easy to accidentally duplicate efforts, leading to doubly wasted time because then there would also be merge conflicts to resolve.

@poire-z For implementation considerations, this post about wxWidgets might be of interest.

Wordpress has an is_rtl() function.

@NiLuJe

This comment has been minimized.

Copy link
Member

@NiLuJe NiLuJe commented Sep 19, 2019

Would something like raqm help for the UI side of things?

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Sep 19, 2019

I had a look at raqm again, and even if I like its simplicity, I don't think we'd be able to use it as is. Had thoughts (because of its simplicity) about how much we would need to patch/extend it (and get involved in upstream development) - but there are quite a few things that we would need to tie to how our frontend does things, that relate to:

  1. line breaking: libraqm does all its stuff on a single line, and have bidi runs all along - while we would need the 2 steps that crengine does: measuring in logical order, then line breaking, then bidi visual ordering and reshape of each line
  2. font fallback: libraqm allows providing on each input char the font to use for that input char - while we would prefer auto reshaping of notdef segments with a provided set of fallback fonts (like I added to crengine - which has only 2, but nothing prevent from having a chain of fallback fonts, like our frontend has).

(One thing I noticed in libraqm and that I missed/forgot in the crengine bidi stuff is that we should split measureText() segment on "unicode script" change, and not only on direction change (so mixed hebrew and arabic in a single text node are split into different segments - while they are currently only one and harfbuzz uses the script of the first kind of text it meets).

So, although there are wishes for these features on libraqm's github, and many closed and not merged PRs that tried at them (and none that went using libunibreak) , it might be easier and quicker to just go at doing something a bit similar to how it is/what it does, but really just targeted at what we need.
And what we need is all in textboxwidget :| which is huge, but contemplating it, it's quite interesting to find a way to have most of that delegated to a C library.

Mainly, the C code would be fed the Lua text string, and we would get back a ShapedText object, with enough methods to get the cumulative width, send current line wished width, get the slice that fit in, set slices as lines, shape a line, get the glyphs (face, glyph index and positions) for that line back, so our Lua side can cache and blit the glyphs from that.
All the datastructures would be malloc'ed from C and kept as long as the ShaptedText object is not gc'ed, so I expect we would gain a lot on the memory usage (and Lua gc) issues we have.

Anyway, could be fun, many of the tough harfbuzz (clusters/glyph/chars walking) and fribidi stuff is already done and could be copied&pasted from crengine - but I would miss the help all the lvArray/lvString/lv*... helpers that crengine provides (*).

Not sure yet how to start on that... Lua library (like cre.cpp), or some plain C lib wrapped by ffi?
I'd like to avoid having to do too much (so no Freetype drawing, no Glyph caching, no blitting, no font management), mainly because I'd feel naked out of crengine LVworld :) All that would still be done nearly as it already is by our frontend code. So, we'd just have a text shaper/layout C lib (just like libraqm is...) helper that would depend on harfbuzz/fribidi/libunibreak.

(*) just wondering, if I would need some array/hash/collection facilities, what's the most easily available? Depending on GLib would suck I think - and I guess all the std:: stuff requires libstdc++.so, that it seems none of our thirdparty is using (!). So, what else?

@NiLuJe

This comment has been minimized.

Copy link
Member

@NiLuJe NiLuJe commented Sep 19, 2019

We do pull glib for sdcv already, so you could theoretically use it.

(I don't recall, we might currently be building it static, but that can easily be corrected).

The std is indeed C++, not C ;).

@Frenzie

This comment has been minimized.

Copy link
Member

@Frenzie Frenzie commented Sep 19, 2019

@poire-z

libstdc++.so, that it seems none of our thirdparty is using

djvulibre and k2pdfopt are the most important users; there's some other stuff that uses it too.

https://github.com/koreader/koreader-base/blob/b9d95c73718fde01f02b4b9200fc36ea8ddc8e9c/Makefile.defs#L262-L272

koreader/koreader-base@dd4d31c

@yparitcher

This comment has been minimized.

Copy link
Contributor

@yparitcher yparitcher commented Sep 22, 2019

is it readable/perfect?

the render is readable

So, not perfect ? :) anything missing to make it perfect?

i just don't like that font, nothing on your end.

however after using hebrew for a few days i noticed harfbuzz can be aggressive on the kerning, bring letters too close together, they usually don't touch but are too close for comfortable reading

Is there any way i can tweak how aggressively harfbuzz does kerning, in the setting or code and i can test what is the best value for hebrew kerning?

a screenshot with tight kerning highlighted. (they are the same, one with highlights)

Reader_2019-Sep-22_162128
Reader_2019-Sep-22_162128

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Sep 22, 2019

Is there any way i can tweak how aggressively harfbuzz does kerning, in the setting or code and i can test what is the best value for hebrew kerning?

I don't think there are any tweaks about that, except a on/off toggle by enabling kerning or not. You would need to recompile crengine by setting -kern here:
https://github.com/koreader/crengine/blob/fe6efab20a759df26e2823d55be4b56ec3ad879a/crengine/src/lvfntman.cpp#L1038-L1041

Because I think Harfbuzz makes no decision: it just follows the instructions the font creator has put in his font. We can just tell Harfbuzz to follow or not these instructions.

You could try to hack your prefered font with FontForge and tweak/kill the kerning table (I know really nothing about how that work...) Or try another font, see if that aggresive happens with it too.
(Might be some bug on our side thus, dunno).

@yparitcher

This comment has been minimized.

Copy link
Contributor

@yparitcher yparitcher commented Sep 22, 2019

Is there any way i can tweak how aggressively harfbuzz does kerning, in the setting or code and i can test what is the best value for hebrew kerning?

I don't think there are any tweaks about that, except a on/off toggle by enabling kerning or not. You would need to recompile crengine by setting -kern here:
https://github.com/koreader/crengine/blob/fe6efab20a759df26e2823d55be4b56ec3ad879a/crengine/src/lvfntman.cpp#L1038-L1041

did not help.

Because I think Harfbuzz makes no decision: it just follows the instructions the font creator has put in his font. We can just tell Harfbuzz to follow or not these instructions.

You could try to hack your prefered font with FontForge and tweak/kill the kerning table (I know really nothing about how that work...) Or try another font, see if that aggresive happens with it too.
(Might be some bug on our side thus, dunno).

i will try another font later, but i don't think it is only the font.

when using the stock kindle reader with the font hack (for details what it is doing ask @NiLuJe ) i get very nice hebrew:
screenshot_2019_09_22T17_08_28-0401

compared to koreader with auto hinting, i get the cramped letters:
Reader_2019-Sep-22_173130

and koreader with native hinting:
Reader_2019-Sep-22_173145

however the stock kindle reader without the font hack totally fails with the diacritics (i don't have a screenshot)

@NiLuJe

This comment has been minimized.

Copy link
Member

@NiLuJe NiLuJe commented Sep 23, 2019

I assume that's on a KF8, not a KFX, right?

Checking with the exact same font would be a helpful comparison, because, yeah, IIRC, the KF8 renderer uses pango/cairo, but I don't completely recall how well it actually honors kerning/ligatures.

(i.e., I don't read the language, but it almost looks unkerned...).

EDIT: Oh. Also, try with no hinting, instead of auto/native ;).

@yparitcher

This comment has been minimized.

Copy link
Contributor

@yparitcher yparitcher commented Sep 23, 2019

I assume that's on a KF8, not a KFX, right?

yes

Checking with the exact same font would be a helpful comparison, because, yeah, IIRC, the KF8 renderer uses pango/cairo, but I don't completely recall how well it actually honors kerning/ligatures.

(i.e., I don't read the language, but it almost looks unkerned...).

i am using the same font for both (SBL_hebrew), i agree the kindle one looks unkerned, however turning off kerning with harfbuzz does not solve the problem in koreader

EDIT: Oh. Also, try with no hinting, instead of auto/native ;).

no hinting does the same for the spacing

harfbuzz has this issue (both regular and light), freetype and no kerning have good spacing but the diacritics are off

@yparitcher

This comment has been minimized.

Copy link
Contributor

@yparitcher yparitcher commented Sep 23, 2019

@poire-z is there a way to tweek letter_spacing, where is it controlled?

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Sep 23, 2019

i will try another font later, but i don't think it is only the font.

So, what's the result with other fonts?

is there a way to tweek letter_spacing, where is it controlled?

Via CSS. Just create a style tweak containing body { letter-spacing: 1px; }

You would need to recompile crengine by setting -kern here

You can try to disable other opentype features, see some examples in the next section used for good/harfbuzz light at https://github.com/koreader/crengine/blob/fe6efab20a759df26e2823d55be4b56ec3ad879a/crengine/src/lvfntman.cpp#L1043-L1074
and if it changes something, try to pointpoint the exact feature that need to be disabled.

Also, does it happen with the same letters when there are no diacritics around?
Can you share a HTML snippet with just a few words where this happen?
(With such a snippet, you can enable HB debugging here: https://github.com/koreader/crengine/blob/fe6efab20a759df26e2823d55be4b56ec3ad879a/crengine/src/lvfntman.cpp#L27-L29 and see some x/y that may show something, or not...)

Also, can you check how that snippet would do with other renderers that use harfbuzz? I think Chrome/Chromium use harfbuzz.

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Sep 23, 2019

@hgrain86 (and others reading this, that read arabic): could you comment about the arabic screenshots above? Are they ok, or do you notice some letter spacing issues too?

@yparitcher

This comment has been minimized.

Copy link
Contributor

@yparitcher yparitcher commented Sep 23, 2019

i will try another font later, but i don't think it is only the font.

So, what's the result with other fonts?

similar, some fonts more than others, some fonts naturally have more spacing so it is less of an issue.

is there a way to tweek letter_spacing, where is it controlled?

Via CSS. Just create a style tweak containing body { letter-spacing: 1px; }

thanks, it helps alleviate the symptoms but on the other hand some letters get too spaced out, so not a real solution

You would need to recompile crengine by setting -kern here

You can try to disable other opentype features, see some examples in the next section used for good/harfbuzz light at https://github.com/koreader/crengine/blob/fe6efab20a759df26e2823d55be4b56ec3ad879a/crengine/src/lvfntman.cpp#L1043-L1074
and if it changes something, try to pointpoint the exact feature that need to be disabled.

i tried that, but it happens with harfbuzz light also.
it is not an issue with the features rather with

harfbuzz in general.

Also, does it happen with the same letters when there are no diacritics around?

yes

Can you share a HTML snippet with just a few words where this happen?

attached

<html>
<body>
<p>ני לששל יש אמ וז</p>
</body>
</html>

(With such a snippet, you can enable HB debugging here: https://github.com/koreader/crengine/blob/fe6efab20a759df26e2823d55be4b56ec3ad879a/crengine/src/lvfntman.cpp#L27-L29 and see some x/y that may show something, or not...)

i did that and did not see anything helpful, also that only works with harfbuzz so does not help me for comparing to freetype.

Also, can you check how that snippet would do with other renderers that use harfbuzz? I think Chrome/Chromium use harfbuzz.

firefox:
1

libreofffice:
Untitled 1

the block sytle fonts (noto) dont kern as much but are not very nice to read, they do not capture the full letter shapes.

the end result so far: harfbuzz has this issue (both regular and light), freetype and no kerning have better spacing but the diacritics are off

@Frenzie

This comment has been minimized.

Copy link
Member

@Frenzie Frenzie commented Sep 24, 2019

thanks, it helps alleviate the symptoms but on the other hand some letters get too spaced out, so not a real solution

Not sure if crengine supports it properly, but on higher DPI displays you'll need a value like .5px or less for one physical pixel.

@yparitcher

This comment has been minimized.

Copy link
Contributor

@yparitcher yparitcher commented Dec 18, 2019

#5359 (comment) was without, #5359 (comment) was with.

@yparitcher

This comment has been minimized.

Copy link
Contributor

@yparitcher yparitcher commented Dec 18, 2019

I was pasteing them in from a character map to test it. definitely not for public consumption.

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Dec 18, 2019

this is probably following the standard and might not want to change but at the same time if the UI is LTR it might be more intuitive that the folder / file structure should also be.

Yes, it follows the standard, because we haven't yet tell it not to :) We can and should, where we know that some bit is a file path, give it hints. We're not yet done with bidi :)

First, we need to update many place to wrap filenames, file paths, directory paths, urls... with changes like:

     self.menu_items.open_previous_document = {
         text_func = function()
             local previous_file = self:getPreviousFile()
             if not G_reader_settings:isTrue("open_last_menu_show_filename") or not previous_file then
                 return _("Open previous document")
             end
             local path, file_name = util.splitFilePathName(previous_file) -- luacheck: no unused
-            return T(_("Previous: %1"), file_name)
+            return T(_("Previous: %1"), BD.filename(file_name))
         end,
         enabled_func = function()
             return self:getPreviousFile() ~= nil
         end,
         callback = function()
             self.ui:switchDocument(self:getPreviousFile())
         end,
         hold_callback = function()
             local previous_file = self:getPreviousFile()
             UIManager:show(ConfirmBox:new{
-                text = T(_("Would you like to open the previous document: %1?"), previous_file),
+                text = T(_("Would you like to open the previous document: %1?"), BD.filepath(previous_file)),
                 ok_text = _("OK"),
                 ok_callback = function()
                     self.ui:switchDocument(previous_file)
                 end,
             })

(I have started updating our code with that, but it's quite a lot, and I'm not testing each of them, and there's risk of crash if typo or wrong parens, so that's for after next release :)

Then, we need to implement all these BD.filepath & alike wrappers. In the yesterday commits, I somehow forgot LTR might show hebrew and would need wrapping too :| and made most of them no-op:

-- Optimise some wrappers by aliasing them to the right wrappers
if Bidi._rtl_ui_text then
Bidi.default = Bidi.rtl
Bidi.wrap = Bidi.rtl
Bidi.filename = Bidi._filename_rtl
Bidi.filepath = Bidi.ltr -- see if we need to split and _filename_rtl() the filename part
Bidi.directory = Bidi.ltr
Bidi.dirpath = Bidi.ltr
Bidi.path = Bidi.ltr
Bidi.url = Bidi.ltr
else
Bidi.default = Bidi.ltr
Bidi.wrap = Bidi.nowrap
Bidi.filename = Bidi.nowrap
Bidi.filepath = Bidi.nowrap
Bidi.directory = Bidi.nowrap
Bidi.dirpath = Bidi.nowrap
Bidi.path = Bidi.nowrap
Bidi.url = Bidi.nowrap
end

So, we need to implement these wrappers, and may be even use them with LTR.
I thought Bidi.ltr() would be enough for paths and urls, in RTL UI. But may be they aren't, and we need the processing suggested by @dov.
Or may be just prefix (or suffix, that one is confusing) each / with LRM (Left-to-Right Mark)?
Or it also needs the / wrapped in LRI / PDI.

But we need to be more careful than I thought when using these wrappers. They are to be used very late, just before giving them to TextBoxWidget - if you use them too early, some other code may be confused, e.g. when looking for 3-letters suffixes, a PDI at the end would make the suffix 6 bytes.
Also, something I'll have to test again, is when truncating left or right a long filename or directory: I saw the ... elipsis being added on the wrong side.
(Little question for @dov, if you know the answer before I really investigate that more: how fribidi would do when there are nested ISOLATES, with PDI at the end, and the string is truncated on its logical order right, and the PDI are discarded/lost: would fribidi implicitely pop any open isolate at the end if it doesn't see any explicite one? Or would it ignore the open because there's no closing? Or is that undefined behaviour :| ?)

and when typing into the koreader searchbar i get:

Well, InputText is even more tedious, so don't test things with it... I think we won't be able to do any BD.wrapping in it. Even if we know it should accept a filepath, we could wrap it, but when typing, you'll meet these bidi marks, you could delete them and mess with the display... and when getting back what the user typed, we'd need to remove them and make sure the input is sane before using it... Quite more tricky :|

@dov

This comment has been minimized.

Copy link

@dov dov commented Dec 18, 2019

(Little question for @dov, if you know the answer before I really investigate that more: how fribidi would do when there are nested ISOLATES, with PDI at the end, and the string is truncated on its logical order right, and the PDI are discarded/lost: would fribidi implicitely pop any open isolate at the end if it doesn't see any explicite one? Or would it ignore the open because there's no closing? Or is that undefined behaviour :| ?)

Yes, the Unicode algo defines that all states are reset at the end of he paragraphs. I.e. all LRO, RLO, RLE, LRE, LRI, RLI, and FSI are implicitly closed in reverse order as if you had explicit PDF's, and PDI's.

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Dec 18, 2019

Thanks.
OK, some quick test with LRM:

-        Bidi.filepath = Bidi.nowrap
-        Bidi.directory = Bidi.nowrap
-        Bidi.dirpath = Bidi.nowrap
-        Bidi.path = Bidi.nowrap
-        Bidi.url = Bidi.nowrap
+        Bidi.filepath = Bidi._path
+        Bidi.directory = Bidi._path
+        Bidi.dirpath = Bidi._path
+        Bidi.path = Bidi._path
+        Bidi.url = Bidi._path
[...]
+local LRM = "\xE2\x80\x8E"     -- U+200E LRM / LEFT-TO-RIGHT MARK
+local RLM = "\xE2\x80\x8F"     -- U+200F RLM / RIGHT-TO-LEFT MARK
+function Bidi._path(path)
+    return path:gsub("/", LRM.."/")
+end

LTR UI (good I guess?)
image

RTL UI:
image
Quite messy... Thought it was line wrapping that may make us shape lines while losing some bidi marks, or Harfbuzz not getting them, but same issue with a smaller font:
image

It gets a bit better if I wrap the whole path in LTR isolate:

function Bidi._path(path)
    return Bidi.ltr(path:gsub("/", LRM.."/"))
end

image

I wondered why Harfbuzz was mirroring that question mark ? - but it is not, it's a ؟ https://graphemica.com/%D8%9F arabic question mark , part of the translated string :)

Or may be RTL users would even prefer (I don't think so, but I'm not them):
image

function Bidi._path_rtl(path)
    local dirpath, filename = util.splitFilePathName(path)
    return Bidi.ltr(dirpath:gsub("/", LRM.."/") .. Bidi._filename_rtl(filename))
end

OK, that's with LRM. But I guess wrapping with FSI..PDI is better at keeping things ordered if a subdirectory contains hebrew mixed with english...
@dov: any general opinion/advice on prefering/deciding between wrapping with isolate, or just inserting LRM/RLM marks (that don't need wrapping with open/close marks, so may be less risky when truncating/line wrapping - or is that just me overthinking it)?

@yparitcher

This comment has been minimized.

Copy link
Contributor

@yparitcher yparitcher commented Dec 18, 2019

@WaseemAlkurdi

This comment has been minimized.

Copy link
Contributor

@WaseemAlkurdi WaseemAlkurdi commented Dec 18, 2019

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Dec 18, 2019

i was not expecting inputtext to change anything, i typed it in with the control characters

Oh, allright. Have you added these bidi control chars to your hebrew keyboard?

if you want any help replaceing all the filename instances in code etc, or a tester for your changes I am more than happy to help.

Thanks. I just finished a first pass around all our code, wrapping files and directories paths and names.
I'll make a "WIP" PR (I'll be soon off for a week), so you can test and play with it, and try to decide and improve how to bidi-wrap stuff.

Appending "(London 2017)", to your directory with that hebrew word, with FSI/auto, I would get:
image
but I had to cheat a bit for this case by appending a space to the last "/" to allow a wrap.

If I hadn't, we'd have a wrap at a unlucky position (because there's no wrap opportunity in the bottom part), and we would get this, that I find totally confusing (is it?):
image

So, may be we need to add zero width space before or after the /, to allow a break on them.

With a LRM, we'd get this (which feels less bogus, but I guess bogus display could happen too if some hebrew was in/around the parens):
image

You'll also have to decide how to do that for the filename (LTR: פרשת_וישב (London 2017).epub or auto: (London 2017) פרשת_וישב.epub.

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Dec 18, 2019

@yparitcher : my changes are in #5696. Feel free to play with it and update/fix it.

@yparitcher

This comment has been minimized.

Copy link
Contributor

@yparitcher yparitcher commented Dec 18, 2019

@WaseemAlkurdi
Gnome teminal got RTL/Bidi support in 3.33.3 in july 2019 https://mail.gnome.org/archives/gnome-announce-list/2019-July/msg00003.html

Oh, allright. Have you added these bidi control chars to your hebrew keyboard?

no, I was pasteing them into the emulator.

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Dec 19, 2019

@yparitcher #5696 updated - keep it up to date if you're already playing with it (I'll probably add some other fixes later today.)
Updated with @WaseemAlkurdi feedback on what's good in RTL UI.

There's still some strange things:
image
On this one, the string is not translated, so it's wrapped in LTR by GetText.
And we build that not the correct way, by concatenating translatable strings:
text = _("Are you sure that you want to delete this file?\n") .. BD.filepath(item.file) .. ("\n") .. _("If you delete a file, it is permanently lost."),
so, it's LTR .. LTR .. LTR in a TextBoxWidget with a para direction of RTL.
(And my GetText tweak lose the trailing \n...)

It should be a single thing wrapped in LTR
text = T(_("Are you sure that you want to delete this file?\n%1\nIf you delete a file, it is permanently lost."), BD.filepath(item.file)),
image

edit: or not :) may be it would be all fine if I didn't lose the trailing \n

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Dec 19, 2019

may be it would be all fine if I didn't lose the trailing \n

Yes, it is. Discard my advice ^. Adding a fix to #5696.

Another confusing one in RTL UI, with FSI/auto for the filename name part:
image

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Dec 19, 2019

@dov: sorry to bother you, but may be you can enlighten me about what/why these I1 & I2 Unicode rules are there: http://www.unicode.org/reports/tr9/tr9-39.html#Resolving_Implicit_Levels. (No problem with Fribidi: it does exactly as the specs and the reference tool - and Harfbuzz mirrors as it should the one quotation mark that ends up being in a RTL level.)
I think they are the cause of what follows:

With this pure LTR input Joe : « L'écriture est » without any wrapping, in a RTL base direction context/paragraph, we'd get the quotation marks shuffled...
https://unicode.org/cldr/utility/bidic.jsp?s=Joe+%3A+%C2%AB+L%27%C3%A9criture+est+%C2%BB&b=1&u=110&d=2
image

It looks like it's also the reason for my ellipsis when text is truncated to be on the wrong side:
https://unicode.org/cldr/utility/bidic.jsp?s=Joe+%3A+%C2%AB+L%27%C3%A9criture+est%E2%80%A6&b=1&u=110&d=2
image

Why do you think that is, and doesn't it look to you wrong?
And how do you think this should be worked around? Wraping all text where this may happen (mostly everywhere :) in FSI...PDI ? Adding some Bidi mark to the ellipsis, or at end of input text?...
Thank you for your Bidi wisdom :)

@dov

This comment has been minimized.

Copy link

@dov dov commented Dec 20, 2019

@poire-z The reason round, square, and parenthesis are shown in the "correct" order is due to a new bracket property and rule since Unicode v6.3 . But the « » brackets don' t have the open/close bracket properties. My guess it is because they are sometimes (some languages?) used in the swapped order, i.e. » opens the pair and « closes.

I think the most reasonable behaviour is always to put paths in FSI/PDI pairs so that they are rendered independantly of the surrounding direction. Or possibly to put each sub-path in its own FSI/PDI pair so that that the sub-path order follow the surrounding direction, but each sub-path is rendered in isolation. (As I suggested in my previous comment.) Any of these options will also take care of the ellipsis.

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Dec 20, 2019

Thank you.
Indeed, no issue when using parens: https://unicode.org/cldr/utility/bidic.jsp?s=Joe+%3A+%28+L%27%C3%A9criture+est+%29&b=1&u=110&d=2

they are sometimes (some languages?) used in the swapped order, i.e. » opens the pair and « closes.

Yes, e.g. German. koreader/crengine#307 (comment)

I guess it's out of scope for Fribidi (no hook currently for that?) - but chars categorization could be tweaked according to the language (but I guess Fribidi shouldn't care about a language). libunibreak has something like that for a few languages in https://github.com/adah1972/libunibreak/blob/master/src/linebreakdef.c.

I guess the fact that ending punctuation (period, ellipsis) is put at the end of the base direction has some logic: to properly end the base direction text - even if it may look wrong when there is only other-direction text, at least for the other-direction reader I am :)

Note to self: when displaying book metadata (title, authors...), provide the language from the book metadata to XText.

I think the most reasonable behaviour is always to put paths in FSI/PDI pairs

Yes, we'll be doing that for paths (done in some of the screenshots above).

But my examples above would apply to book titles, and other small bits of text that are not paths.
One solution that fixes both cases is just to prepend a FSI to the string (and just don't bother with the ending PDI, as it might be truncated out anyway). I'll have to think how/when to do that, and if it should be done by default, or just when using external text like metadata.

@yparitcher

This comment has been minimized.

Copy link
Contributor

@yparitcher yparitcher commented Dec 24, 2019

another thought:

when reading a Hebrew book in LTR UI the table of contents is LTR.
Reader_2019-Dec-24_182419

it would look better if it matches the book language/direction ie: RTL
Reader_2019-Dec-24_182454

however then it is not clear which way the arrows on bottom should go.

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Dec 29, 2019

I dunno how edge-case your case is :) But we should be able to handle it (may be after all the Bidi/RTL normal UI work has settled).

A quick try at mirroring just the items, not the bottom arrows (a bit ugly, as I had considered _mirroredUI private :):

--- a/frontend/apps/reader/modules/readertoc.lua
+++ b/frontend/apps/reader/modules/readertoc.lua
@@ -90,4 +90,12 @@ function ReaderToc:fillToc()
     end
     self.toc = self.ui.document:getToc()
+
+    self.mirror_items_layout = nil -- use nil to keep using UI default
+    -- Tweak that condition (global or per book setting, detected book language direction
+    -- different from UI one...):
+    if true then
+        self.mirror_items_layout = true
+        -- set to false if in RTL UI with english LTR TOC
+    end
 end

@@ -285,7 +293,11 @@ function ReaderToc:onShowToc()

     -- update collapsible state
+    local mirrored_items = BD.mirroredUILayout()
+    if self.mirror_items_layout ~= nil then
+        mirrored_items = self.mirror_items_layout
+    end
     self.expand_button = Button:new{
         icon = "resources/icons/appbar.control.expand.png",
-        icon_rotation_angle = BD.mirroredUILayout() and 180 or 0,
+        icon_rotation_angle = mirrored_items and 180 or 0,
         width = Screen:scaleBySize(30),
         bordersize = 0,
@@ -323,4 +335,5 @@ function ReaderToc:onShowToc()
         item_table = self.collapsed_toc,
         state_size = button_size,
+        mirror_items_layout = self.mirror_items_layout,
         ui = self.ui,
         is_borderless = true,
@@ -357,5 +370,5 @@ function ReaderToc:onShowToc()
         local do_toggle_state = false
         if item.state and pos and pos.x then
-            if BD.mirroredUILayout() then
+            if mirrored_items then
                 do_toggle_state = pos.x > 0.7
             else
--- a/frontend/ui/widget/menu.lua
+++ b/frontend/ui/widget/menu.lua
@@ -137,4 +137,5 @@ local MenuItem = InputContainer:new{
     text = nil,
     bidi_wrap_func = nil,
+    mirror_layout = nil,
     show_parent = nil,
     detail = nil,
@@ -218,5 +219,7 @@ function MenuItem:init()
     local state_container = LeftContainer:new{
         dimen = Geom:new{w = self.content_width/2, h = self.dimen.h},
+        _mirroredUI = self.mirror_layout,
         HorizontalGroup:new{
+            _mirroredUI = self.mirror_layout,
             TextWidget:new{
                 text = state_indent,
@@ -272,4 +275,5 @@ function MenuItem:init()
             bold = self.bold,
             fgcolor = self.dim and Blitbuffer.COLOR_DARK_GRAY or nil,
+            para_direction_rtl = self.mirror_layout,
         }
         local w = item_name:getWidth()
@@ -373,5 +377,7 @@ function MenuItem:init()
     local text_container = LeftContainer:new{
         dimen = Geom:new{w = self.content_width, h = self.dimen.h},
+        _mirroredUI = self.mirror_layout,
         HorizontalGroup:new{
+            _mirroredUI = self.mirror_layout,
             HorizontalSpan:new{
                 width = self.state_size.w,
@@ -383,4 +389,5 @@ function MenuItem:init()
     local mandatory_container = RightContainer:new{
         dimen = Geom:new{w = self.content_width, h = self.dimen.h},
+        _mirroredUI = self.mirror_layout,
         mandatory_widget,
     }
@@ -397,6 +404,8 @@ function MenuItem:init()
         HorizontalGroup:new{
             align = "center",
+            _mirroredUI = self.mirror_layout,
             OverlapGroup:new{
                 dimen = Geom:new{w = self.content_width, h = self.dimen.h},
+                _mirroredUI = self.mirror_layout,
                 state_container,
                 text_container,
@@ -407,4 +416,5 @@ function MenuItem:init()
     local hgroup = HorizontalGroup:new{
         align = "center",
+        _mirroredUI = self.mirror_layout,
         HorizontalSpan:new{ width = Size.padding.fullscreen },
     }
@@ -422,4 +432,5 @@ function MenuItem:init()
     self[1] = FrameContainer:new{
         bordersize = 0,
+        _mirroredUI = self.mirror_layout,
         padding = 0,
         hgroup,
@@ -1011,4 +1022,5 @@ function Menu:updateItems(select_number)
                 text = Menu.getMenuText(self.item_table[i]),
                 bidi_wrap_func = self.item_table[i].bidi_wrap_func,
+                mirror_layout = self.mirror_items_layout,
                 mandatory = self.item_table[i].mandatory,
                 bold = self.item_table.current == i or self.item_table[i].bold == true,

About the condition for inverting the default mirroring from the UI (first chunk above):
Dunno if we should trust the book language, that may be absent.
Or have it toggle'able from a menu item (may be only shown when there is a book language with an other direction from the UI)?
(It would be the occasion to add a menu TOC settings>, where we could put the hidden "alternative TOC from headings" we get when holding on the TOC menu item.)

Or have that just follow the inverse_reading_order setting? I dunno if these should be orthogonal, or if a person deciding to inverse page turn gestures would want the TOC display and navigation inverted too. I naively think yes.
@WaseemAlkurdi : with the Arabic UI, reading an english book: would you "invert page turn taps and swipes"? And if yes, how would you want TOC display and navigation? Still RTL or as with the LTR UI?

And what about Bookmarks ? :)

@yparitcher

This comment has been minimized.

Copy link
Contributor

@yparitcher yparitcher commented Dec 29, 2019

I dunno how edge-case your case is :) But we should be able to handle it (may be after all the Bidi/RTL normal UI work has settled).

great

A quick try at mirroring just the items, not the bottom arrows (a bit ugly, as I had considered _mirroredUI private :):

works for me

About the condition for inverting the default mirroring from the UI (first chunk above):
Dunno if we should trust the book language, that may be absent.
Or have it toggle'able from a menu item (may be only shown when there is a book language with an other direction from the UI)?
(It would be the occasion to add a menu TOC settings>, where we could put the hidden "alternative TOC from headings" we get when holding on the TOC menu item.)

Or have that just follow the inverse_reading_order setting? I dunno if these should be orthogonal, or if a person deciding to inverse page turn gestures would want the TOC display and navigation inverted too. I naively think yes.

this is the hard question :)
i would say have a ladder with inverse_reading_order first and then the book language if set.

@WaseemAlkurdi : with the Arabic UI, reading an english book: would you "invert page turn taps and swipes"? And if yes, how would you want TOC display and navigation? Still RTL or as with the LTR UI?

And what about Bookmarks ? :)

i dont really use bookmarks so not sure, but after a quick test it uses the Page and date/time of the UI so it looks fine consistent with the UI and i dont see a need to flip it. (using the arabic UI to simulate RTL)
Reader_2019-Dec-29_113716
Reader_2019-Dec-29_114004

however this test shows a unrelated quirk/bug in bookmarks Bidi. the chapter name ( הלכות שבת - פרק שמיני) sould be isolated from the string so it apears itn the right order
Reader_2019-Dec-29_120109

this can be done by

--- a/frontend/apps/reader/modules/readerbookmark.lua
+++ b/frontend/apps/reader/modules/readerbookmark.lua
@@ -210,7 +210,7 @@ function ReaderBookmark:onShowBookmark()
             page = self.ui.document:getPageFromXPointer(page)
         end
         if v.text == nil or v.text == "" then
-            v.text = T(_("Page %1 %2 @ %3"), page, v.notes, v.datetime)
+            v.text = T(_("Page %1 %2 @ %3"), BD.default(page), BD.auto(v.notes), BD.default(v.datetime))
         end
     end
 
@@ -438,9 +438,9 @@ function ReaderBookmark:updateBookmark(item)
             self.bookmarks[i].pos0 = item.updated_highlight.pos0
             self.bookmarks[i].pos1 = item.updated_highlight.pos1
             self.bookmarks[i].notes = item.updated_highlight.text
-            self.bookmarks[i].text = T(_("Page %1 %2 @ %3"), page,
-                                        new_text,
-                                        item.updated_highlight.datetime)
+            self.bookmarks[i].text = T(_("Page %1 %2 @ %3"), BD.default(page),
+                                        BD.auto(new_text),
+                                        BD.default(item.updated_highlight.datetime))
             self.bookmarks[i].datetime = item.updated_highlight.datetime
             self:onSaveSettings()
         end
@@ -459,7 +459,7 @@ function ReaderBookmark:renameBookmark(item, from_highlight)
                     if not self.ui.document.info.has_pages then
                         page = self.ui.document:getPageFromXPointer(page)
                     end
-                    item.text = T(_("Page %1 %2 @ %3"), page, item.notes, item.datetime)
+                    item.text = T(_("Page %1 %2 @ %3"), BD.default(page), BD.auto(item.notes), BD.default(item.datetime))
                 end
                 break
             end

however i do not know how this would affect editing bookmarks etc.

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Jan 3, 2020

however i do not know how this would affect editing bookmarks etc.

Yeah, this is a bit tedious. I don't know how these added FSI/PDI to item.notes could affect other tools that use our settings (some discussions/samples at #3415 (comment) and #1095 (comment) ).
The sad things is that we've been saving item.text with the final concatenated bits - if would have been better to just save the individual bits, and build the visual strings only for display - but that's a harder thing to fix if trying to stay backward compatible (keeping past highlights working).

(The BD.default() in your patch might be not needed.)

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Jan 3, 2020

It looks like it's also the reason for my ellipsis when text is truncated to be on the wrong side:

A correlated reason for that is that, in xtext.cpp, when substituting the text char with an ellipsis, I keep the original chat bidi type and level. So, if the ellipsis is replacing a letter it would stay near the previous letter - but if it is replacing a space, it may be send away to start or end of string by the Unicode bidi rules L1 (or I1, I2, L1 to L4...).
And when adding an ellipsis to a TextBoxWidget, we always add it on a breakable character (so, a space), so the issue is always there - unlike for TextWidget where we drop the libunibreak rules and can add the ellipsis at any place.

For this text string, starting in logical order with that hebrew word, we'd either get: when replacing a letter:
image
but when replacing a space:
image
Same in RTL UI:
image
image

I'll have this handled in xtext.cpp (getting the right bidi type of the ellipsis, and assigning to it the bidi level of the previous char), so we get:
image
image

(We could also, when truncating file names, try to truncate text before the extension, and then add the extension so we always see it - we don't see its a .epub in the screenshots above - but well... looks not obvious, and we were not even doing that for non bidi filenames, so... dropping that idea:)

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Jan 3, 2020

Ok, made that xtext PR koreader/koreader-base#1025.
Looking again at the screenshots, I think that decision might be questionable:
The previous behaviour, giving oscillating results, is wrong - but an alternative would be to put the ellipsis always on the right in LTR, and always on the left in RTL.

But I think the way I have it done with that PR improves readability, and might solves some ambiguity, like this one:
image
Does it end or does it start with that hebrew word?

@WaseemAlkurdi

This comment has been minimized.

Copy link
Contributor

@WaseemAlkurdi WaseemAlkurdi commented Jan 5, 2020

@poire-z Can you do one with Arabic? ;-)

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Jan 5, 2020

Previously, before koreader/koreader-base#1025, in LTR UI:
image
in RTL UI:
image

With the final stuff commited, in LTR UI:
image
in RTL UI:
image

@WaseemAlkurdi

This comment has been minimized.

Copy link
Contributor

@WaseemAlkurdi WaseemAlkurdi commented Jan 5, 2020

With the final stuff commited, in LTR UI:
image

Looks great!

But ...

in RTL UI:
image

Wouldn't it be better if the RTL UI filenames are mirrored around, with English on the right, the dots in the middle, and the Arabic/Hebrew on the left, as an RTL UI is expected to be read from right to left?

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Jan 5, 2020

^ These 2 filenames start with the hebrew word / the 2 arabic words.
Ok, my capitalized This made that confusing: the filename in logical order is:
ARABIC WORDS This is a very long file name, quite long indeed, very very long indeed, very very super long.epub
So, in RTL UI, you start from the arabic words, as you normally do, read the arabic, see some latin, fast forward to start of latin, read until the ellipsis: that's where it ends truncated.

In LTR UI, that's where it's more naturally ambiguous.
You could start from the left, meet the ellipsis, but there's something you haven't read after: this is wrong.
So, start from the right: same as with RTL UI.

When the start is ambiguous, I think the ellipsis, pointing to where it ends (notice it is stuck to the english word, not to the hebrew or arabic word), give some help: follow the flow back from the ellipsis, and you'll reach the start :)

(Or I am overthinking it ? :)

@WaseemAlkurdi

This comment has been minimized.

Copy link
Contributor

@WaseemAlkurdi WaseemAlkurdi commented Jan 5, 2020

``
So, in RTL UI, you start from the arabic words, as you normally do, read the arabic, see some latin, fast forward to start of latin, read until the ellipsis: that's where it ends truncated.

Exactly! Are you sure you don't speak Arabic? ;-P

So with this in mind:

ARABIC WORDS This is a very long file name, quite long indeed, very very long indeed, very very super long.epub

that would make the RTL UI screenshot correct, and the LTR UI screenshot broken.

In LTR UI, that's where it's more naturally ambiguous.
You could start from the left, meet the ellipsis, but there's something you haven't read after: this is wrong.
So, start from the right: same as with RTL UI.

Let's take a real example where one might start with Arabic and end with Latin: suppose that there's a file called:
"Biography of Thomas Edison.epub".
but that file is in Arabic, and for some reason, the name of Thomas Edison is kept in Latin characters (some people prefer to keep Western names in Latin characters, while the majority of people "Arabicize" them by using Arabic characters to represent Latin sounds).
That would give us:
سيرة Thomas Edison.epub
Starting the wrong way (from Thomas Edison, then ending in "biography" is wrong in Arabic and grammatically wrong in English, where you should have a possessive 's).
So perhaps we should align the filename to make it start from the right (right after the size in LTR, and from the rightmost end in RTL) wherever the filename starts with an Arabic character?

When the start is ambiguous, I think the ellipsis, pointing to where it ends (notice it is stuck to the english word, not to the hebrew or arabic word), give some help: follow the flow back from the ellipsis, and you'll reach the start :)

It's an elaborate solution, and I like it myself ... but are KOReader users going to intuitively understand this?
Let me fire up my Arabic Windows instance and see how Windows does it.
EDIT: Damn, it's missing. I'll recreate it.

@poire-z

This comment has been minimized.

Copy link
Contributor

@poire-z poire-z commented Jan 5, 2020

and the LTR UI screenshot broken

I think we decided somewhere (can't find it above) with @yparitcher that as a rule, in the LTR UI (but also in the RTL UI), a filename will be laid out according to its first strong character: BD.auto(name part) + extension.
So, a hebrew filename name part will be isolated will be laid our from right to left, any traling latin being on the left of the hebrew start, but the extension still always on the right.
In the RTL UI, the difference is the extension follows the direction decided from the first char.

So perhaps we should align the filename to make it start from the right (right after the size in LTR, and from the rightmost end in RTL) wherever the filename starts with an Arabic character?

So, that's what we do - except for changing the alignment (for easthetics, I think we should keep all files aligned on the same good side of the UI).
I guess it might be confusing too :| Dunno if @yparitcher would wand the extension on the left, as a sign of where to not start reading :)

KOReader users going to intuitively understand this?

Most koreader users won't have any bidi filenames :) That's mainly for you and @yparitcher , and the few like you :)
(Also see #5359 (comment) :)

Anyway, you can experiment on solutions by playing with:

-- Optimise some wrappers by aliasing them to the right wrappers
if Bidi._rtl_ui_text then
Bidi.default = Bidi.rtl
Bidi.wrap = Bidi.rtl
Bidi.filename = Bidi._filename_rtl
Bidi.filepath = Bidi._filepath_rtl -- filename auto, but with extension on the right
Bidi.directory = Bidi._path -- will keep any trailing / on the right
Bidi.dirpath = Bidi._path
Bidi.path = Bidi._path
Bidi.url = Bidi._path
else
Bidi.default = Bidi.ltr
Bidi.wrap = Bidi.nowrap
Bidi.filename = Bidi._filename_ltr
Bidi.filepath = Bidi._filepath_ltr
Bidi.directory = Bidi._path -- will keep any trailing / on the right
Bidi.dirpath = Bidi._path
Bidi.path = Bidi._path
Bidi.url = Bidi._path
end

The implementation of each wrapper is at the bottom of the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.