Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keyboard for Arabic and languages with Arabic script, initial release #5569

Merged
merged 45 commits into from
Jan 18, 2020
Merged

Keyboard for Arabic and languages with Arabic script, initial release #5569

merged 45 commits into from
Jan 18, 2020

Conversation

WaseemAlkurdi
Copy link
Contributor

@WaseemAlkurdi WaseemAlkurdi commented Nov 6, 2019

This is an Arabic character keyboard layout. It is based in part on the PC keyboard layout, and in part on the Google keyboard.
I have had to add some keys to the first and second rows.
I haven't tested this, because attempting to compile the build from source results in an error:

make[1]: *** No rule to make target 'luafilesystem/src/lfs.c', needed by 'build/x86_64-linux-gnu-debug/libs/libkoreader-lfs.so'.  Stop.

Diacritic marks haven't been added, but they could be added in a couple of minutes, should this be tested and proven to work.


This change is Reviewable

@WaseemAlkurdi WaseemAlkurdi changed the title Added Arabic keyboard, initial release Keyboard for Arabic and languages with Arabic script, initial release Nov 6, 2019
@Frenzie Frenzie added this to the 2019.11 milestone Nov 6, 2019
@Frenzie Frenzie added the UX label Nov 6, 2019
@Frenzie
Copy link
Member

Frenzie commented Nov 6, 2019

On which system is that? You can use the extracted AppImage (for just testing Lua stuff) or the Docker image to have a tailored build environment, see https://github.com/koreader/koreader/blob/master/doc/Building.md#setting-up-a-build-environment-for-koreader

Also on systems other than Android it's generally simple to just change a few Lua files.

@WaseemAlkurdi
Copy link
Contributor Author

On which system is that? You can use the extracted AppImage (for just testing Lua stuff) or the Docker image to have a tailored build environment, see https://github.com/koreader/koreader/blob/master/doc/Building.md#setting-up-a-build-environment-for-koreader

Also on systems other than Android it's generally simple to just change a few Lua files.

I'm building on desktop Linux.
So is it okay to just copy the source tree over to the /mnt/us/koreader directory on my Kindle?
And do I have to edit options to enable the Arabic keyboard?

@Frenzie
Copy link
Member

Frenzie commented Nov 6, 2019

You still need to add it to this table:

lang_to_keyboard_layout = {
el = "el_keyboard",
en = "en_keyboard",
es = "es_keyboard",
fr = "fr_keyboard",
he = "he_keyboard",
ja = "ja_keyboard",
pt_BR = "pt_keyboard",
ko_KR = "ko_KR_keyboard",
},

@Frenzie
Copy link
Member

Frenzie commented Nov 6, 2019

So is it okay to just copy the source tree over to the /mnt/us/koreader directory on my Kindle?

Sort of. As long as you have the libraries in place that should be fine, but it's normally faster and easier to just overwrite the couple of Lua files you edited.

@WaseemAlkurdi
Copy link
Contributor Author

@Frenzie I did that ... KOReader now crashes. Why is that?
The Lua scripts are error-free, except for warnings about unused variables.

@@ -60,7 +60,7 @@ local alef = ar_popup.alef
local baa = ar_popup.baa
local jeem = ar_popup.jeem
local daal = ar_popup.daal
local h-aa = ar_popup.h-aa
local haa2 = ar_popup.haa2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally, underscore (_) is fine in a variable name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Frenzie I know, but I removed them in order to isolate an issue. I'll return them in the next round of cleanups. But for now, the keyboard as is would crash KOReader for some reason.

@Frenzie
Copy link
Member

Frenzie commented Nov 6, 2019

I did that ... KOReader now crashes. Why is that?

I have no idea, because all you've got here is two completely independent files that literally can't affect anything at all. ^_^ The crash.log will tell you more if you want to find out, or you can just overwrite the whole thing with the latest nightly from http://build.koreader.rocks/download/nightly/v2019.10-33-gf1f75c5_2019-11-06/ so you've got a working copy.

@Frenzie
Copy link
Member

Frenzie commented Nov 6, 2019

Anyway, I can push a few fix-up commits to your branch if you like.

@WaseemAlkurdi
Copy link
Contributor Author

Anyway, I can push a few fix-up commits to your branch if you like.

It would be great if you do!

@Frenzie
Copy link
Member

Frenzie commented Nov 6, 2019

@WaseemAlkurdi See the two individual commits for the (minor) changes.

Screenshot_2019-11-06_14-28-20

@@ -377,6 +377,7 @@ function VirtualKeyPopup:init()
virtual_key.onHoldReleaseKey = function()
virtual_key:onTapSelect(true)
UIManager:close(self)
return true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I didn't mean to commit this line but that's okay. This solves the worst part of #5555.

@poire-z
Copy link
Contributor

poire-z commented Nov 6, 2019

KOReader now crashes. Why is that? The Lua scripts are error-free,

See your crash.log, you should have a stack trace that will tell you (or us) why it crashes, and what must have happened for it to crash even if your lua files are error-free (truncated transfer, other file touched...)

About AppImage: https://github.com/koreader/koreader/wiki/Installation-on-desktop-linux
(@Frenzie : Looks like this #4304 got lost in the Wiki reorganisation.)

Also, we'll have to think about these HE/AR keyboards: is it fine that you start with querty, and you have to it the little Globa icon key to have the HE/AR keys? Or should we start with these, and get qwerty with that Globe icon key instead. See #5414 (comment) and around.

@Frenzie
Copy link
Member

Frenzie commented Nov 6, 2019

@poire-z Not lost, that's at the top of https://github.com/koreader/koreader/blob/master/doc/Building.md#setting-up-a-build-environment-for-koreader

@Frenzie
Copy link
Member

Frenzie commented Nov 6, 2019

@WaseemAlkurdi Now you'll need to git pull my commits into your branch before you continue working.

-- fourth row
{
{ "Sym", "Sym", "ABC", "ABC", "Sym", "رمز", "رمز", "رمز", "Sym", "Sym", "ABC", "ABC",
width = 1.5},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could try reducing this width to = 1 to not have these keys overflow the keyboard :)

@NiLuJe
Copy link
Member

NiLuJe commented Nov 6, 2019

The original build error vaguely looks like you haven't pulled the submodules? One kodev step should take care of that, see the docs ;). Otherwise, I'm going to need more context.

Also, I've posted the one-liner I use to scp updated files to my devices a couple of times around. I'm unfortunately not on my desktop right now, but it's basically based on the list that git diff --name-only outputs ;).

@Frenzie
Copy link
Member

Frenzie commented Nov 6, 2019

A ./kodev run should do that automatically though.

},
-- third row
{ -- 1 2 3 4 5 6 7 8 9 10 11 12
{ label = "بَدِّل",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you need to leave it as "Shift" for it to work.

{ ",", ",", "’", "↓", "ֽ ", waw, "Ю", "ю", "Ç", "ç", "…", "¨", },
{ ".", ".", "’", "↓", "ֽ ", zay, "Ю", "ю", "Ç", "ç", "…", "¨", },
{ "/", "/", "’", "↓", "ֽ ", thaa, "Ю", "ю", "Ç", "ç", "…", "¨", },
{ label = "مَسح",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: must stay Backspace (that's not what displayed, the icon= takes precedence, but I guess the label (albeit displayed if no icon) serves as recognizing it should do shift or backspace.

},
-- fourth row
{
{ "Sym", "Sym", "ABC", "ABC", "Sym", "رمز", "رمز", "رمز", "Sym", "Sym", "ABC", "ABC",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like these رمز", "رمز", "رمز" (raaah, so hard to cut and paste and wrap, bidi hell!) need to be added to:
symbolmode_keys = {["Sym"] = true, ["ABC"] = true, ["رمز"] = true}, on line 91, for them to work.

(That's going to be my new swear word : that bidi-mary is bidi disgusting on this bidi sunday :)

Comment on lines 64 to 65
{ diacritic_tanween_damm, qaf, "#", "3", },
{ "ﻹ", fah, "+", _eq, },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might keep them all aligned the same way with the 1 2 3 4, for readability :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@poire-z Can I do that from the web interface, or do I have to push again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@poire-z Turns out that I can, but it would be considered another commit. Not a problem!

{ "ﻹ", fah, "+", _eq, },
{ "إ", ghayn, "€", "(", },
{ "`", ayin, "‰", ")", },
-- h_aa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can indent that comment (including the --) up to where the {"ه", label="ه‍"} (I assume it says what's that thing is :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@poire-z Fixing that!

"ز",
north = "ژ",
},
ha = { -- الحاء
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could translate that arabic word to english -- الحاء (english translation) if that means something other than "ha" :) so we don't have to wonder if we're missing something, and to improve the culture of non arabic readers :)
(I wonder the same above for h_aa, but it's shorter, so it would just be h_aa in arabic :)

Copy link
Contributor Author

@WaseemAlkurdi WaseemAlkurdi Jan 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@poire-z I didn't know how to "transcribe" the two letters, so I used 'h_aa' for هـ and 'ha' for ح. Honestly, this could be improved, and I myself mix up the two "transciptions".
Please do listen to the first 5 seconds of this YouTube video: https://www.youtube.com/watch?v=YeyVjaWD6-U. The first letter that the announcer pronounces is 'هـ' (my 'h_aa') and the second letter is 'ح' (my ha).
Edit: Had them in reverse.

Copy link
Contributor Author

@WaseemAlkurdi WaseemAlkurdi Jan 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word 'الحاء' is simply the name of the letter, roughly meaning "the letter ha". like how "double u" is sort of a name of the letter W in English.

Comment on lines 146 to 147
-- الحركات
diacritics = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that arabic word means "diacritics" :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@poire-z Precisely.

@WaseemAlkurdi
Copy link
Contributor Author

WaseemAlkurdi commented Jan 18, 2020

@WaseemAlkurdi Can these just be removed?

    frontend/ui/data/keyboardlayouts/ar_AA_keyboard.lua:11:7: unused variable 'h_aa'
    frontend/ui/data/keyboardlayouts/ar_AA_keyboard.lua:38:7: unused variable 'taamarbouta'

There's a slight problem here. I had to hardcode these two letters (instead of defining them as variables) because I had to use label=, as you can see in ar_AA_keyboard.lua).
Do you suggest removing these variables, or is there a way in which I could use label= with ar_popup.lua?

Never mind. I tried using label= directly in ar_AA_popup.lua directly, and it worked. I mean, it should, since what we were doing is defining a Lua table inline, so why should it not work if importing that table as a variable from another file?

Fixed the two missing variables by moving the `label=` part from the main keyboard to this file.
- Removed the label.
- Fixed the hardcoded labels (and the two missing variables).
Translated label.
Fixed the missing variables ... for good?
@poire-z
Copy link
Contributor

poire-z commented Jan 18, 2020

Regarding the trick with the zero-width-joiner/non-joiner stuff mentionned in #5569 (comment) :
The sad thing is that we don't see it in the code :)
I had the same issue some weeks ago with zero width space, and I used \x utf8 bytes instead. You could do that, with a comment explaining why.
Or just add a comment mentionning there is an invisible ZWNJ or ZWJ.
Or, I think arabic has another range in unicode for "arabic presentational forms", which I think are the pre-shaped glyphs, so may be the one you want might be there:
https://en.wikipedia.org/wiki/Arabic_Presentation_Forms-A
https://en.wikipedia.org/wiki/Arabic_Presentation_Forms-B

@WaseemAlkurdi
Copy link
Contributor Author

@poire-z Sheesh, I've mistyped it while moving. It was supposed to have a ZWJ, but the ZWJ isn't directly accessible on the keyboard, unlike a non-zero width joiner (an easy Shift-J away). Will fix this instant.

Replaced the non-zero width joiner with a zero width joiner.
@WaseemAlkurdi
Copy link
Contributor Author

The sad thing is that we don't see it in the code :)

GitHub's web UI, at least here on Firefox on Linux, uses a red dot to indicate it instead of the shaped character.

@WaseemAlkurdi
Copy link
Contributor Author

WaseemAlkurdi commented Jan 18, 2020

Or, I think arabic has another range in unicode for "arabic presentational forms", which I think are the pre-shaped glyphs, so may be the one you want might be there.

No, these (from what I see here: https://en.wikipedia.org/wiki/Arabic_Presentation_Forms-A) are the Arabic characters used by non-Arabic languages with Arabic scripts, which are not "legal Arabic". Something like how some French and German characters, while using Latin script, are not "legal Latin".
But the B block (https://en.wikipedia.org/wiki/Arabic_Presentation_Forms-B) is in fact legal Arabic, and is probably the block you're referring to. These are the shaped ligatures ... I could see where you're going with this! Amazing.

Replaced the two letters that have a non-zero width joiner with their preformed ligature forms as per @poire-z's suggestion. Really neat.
@WaseemAlkurdi
Copy link
Contributor Author

Replaced as per your suggestion. It's a much neater solution, having the whole thing as one character, instead of bothering with invisible characters.
Looks like this is done then! The build tests check out.

A small note though. The Wikipedia article (https://en.wikipedia.org/wiki/Arabic_Presentation_Forms-B) says:

The presentation forms are present only for compatibility with older standards, and are not currently needed for coding text.

Does this mean that these might be deprecated? If so, wouldn't this be a "hack" that might stop working with unpredictable consequences?

@poire-z
Copy link
Contributor

poire-z commented Jan 18, 2020

And in B ? :) Isn't it this one:

Uni­codecode point char­acter UTF-8en­co­ding(hex) Uni­co­de char­ac­ter name Uni­co­de 1.0 char­act­er name (de­pre­ca­ted)
U+FEE9 ef bb a9 ARABIC LET­TER HEH ISOLATED FORM GLYPH FOR ISOLATE ARABIC HA
U+FEEA ef bb aa ARABIC LET­TER HEH FINAL FORM GLYPH FOR FINAL ARABIC HA
U+FEEB ef bb ab ARABIC LET­TER HEH INITIAL FORM GLYPH FOR INITIAL ARABIC HA
U+FEEC ef bb ac ARABIC LET­TER HEH MEDIAL FORM GLYPH FOR MEDIAL ARABIC HA

from https://unicode-search.net/unicode-namesearch.pl?term=HEH

@poire-z
Copy link
Contributor

poire-z commented Jan 18, 2020

Oh, allright, you found it :)
I don't think Unicode can deprecate or remove codepoints :) by contract, I guess, they can just add stuff.

Is it ok that it does not look the same in GH UI ?

@poire-z
Copy link
Contributor

poire-z commented Jan 18, 2020

Looks like this is done then! The build tests check out.

Stil the alignment with 1 2 3 4 to fix with GH UI :)

- Fixed keyboard row spacing ... should be more organized now.
- Added a couple of labels to explain some Arabic strings and transliteration choices.
@WaseemAlkurdi
Copy link
Contributor Author

Looks like this is done then! The build tests check out.

Stil the alignment with 1 2 3 4 to fix with GH UI :)

Done! Should look really tidy now.

Fixed the missing space
@WaseemAlkurdi
Copy link
Contributor Author

I think you accidentally lost a space here.

What's a space between friends? :-D

Copy link
Contributor

@poire-z poire-z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% conform to my knowledge of Arabic ! (which would give 0, but hey, approved anyway :)

Comment on lines 147 to 148
-- Diacritics (al-Harakat الحركات)
diacritics = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last nitpick: have the comment indented as what follows it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@poire-z Done.

Indented a comment.
@NiLuJe
Copy link
Member

NiLuJe commented Jan 18, 2020

I'd possibly have gone with IPA for those latest (interesting!) comments, but that's just me ;).

@Frenzie
Copy link
Member

Frenzie commented Jan 18, 2020

No, I definitely would've gone with IPA. But I didn't know IPA prior to the university, so I wouldn't really expect anybody else to unless they had a degree in something language-related.

@Frenzie Frenzie merged commit c8b3942 into koreader:master Jan 18, 2020
@WaseemAlkurdi WaseemAlkurdi deleted the arabic_keyboard branch January 19, 2020 08:45
mwoz123 pushed a commit to mwoz123/koreader that referenced this pull request Mar 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants