Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] The compose support works only for sequences starting with Multi_key or dead_ keys, it ignores lines starting with other keys #379

Closed
mike-fabian opened this issue Sep 7, 2022 · 11 comments

Comments

@mike-fabian
Copy link
Owner

Discovered while reasearching https://bugzilla.redhat.com/show_bug.cgi?id=2122899

On the AB05 key (that is where the b is on an English US layout, the standard Arabic xkb layout produces U+FEFB:

$ grep -i fefb /usr/share/X11/xkb/symbols/ara 
    key <AB05> {  [           UFEFB,                UFEF5,                  NoSymbol,            NoSymbol ]};  // ‎ﻻ‎ ‎ﻵ‎
    key <AB05> {  [           UFEFB,                UFEF5,                     U06AB,               U06AD ]};  // ‎ﻻ‎ ‎ﻵ‎     ‎ګ‎ ‎ڭ‎

That is not desired, the desired result is U+0644 U+0627

But with xkb layouts, it is not possible to output more than one keysym per keystroke.

https://www.freedesktop.org/wiki/Software/XKeyboardConfig/XKB2Dreams/ talks about

`3. Support for scenarios "multiple keypresses - one keysym" and "single keypress - multiple keysyms".

If xkb would be improved like this, one could output U0644 U0627 when pressing the AB05 key.

But that is a “dream” and might never happen.

So I suggested to the user to use ibus-m17n or ibus-typing-booster with ar-kbd.mim instead, ar-kbd.mim emulates the Arabic keyboard on top of an US English keyboard layout using m17n-lib and can output any string for any keypress. It outputs the desired U+0644 U+0627 when typing b.

But the user had also noticed that it just happens to work in Qt apps, I could not figure out why, it appeared very mysterious to me.

Today, after more discussion with the user avidseeker we finally stumbled on the reason.
It is because of Compose support!

$ grep 'ARABIC LIGATURE' /usr/share/X11/locale/en_US.UTF-8/Compose
<UFEFB>	:   "لا" # ARABIC LIGATURE LAM WITH ALEF
<UFEF7>	:   "لأ" # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE
<UFEF9>	:   "لإ" # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA BELOW
<UFEF5>	:   "لآ" # ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE

So even though the xkb keyboard for Arabic produces U+FEFB, the Compose support then replaces this by U+0644 U+0627.

That can be tested by starting xterm like this:

env XMODIFIERS=@im=none xterm &

This makes sure that the Compose support of X11 is used and not the Compose support of ibus

Then in the xterm, type

 echo -n b | iconv -f utf8 -t utf16le | od -x
0000000 0062
0000002

and we see that the b produces U+0062, which is correct.

Switch to the Arabic keyboard,

setxkbmap  ara

type “arrow up” to get the echo -n b | iconv -f utf8 -t utf16le | od -x line back, go back to the b with “arrow left”, type b and now one gets:

echo -n لا | iconv -f utf8 -t utf16le | od -x 
0000000 0644 0627
0000004

I.e. even though the keyboard surely outputs only U+FEFB, the Compose support of Xorg transforms this into U+0644 U+0627

@mike-fabian mike-fabian added the bug label Sep 7, 2022
@mike-fabian mike-fabian self-assigned this Sep 7, 2022
@mike-fabian mike-fabian added this to To do in Mike’s github project board via automation Sep 7, 2022
@mike-fabian
Copy link
Owner Author

Peek.2022-09-07.17-08.mp4

@mike-fabian
Copy link
Owner Author

However, when the same test is done in an xterm started like

env XMODIFIERS=@im=ibus xterm &

This test fails, one gets only U+FEFB:

Peek.2022-09-07.17-11.mp4

@mike-fabian
Copy link
Owner Author

I.e. even though ibus has compose support, the compose support in ibus apparently does not support the lines

$ grep 'ARABIC LIGATURE' /usr/share/X11/locale/en_US.UTF-8/Compose
<UFEFB>	:   "لا" # ARABIC LIGATURE LAM WITH ALEF
<UFEF7>	:   "لأ" # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE
<UFEF9>	:   "لإ" # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA BELOW
<UFEF5>	:   "لآ" # ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE

@mike-fabian
Copy link
Owner Author

The Compose support in Gtk3 and Gtk4 does not support this either.

If I have a `~/.XCompose file containing:

$ cat ~/.XCompose
# %H  expands to the user's home directory (the $HOME environment variable)
# %L  expands to the name of the locale specific Compose file (i.e.,
#     "/usr/share/X11/locale/<localename>/Compose")
# %S  expands to the name of the system directory for Compose files (i.e.,
#     "/usr/share/X11/locale")
           
#include "%L"
include "/%L"

and then test the Gtk3 Compose support in

env GTK_IM_MODULE=gtk-im-context-simple gedit

and the Gtk4 Compose support in

env GTK_IM_MODULE=gtk-im-context-simple gnome-text-editor

and type the b key while the Arabic keyboard layout is active, I get U+FEFB.

So it looks like the Compose support in Gtk3/Gtk4 does not support this either.

@mike-fabian
Copy link
Owner Author

Another compose implementation is in ibus-typing-booster and this does not support this either (that’s why I opened this issue here).

In case of ibus-typing-booster, I can obviously see why:

https://github.com/mike-fabian/ibus-typing-booster/blob/main/engine/hunspell_table.py#L5533

            return False
        if (not self._typed_compose_sequence
            and not key.name == 'Multi_key'
            and not key.name.startswith('dead_')):
            if DEBUG_LEVEL > 1:
                LOGGER.debug('Not in a compose sequence.')
            return False

i.e. ibus-typing-booster currently considers everything which does not start with Multi_key or dead_ not as a valid compose sequence.

@mike-fabian
Copy link
Owner Author

I will try to fix this in typing booster, but that will only make the Arabic xkb keyboard work correctly in typing-booster of course.

To make it work correctly elsewhere, the Compose support in ibus and in Gtk3/Gtk4 need to be fixed as well.

@mike-fabian mike-fabian moved this from To do to In progress in Mike’s github project board Sep 8, 2022
@mike-fabian
Copy link
Owner Author

This commit in libX11 from 2008-06-20 added the Arabic compose sequences:

https://gitlab.freedesktop.org/xorg/lib/libx11/-/commit/21e464ec682ab23ba20ddf6bd72c6db214cfbe01

commit 21e464ec682ab23ba20ddf6bd72c6db214cfbe01
Author: Khaled Hosny <khaledhosny@eglug.org>
Date:   Thu Jun 19 18:26:11 2008 -0400

    NLS: Add Arabic Lam-Alef ligature compose sequences (bug #16426)
    
    Add some Arabic digraphs to utf-8 locales with a Compose.pre
    
    Signed-off-by: James Cloos <cloos@jhcloos.com>

@mike-fabian
Copy link
Owner Author

Here is the original bug to add these Arabic compose sequences:

https://bugs.freedesktop.org/show_bug.cgi?id=16426

Khaled Hosny 2008-06-19 05:50:06 UTC

Created [attachment 17228](https://bugs.freedesktop.org/attachment.cgi?id=17228) [[details]](https://bugs.freedesktop.org/attachment.cgi?id=17228&action=edit)
Arabic Compose rules

Arabic keyboard needs the ability to have one key stroke producing to code points, see #8195.
Adding the the attached rules to Compose files of UTF-8 locales is needed in order to fix this.

@mike-fabian
Copy link
Owner Author

Original discussion about the problem:
https://bugs.freedesktop.org/show_bug.cgi?id=8195
migrated to gitlab:
https://gitlab.freedesktop.org/xorg/xserver/-/issues/346

@mike-fabian
Copy link
Owner Author

mike-fabian commented Sep 9, 2022

The test builds of ibus-typing-booster >= 2.18.17 at
https://copr.fedorainfracloud.org/coprs/mfabian/ibus-typing-booster/builds/
have a fix for this problem and now support also compose sequences not starting with Multi_key or dead keys.

This video shows that it works for the special Arabic compose sequence:

  • In the video, ibus-typing-booster is setup to use only “NoIME” i.e. only the current keyboard layout, no other input methods
  • I first select 'en' in the Gnome panel to select the “English (US)” keyboard layout and type the b key: A “b” appears in gedit.
  • Then I select the “Arabic” keyboard layout in the Gnome panel and again type the b key into gedit: The character ﻻ U+FEFB ARABIC LIGATURE LAM WITH ALEF ISOLATED FORM appears. One can confirm that it is a single character by deleting it again with Backspace, a single Backspace is enough to delete it
  • Then I select “Typing Booster” in the Gnome panel and again type the b key into gedit. Now the Arabic keyboard layout is still used, but the Compose support comes from ibus-typing-booster and not from ibus or Gtk. And in gedit the string "لا" U+0644 ARABIC LETTER LAM U+0627 ARABIC LETTER ALEF appears. This looks very similar to what we got before with the Arabic keyboard layout used without ibus-typing-booster, but we can confirm that there are actually two characters by deleting again with Backspace. Now after typing the first backspace, ل U+0644 ARABIC LETTER LAM remains and two times Backspace is needed to delete the complete string.
Peek.2022-09-09.10-08.mp4

Mike’s github project board automation moved this from In progress to Done Sep 9, 2022
@mike-fabian
Copy link
Owner Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

1 participant