Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tones for IMEs such as Jyutping? #6

Closed
Pi-Cla opened this issue Jul 5, 2022 · 15 comments
Closed

Tones for IMEs such as Jyutping? #6

Pi-Cla opened this issue Jul 5, 2022 · 15 comments

Comments

@Pi-Cla
Copy link

Pi-Cla commented Jul 5, 2022

Hi, is there any way to make it so systems such as Jyutping can also use tones? That is something I miss from using Jyutping on my phone.

@mike-fabian
Copy link
Owner

Currently, the jyuping.txt only contains lines like:

https://github.com/mike-fabian/ibus-table-chinese/blob/main/tables/cantonese/jyutping.txt#L14658

jyut    粵      000183

I.e. without the tones.

Would it be sufficient if I just add all the tones, i.e. change all lines to look like this:

jyut6    粵      000183
``

If numbers like 6 are to be used as input, they cannot be used to select candidates by number anymore.
But one could use F1-F9 instead. Would that be OK?

@Pi-Cla
Copy link
Author

Pi-Cla commented Jul 10, 2022

I would be ok with doing it like that, though I wonder if an approach such as the one done by rime-cantonese could also be used here. Where they represent the tones by ending a word with: v, x, q, vv, xx, or qq.

@mike-fabian
Copy link
Owner

I would be ok with doing it like that, though I wonder if an approach such as the one done by rime-cantonese could also be used here. Where they represent the tones by ending a word with: v, x, q, vv, xx, or qq.

How does it work on your phone?

@mike-fabian
Copy link
Owner

I would be ok with doing it like that, though I wonder if an approach such as the one done by rime-cantonese could also be used here. Where they represent the tones by ending a word with: v, x, q, vv, xx, or qq.

That looks like a very nice approach because it doesn’t conflict with selecting candidates by number.
Do you prefer that approach over using numbers?
If you think this is a good way of doing it, I will try to change the jyutping.txt table in that way.

@Pi-Cla
Copy link
Author

Pi-Cla commented Jul 12, 2022

The way it works on my phone is that I can type one of those letter combinations after a word and then it displays the tone as a number after the jyutping.

I would be happy with the letter combinations just being appended to

@Pi-Cla Pi-Cla closed this as completed Jul 12, 2022
@Pi-Cla Pi-Cla reopened this Jul 12, 2022
@Pi-Cla
Copy link
Author

Pi-Cla commented Jul 12, 2022

The way it works on my phone is that I type in the tone letter combination after a jyutping word and then it displays the tone as the corresponding number rather than the letter(s).

If that is too complicated to implement, I would be happy with just appending the tone letter combination to the end of the jyutping in the .txt file.

Either way, there should probably be documentation in the jyutping part of this repo that describes this tone input as well.

@mike-fabian
Copy link
Owner

The same letter combinations as in the link to rime-cantonese you showed?

I will use these letter combinations then!

mike-fabian added a commit that referenced this issue Jul 12, 2022
Resolves: #6

Uses a new python script `improve_jyutping.py` which reads cantonese.txt,
Unihan_Readings.txt, *and* the old jyutping.txt as input and writes
an improved version jyutping.txt.new.

New serial number:

SERIAL_NUMBER = 20220712

Add a few frequencies from cantonese.txt:

    gok 咯 0 -> 2
    kwan 焜 0 -> 1
    fai 狒 0 -> 1
    bung 甭 0 -> 2
    kong 礦 0 -> 11

Add tonal markers according to

https://github.com/rime/rime-cantonese/blob/main/README-en.md#tonal-markers

    1. v: High level, e.g. siv → 詩; High level checked, e.g. sikv → 色
    2. x: Medium rising, e.g. six → 史
    3. q: Medium level, e.g. siq→ 試; Medium level checked, e.g. sekq → 錫
    4. vv: Low falling, e.g. sivv → 時
    5. xx: Low rising, e.g. sixx → 市
    6. qq: Low level, e.g. siqq→ 事; Low level checked, e.g. sikqq → 食

For example, change these old entries

        aa	㝞	000000
        aa	亞	000194

to these new entries:

        aav	㝞	0
        aaq	亞	000194

according to these lines from Unihan_Readings.txt:

        㝞: U+375E	kCantonese	aa1
        亞: U+4E9E	kCantonese	aa3

22267 entries have been changed by adding tones like that.
@mike-fabian mike-fabian self-assigned this Jul 12, 2022
@mike-fabian mike-fabian added this to To do in Mike’s github project board via automation Jul 12, 2022
mike-fabian added a commit that referenced this issue Jul 12, 2022
Resolves: #6

Uses a new python script `improve_jyutping.py` which reads cantonese.txt,
Unihan_Readings.txt, *and* the old jyutping.txt as input and writes
an improved version jyutping.txt.new.

Add a few frequencies from cantonese.txt:

    gok 咯 0 -> 2
    kwan 焜 0 -> 1
    fai 狒 0 -> 1
    bung 甭 0 -> 2
    kong 礦 0 -> 11

Add tonal markers according to

https://github.com/rime/rime-cantonese/blob/main/README-en.md#tonal-markers

    1. v: High level, e.g. siv → 詩; High level checked, e.g. sikv → 色
    2. x: Medium rising, e.g. six → 史
    3. q: Medium level, e.g. siq→ 試; Medium level checked, e.g. sekq → 錫
    4. vv: Low falling, e.g. sivv → 時
    5. xx: Low rising, e.g. sixx → 市
    6. qq: Low level, e.g. siqq→ 事; Low level checked, e.g. sikqq → 食

For example, change these old entries

        aa	㝞	000000
        aa	亞	000194

to these new entries:

        aav	㝞	0
        aaq	亞	000194

according to these lines from Unihan_Readings.txt:

        㝞: U+375E	kCantonese	aa1
        亞: U+4E9E	kCantonese	aa3

22267 entries have been changed by adding tones like that.
@mike-fabian
Copy link
Owner

I have an improved version of the table. Can you please test it before I make a new release?

The new version is on the release-candidate-1.8.9 branch:

https://github.com/mike-fabian/ibus-table-chinese/blob/release-candidate-1.8.9/tables/cantonese/jyutping.txt

You can compile it to binary with

ibus-table-createdb -s jyutping.txt

Then copy the created binary database to this place:

sudo cp  jyutping.db /usr/share/ibus-table/tables/jyutping.db

Then ibus restart or log out of your desktop and in again.

@mike-fabian
Copy link
Owner

mike-fabian commented Jul 12, 2022

3 Screenshots showing which candidates are shown when typing aa, aav, and aaq. The candidates shown when typing aa are the same as with the old table, but when typing aav or aaq, only candidates with the right tone are shown:

Screenshot
Screenshot-new-aav
Screenshot-new-aaq

@Pi-Cla
Copy link
Author

Pi-Cla commented Jul 12, 2022

Thanks, I tested it out and it looks good to me

@mike-fabian
Copy link
Owner

I will make a new release soon then.

By the way, I thought theoretically I could add the entries with the numbers as well into the database, i.e. the database now has

 jyut    粵      183
jyutqq  粵      183

The first entry jyut 粵 183 might look redundant but it isn’t because it makes typing jsut jyut an exact match already and this improves sorting when typing without tones. So these two entries are both helpful.

I could add a third one like this:

 jyut    粵      183
jyutqq  粵      183
jyut6  粵      183

and then one could type jyutqq or jyut6 however one prefers. It would make the database bigger and thus a bit slower, but it seems fast enough to me.

The disadvantage would only be that numbers cannot be used anymore to commit candidates. One would have to use F1-F9 instead or just just the arrow-up and arrow-down keys to select candidates or use the mouse.

What do you think about that?

I tend to not add it and leave it as in the version you tested to keep committing by number key working.

If these letters like qq to use for tone 6 are well known by most users, then using them is probably good enough. As https://github.com/rime/rime-cantonese/blob/main/README-en.md#tonal-markers and your phone uses these letters, I guess it is a well known standard, right?

@mike-fabian mike-fabian moved this from To do to In progress in Mike’s github project board Jul 13, 2022
@Pi-Cla
Copy link
Author

Pi-Cla commented Jul 13, 2022

I am fine with just doing the two entries. Though in the case of my phone, I am using a modification of rime-cantonese for Android, some other Jyutping keyboards such as Gboard don't even have any tone input at all.

Mike’s github project board automation moved this from In progress to Done Jul 13, 2022
@mike-fabian
Copy link
Owner

OK, so I released https://github.com/mike-fabian/ibus-table-chinese/releases/tag/1.8.9 with the version of the jyutping.txt table you tested yesterday.

@Pi-Cla
Copy link
Author

Pi-Cla commented Jul 13, 2022

I notice that you have been maintaining the openSUSE package ibus-typing-booster, but not the ibus-table* packages. Do you wanna do it yourself or can I do that?

@mike-fabian
Copy link
Owner

Oh, it would be great if you do that!

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Dec 23, 2022
Fixes build.

Release 1.8.12

- Add appdata.xml files
- Convert license tags to SPDX format
- Add .svg icon files for use in appdata.xml files

Release 1.8.11

- Improve punctuation support in jyutping.txt, cantonese.txt, cantonhk.txt, cantonyale.txt
  (Resolves: mike-fabian/ibus-table-chinese#7)
- Improve “improve_jyutping.py” to allow comments in the table
- Update of jyutping.txt for Unicode 15.0.0 final release

Release 1.8.10

- Improve punctuation support in cangjie5.txt, cangjie3.txt, cangjie-big.txt,
  quick5.txt, quick3.txt, quick-classic.txt
  (Resolves: kaio/ibus-table#73)
  (Resolves: mike-fabian/ibus-table#121)

Release 1.8.9

- Add tones to Jyutping.txt table
  (Resolves: mike-fabian/ibus-table-chinese#6)
  Tonal markers according to
  https://github.com/rime/rime-cantonese/blob/main/README-en.md#tonal-markers
  were added:
  1. v: High level, e.g. siv → 詩; High level checked, e.g. sikv → 色
  2. x: Medium rising, e.g. six → 史
  3. q: Medium level, e.g. siq→ 試; Medium level checked, e.g. sekq → 錫
  4. vv: Low falling, e.g. sivv → 時
  5. xx: Low rising, e.g. sixx → 市
  6. qq: Low level, e.g. siqq→ 事; Low level checked, e.g. sikqq → 食

Release 1.8.8

- Add PINYIN_MODE = TRUE to cangjie-big.txt, quick-classic.txt, and erbi.txt
- Make “Traditional Chinese only” the default for quick5
- Improve the quick5.txt table in a similar way the cangjie5.txt
  table was recently improved
  (Resolves: mike-fabian/ibus-table-chinese#4)
- Build outside of the source tree
  (Resolves: mike-fabian/ibus-table-chinese#2)

Release 1.8.7

- Make “Traditional Chinese only” the default for cangjie5
  (Resolves mike-fabian/ibus-table-chinese#2)

Release 1.8.6

- Increase serial number of cangjie5.txt and erbi-qs.txt

Release 1.8.5

- add table_extra tag for auxiliary code
  (Resolves: definite/ibus-table-chinese#18)

Release 1.8.4

- Another improvement for cangjie5.txt
  (Resolves: https://github.com/mike-fabian/ibus-table/issues/87_
- Updated README
  (Includes: definite/ibus-table-chinese#17)
- Correct a misplaced non-alphabetic symbol
  (Includes: definite/ibus-table-chinese#16)
- Simplify CMakeLists.txt to avoid requiring cmake-fedora
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants