Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trad/Simp characters marked as Simp only in Unihan_Variants.txt #88

Closed
ghost opened this issue Jan 7, 2022 · 36 comments
Closed

Trad/Simp characters marked as Simp only in Unihan_Variants.txt #88

ghost opened this issue Jan 7, 2022 · 36 comments

Comments

@ghost
Copy link

ghost commented Jan 7, 2022

This is an umbrella issue that I will open (with @mike-fabian's permission, pending) to collect characters that are used in both Trad and Simp Chinese, but are marked as Simp only in Unihan_Variants.txt. Expansion of ibus/ibus#2323.

@mike-fabian I've come across another such character a couple of days ago, but I forgot to open an issue right away, so I forgot what it was. I'm sure there might be more such characters lurking around. So I suggest we keep this issue open so peeps can add such characters when/if they come across them, and you can update Unihan_Variants.txt naturally as you do maintenance on this fork.

@ghost
Copy link
Author

ghost commented Jan 7, 2022

I will post a comment in this issue once I find that character again.

@ghost ghost changed the title Trad characters marked as Simp only by cangjie5 Trad/Simp characters marked as Simp only by cangjie5 Jan 7, 2022
@ghost ghost changed the title Trad/Simp characters marked as Simp only by cangjie5 Trad/Simp characters marked as Simp only in Unihan_Variants.txt Jan 7, 2022
@mike-fabian
Copy link
Owner

mike-fabian commented Jan 11, 2022

https://github.com/mike-fabian/ibus-table/releases/tag/1.15.0 contains

“Update Unihan_Variants.txt to “2021-08-06 Unicode 14.0.0 final” and regenerate engine/chinese_variants.py”

commit 4debaa5
Author: Mike FABIAN mfabian@redhat.com
Date: Tue Oct 5 11:30:31 2021 +0200

Regenerate engine/chinese_variants.py for Unihan_Variants.txt from “2021-08-06 Unicode 14.0.0 final”

- *All* our fixes which are now included upstream.
- Because of the new Unihan_Variants.txt, the following 48 characters
  are added to the VARIANTS_TABLE in chinese_variants.py:
  u'䓖': 1, u'了': 1, u'伙': 1, u'借': 1, u'傢': 2, u'冬': 1, u'千': 1,
  u'卜': 1, u'卷': 1, u'吁': 1, u'合': 1, u'回': 1, u'夥': 2, u'姜': 1,
  u'家': 1, u'峃': 1, u'嶨': 2, u'庼': 1, u'廎': 2, u'懞': 2, u'才': 1,
  u'折': 1, u'捲': 2, u'摺': 2, u'旋': 1, u'朱': 1, u'濛': 2, u'灶': 1,
  u'瞭': 3, u'矇': 2, u'硃': 2, u'秋': 1, u'竈': 2, u'籲': 2, u'纔': 2,
  u'蒙': 1, u'蔑': 1, u'蔔': 2, u'薑': 2, u'藉': 3, u'藭': 2, u'衊': 2,
  u'迴': 2, u'霉': 1, u'鞦': 2, u'黴': 2, u'鼕': 2,

  1 = simplified Chinese
  2 = traditional Chinese
  3 = used both in simplified *and* traditional Chinese

@mike-fabian mike-fabian self-assigned this Jan 11, 2022
@mike-fabian
Copy link
Owner

Looking at the above changes coming from the new Unihan_Variants.txt, I have some doubts whether all of them are correct.

For example u'秋': 1, and u'冬': 1,. Aren’t they used in traditional Chinese as well? And u'千': 1, and a few others.

It would be nice if somebody could go the whole list above and check.

@ghost
Copy link
Author

ghost commented Jan 12, 2022

Yes, all the characters that you mentioned are used in Trad. I will check this list.

@ghost
Copy link
Author

ghost commented Jan 12, 2022

I believe all of the chars from that list are used in Trad except for these two:
u'峃'
u'庼'

Edit:
u'䓖'
this one is also unused in Trad

@ghost
Copy link
Author

ghost commented Jan 13, 2022

Okay, I found that character. 栗 li4 is marked as Simp-only, but it's used in Taiwan. There is a city named Miaoli 苗栗 here.

I will post a comment in this issue once I find that character again.

@ghost
Copy link
Author

ghost commented Jan 18, 2022

Hi, btw, how can I try the new release on Fedora? I tried doing it the same way I did last time and nothing changed...

@mike-fabian
Copy link
Owner

Hi, btw, how can I try the new release on Fedora? I tried doing it the same way I did last time and nothing changed...

Which release do you mean?

@mike-fabian
Copy link
Owner

https://bodhi.fedoraproject.org/updates/FEDORA-2022-8a9b689712
https://github.com/mike-fabian/ibus-table/releases/tag/1.15.0

which contains:

  • Update Unihan_Variants.txt to “2021-08-06 Unicode 14.0.0 final” and regenerate engine/chinese_variants.py
    ?

@mike-fabian
Copy link
Owner

mike-fabian commented Jan 18, 2022

With 1.15.0, if you use cangjie5 and switch to “traditional Chinese only" mode and type hey you do not get 冬 (because it is now marked as simplified only).

@mike-fabian
Copy link
Owner

Typing hey with cangjie5 with ibus-table-1.14.1 in traditional Chinese mode gives 冬:

Peek.2022-01-18.07-27.mp4

@mike-fabian
Copy link
Owner

Typing hey with cangjie5 with ibus-table-1.15.0 in traditional Chinese mode does not give 冬 anymore:

Peek.2022-01-18.07-56.mp4

@ghost
Copy link
Author

ghost commented Jan 18, 2022

Sorry, I meant the release with your improvements regarding 顥/顯.
Not the one with the weird new Unihan variants where 冬 isn't considered traditional anymore (by the way, what are we gonna do about that?)

@mike-fabian
Copy link
Owner

Not the one with the weird new Unihan variants where 冬 isn't considered traditional anymore (by the way, what are we gonna do about that?)

We are going to fix it of course.

@mike-fabian
Copy link
Owner

Sorry, I meant the release with your improvements regarding 顥/顯.

That was an improvement in the cangjie5 table, so it is in the ibus-table-chinese update not in the ibus-table update.

I.e. it is here:

https://bodhi.fedoraproject.org/updates/FEDORA-2022-76daebdbee
https://github.com/mike-fabian/ibus-table-chinese/releases/tag/1.8.4

@ghost
Copy link
Author

ghost commented Jan 18, 2022

Hm, well that's the problem. Running that dnf command doesn't do anything:

$ sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-76daebdbee
[sudo] password for igor:
Fedora 35 - x86_64 - Updates 3.6 kB/s | 5.7 kB 00:01
Fedora 35 - x86_64 - Updates 215 kB/s | 2.0 MB 00:09
Fedora Modular 35 - x86_64 - Updates 8.0 kB/s | 5.9 kB 00:00
Fedora 35 - x86_64 - Test Updates 16 kB/s | 4.2 kB 00:00
Fedora 35 - x86_64 - Test Updates 714 kB/s | 2.5 MB 00:03
Dependencies resolved.
Nothing to do.
Complete!

@mike-fabian
Copy link
Owner

Do you have ibus-table-chinese-1.8.4 already?

@ghost
Copy link
Author

ghost commented Jan 18, 2022

I have the version I installed before through a similar command, the one with the 知/佑 improvements. This command doesn't seem to change anything.

@ghost
Copy link
Author

ghost commented Jan 18, 2022

https://bodhi.fedoraproject.org/updates/FEDORA-2021-8808d4f69f

I updated it through this command.

@mike-fabian
Copy link
Owner

What happens if you try

sudo dnf clean all

before trying

sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-76daebdbee

@ghost
Copy link
Author

ghost commented Jan 18, 2022

It remade all the metadata, but then the result is still the same: dependencies resolved, nothing to do.
By the way, how do I check the version of ibus-table-chinese? Is there a terminal command for that?

@ghost
Copy link
Author

ghost commented Jan 18, 2022

Oh I just checked the version.
I have ibus-table-chinese-1.8.4.-1.fc35
So I actually have the latest one. Yet it still has the mentioned problem with 顯顥

@ghost
Copy link
Author

ghost commented Jan 18, 2022

afmbc brings up 顥 instead of 顯 as the first choice.

@mike-fabian
Copy link
Owner

OK, I found what happened.

You have to delete

rm  ~/.local/share/ibus-table/tables/cangjie5.cache

while ibus-table is not running, then restart it.

This cache still has the older version of the cangjie5 table.

@mike-fabian
Copy link
Owner

This cache would have been invalidated automatically if I had increased the serial number of the cangjie5 table, but I forgot to do that when releasing 1.8.4.

Like this:

mike-fabian/ibus-table-chinese@95a4df5

Which will be in 1.8.6

@ghost
Copy link
Author

ghost commented Jan 18, 2022

Great, now everything is working as expected. Thanks!

@mike-fabian
Copy link
Owner

What do you think about the test packages for #85 ?

ibus-table-1.15.1 packages with that new feature for Fedora are here:

https://copr.fedorainfracloud.org/coprs/mfabian/ibus-table/builds/

@mike-fabian
Copy link
Owner

You can get this update with the commands:

sudo dnf copr enable mfabian/ibus-table 
sudo dnf update ibus-table

@mike-fabian
Copy link
Owner

I opened a new issue about CJK compatibility ideographs in the cangjie5 table:

mike-fabian/ibus-table-chinese#1

Because an issue about these was reported for the wubi table(s): kaio#76

The user who reported the issue for the wubi tables wanted them removed because “The user can never differentiate them visually.” (That depends on the font, I guess in most cases it is right that they are very hard to distinguish or may even look completely identical).

So I wanted to check whether the cangjie5 table has a similar problem.

Can you please check mike-fabian/ibus-table-chinese#1 ?

@ghost
Copy link
Author

ghost commented Jan 19, 2022

Okay

@mike-fabian
Copy link
Owner

Okay, I found that character. 栗 li4 is marked as Simp-only, but it's used in Taiwan. There is a city named Miaoli 苗栗 here.

I will post a comment in this issue once I find that character again.

I opened a new issue for that one:

#95

Better more issues then one issue collecting characters which can never be closed.

@mike-fabian
Copy link
Owner

37 new characters were added by updating Unihan_Variants.txt from “2021-12-01 Unicode 15.0.0 draft”:

c1c39a3

Regenerate engine/chinese_variants.py for Unihan_Variants.txt from “2021-12-01 Unicode 15.0.0 draft”

  • Because of the new Unihan_Variants.txt, the following 37 characters
    are added to the VARIANTS_TABLE in chinese_variants.py:
    u'䓨': 1, u'沄': 1, u'潕': 2, u'澐': 2, u'罃': 2, u'鮗': 2, u'龻': 2,
    u'鿟': 1, u'鿠': 2, u'鿰': 1, u'鿲': 1, u'鿳': 2, u'鿴': 1, u'鿵': 1,
    u'鿶': 1, u'鿷': 1, u'鿸': 1, u'鿹': 1, u'鿺': 1, u'𣲘': 1, u'𤇾': 2,
    u'𤪤': 2, u'𦥯': 2, u'𧰎': 2, u'𩷓': 2, u'𩷕': 2, u'𩹎': 2, u'𪄳': 2,
    u'𪛞': 1, u'𫇦': 1, u'𬉧': 2, u'𬵨': 2, u'𰀡': 1, u'𰀢': 1, u'𰁜': 1,
    u'𰃮': 1, u'𰯲': 2,

    1 = simplified Chinese
    2 = traditional Chinese
    3 = used both in simplified and traditional Chinese

@mike-fabian
Copy link
Owner

37 new characters were added by updating Unihan_Variants.txt from “2021-12-01 Unicode 15.0.0 draft”:

c1c39a3

Regenerate engine/chinese_variants.py for Unihan_Variants.txt from “2021-12-01 Unicode 15.0.0 draft”

* Because of the new Unihan_Variants.txt, the following 37 characters
  are added to the VARIANTS_TABLE in chinese_variants.py:
  u'䓨': 1, u'沄': 1, u'潕': 2, u'澐': 2, u'罃': 2, u'鮗': 2, u'龻': 2,
  u'鿟': 1, u'鿠': 2, u'鿰': 1, u'鿲': 1, u'鿳': 2, u'鿴': 1, u'鿵': 1,
  u'鿶': 1, u'鿷': 1, u'鿸': 1, u'鿹': 1, u'鿺': 1, u'𣲘': 1, u'𤇾': 2,
  u'𤪤': 2, u'𦥯': 2, u'𧰎': 2, u'𩷓': 2, u'𩷕': 2, u'𩹎': 2, u'𪄳': 2,
  u'𪛞': 1, u'𫇦': 1, u'𬉧': 2, u'𬵨': 2, u'𰀡': 1, u'𰀢': 1, u'𰁜': 1,
  u'𰃮': 1, u'𰯲': 2,
  1 = simplified Chinese
  2 = traditional Chinese
  3 = used both in simplified _and_ traditional Chinese

New issue for this part:

#97

@mike-fabian
Copy link
Owner

https://github.com/mike-fabian/ibus-table/releases/tag/1.15.0 contains

“Update Unihan_Variants.txt to “2021-08-06 Unicode 14.0.0 final” and regenerate engine/chinese_variants.py”

commit 4debaa5 Author: Mike FABIAN mfabian@redhat.com Date: Tue Oct 5 11:30:31 2021 +0200

Regenerate engine/chinese_variants.py for Unihan_Variants.txt from “2021-08-06 Unicode 14.0.0 final”

- *All* our fixes which are now included upstream.
- Because of the new Unihan_Variants.txt, the following 48 characters
  are added to the VARIANTS_TABLE in chinese_variants.py:
  u'䓖': 1, u'了': 1, u'伙': 1, u'借': 1, u'傢': 2, u'冬': 1, u'千': 1,
  u'卜': 1, u'卷': 1, u'吁': 1, u'合': 1, u'回': 1, u'夥': 2, u'姜': 1,
  u'家': 1, u'峃': 1, u'嶨': 2, u'庼': 1, u'廎': 2, u'懞': 2, u'才': 1,
  u'折': 1, u'捲': 2, u'摺': 2, u'旋': 1, u'朱': 1, u'濛': 2, u'灶': 1,
  u'瞭': 3, u'矇': 2, u'硃': 2, u'秋': 1, u'竈': 2, u'籲': 2, u'纔': 2,
  u'蒙': 1, u'蔑': 1, u'蔔': 2, u'薑': 2, u'藉': 3, u'藭': 2, u'衊': 2,
  u'迴': 2, u'霉': 1, u'鞦': 2, u'黴': 2, u'鼕': 2,

  1 = simplified Chinese
  2 = traditional Chinese
  3 = used both in simplified *and* traditional Chinese

New issue for this part:

#96

@mike-fabian
Copy link
Owner

We have 3 new issues now for the remaining stuff here and can close this one.

@mike-fabian mike-fabian added this to To do in Mike’s github project board via automation Jan 20, 2022
@mike-fabian mike-fabian moved this from To do to Done in Mike’s github project board Jan 20, 2022
@ghost
Copy link
Author

ghost commented Jan 20, 2022

Better more issues then one issue collecting characters which can never be closed.

Gotcha!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

1 participant