Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Kangxi radicals (as characters) #13

Closed
hugolpz opened this issue Apr 21, 2021 · 13 comments
Closed

Missing Kangxi radicals (as characters) #13

hugolpz opened this issue Apr 21, 2021 · 13 comments

Comments

@hugolpz
Copy link

hugolpz commented Apr 21, 2021

Kangxi radicals are encoded twice in Unicode. (Danger: do not mix them up !)

  • as radical points : ⼀⼁⼂⼃⼄⼅⼆⼇⼈⼉⼊⼋⼌⼍⼎⼏⼐⼑⼒⼓⼔⼕⼖⼗⼘⼙⼚⼛⼜⼝⼞⼟⼠⼡⼢⼣⼤⼥⼦⼧⼨⼩⼪⼫⼬⼭⼮⼯⼰⼱⼲⼳⼴⼵⼶⼷⼸⼹⼺⼻⼼⼽⼾⼿⽀⽁⽂⽃⽄⽅⽆⽇⽈⽉⽊⽋⽌⽍⽎⽏⽐⽑⽒⽓⽔⽕⽖⽗⽘⽙⽚⽛⽜⽝⽞⽟⽠⽡⽢⽣⽤⽥⽦⽧⽨⽩⽪⽫⽬⽭⽮⽯⽰⽱⽲⽳⽴⽵⽶⽷⽸⽹⽺⽻⽼⽽⽾⽿⾀⾁⾂⾃⾄⾅⾆⾇⾈⾉⾊⾋⾌⾍⾎⾏⾐⾑⾒⾓⾔⾕⾖⾗⾘⾙⾚⾛⾜⾝⾞⾟⾠⾡⾢⾣⾤⾥⾦⾧⾨⾩⾪⾫⾬⾭⾮⾯⾰⾱⾲⾳⾴⾵⾶⾷⾸⾹⾺⾻⾼⾽⾾⾿⿀⿁⿂⿃⿄⿅⿆⿇⿈⿉⿊⿋⿌⿍⿎⿏⿐⿑⿒⿓⿔⿕
  • as true characters : 一丨丶丿乙亅二亠人儿入八冂冖冫几凵刀力勹匕匚匸十卜卩厂厶又口囗土士夂夊夕大女子宀寸小尢尸屮山巛工己巾干幺广廴廾弋弓彐彡彳心戈戶手支攴文斗斤方无日曰月木欠止歹殳毋比毛氏气水火爪父爻爿片牙牛犬玄玉瓜瓦甘生用田疋疒癶白皮皿目矛矢石示禸禾穴立竹米糸缶网羊羽老而耒耳聿肉臣自至臼舌舛舟艮色艸虍虫血行衣襾見角言谷豆豕豸貝赤走足身車辛辰辵邑酉釆里金長門阜隶隹雨靑非面革韋韭音頁風飛食首香馬骨高髟鬥鬯鬲鬼魚鳥鹵鹿麥麻黃黍黑黹黽鼎鼓鼠鼻齊齒龍龜龠.

AnimCJK works on true characters only, as do many font.
I used animeCJK to produce .gif for Kangxi radicals as characters.
AnimCJK covers 168 radicals as characters.
The following 46 items are missing (=absent from all locals) :

  1. 丿

(I'm especially interested by the last 20.)

Questions

Note

I compared 2 files, sorted, with one character per line in each file and no empty lines.

ls -1 *.gif | sed 's/-sbs.gif//g' | sort > ./exist-from-animCJK.md                 # list my gif files derivated from AnimCJK, remove extension, sort.
comm -23 radicals-all.md ./exist-from-animCJK.md > ./missing.md        # compare all vs subset
@hugolpz
Copy link
Author

hugolpz commented Apr 21, 2021

@skishore

@parsimonhi
Copy link
Owner

I didn't include all the radicals in animCJK official release because i didn't finish the job properly.

However, i have already made most of the job. See for instance http://gooo.free.fr/animCJK/all.php and enter the missing radicals in the input field to check if it could be ok for you (you can enter several characters at once up to 40).

What is missing for the moment is:

  1. checking the japanese stroke order (that means i have to make two version of the characters).
  2. adding brush effect at the beginning and the end of the strokes. I made an algorithm to do it automatically, but i need to verify the result before adding the corresponding characters to the official release of animCJK, because it is not always perfect. Check the brush checkbox in http://gooo.free.fr/animCJK/all.php to see what my algorithm does.
  3. I don't know when I will complete these tasks.

@hugolpz
Copy link
Author

hugolpz commented Apr 23, 2021

  1. I see. So maybe I should plug my fork upon your data. 👍🏼 EDIT: I see 齒 / 40786.svg on your site but don't find 40786.svg in parsimonhi/animCJK. XHR request from my https githubpage fails due to mixed content. See Plug on latest AnimCJK data hugolpz/animCJK#3.

  2. Brush: you are reconstructing the hidden parts, that's impressive yes. I had to do it by hands.

  3. Open source = no deadlines. Document well, follow your own happy flow.

Note: Sometimes, for strategic reasons, I can be helpful to keep some open content unpublished until some strategic objective are met. This can help the project. If we are such situation please inform me of it, so I just wait for your later, official release.

@hugolpz
Copy link
Author

hugolpz commented Apr 25, 2021

I see you have APIs which return a mix of html and svg via getSvg.php and others.

Do you have the usual svgHans, Ja, svgHant folders, in https protocal, with cross domain queries allowed ? So I may change my local query :

    file=svgsDir+"/"+dec+".svg";
    xhr.open("GET",file,true);

into a cross domain query upon your data :

    file=apiUrl+'/'+svgsDir+'/'+dec+'.svg';
    xhr.open("GET",file,true);

Note: If not it's ok. My project is stable I can let it as it.

@parsimonhi
Copy link
Owner

Hello,

The "experimental" area of AnimCJK is experimental. :-) As a result it contains many errors, so take care.

If you want to get some characters from the experimental area, it is sometimes complicated because characters are stored in more than ten different folders.

To keep things simple, you can try to get radical characters from http://gooo.free.fr/animCJK/svgsZh/ folder only. In this folder, the file name of each character is suffixed by a "z" (this suffix means the character strokes are not yet "brushed").

For instance, 齒 is in http://gooo.free.fr/animCJK/svgsZh/40786z.svg. Display it in a browser, copy the code source in a text file, and name it "40786.svg" (without the "z" suffix).

Then you can put this file in the samples/svgs folder of "your" animCJK project (this folder is used to contain any additional characters that are not already in the official release of animCJK).

Finally, run samples/imageFactory.html of "your" animCJK project in a browser, select the "svgs" radio, enter 齒 in the character field, and click on "Create" button. Brushed characters as gif images will be generated.

@parsimonhi
Copy link
Owner

parsimonhi commented Apr 27, 2021

Hello,

You cannot get the svg sources of the experimental part of animCJK using ajax, but you can get them using curl.

@parsimonhi
Copy link
Owner

Hello,

I added the missing radicals. Note that many of them are not identical in svgsJa and svgsZhHans (because they have not the same stroke order or the same glyph or the same number of stroke).

@hugolpz
Copy link
Author

hugolpz commented Nov 30, 2021

Nice :D I will have some git merge to do, then it will unlock my workflow. 😄
2022 likely. Thank you @parsimonhi !

Before closing this Kangxi radicals issue :

  • svgsZhHant 廴: 3 strokes. svgsZhHans 廴 : 2 strokes. svgsZhJa 廴 : 2 strokes (not sure since not my area)
  • radicals points (see top of this ticket) should use Kangxi-inspired glyphs, always, whatever the locale. Then stroke order of the locales.

@parsimonhi
Copy link
Owner

parsimonhi commented Nov 30, 2021

Hello,

First, by ja, zhHans or zhHant, i means something that corresponds to the language code of a webpage such as ja, zh-hans or zh-hant (the one one puts in <html lang="ja"> for instance), to be sure we talk about the same thing.

In svgsJa, 廴 : 3 strokes (i am sure of that). When you have a doubt in Japanese, see https://kakijun.jp (not https://kakijun.com).

About the glyphs, one cannot just reproduce the Kangxi glyphs as is in kaisho (楷書/楷书) style (which is the style used in animCJK and the style used in wikimedia). One should conform as much as possible to the customs of the countries for this style. Kangxi glyphs are very closed (or the same as) to the style displayed with zh-hant lang code. But in ja or zhHans, several radicals have (slightly) different glyphs. Note that it is not a question of simplified character. When a character is really simplified in ja or zhHans, it has a different unicode.

For instance, 龜 :
In zhHans (when 龜, which is a traditional character, is used in a simplified Chinese text): https://www.zhihu.com/question/20317770
In Ja (龜 is an uncommon character in Japanese): https://kakijun.jp/page/kame16200.html
In zhTw: https://stroke-order.learningweb.moe.edu.tw/practice.do?lang=en&word=%E9%BE%9C
In zhHk: https://www.edbchinese.hk/lexlist_en/ then enter 龜 in the "Direct input character" field then click on "Show" button
Kangxi: https://www.kangxizidian.com/kangxi/1537.gif

Simplified in Ja (different unicode), 亀 : https://kakijun.jp/page/11235200.html
Simplified in zhHans (different unicode), 龟 : https://www.archchinese.com/chinese_english_dictionary.html?find=%E9%BE%9F

Sometimes, it is just the shape of one or two strokes which is different, as for 黹 (check the 6th and 7th strokes):
ja: https://kakijun.jp/page/chi12200.html
zhTw: https://stroke-order.learningweb.moe.edu.tw/practice.do?lang=en&word=%E9%BB%B9 (same as in ja)
kangxi: https://www.kangxizidian.com/kangxi/1522.gif (same as in ja)
zhHans: https://www.archchinese.com/chinese_english_dictionary.html?find=%E9%BB%B9 (different from ja, zhTw and kangxi).

As a result, the difference between ja zhHans and zhHant, for a given unicode, cannot be just a question of stroke order. It is also a question of glyph, even if they are closed to each other in all languages. And sometimes it is also a question of number of strokes, as for 廴 (with no glyph alteration), or 禸 (with glyph alteration).

You can also compare the glyph using Noto fonts (warning: there are several Noto fonts sets). You will see the same kind of difference, if you use the correct Noto font set for a given lang code, most of the time. But there are exceptions of course for some characters for which Noto and kaisho style shows a different result. For instance, characters with 糹/糸 radical such as 紙 have the same glyph as in Chinese in kaisho style, but are different in Noto (and the stroke order is also different in the ja kaisho style compared to a "normal" ja style for these characters).

In summary, one cannot use the Kangxi glyphs as is in ja or zhHans (even when it is a traditional character used in a simplified Chinese text). However, one probably can use them as is in zhHant. I am not enough strong at zhHant to be sure of that, but it is sure that there are small differences between zhHant and zhTw.

@parsimonhi
Copy link
Owner

hello,

I just check some radicals comparing Kangxi versus Noto with zh-hant and Taiwanese hanzi (https://stroke-order.learningweb.moe.edu.tw/characters.do?lang=en). There are glyph differences between Kangxi and the two others (which seem to give the same result). Check for instance 禸 (3rd stroke), 骨 (9th and 10th strokes), 雨 (6th and 7th strokes), 舟 (6th stroke), 角 (7th stroke).

Note that Japanese glyphs seem closer to Kangxi than traditional Chinese glyphs!

@hugolpz
Copy link
Author

hugolpz commented Dec 2, 2021

That's is why i suggested to have both radical unicode points and characters unicode points in your project.

  • The characters have been modified slightly per country policies in past 100 (400) years.
  • The radicals points are more anchored into traditions.

While i don't know the details, some items have slight differences in glyph shapes and number of strokes between their kangxi radicals points and their localized character unicode point.

@parsimonhi
Copy link
Owner

Hello,

You said:

While i don't know the details, some items have slight differences in glyph shapes and number of strokes between their kangxi radicals points and their character unicode point.

I see what you mean. Interesting view point. And it gives me another idea.

Besides "CJK UNIFIED IDEOGRAPH" (what i am currently using in animCJK) and "CJK RADICAL" (what you suggested), there are also some characters (with different unicode codes) called "CJK COMPATIBILITY IDEOGRAPH" that are designed to show the glyph in another language for a given character. There are already some samples of that in animCJK in the svgsJa. See for instance 勉 (21193.svg) and 勉 (64051.svg) in svgsJa. And for Kangxi radicals, there are some additional characters (with other unicode codes) called "KANGXI RADICAL". See for instance https://en.wiktionary.org/wiki/%E9%BE%9C (龜).

However, it seems like a total mess (I haven't figured out what to do yet)! But perhaps we can hope to see one day in animCJK the glyph of radicals as in Kangxi, whatever the language in use, using one of these other unicode codes.

@hugolpz
Copy link
Author

hugolpz commented Dec 26, 2021

@parsimonhi, I think you have a clearer understanding on this issue and can't help with my current understanding. I've been out of this field (not doing proper reading and character analysis) for a decade. Best is you lead as you see fit indeed : )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants