Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update schema and dictionary #2

Merged
merged 2 commits into from
Oct 1, 2019
Merged

update schema and dictionary #2

merged 2 commits into from
Oct 1, 2019

Conversation

leimaau
Copy link
Collaborator

@leimaau leimaau commented Oct 1, 2019

1.更新schema,添加五筆畫反查,修改過長單詞。

2.更新dict,補上粵拼詞表數據中缺失的詞彙(log文件)並對其修正,擬聲詞、單字音和夾英文的詞不收。

3.對於一詞多音的情況,建議以中英對照香港學校中文學習基礎字詞香港小學學習字詞表的規範來設定推薦讀音。
例如「刻不容緩」LSHK粵拼詞表中有四種讀法

刻不容緩 haak1 bat1 jung4 wun4
刻不容緩 haak1 bat1 jung4 wun6
刻不容緩 hak1 bat1 jung4 wun4
刻不容緩 hak1 bat1 jung4 wun6

字詞表中的「刻」字顯示推薦「刻不容緩 hak1 bat1 jung4 wun6」,因此可以把其他讀法詞頻調低,我對帶wun4項的調至5%。又例如字詞表中的「雕」字詞組「雕刻」有tiu1 hak1一音(常見),增補於上。其他詞彙都可以類似調整。

4.對於單字音部份,我的想法是要加上字頻,以避免出現類似「打 daa1」排在「打 daa2」前的情況,可通過統計同一個字的某個讀音在五份資料中的出現頻次設定,例如「購」有kau3 gau3兩讀,kau3佔5/5,gau3佔4/5,因此kau3頻率高於gau3,若LSHK字音表中有*一類的標誌還可以把頻率再調高。具體原則和細節可以進一步研究討論。

@laubonghaudoi laubonghaudoi merged commit 9d8d71b into rime:master Oct 1, 2019
@laubonghaudoi laubonghaudoi mentioned this pull request Oct 2, 2019
leimaau added a commit that referenced this pull request Oct 3, 2019
hfhchan added a commit that referenced this pull request Jun 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants