Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
1.更新schema,添加五筆畫反查,修改過長單詞。
2.更新dict,補上粵拼詞表數據中缺失的詞彙(log文件)並對其修正,擬聲詞、單字音和夾英文的詞不收。
3.對於一詞多音的情況,建議以中英對照香港學校中文學習基礎字詞和香港小學學習字詞表的規範來設定推薦讀音。
例如「刻不容緩」LSHK粵拼詞表中有四種讀法
字詞表中的「刻」字顯示推薦「刻不容緩 hak1 bat1 jung4 wun6」,因此可以把其他讀法詞頻調低,我對帶wun4項的調至5%。又例如字詞表中的「雕」字詞組「雕刻」有tiu1 hak1一音(常見),增補於上。其他詞彙都可以類似調整。
4.對於單字音部份,我的想法是要加上字頻,以避免出現類似「打 daa1」排在「打 daa2」前的情況,可通過統計同一個字的某個讀音在五份資料中的出現頻次設定,例如「購」有kau3 gau3兩讀,kau3佔5/5,gau3佔4/5,因此kau3頻率高於gau3,若LSHK字音表中有
*
一類的標誌還可以把頻率再調高。具體原則和細節可以進一步研究討論。