Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

擴充詞庫 #10

Closed
laubonghaudoi opened this issue Oct 29, 2019 · 3 comments
Closed

擴充詞庫 #10

laubonghaudoi opened this issue Oct 29, 2019 · 3 comments

Comments

@laubonghaudoi
Copy link
Member

laubonghaudoi commented Oct 29, 2019

我之前搵到咗呢個倉庫:https://github.com/ziloeng/rime-jyut6ping3
呢個倉庫入邊有好豐富嘅粵語詞彙,我亦都徵得咗作者嘅同意,可以將入邊嘅數據加到我哋嘅碼表度。所以我而家諗住下一步就整合呢啲詞彙。呢個倉庫入邊有5個詞庫文件:

  1. jyut6ping3.dict.yaml單字字音碼表,例子Unihan嘅kCantonse。呢部分我哋已經解決,可以忽略。
  2. jyut6ping3.dict.yaml少量emoji碼表,可以忽略。
  3. jyut6ping3.vocabulary.dict.yaml大量粵語詞彙,其中有1萬1千條有標粵拼,剩低9萬幾條剩得個詞組,冇標粵拼。呢部分係我哋要重點考慮嘅
  4. jyut6ping3.vocabulary.emoji.dict.yaml3千幾條粵語詞彙,冇標粵拼,可以都加入(唔知同上面有乜唔同,點解要分出來)。

所以我而家打算先增補呢部分詞彙。另外有一個問題就係,因爲呢啲詞彙數量太大,無辦法一次過手工檢查晒,所以我推薦先將呢部分詞彙放喺另外一個文件jyut6ping3.vocabulary.dict.yaml入邊,包括@leimaau 之前提交bd8349b 加嘅兩萬個詞條,都整合放到呢個文件入邊,統一以後收到反饋再修改維護。噉樣好唔好?

最後有個問題就係,如果我哋加入晒呢啲詞彙,話唔定可以取消使用個自帶八股文詞庫嘅設定。因爲呢度嘅詞彙已經足夠多,而且可以避免打出一啲官話詞彙。當然呢一點要到時試過先知。

@chaaklau
Copy link
Collaborator

chaaklau commented Oct 30, 2019

  • jyut6ping3.dict.yaml少量emoji碼表,可以忽略。

(應該打錯咗檔案名?) Emoji 喺推廣輸入法方面好有用,長遠要大幅擴張至係 :) 如果有餘力的話,應該要每個 emoji 加十個八個相關詞落去。

@leimaau
Copy link
Collaborator

leimaau commented Oct 30, 2019

@laubonghaudoi 兩個詞庫分開之後如果用戶嗰便能夠自動安裝就冇問題,若果都係要手動安裝兩個詞庫同opencc都係一件麻煩事。jyut6ping3.vocabulary.dict.yaml剩低個9萬詞如果唔係粵詞,大部份係官詞且八股文簡化字八股文都收錄就唔需賦粵拼喇啩。

@chaaklau Emoji 係可以通過OpenCC_Emoji來加嘅,做法同繁簡轉換一樣,下個更新可以加上。

@laubonghaudoi laubonghaudoi reopened this Oct 30, 2019
@laubonghaudoi
Copy link
Member Author

  • jyut6ping3.dict.yaml少量emoji碼表,可以忽略。

(應該打錯咗檔案名?) Emoji 喺推廣輸入法方面好有用,長遠要大幅擴張至係 :) 如果有餘力的話,應該要每個 emoji 加十個八個相關詞落去。

emoji嘅支持係可以另外自己安裝嘅,就係用呢個倉庫https://github.com/rime/rime-emoji

具體操作就係,運行下面呢行命令,然後重新佈署,就可以打emoji了。

bash rime-install emoji:customize:schema=jyut6ping3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants