Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
2.dict 增補詞彙,來源開放粵語詞典同CC-Canto,並除去八股文中已有都詞彙,同時對不少開放粵語詞典的同音替代字換回本字,CC-Canto數據源爲網友所寫,可能有生僻字或替代字,已儘量修補。
- Loading branch information
bd8349b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
今次新增嘅開放粵語詞典嘅兩萬幾個詞彙都放喺碼表最後,係唔係因爲仲未校對完?等校對完之後最好將佢哋撈埋原先嘅嗰啲詞組重新排序,噉樣方便管理。
而且我見到開放粵語詞典入邊有好多詞好似都奇離,好似
丫挺 aa1 ting5
唔知係咩來嘅。仲有就係,呢啲詞組係唔係同目前rime-jyutping
無聲調版碼表入邊最尾嘅未標音詞組一樣嘅?bd8349b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
太多詞㗎啦所以無可能校對完美嘅,只能夠儘量校對,睇到邊啲有問題就校對邊啲,可能永無校對完成之日,撈到一起都好麻煩,所以放到碼表最屘。
開放粵語詞典我只係取比較有用比較清楚嘅詞來加入,所以係精選主要嘅部分然後加上CC-Canto嘅詞彙部分,我冇參照過rime-jyutping無聲調版碼表後便嘅詞,所以兩便相同唔相同嘅詞都可能會有。
bd8349b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好嘅,噉呢啲詞加入去嗰陣有無將嗰個佢哋同前面啲詞去重?可能會有好多重複。
bd8349b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我已經使用數據庫去過重了,唔會重複,除非之前手工加入嘅時候唔留意多加一個,上半節LSHK詞表嘅部分,但呢種情況概率小啲。
bd8349b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
「丫挺,北京方言,粗话。是“丫头养的”的连读。」
bd8349b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
其實我有諗過一個問題,就係有無必要將似乎入邊嘅北方話詞彙同粵語詞彙分開來,不過噉樣又好似太麻煩