Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

去除非多音字词组的拼音库 #23

Closed
hotoo opened this issue Sep 18, 2014 · 7 comments
Closed

去除非多音字词组的拼音库 #23

hotoo opened this issue Sep 18, 2014 · 7 comments

Comments

@hotoo
Copy link
Owner

hotoo commented Sep 18, 2014

有一个优化点,词组的拼音库,可以只保留『有多音字的词组』,没有多音字的词组以普通的拼音转换即可。
不过工作量会比较大,希望大家能帮助找出有多音字的词组。 :)
#19 #20

@TooBug
Copy link

TooBug commented Dec 12, 2014

我来试下。

@TooBug
Copy link

TooBug commented Dec 12, 2014

能否简单介绍一下tools目录的结构?各个工具和目录的作用是啥?

要找出含多音字的词组的话是直接从dict入手生成一个新词典吗?(即dict是“源”还是生成结果?)

@hotoo
Copy link
Owner Author

hotoo commented Dec 12, 2014

👍

tools 目录用来从网络上抓取字典、词典用的,dict 是抓取的结果。可以看看 Makefile 里面的用法。

这个需求可以考虑写个工具自动提取所有的多音字,并进一步提取出所有的多音词。

@TooBug
Copy link

TooBug commented Dec 12, 2014

刚看到markfile,大概明白了,最近抽时间看能不能整理出来。

@devon
Copy link

devon commented Dec 15, 2014

NPM 安装后的结果很大,有 15M 之多,但 Web 版本通过 SPM 安装就很小。除了词曲的大小之外,NPM 安装的还有很多的其它文件,哪些文件或目录是可以不用的,可以直接删除掉?

@devon
Copy link

devon commented Dec 15, 2014

非多音字词组,这个,可能通过程序来查找吧? 建一个多音字的库,然后去遍历词组,将没有多音字的词组干掉。

@hotoo
Copy link
Owner Author

hotoo commented Aug 3, 2015

mozillazg/python-pinyin#12mozillazg/python-pinyin@299a283 这里看来,绝大部分词语都包含多音字,这个优化看来必要性不大。

期待 luckykaiyi/nodejieba#29

@hotoo hotoo closed this as completed Aug 3, 2015
luzhen328 pushed a commit to luzhen328/pinyin.js that referenced this issue May 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants