Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jieba.load_userdict()的问题 #399

Open
super1-chen opened this issue Sep 29, 2016 · 4 comments
Open

jieba.load_userdict()的问题 #399

super1-chen opened this issue Sep 29, 2016 · 4 comments

Comments

@super1-chen
Copy link

我按照规范建立了自己的字典,因为业务需要,我创建了两个字典,分别名字为a.txt ,b.txt 这两个文件分表包括了两个数据表格里面的数据,自己测试了一下不能如果采用以下的方法加载两个文件,好像不能同时生效

import jieba
jieba.load_userdict('a.txt')
jieba.load_userdict('a.txt')

请问结巴能否同时加载两个用户字典呢??

@super1-chen
Copy link
Author

今天又遇到了问题,我在我的flask web app 中使用了jieba的加载自定义字典功能,然后用下面的命令启动

gunicorn -w 4 -p gevent -b 0.0.0.0:9999 --reload run:app
发现jieba连续不断的吐出下面的提示,我觉得应该是gunicorn开启了多个线程导致了这个问题,我想请教下,该如何解决?

loading model from cache /tmp/jieba.cache
loading model cost 2.44023799896 seconds.
Trie has been built succesfully.
[2016-09-29 17:05:37 +0000] [32528] [INFO] Booting worker with pid: 32528
Building Trie..., from /root/py27/lib/python2.7/site-packages/jieba/dict.txt
loading model from cache /tmp/jieba.cache
loading model cost 2.28571200371 seconds.
Trie has been built succesfully.
[2016-09-29 17:06:06 +0000] [32556] [INFO] Booting worker with pid: 32556
Building Trie..., from /root/py27/lib/python2.7/site-packages/jieba/dict.txt
loading model from cache /tmp/jieba.cache
loading model cost 2.27150511742 seconds.
Trie has been built succesfully.
[2016-09-29 17:06:10 +0000] [32560] [INFO] Booting worker with pid: 32560
Building Trie..., from /root/py27/lib/python2.7/site-packages/jieba/dict.txt
loading model from cache /tmp/jieba.cache

@fxsjy
Copy link
Owner

fxsjy commented Sep 30, 2016

gunicorn会fork多个进程,但是jieba是lazy加载词典的。你可以在import jieba后,调用一下jieba.initialize()。 这样就不会多次加载了。

@natsuapo
Copy link

natsuapo commented Oct 3, 2016

同样也是jieba load_dict的问题,我发现我自己在词典中添加了一个词并设定了参数比如:萌萌哒 50 a,但是使用posseg分词的结果却是 萌萌哒 x,这是版本问题还是其他设定的问题?

@super1-chen
Copy link
Author

@fxsjy
具体的代码用到了这几个部分
import jieba
jieba.initialize()

import os
if os.path.exists('cbi360.txt'):
jieba.load_userdict('cbi360.txt')
import jieba.posseg as peg

其中 cbi360.txt是我的自己的字典,而且我还用刀了jieba.posseg 的方法,请问这个具体的顺序是怎么样的啊?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants