Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyhanlp多进程问题 #1645

Closed
1 task done
luoqishuai opened this issue Apr 21, 2021 · 3 comments
Closed
1 task done

pyhanlp多进程问题 #1645

luoqishuai opened this issue Apr 21, 2021 · 3 comments
Assignees
Labels

Comments

@luoqishuai
Copy link

luoqishuai commented Apr 21, 2021

Describe the bug
pyhanlp多进程异常. 不能充分利用cpu,而且感觉 代码停止/"卡住"

Code to reproduce the issue

!pip3 install pyhanlp
from multiprocessing import Pool
from tqdm import tqdm
from pyhanlp import HanLP
print(HanLP.segment('hello'))

def test_process(tmp: int):
    for i in range(10000):
        HanLP.segment("商品和服务")

pool_ = Pool(2)
result = pool_.map(test_process, tqdm(range(10)))
pool_.close()
pool_.join()
# print(result)
print('END')

Describe the current behavior
同样的多进程代码,就单纯的分词代码改成其它分词工具是没有问题的
HanLP.segment -> jieba.cut
但是hanlp运行的时候cpu使用率在130%左右(机器是2颗 E5-2620 v4,每颗是8核16线程,内存剩余30G)
我不知道是真的卡住还是,速度慢.

Expected behavior
我希望能够多进程,高速运行hanlp分词

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):CentOS Linux release 7.6.1810 (Core)
  • Python version:3.6.5
  • HanLP version:
    hanlp 2.1.0a36
    hanlp-common 0.0.6
    hanlp-downloader 0.0.20
    hanlp-trie 0.0.2
    pyhanlp 0.1.77

Other info / logs
这是使用
https://play.hanlp.ml/run/hanlp-zh
运行的结果

cpu使用率依旧是0,等了好久,一直在转.
图片链接
https://sm.ms/image/acSxlnBwpG49ehJ

代码改自#1625

在网上找到的可能相关的问题
https://bbs.hankcs.com/t/topic/2128

  • I've completed this form and searched the web for solutions.
@zsweet
Copy link

zsweet commented Apr 21, 2021

遇到了同样的问题,请问这个问题大概什么时候能解决呢?

@hankcs
Copy link
Owner

hankcs commented Apr 21, 2021

这是因为JVM只有一个,不可能被多个Python进程调用。

正确的方式为Java线程池,利用Java真正的多线程并发优势,轻松超越Python。

例子:https://github.com/hankcs/pyhanlp/blob/master/tests/demos/demo_multi_thread.py

@hankcs hankcs closed this as completed Apr 21, 2021
@hankcs hankcs added question and removed bug labels Apr 21, 2021
@hanlpbot
Copy link
Collaborator

This issue has been mentioned on Butterfly Effect. There might be relevant details there:

https://bbs.hankcs.com/t/topic/2128/4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants