Skip to content
This repository has been archived by the owner on Mar 1, 2022. It is now read-only.

新词发现可以用来发现长词组吗?比如:生物医药板块、新冠疫苗板块 #56

Open
zsp042 opened this issue Jul 20, 2020 · 2 comments

Comments

@zsp042
Copy link

zsp042 commented Jul 20, 2020

新词发现可以用来发现长词组吗?比如:生物医药板块、新冠疫苗板块

@ZXR-v2
Copy link

ZXR-v2 commented Aug 21, 2020

同问!

@victorzhrn
Copy link
Collaborator

def extract_phrase(corpus,
                   top_k: float = 200,
                   chunk_size: int = 1000000,
                   min_n:int = 2,
                   max_n:int=4,
                   min_freq:int = 5):

源码中, 其实有两个参数 max_nmax_n分别是抽取出词汇的最短与最长字符长度. issue里提到的case: 生物医药板块,新冠疫苗板块 字符长度为5; 如果明确是要找5字短语, 可以试试直接设置min_n=5,max_n=5

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants