Skip to content
This repository has been archived by the owner on Mar 1, 2022. It is now read-only.

词频统计的问题 #70

Open
zjw271208550 opened this issue May 24, 2021 · 0 comments
Open

词频统计的问题 #70

zjw271208550 opened this issue May 24, 2021 · 0 comments

Comments

@zjw271208550
Copy link

您好,我对 ngram_utils 的 get_ngram_freq_info 有些疑惑,请教一下:
为什么对于词频是否大于min_freq 的操作要在 _process_corpus_chunk 中进行?
假如每个 chunk 中各有一个 X,共10个 chunk ,那么即便 min_freq 设的是2 也不会统计到这个 X.
min_freq 是只对当前 chunk 的词频结果判断嘛?不应该是整个corpus?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant