-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
代码中使用的统计量有哪些? #5
Comments
有点久,我已经不大记得那篇论文的专业术语了。 映像中就两点,每个词的词频,以及左右熵。 再加上左右熵,”我的“就不是词,因为它是一个很随机的组合,算不上是日常特定使用的词语。 上文所说的凝固统计量,是怎么个计算的?我看我在代码里面是不是我实现的意思~ |
凝固度还是Matrix67那篇文章,计算联合概率 |
hi jannson |
感谢! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
问题
hi jannson
您在另一个项目的说明中说到该项目用的算法来源于Martrix67的那篇文章,但是在阅读您的代码之后,发现您使用的统计量主要是左右邻接熵, 并未看到您使用凝固度,于是我在此基础上,又添加了凝固度统计量,可以又过滤一些“伪新词”, 但是在有些语料上该工具的新词的发现能力有时候还是不太好,。
所以,请问您的代码中主要都用了哪些统计量? 基于您的经验,如果要进一步优化,您觉得还需要做哪些方向的改进?谢谢!
K
The text was updated successfully, but these errors were encountered: