Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative cost #64

Closed
kota7 opened this issue Nov 22, 2019 · 2 comments
Closed

Negative cost #64

kota7 opened this issue Nov 22, 2019 · 2 comments

Comments

@kota7
Copy link

kota7 commented Nov 22, 2019

Thanks first for the great database.

Motivation

I find some words in the data are assigned negative costs.

$ cat mecab-ipadic-neologd/build/mecab-ipadic-2.7.0-20070801-neologd-20191111/mecab-user-dict-seed.20191111.csv | grep "ファニチャーロウ"
ファニチャーロウレーシング,1288,1288,-5111,名詞,固有名詞,一般,*,*,*,ファニチャー・ロウ・レーシング,ファニチャーロウレーシング,ファニチャーロウレーシング
ファニチャー・ロウ・レーシング,1288,1288,-9029,名詞,固有名詞,一般,*,*,*,ファニチャー・ロウ・レーシング,ファニチャーロウレーシング,ファニチャーロウレーシング

Costs are lower for more frequent words. But the examples above do not seem to be so frequent as assigned a very low cost. I suspect this could possibly be a result of integer overflow or sort.

Goal

I would like to know:
(1) if this is a correct/intended result or a bug
(2) if correct/intended, how negative costs should be interpreted.

Can someone help me with this?

@neologd
Copy link
Owner

neologd commented Nov 25, 2019

Thank you for your frank question.

In conclusion, we think this case is correct and not a bug.
And using negative integer values in the range of 2-byte integers as a cost value conform to the IPADIC specification.

Also, the cost value given to each words are not necessarily based on the frequency of word observation in the real world or in the corpus.

Chapter 5, (P 79 -) in the following book will help you understand how different cost values are used in the analysis process.

https://www.amazon.co.jp/dp/B07J1NBNYW/ref=tmm_kin_swatch_0

If you don't have this book, we strongly recommend you to read it.

Also a following slide (P9 -) by same author is very helpful for you.

https://www.jtpa.org/wp-content/uploads/2014/06/MeCab.pdf

Thank you very much.

@neologd neologd closed this as completed Nov 25, 2019
@kota7
Copy link
Author

kota7 commented Nov 25, 2019

Thanks for the answer. This helps a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants