Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upBetter morphemizer dict (ipadic-neologd) #37
Comments
|
Would you be interested in opening a pr about this? |
|
I've been considering adding this, but not natively due to its size. Would an option to select an alternate MeCab dictionary be suitable? @LucasMIA |
|
i looked into this. the code is very tied to the format of the Mecab output it uses. It would require several changes to adapt to the extended output of the Ipadic one. The easiest way would be to create a filter that makes Ipadic behave like the currently used Mecab. This does not have be done inside the extension. It can be drop-in replacement. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
There's a better dictionary for mecab called neologd, although it's quite bulky (1gb vs the 50mb currently bundled dict vs the 250mb ipadic morphman addon).
https://github.com/neologd/mecab-ipadic-neologd
It would solve a lot of accuracy issues with readings that are popping up. Also, even the latest mecab-ipadic is 8 years old (!) while neologd is a new corpus that has been updated as recently as a few weeks ago.