Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better morphemizer dict (ipadic-neologd) #37

Open
setreadygo opened this issue Oct 12, 2019 · 3 comments
Open

Better morphemizer dict (ipadic-neologd) #37

setreadygo opened this issue Oct 12, 2019 · 3 comments

Comments

@setreadygo
Copy link

@setreadygo setreadygo commented Oct 12, 2019

There's a better dictionary for mecab called neologd, although it's quite bulky (1gb vs the 50mb currently bundled dict vs the 250mb ipadic morphman addon).

https://github.com/neologd/mecab-ipadic-neologd

It would solve a lot of accuracy issues with readings that are popping up. Also, even the latest mecab-ipadic is 8 years old (!) while neologd is a new corpus that has been updated as recently as a few weeks ago.

@LucasMIA
Copy link
Collaborator

@LucasMIA LucasMIA commented Feb 20, 2020

Would you be interested in opening a pr about this?

@jamesrwaugh
Copy link

@jamesrwaugh jamesrwaugh commented Apr 6, 2020

I've been considering adding this, but not natively due to its size. Would an option to select an alternate MeCab dictionary be suitable? @LucasMIA

@dmgerman
Copy link

@dmgerman dmgerman commented Apr 28, 2020

i looked into this. the code is very tied to the format of the Mecab output it uses. It would require several changes to adapt to the extended output of the Ipadic one. The easiest way would be to create a filter that makes Ipadic behave like the currently used Mecab. This does not have be done inside the extension. It can be drop-in replacement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.