Extend MeCab plug-in to support morpheme n-gram #1054

kmaehashi · 2015-10-07T03:20:30Z

Currently MeCab plugin just splits the input string and use surface of each morpheme as a feature.
It is better to support morpheme n-gram to extract the context of the natural language.

I'm thinking of adding a new "ngram" parameter (which is optional and defaults to 1) to the mecab plug-in configuration as follows:

"string_types": {
  "mecab": {
    "method": "dynamic",
    "path": "libmecab_splitter.so",
    "function": "create",
    "arg": "-d /usr/lib64/mecab/dic/ipadic",
    "ngram": 2
  }
}

For instance, when 本日は晴天 is given as an input, {"本日", "は", "晴天"} is extracted when "ngram" is 1, whereas {"本日|は", "は|晴天"} is extracted when "ngram" is 2.

The text was updated successfully, but these errors were encountered:

kmaehashi · 2015-12-15T01:39:40Z

Updated docs for this: jubatus/website#236

support morpheme n-gram in mecab plugin (fix #1054)

kmaehashi · 2015-12-21T05:09:34Z

Fixed via #1070

kmaehashi added feature plugin labels Oct 7, 2015

kmaehashi added this to the Far Future milestone Oct 7, 2015

kmaehashi modified the milestones: Near Future, Far Future Nov 9, 2015

kmaehashi modified the milestones: 0.8.6, Near Future Dec 6, 2015

kmaehashi self-assigned this Dec 7, 2015

kmaehashi added a commit that referenced this issue Dec 14, 2015

support morpheme n-gram in mecab plugin (fix #1054)

ba174c0

kmaehashi mentioned this issue Dec 14, 2015

support morpheme n-gram in mecab plugin (fix #1054) #1070

Merged

kmaehashi added a commit that referenced this issue Dec 15, 2015

support morpheme n-gram in mecab plugin (fix #1054)

2e052e4

kmaehashi mentioned this issue Dec 15, 2015

add documentation for ngram option in mecab plugin jubatus/website#236

Merged

TkrUdagawa added a commit that referenced this issue Dec 18, 2015

Merge pull request #1070 from jubatus/mecab-ngram

ac64c72

support morpheme n-gram in mecab plugin (fix #1054)

kmaehashi closed this as completed Dec 21, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend MeCab plug-in to support morpheme n-gram #1054

Extend MeCab plug-in to support morpheme n-gram #1054

kmaehashi commented Oct 7, 2015

kmaehashi commented Dec 15, 2015

kmaehashi commented Dec 21, 2015

Extend MeCab plug-in to support morpheme n-gram #1054

Extend MeCab plug-in to support morpheme n-gram #1054

Comments

kmaehashi commented Oct 7, 2015

kmaehashi commented Dec 15, 2015

kmaehashi commented Dec 21, 2015