Add "ja_stop" filter #48

johtani · 2014-10-21T09:16:26Z

can use a predefined "japanese" stop words
can not use other predefined stop words

Closes #45

johtani · 2014-10-21T09:22:06Z

Now, I add ja_stop filter to this plugin. This implementation copy from stop filter.

If we can the predefined stop words to elasticsearch core stop filter from Analyzer Plugin, is it useful?

johtani · 2014-10-22T16:01:00Z

I have two option for implementing this.

Implementing each analyzer's stop filter
This PR. Create new stop filter for the plugin.
If possible, we change StopTokenFilterFactory's properties to "protected". And extends it.
Then, it is simple.
Extending elasticsearch core stop filter via Analyzer Plugin
Now, the predefined stop words list is Immutable. https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/index/analysis/Analysis.java#L122-122
If we can change the list via Analyzer Plugin, is it useful? it is dangerous?

dadoonet · 2014-11-26T13:40:40Z

src/main/java/org/elasticsearch/index/analysis/JapaneseStopTokenFilterFactory.java

+            .immutableMap();
+        this.stopWords = Analysis.parseWords(env, settings, "stopwords", JapaneseAnalyzer.getDefaultStopSet(), namedStopWords, version, ignoreCase);
+        this.enablePositionIncrements = settings.getAsBoolean("enable_position_increments", true);
+        if (!enablePositionIncrements && version.onOrAfter(Version.LUCENE_44)) {


Do we need that? I mean that this PR will probably go in es-1.4+ which has a Lucene 4.10 version.
Is it for backward compatibility?

dadoonet · 2014-11-26T13:43:19Z

Left a small comment.
I think you should also update the README file to reflect that change.

* can use a predefined "_japanese_" stop words * can not use other predefined stop words Closes #45

johtani · 2014-11-26T14:58:51Z

@dadoonet Thanks for your comment! I fixed it.

* can use a predefined "_japanese_" stop words * can not use other predefined stop words * upgrade to lucene 5 * add ja_stop to README Closes #45

johtani · 2014-11-26T16:52:47Z

update README
upgrade lucene 5-SNAPSHOT

johtani · 2015-03-16T04:49:04Z

Merged by 0a0d6fd

dadoonet reviewed Nov 26, 2014
View reviewed changes

Add "ja_stop" filter

4ebd6fb

* can use a predefined "_japanese_" stop words * can not use other predefined stop words Closes #45

Add "ja_stop" filter

21bfe65

* can use a predefined "_japanese_" stop words * can not use other predefined stop words * upgrade to lucene 5 * add ja_stop to README Closes #45

johtani force-pushed the add-predefined-ja-stopwords-set branch from 2881cc0 to 21bfe65 Compare November 26, 2014 16:51

johtani added new 2.4.3 labels Mar 16, 2015

johtani added this to the 2.4.3 milestone Mar 16, 2015

johtani self-assigned this Mar 16, 2015

johtani closed this Mar 16, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add "ja_stop" filter #48

Add "ja_stop" filter #48

johtani commented Oct 21, 2014

johtani commented Oct 21, 2014

johtani commented Oct 22, 2014

dadoonet Nov 26, 2014

dadoonet commented Nov 26, 2014

johtani commented Nov 26, 2014

johtani commented Nov 26, 2014

johtani commented Mar 16, 2015

Add "ja_stop" filter #48

Add "ja_stop" filter #48

Conversation

johtani commented Oct 21, 2014

johtani commented Oct 21, 2014

johtani commented Oct 22, 2014

dadoonet Nov 26, 2014

Choose a reason for hiding this comment

dadoonet commented Nov 26, 2014

johtani commented Nov 26, 2014

johtani commented Nov 26, 2014

johtani commented Mar 16, 2015