Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "ja_stop" filter #48

Closed

Conversation

johtani
Copy link
Contributor

@johtani johtani commented Oct 21, 2014

  • can use a predefined "japanese" stop words
  • can not use other predefined stop words

Closes #45

@johtani
Copy link
Contributor Author

johtani commented Oct 21, 2014

Now, I add ja_stop filter to this plugin. This implementation copy from stop filter.

If we can the predefined stop words to elasticsearch core stop filter from Analyzer Plugin, is it useful?

@johtani
Copy link
Contributor Author

johtani commented Oct 22, 2014

I have two option for implementing this.

  1. Implementing each analyzer's stop filter
    This PR. Create new stop filter for the plugin.
    If possible, we change StopTokenFilterFactory's properties to "protected". And extends it.
    Then, it is simple.
  2. Extending elasticsearch core stop filter via Analyzer Plugin
    Now, the predefined stop words list is Immutable. https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/index/analysis/Analysis.java#L122-122
    If we can change the list via Analyzer Plugin, is it useful? it is dangerous?

.immutableMap();
this.stopWords = Analysis.parseWords(env, settings, "stopwords", JapaneseAnalyzer.getDefaultStopSet(), namedStopWords, version, ignoreCase);
this.enablePositionIncrements = settings.getAsBoolean("enable_position_increments", true);
if (!enablePositionIncrements && version.onOrAfter(Version.LUCENE_44)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need that? I mean that this PR will probably go in es-1.4+ which has a Lucene 4.10 version.
Is it for backward compatibility?

@dadoonet
Copy link
Member

Left a small comment.
I think you should also update the README file to reflect that change.

 * can use a predefined "_japanese_" stop words
 * can not use other predefined stop words

Closes #45
@johtani
Copy link
Contributor Author

johtani commented Nov 26, 2014

@dadoonet Thanks for your comment! I fixed it.

 * can use a predefined "_japanese_" stop words
 * can not use other predefined stop words
 * upgrade to lucene 5
 * add ja_stop to README

  Closes #45
@johtani johtani force-pushed the add-predefined-ja-stopwords-set branch from 2881cc0 to 21bfe65 Compare November 26, 2014 16:51
@johtani
Copy link
Contributor Author

johtani commented Nov 26, 2014

  • update README
  • upgrade lucene 5-SNAPSHOT

@johtani johtani added this to the 2.4.3 milestone Mar 16, 2015
@johtani johtani self-assigned this Mar 16, 2015
@johtani
Copy link
Contributor Author

johtani commented Mar 16, 2015

Merged by 0a0d6fd

@johtani johtani closed this Mar 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants