Skip to content

Commit

Permalink
remove lowercase filter
Browse files Browse the repository at this point in the history
The tokenizer already transformed text to lowercase in its normalizer.
  • Loading branch information
thphuong committed Sep 4, 2019
1 parent e938b39 commit ac8895b
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 4 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Vietnamese Analysis plugin integrates Vietnamese language analysis into Elasticsearch.
The plugin provides the following functions:

Analyzer: `vi_analyzer`. Tokenizer: `vi_tokenizer`. Filter: `vi_stop`. The `vi_analyzer` itself is composed of the `vi_tokenizer`, the `lowercase` and the `vi_stop` filter.
Analyzer: `vi_analyzer`. Tokenizer: `vi_tokenizer`. Filter: `vi_stop`. The `vi_analyzer` itself is composed of the `vi_tokenizer` and the `vi_stop` filter.

The tokenizer uses [coccoc-tokenizer](https://github.com/coccoc/coccoc-tokenizer) for tokenization.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
import com.coccoc.Tokenizer.Mode;

import org.apache.lucene.analysis.CharArraySet;
import org.apache.lucene.analysis.LowerCaseFilter;
import org.apache.lucene.analysis.StopwordAnalyzerBase;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
Expand Down Expand Up @@ -38,8 +37,7 @@ private static class DefaultSetHolder {
@Override
protected TokenStreamComponents createComponents(String fieldName) {
Tokenizer tokenizer = new VietnameseTokenizer(mode);
TokenStream stream = new LowerCaseFilter(tokenizer);
stream = new StopFilter(stream, stopwords);
TokenStream stream = new StopFilter(tokenizer, stopwords);

return new TokenStreamComponents(tokenizer, stream);
}
Expand Down

0 comments on commit ac8895b

Please sign in to comment.