Skip to content

Conversation

ChrisMcKee
Copy link
Contributor

NGram Filter TokenChars query (Unknown if required) @Mpdreamz

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-ngram-tokenfilter.html doesnt display the token_chars as an option but it's used in various articles and tracked in various queries on stackoverflow

Full copy of raw query being pushed in the article above here https://gist.github.com/ChrisMcKee/c4cab45080e52e35190a#file-elasticsearch-ngram-tokenchar-poc

Resulting mapping here https://gist.github.com/ChrisMcKee/c4cab45080e52e35190a#file-elasticsearch-ngram-tokenchar-poc-resultant-mapping

Obviously the tokenizer http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html#analysis-ngram-tokenizer has it, so this may simply be a case of a common misunderstanding that ES seems to allow/maps in?

analysis: {
    filter: {
        nGram_filter: {
            max_gram: 20,
            min_gram: 2,
            type: nGramtoken_chars: [
                letter,
                digit,
                punctuation,
                symbol
            ]
        }
    }

Seems right though, luckily Martijn you have the advantage of being able to ask :)

@Mpdreamz
Copy link
Member

The ngram tokenizer has it:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html

but the ngram token filter does not:

https://github.com/elasticsearch/elasticsearch/search?utf8=%E2%9C%93&q=token_chars

see also:

https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/index/analysis/NGramTokenFilterFactory.java#L35-L55

The ngram tokenizer allows you to specify token chars so that blah in qwerty@blah is a separate ngram in its own right. You can only specify only 1 tokenizer which is why this is important, when token filters kick in you have already cut up the words in logical portions.

@Mpdreamz Mpdreamz closed this Oct 24, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants