New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
position inconsistency when using _analyze API with or without index name #29021
Comments
Pinging @elastic/es-search-aggs |
The reason for the difference in The code that makes this decision is here Lucene defaults positionIncrementGap to 0 in all analyzers, so if we desire non default lucene behavior for all analyzers the code for setupAnalyzers might need to change to not return vanilla lucene analyzers but ones that override the getPositionIncrementGap. I am not super familiar with this code so I am not sure if what I suggest is even desired or correct behavior |
When no index is specified on an analyze request, the code that builds the analysis chain for that request goes directly to the analysis registry to check for pre-built analyzers. This can cause inconsistencies due to the fact that elasticsearch defaults for various analysis settings (eg the position increment gap) are different to the lucene defaults. To remove these inconsistencies, this commit builds a one-off IndexAnalyzers object when none is provided. This means that all analyzers accessed through an analysis request will use elasticsearch defaults. Fixes elastic#29021
Pinging @elastic/es-search (Team:Search) |
Elasticsearch version (
bin/elasticsearch --version
): 6.2.2Steps to reproduce:
Consider the following
It gives:
Position of the 1st term of the second text is
101
which is correct because we don't want to be able to match phrase likea b
.Now the same
_analyze
but with no index name:It gives:
Position of the 1st term of the second text is
1
which is incorrect.It leaves the impression that a match phrase could work on
a b
.Not important though as indexing a document is always happening within an index but I'm feeling that this
_analyze
API is giving inconsistent results. May be we should fix it?cc @jpountz
The text was updated successfully, but these errors were encountered: