Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shingle filters that produce shingles of different size can create gigantic queries #23918

Closed
jimczi opened this issue Apr 5, 2017 · 2 comments
Labels
blocker >bug :Search/Search Search-related issues that do not fall into other categories v5.3.1

Comments

@jimczi
Copy link
Contributor

jimczi commented Apr 5, 2017

Shingle filters creates a graph token stream that the query parser is now able to consume.
Though when shingles of different size are produced the number of paths in the graph can explode.
This is also the case when output_unigram is set to true.
In 5.3 all paths are generated before building the query so a node can OOM easily on a single big input query. In 5.4 and beyond we detect the explosion earlier but we fail the entire request.
Instead we should be able to detect the problematic token filters and disable the graph analysis for these fields.

@jimczi jimczi added :Search/Search Search-related issues that do not fall into other categories blocker >bug v5.3.1 labels Apr 5, 2017
@clintongormley
Copy link

we should also update the docs to explain better config

i'd consider removing the min/max shingles settings in favour of a single size, and removing output_unigram too?

jimczi added a commit that referenced this issue Apr 6, 2017
…ucing tokens of different size (#23920)

This change disables graph analysis of token streams containing a shingle or a cjk filters that produce shingle or ngram of different size. The graph analysis is disabled for phrase and boolean queries.

Closes #23918
jimczi added a commit that referenced this issue Apr 6, 2017
…ucing tokens of different size (#23920)

This change disables graph analysis of token streams containing a shingle or a cjk filters that produce shingle or ngram of different size. The graph analysis is disabled for phrase and boolean queries.

Closes #23918
jimczi added a commit that referenced this issue Apr 6, 2017
…ucing tokens of different size (#23920)

This change disables graph analysis of token streams containing a shingle or a cjk filters that produce shingle or ngram of different size. The graph analysis is disabled for phrase and boolean queries.

Closes #23918
jimczi added a commit that referenced this issue Apr 6, 2017
…ucing tokens of different size (#23920)

This change disables graph analysis of token streams containing a shingle or a cjk filters that produce shingle or ngram of different size. The graph analysis is disabled for phrase and boolean queries.

Closes #23918
@jeantil
Copy link

jeantil commented Apr 10, 2017

@clintongormley 👍 for a doc on what a better config would be (if possible without altering the outcome of a request)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker >bug :Search/Search Search-related issues that do not fall into other categories v5.3.1
Projects
None yet
Development

No branches or pull requests

3 participants