Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastVectorHighlighter fails with StackOverflow on terms with large TermFrequency #3486

Closed
s1monw opened this Issue Aug 12, 2013 · 0 comments

Comments

Projects
None yet
1 participant
@s1monw
Copy link
Contributor

s1monw commented Aug 12, 2013

FVH deploys some recursive logic to extract terms from documents that need to highlighted. For documents that have terms with super large term frequency like a document that repeats a terms very very often this can produce some very large stacks when extracting the terms. Taken to an extreme this causes stack overflow errors when this grow beyond a term frequency >= 6000.
The ultimate solution is a iterative implementation of the extract logic but until then we should protect users from these massive term extractions which might be not very useful in the first place.

I will attach a possible fix and a test case that reproduces the problem in a bit.

@ghost ghost assigned s1monw Aug 12, 2013

s1monw added a commit to s1monw/elasticsearch that referenced this issue Aug 12, 2013

Limit the number of extracted token instance per query token.
FVH deploys some recursive logic to extract terms from documents
that need to highlighted. For documents that have terms with super
large term frequency like a document that repeats a terms very
very often this can produce some very large stacks when extracting
the terms. Taken to an extreme this causes stack overflow errors
when this grow beyond a term frequency >= 6000.

The ultimate solution is a iterative implementation of the extract
logic but until then we should protect users from these massive
term extractions which might be not very useful in the first place.

Closes elastic#3486

@s1monw s1monw closed this in 8a876ea Aug 12, 2013

s1monw added a commit that referenced this issue Aug 12, 2013

Limit the number of extracted token instance per query token.
FVH deploys some recursive logic to extract terms from documents
that need to highlighted. For documents that have terms with super
large term frequency like a document that repeats a terms very
very often this can produce some very large stacks when extracting
the terms. Taken to an extreme this causes stack overflow errors
when this grow beyond a term frequency >= 6000.

The ultimate solution is a iterative implementation of the extract
logic but until then we should protect users from these massive
term extractions which might be not very useful in the first place.

Closes #3486

s1monw added a commit that referenced this issue Aug 15, 2013

Limit the number of extracted token instance per query token.
FVH deploys some recursive logic to extract terms from documents
that need to highlighted. For documents that have terms with super
large term frequency like a document that repeats a terms very
very often this can produce some very large stacks when extracting
the terms. Taken to an extreme this causes stack overflow errors
when this grow beyond a term frequency >= 6000.

The ultimate solution is a iterative implementation of the extract
logic but until then we should protect users from these massive
term extractions which might be not very useful in the first place.

Closes #3486

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Limit the number of extracted token instance per query token.
FVH deploys some recursive logic to extract terms from documents
that need to highlighted. For documents that have terms with super
large term frequency like a document that repeats a terms very
very often this can produce some very large stacks when extracting
the terms. Taken to an extreme this causes stack overflow errors
when this grow beyond a term frequency >= 6000.

The ultimate solution is a iterative implementation of the extract
logic but until then we should protect users from these massive
term extractions which might be not very useful in the first place.

Closes elastic#3486
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.