Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checks in term and terms queries that input terms are not too long #99818

Merged
merged 6 commits into from Sep 25, 2023

Conversation

romseygeek
Copy link
Contributor

Lucene indexes do not allow terms of greater than 32k bytes long. Any queries that
contain terms that exceed this length will by definition not match anything, and can
cause cluster instability by consuming large amounts of heap. They are also generally
always a user error (for example, a termsquery that concatenates all its inputs into
a single string rather than splitting them into json arrays).

This commit adds some checking to Term and Terms query builders that will throw an
exception if any of their input terms are greater than the maximum allowed length by
the lucene IndexWriter.

Fixes #99802

@romseygeek romseygeek added >enhancement :Search/Search Search-related issues that do not fall into other categories v8.11.0 labels Sep 22, 2023
@romseygeek romseygeek self-assigned this Sep 22, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Sep 22, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @romseygeek, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @romseygeek for a quick fix, LGTM!

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

* @param input an input BytesRef
* @return a String prefix
*/
public static String safeStringPrefix(BytesRef input, int prefixLength) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method intended to be used elsewhere or only a helper method for checkIndexableLength(...)?
In the latter case, maybe make this a private helper method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's a good call, will make this private.

@martijnvg
Copy link
Member

Should this change be back ported to the 8.10 branch?

@romseygeek
Copy link
Contributor Author

Should this change be back ported to the 8.10 branch?

Yes, good idea.

@romseygeek romseygeek added v8.10.3 auto-backport-and-merge Automatically create backport pull requests and merge when ready labels Sep 25, 2023
@romseygeek romseygeek merged commit a5fa195 into elastic:main Sep 25, 2023
12 checks passed
@romseygeek romseygeek deleted the bug/long-term-queries branch September 25, 2023 09:32
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.10

romseygeek added a commit to romseygeek/elasticsearch that referenced this pull request Sep 25, 2023
elastic#99818)

Lucene indexes do not allow terms of greater than 32k bytes long. Any queries that
contain terms that exceed this length will by definition not match anything, and can
cause cluster instability by consuming large amounts of heap. They are also generally
always a user error (for example, a termsquery that concatenates all its inputs into
a single string rather than splitting them into json arrays).

This commit adds some checking to Term and Terms query builders that will throw an
exception if any of their input terms are greater than the maximum allowed length by
the lucene IndexWriter.

Fixes elastic#99802
elasticsearchmachine pushed a commit that referenced this pull request Sep 25, 2023
#99818) (#99863)

Lucene indexes do not allow terms of greater than 32k bytes long. Any queries that
contain terms that exceed this length will by definition not match anything, and can
cause cluster instability by consuming large amounts of heap. They are also generally
always a user error (for example, a termsquery that concatenates all its inputs into
a single string rather than splitting them into json arrays).

This commit adds some checking to Term and Terms query builders that will throw an
exception if any of their input terms are greater than the maximum allowed length by
the lucene IndexWriter.

Fixes #99802
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport-and-merge Automatically create backport pull requests and merge when ready >enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.10.3 v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Limit term query values to IndexWriter.MAX_TERM_LENGTH
4 participants