Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address concurrency issue in top hits aggregation #106990

Merged
merged 7 commits into from Apr 4, 2024

Conversation

javanna
Copy link
Member

@javanna javanna commented Apr 2, 2024

Top hits aggregation runs the fetch phase concurrently when the query phase is executed across multiple slices. This is problematic as the fetch phase does not support concurrent execution yet.

The core of the issue is that the search execution context is shared across slices, which call setLookupProviders against it concurrently, setting each time different instances of preloaded source and field lookup providers. This makes us cross streams between slices, and hit lucene assertions that ensure that stored fields loaded from a certain thread are not read from a different thread.

We have not hit this before because the problem revolves around SearchLookup which is used by runtime fields. TopHitsIT is the main test we have for top hits agg, but it uses a mock script engine which bypasses painless and SearchLookup.

@elasticsearchmachine
Copy link
Collaborator

Hi @javanna, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 2, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@salvatore-campagna
Copy link
Contributor

LGTM, thank you. I will close PR #106910 as a result of having this one.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Would it be able to add a test to TopHitsAggregatorTests that would otherwise fail? My understanding is that if stored fields or doc values fields are fetched via the top hits agg we could run into the concurrency issue that this PR tries to fix?

@javanna
Copy link
Member Author

javanna commented Apr 2, 2024

@martijnvg I already updated TopHitsIT to fail. It needed the additional subfetch phases added as a plugin, that relies on retrieving stored fields from SearchLookup. I am not sure how to repro the issue from TopHitsAggregatorTests because it revolves around sharing SearchExecutionContext in the fetch phase across slices.

@javanna
Copy link
Member Author

javanna commented Apr 4, 2024

Thanks all for the reviews and help!

@javanna javanna merged commit d6582cf into elastic:main Apr 4, 2024
14 checks passed
@javanna javanna deleted the fix/top_hits_fetch_phase_lookup branch April 4, 2024 10:42
javanna added a commit to javanna/elasticsearch that referenced this pull request Apr 4, 2024
Top hits aggregation runs the fetch phase concurrently when the query phase is executed across multiple slices. This is problematic as the fetch phase does not support concurrent execution yet.

The core of the issue is that the search execution context is shared across slices, which call setLookupProviders against it concurrently, setting each time different instances of preloaded source and field lookup providers. This makes us cross streams between slices, and hit lucene assertions that ensure that stored fields loaded from a certain thread are not read from a different thread.

We have not hit this before because the problem revolves around SearchLookup which is used by runtime fields. TopHitsIT is the main test we have for top hits agg, but it uses a mock script engine which bypasses painless and SearchLookup.
javanna added a commit that referenced this pull request Apr 4, 2024
Top hits aggregation runs the fetch phase concurrently when the query phase is executed across multiple slices. This is problematic as the fetch phase does not support concurrent execution yet.

The core of the issue is that the search execution context is shared across slices, which call setLookupProviders against it concurrently, setting each time different instances of preloaded source and field lookup providers. This makes us cross streams between slices, and hit lucene assertions that ensure that stored fields loaded from a certain thread are not read from a different thread.

We have not hit this before because the problem revolves around SearchLookup which is used by runtime fields. TopHitsIT is the main test we have for top hits agg, but it uses a mock script engine which bypasses painless and SearchLookup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.13.2 v8.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants