Skip to content

Conversation

original-brownbear
Copy link
Contributor

The approach taken by ExpressionList becomes very expensive for large numbers of indices/datastreams. It implies that large lists of concrete names (as they are passed down from the transport layer via e.g. security) are copied at least twice during iteration.
Removing the intermediary list and inlining the logic brings down the latency of searches targetting many shards/indices at once and allows for subsequent optimizations.
The removed tests appear redundant as they tested an implementation detail of the IndexNameExpressionResolver which itself is well covered by its own tests.
Also, the inlined logic has been optimised slightly to avoid needlessly copying lists in some spots.

The approach taken by `ExpressionList` becomes very expensive for large
numbers of indices/datastreams. It implies that large lists of concrete
names (as they are passed down from the transport layer via e.g. security)
are copied at least twice during iteration.
Removing the intermediary list and inlining the logic brings down the latency of searches
targetting many shards/indices at once and allows for subsequent
optimizations.
The removed tests appear redundant as they tested an implementation
detail of the IndexNameExpressionResolver which itself is well covered
by its own tests.
@original-brownbear original-brownbear added >non-issue :Data Management/Indices APIs APIs to create and manage indices and templates labels Oct 24, 2024
@original-brownbear original-brownbear marked this pull request as ready for review October 24, 2024 01:13
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine elasticsearchmachine added Team:Data Management Meta label for data/management team v9.0.0 labels Oct 24, 2024
return ExplicitResourceNameFilter.filterUnavailable(
context,
DateMathExpressionResolver.resolve(context, List.of(expressions))
DateMathExpressionResolver.resolve(context, Arrays.asList(expressions))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty big deal in some cases, List.of will copy the expressions list which is quite heavy once we get to very long lists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this, I think we moved from a defensive copy of the expression array into a mutable view. Are there any concerns about later code using .set(…) on this list and potentially mutating the expressions array? Is that a danger we're willing to accept? (For what it's worth, I don't see any instances of .set(…) in the code here that is mutating the expressions list, so we may just want to document not to modify it for the future).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could Collections.unmodifiableList(...) the list from Arrays.asList if you wanted. It's an allocation and adds some overhead to the list operations, but I don't think it's nearly as much work as copying the entire list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking about dropping the Arrays wrapping in a follow-up :P that would make this perform yet a little nicer actually :D

AtomicBoolean emptyWildcardExpansion = context.getOptions().allowNoIndices() ? null : new AtomicBoolean();
for (int i = firstWildcardIndex; i < expressions.size(); i++) {
String expression = expressions.get(i);
boolean isExclusion = i > firstWildcardIndex && expression.charAt(0) == '-';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little lazy, could special case the first wildcard but it's even more code and not a measurable speedup given how complicated the loop body is, so I went for this. Either way, inlining the logic is about 3x faster than the existing logic from a quick experiment with the many-shards benchmark track.

if (expression.isExclusion()) {
matchingOpenClosedNames.forEachOrdered(result::remove);
if (isExclusion) {
matchingOpenClosedNames.forEach(result::remove);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it matters much, but no need to do these removals/additions in order.

return true;
}

private static void validateAliasOrIndex(ExpressionList.Expression expression) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No point in having a single use method like that not inside the loop body, it's just harder on the compiler a well as the reader of the code.

* Used to iterate expression lists and work out which expression item is a wildcard or an exclusion.
*/
public static final class ExpressionList implements Iterable<ExpressionList.Expression> {
private final List<Expression> expressionsList;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for this intermediary list, inlining the logic and doing the checks on the fly is at least twice as fast as this approach and O(1) instead of O(n) in heap use.

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I left one comment, but it's only about mentioning safety in a comment somewhere.

return ExplicitResourceNameFilter.filterUnavailable(
context,
DateMathExpressionResolver.resolve(context, List.of(expressions))
DateMathExpressionResolver.resolve(context, Arrays.asList(expressions))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this, I think we moved from a defensive copy of the expression array into a mutable view. Are there any concerns about later code using .set(…) on this list and potentially mutating the expressions array? Is that a danger we're willing to accept? (For what it's worth, I don't see any instances of .set(…) in the code here that is mutating the expressions list, so we may just want to document not to modify it for the future).

@original-brownbear original-brownbear added v8.17.0 auto-backport Automatically create backport pull requests when merged labels Oct 24, 2024
@original-brownbear
Copy link
Contributor Author

Thanks everyone!

@original-brownbear original-brownbear merged commit d5265be into elastic:main Oct 24, 2024
16 checks passed
@original-brownbear original-brownbear deleted the drop-expression-list branch October 24, 2024 21:17
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Oct 24, 2024
…gic (elastic#115487)

The approach taken by `ExpressionList` becomes very expensive for large
numbers of indices/datastreams. It implies that large lists of concrete
names (as they are passed down from the transport layer via e.g. security)
are copied at least twice during iteration.
Removing the intermediary list and inlining the logic brings down the latency of searches
targetting many shards/indices at once and allows for subsequent
optimizations.
The removed tests appear redundant as they tested an implementation
detail of the IndexNameExpressionResolver which itself is well covered
by its own tests.
elasticsearchmachine pushed a commit that referenced this pull request Oct 24, 2024
…gic (#115487) (#115602)

The approach taken by `ExpressionList` becomes very expensive for large
numbers of indices/datastreams. It implies that large lists of concrete
names (as they are passed down from the transport layer via e.g. security)
are copied at least twice during iteration.
Removing the intermediary list and inlining the logic brings down the latency of searches
targetting many shards/indices at once and allows for subsequent
optimizations.
The removed tests appear redundant as they tested an implementation
detail of the IndexNameExpressionResolver which itself is well covered
by its own tests.
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Oct 25, 2024
…gic (elastic#115487)

The approach taken by `ExpressionList` becomes very expensive for large
numbers of indices/datastreams. It implies that large lists of concrete
names (as they are passed down from the transport layer via e.g. security)
are copied at least twice during iteration.
Removing the intermediary list and inlining the logic brings down the latency of searches
targetting many shards/indices at once and allows for subsequent
optimizations.
The removed tests appear redundant as they tested an implementation
detail of the IndexNameExpressionResolver which itself is well covered
by its own tests.
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Nov 4, 2024
…gic (elastic#115487)

The approach taken by `ExpressionList` becomes very expensive for large
numbers of indices/datastreams. It implies that large lists of concrete
names (as they are passed down from the transport layer via e.g. security)
are copied at least twice during iteration.
Removing the intermediary list and inlining the logic brings down the latency of searches
targetting many shards/indices at once and allows for subsequent
optimizations.
The removed tests appear redundant as they tested an implementation
detail of the IndexNameExpressionResolver which itself is well covered
by its own tests.
@original-brownbear original-brownbear restored the drop-expression-list branch November 30, 2024 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Data Management/Indices APIs APIs to create and manage indices and templates >non-issue Team:Data Management Meta label for data/management team v8.17.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants