-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Replace IndexNameExpressionResolver.ExpressionList with imperative logic #115487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace IndexNameExpressionResolver.ExpressionList with imperative logic #115487
Conversation
The approach taken by `ExpressionList` becomes very expensive for large numbers of indices/datastreams. It implies that large lists of concrete names (as they are passed down from the transport layer via e.g. security) are copied at least twice during iteration. Removing the intermediary list and inlining the logic brings down the latency of searches targetting many shards/indices at once and allows for subsequent optimizations. The removed tests appear redundant as they tested an implementation detail of the IndexNameExpressionResolver which itself is well covered by its own tests.
Pinging @elastic/es-data-management (Team:Data Management) |
return ExplicitResourceNameFilter.filterUnavailable( | ||
context, | ||
DateMathExpressionResolver.resolve(context, List.of(expressions)) | ||
DateMathExpressionResolver.resolve(context, Arrays.asList(expressions)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty big deal in some cases, List.of
will copy the expressions list which is quite heavy once we get to very long lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this, I think we moved from a defensive copy of the expression array into a mutable view. Are there any concerns about later code using .set(…)
on this list and potentially mutating the expressions array? Is that a danger we're willing to accept? (For what it's worth, I don't see any instances of .set(…)
in the code here that is mutating the expressions list, so we may just want to document not to modify it for the future).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could Collections.unmodifiableList(...)
the list from Arrays.asList
if you wanted. It's an allocation and adds some overhead to the list operations, but I don't think it's nearly as much work as copying the entire list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking about dropping the Arrays
wrapping in a follow-up :P that would make this perform yet a little nicer actually :D
AtomicBoolean emptyWildcardExpansion = context.getOptions().allowNoIndices() ? null : new AtomicBoolean(); | ||
for (int i = firstWildcardIndex; i < expressions.size(); i++) { | ||
String expression = expressions.get(i); | ||
boolean isExclusion = i > firstWildcardIndex && expression.charAt(0) == '-'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little lazy, could special case the first wildcard but it's even more code and not a measurable speedup given how complicated the loop body is, so I went for this. Either way, inlining the logic is about 3x faster than the existing logic from a quick experiment with the many-shards benchmark track.
if (expression.isExclusion()) { | ||
matchingOpenClosedNames.forEachOrdered(result::remove); | ||
if (isExclusion) { | ||
matchingOpenClosedNames.forEach(result::remove); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it matters much, but no need to do these removals/additions in order.
return true; | ||
} | ||
|
||
private static void validateAliasOrIndex(ExpressionList.Expression expression) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No point in having a single use method like that not inside the loop body, it's just harder on the compiler a well as the reader of the code.
* Used to iterate expression lists and work out which expression item is a wildcard or an exclusion. | ||
*/ | ||
public static final class ExpressionList implements Iterable<ExpressionList.Expression> { | ||
private final List<Expression> expressionsList; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need for this intermediary list, inlining the logic and doing the checks on the fly is at least twice as fast as this approach and O(1) instead of O(n) in heap use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I left one comment, but it's only about mentioning safety in a comment somewhere.
return ExplicitResourceNameFilter.filterUnavailable( | ||
context, | ||
DateMathExpressionResolver.resolve(context, List.of(expressions)) | ||
DateMathExpressionResolver.resolve(context, Arrays.asList(expressions)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this, I think we moved from a defensive copy of the expression array into a mutable view. Are there any concerns about later code using .set(…)
on this list and potentially mutating the expressions array? Is that a danger we're willing to accept? (For what it's worth, I don't see any instances of .set(…)
in the code here that is mutating the expressions list, so we may just want to document not to modify it for the future).
Thanks everyone! |
💚 Backport successful
|
…gic (elastic#115487) The approach taken by `ExpressionList` becomes very expensive for large numbers of indices/datastreams. It implies that large lists of concrete names (as they are passed down from the transport layer via e.g. security) are copied at least twice during iteration. Removing the intermediary list and inlining the logic brings down the latency of searches targetting many shards/indices at once and allows for subsequent optimizations. The removed tests appear redundant as they tested an implementation detail of the IndexNameExpressionResolver which itself is well covered by its own tests.
…gic (#115487) (#115602) The approach taken by `ExpressionList` becomes very expensive for large numbers of indices/datastreams. It implies that large lists of concrete names (as they are passed down from the transport layer via e.g. security) are copied at least twice during iteration. Removing the intermediary list and inlining the logic brings down the latency of searches targetting many shards/indices at once and allows for subsequent optimizations. The removed tests appear redundant as they tested an implementation detail of the IndexNameExpressionResolver which itself is well covered by its own tests.
…gic (elastic#115487) The approach taken by `ExpressionList` becomes very expensive for large numbers of indices/datastreams. It implies that large lists of concrete names (as they are passed down from the transport layer via e.g. security) are copied at least twice during iteration. Removing the intermediary list and inlining the logic brings down the latency of searches targetting many shards/indices at once and allows for subsequent optimizations. The removed tests appear redundant as they tested an implementation detail of the IndexNameExpressionResolver which itself is well covered by its own tests.
…gic (elastic#115487) The approach taken by `ExpressionList` becomes very expensive for large numbers of indices/datastreams. It implies that large lists of concrete names (as they are passed down from the transport layer via e.g. security) are copied at least twice during iteration. Removing the intermediary list and inlining the logic brings down the latency of searches targetting many shards/indices at once and allows for subsequent optimizations. The removed tests appear redundant as they tested an implementation detail of the IndexNameExpressionResolver which itself is well covered by its own tests.
The approach taken by
ExpressionList
becomes very expensive for large numbers of indices/datastreams. It implies that large lists of concrete names (as they are passed down from the transport layer via e.g. security) are copied at least twice during iteration.Removing the intermediary list and inlining the logic brings down the latency of searches targetting many shards/indices at once and allows for subsequent optimizations.
The removed tests appear redundant as they tested an implementation detail of the IndexNameExpressionResolver which itself is well covered by its own tests.
Also, the inlined logic has been optimised slightly to avoid needlessly copying lists in some spots.