Replace IndexNameExpressionResolver.ExpressionList with imperative logic #115487

original-brownbear · 2024-10-24T01:13:04Z

The approach taken by ExpressionList becomes very expensive for large numbers of indices/datastreams. It implies that large lists of concrete names (as they are passed down from the transport layer via e.g. security) are copied at least twice during iteration.
Removing the intermediary list and inlining the logic brings down the latency of searches targetting many shards/indices at once and allows for subsequent optimizations.
The removed tests appear redundant as they tested an implementation detail of the IndexNameExpressionResolver which itself is well covered by its own tests.
Also, the inlined logic has been optimised slightly to avoid needlessly copying lists in some spots.

The approach taken by `ExpressionList` becomes very expensive for large numbers of indices/datastreams. It implies that large lists of concrete names (as they are passed down from the transport layer via e.g. security) are copied at least twice during iteration. Removing the intermediary list and inlining the logic brings down the latency of searches targetting many shards/indices at once and allows for subsequent optimizations. The removed tests appear redundant as they tested an implementation detail of the IndexNameExpressionResolver which itself is well covered by its own tests.

elasticsearchmachine · 2024-10-24T01:13:57Z

Pinging @elastic/es-data-management (Team:Data Management)

original-brownbear · 2024-10-24T01:14:34Z

server/src/main/java/org/elasticsearch/cluster/metadata/IndexNameExpressionResolver.java

                return ExplicitResourceNameFilter.filterUnavailable(
                    context,
-                    DateMathExpressionResolver.resolve(context, List.of(expressions))
+                    DateMathExpressionResolver.resolve(context, Arrays.asList(expressions))


This is a pretty big deal in some cases, List.of will copy the expressions list which is quite heavy once we get to very long lists.

With this, I think we moved from a defensive copy of the expression array into a mutable view. Are there any concerns about later code using .set(…) on this list and potentially mutating the expressions array? Is that a danger we're willing to accept? (For what it's worth, I don't see any instances of .set(…) in the code here that is mutating the expressions list, so we may just want to document not to modify it for the future).

You could Collections.unmodifiableList(...) the list from Arrays.asList if you wanted. It's an allocation and adds some overhead to the list operations, but I don't think it's nearly as much work as copying the entire list.

I was actually thinking about dropping the Arrays wrapping in a follow-up :P that would make this perform yet a little nicer actually :D

original-brownbear · 2024-10-24T01:16:30Z

server/src/main/java/org/elasticsearch/cluster/metadata/IndexNameExpressionResolver.java

+            AtomicBoolean emptyWildcardExpansion = context.getOptions().allowNoIndices() ? null : new AtomicBoolean();
+            for (int i = firstWildcardIndex; i < expressions.size(); i++) {
+                String expression = expressions.get(i);
+                boolean isExclusion = i > firstWildcardIndex && expression.charAt(0) == '-';


This is a little lazy, could special case the first wildcard but it's even more code and not a measurable speedup given how complicated the loop body is, so I went for this. Either way, inlining the logic is about 3x faster than the existing logic from a quick experiment with the many-shards benchmark track.

original-brownbear · 2024-10-24T01:17:01Z

server/src/main/java/org/elasticsearch/cluster/metadata/IndexNameExpressionResolver.java

-                    if (expression.isExclusion()) {
-                        matchingOpenClosedNames.forEachOrdered(result::remove);
+                    if (isExclusion) {
+                        matchingOpenClosedNames.forEach(result::remove);


Not sure it matters much, but no need to do these removals/additions in order.

original-brownbear · 2024-10-24T01:18:01Z

server/src/main/java/org/elasticsearch/cluster/metadata/IndexNameExpressionResolver.java

            return true;
        }

-        private static void validateAliasOrIndex(ExpressionList.Expression expression) {


No point in having a single use method like that not inside the loop body, it's just harder on the compiler a well as the reader of the code.

original-brownbear · 2024-10-24T01:20:24Z

server/src/main/java/org/elasticsearch/cluster/metadata/IndexNameExpressionResolver.java

-     * Used to iterate expression lists and work out which expression item is a wildcard or an exclusion.
-     */
-    public static final class ExpressionList implements Iterable<ExpressionList.Expression> {
-        private final List<Expression> expressionsList;


There is no need for this intermediary list, inlining the logic and doing the checks on the fly is at least twice as fast as this approach and O(1) instead of O(n) in heap use.

dakrone

LGTM, I left one comment, but it's only about mentioning safety in a comment somewhere.

dakrone · 2024-10-24T17:41:26Z

server/src/main/java/org/elasticsearch/cluster/metadata/IndexNameExpressionResolver.java

                return ExplicitResourceNameFilter.filterUnavailable(
                    context,
-                    DateMathExpressionResolver.resolve(context, List.of(expressions))
+                    DateMathExpressionResolver.resolve(context, Arrays.asList(expressions))


With this, I think we moved from a defensive copy of the expression array into a mutable view. Are there any concerns about later code using .set(…) on this list and potentially mutating the expressions array? Is that a danger we're willing to accept? (For what it's worth, I don't see any instances of .set(…) in the code here that is mutating the expressions list, so we may just want to document not to modify it for the future).

original-brownbear · 2024-10-24T21:16:59Z

Thanks everyone!

elasticsearchmachine · 2024-10-24T21:18:25Z

💚 Backport successful

Status	Branch	Result
✅	8.x

…gic (elastic#115487) The approach taken by `ExpressionList` becomes very expensive for large numbers of indices/datastreams. It implies that large lists of concrete names (as they are passed down from the transport layer via e.g. security) are copied at least twice during iteration. Removing the intermediary list and inlining the logic brings down the latency of searches targetting many shards/indices at once and allows for subsequent optimizations. The removed tests appear redundant as they tested an implementation detail of the IndexNameExpressionResolver which itself is well covered by its own tests.

…gic (#115487) (#115602) The approach taken by `ExpressionList` becomes very expensive for large numbers of indices/datastreams. It implies that large lists of concrete names (as they are passed down from the transport layer via e.g. security) are copied at least twice during iteration. Removing the intermediary list and inlining the logic brings down the latency of searches targetting many shards/indices at once and allows for subsequent optimizations. The removed tests appear redundant as they tested an implementation detail of the IndexNameExpressionResolver which itself is well covered by its own tests.

…gic (elastic#115487) The approach taken by `ExpressionList` becomes very expensive for large numbers of indices/datastreams. It implies that large lists of concrete names (as they are passed down from the transport layer via e.g. security) are copied at least twice during iteration. Removing the intermediary list and inlining the logic brings down the latency of searches targetting many shards/indices at once and allows for subsequent optimizations. The removed tests appear redundant as they tested an implementation detail of the IndexNameExpressionResolver which itself is well covered by its own tests.

original-brownbear added >non-issue :Data Management/Indices APIs APIs to create and manage indices and templates labels Oct 24, 2024

original-brownbear marked this pull request as ready for review October 24, 2024 01:13

original-brownbear requested review from dakrone and gmarouli October 24, 2024 01:13

elasticsearchmachine added Team:Data Management Meta label for data/management team v9.0.0 labels Oct 24, 2024

original-brownbear commented Oct 24, 2024

View reviewed changes

original-brownbear requested a review from albertzaharovits October 24, 2024 01:20

dakrone approved these changes Oct 24, 2024

View reviewed changes

original-brownbear added 2 commits October 24, 2024 21:39

Merge remote-tracking branch 'elastic/main' into drop-expression-list

3c01201

doc

6466118

original-brownbear added v8.17.0 auto-backport Automatically create backport pull requests when merged labels Oct 24, 2024

original-brownbear merged commit d5265be into elastic:main Oct 24, 2024
16 checks passed

original-brownbear deleted the drop-expression-list branch October 24, 2024 21:17

original-brownbear mentioned this pull request Oct 24, 2024

[8.x] Replace IndexNameExpressionResolver.ExpressionList with imperative logic (#115487) #115602

Merged

original-brownbear restored the drop-expression-list branch November 30, 2024 10:08

Replace IndexNameExpressionResolver.ExpressionList with imperative logic #115487

Replace IndexNameExpressionResolver.ExpressionList with imperative logic #115487

Uh oh!

Conversation

original-brownbear commented Oct 24, 2024

Uh oh!

elasticsearchmachine commented Oct 24, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dakrone left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Oct 24, 2024

Uh oh!

Uh oh!

elasticsearchmachine commented Oct 24, 2024

💚 Backport successful

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants