Reuse previous indices lookup when possible #79004

martijnvg · 2021-10-12T16:03:17Z

In cases when indices, aliases and data streams aren't modified then
the indices lookup can be reused.

For example in:

The IndexMetadataUpdater#applyChanges(...) method builds a new metadata
instance, but only primary term or insync allocations may be updated.
No new indices, aliases or data streams are added, so re-building indices
lookup is not necessary.
MasterService#patchVersions

Additionally the logic that checks when indices lookup can be reused,
this logic also checks the hidden and system flags of indices/datastreams.

In clusters with many indices the cost of building indices lookup is
non-neglectable and should be avoided in this case.

Closes #78980
Partially addresses #77888

elasticmachine · 2021-10-12T16:03:20Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner · 2021-10-12T18:21:55Z

server/src/main/java/org/elasticsearch/cluster/metadata/Metadata.java

        }

-        public Builder(Metadata metadata) {
+        Builder(Metadata metadata, boolean reuseIndicesLookup) {


I wonder, could we have the Builder itself determine whether it has to recompute the lookup or not? We should be able to keep track of whether we've added or removed an index or changed any of its relevant settings (hidden/closed/aliases) and similarly for data streams.

Yes, I think that can work. By checking whether any of the following methods have been invoked: put(IMD.Builder), put(IMD, boolean), remove(...), removeAllIndices(), , indices(...), put(DataStream), dataStreams(....), put(aliasName, dataStream, isWriteDataStream, filter), removeDataStream(...), removeDataStreamAlias(...) then the build() method can determine whether the indices lookup can be reused.

This approach does seem to be working.

However another issue occurred to me, via the indicesLookup, IndexMetadata instances can be returned. This is ok now, but if we selectively reuse previous indicesLookup instances then the IndexMetadata returned by the indicesLookup can differ from the regular indices map (in Metadata) for the same index. Some tests are failing because of that.

Looking at the usages of IndexAbstraction (value in indicesLookup), most of the production usages just get the index name. So I think we should refactor IndexAbstraction's getIndices() and getWriteIndex methods to just return a string (instead of IndexMetadata). I think for the cases where IndexAbstraction is used to fetch other properties of IndexMetadata, Metadata should be used along side indicesLookup to fetch the required information.

Looking at the usages of IndexAbstraction (value in indicesLookup), most of the production usages just get the index name. So I think we should refactor IndexAbstraction's getIndices() and getWriteIndex methods to just return a string (instead of IndexMetadata). I think for the cases where IndexAbstraction is used to fetch other properties of IndexMetadata, Metadata should be used along side indicesLookup to fetch the required information.

++ I think that would be a clever move. That would allow us to stay consistent here more easily :)

By checking whether any of the following methods have been invoked:

I wonder, if we do the above, maybe we can just add a copy constructor to Metadata like Metadata.withShardRoutingUpdates(Map<>)? Then we don't even have to bother with the builder and checking for changes manually in any way? And it's more obviously safe and correct as well isn't it?

++ I think that would be a clever move. That would allow us to stay consistent here more easily :)

I will work on this separately. I've opened a draft PR #79080 and I'm trying to get everything working.

I wonder, if we do the above, maybe we can just add a copy constructor to Metadata like Metadata.withShardRoutingUpdates(Map<>)?

The current approach does allow for more cases for the indices lookup to be reused (e.g. when changing most index settings). But this approach does feel safer. Having builder methods that allow the reuse of indices lookup for specific cases, is maybe the way to go?

Having builder methods that allow the reuse of indices lookup for specific cases, is maybe the way to go?

Fair point indeed :) You're right that this might indirectly help a bunch of cases ... nevermind me :)

Most users of an `IndexAbstraction` instance doesn't need to use the `IndexMetadata` instances that `getIndices()` and `getWriteIndex()` returns. Cluster state variables/parameters can be used in places that access to `IndexMetadata` is required. By changing the `getIndices()` and `getWriteIndex()` methods to return `Index` instance, the indices lookup can be reused across different cluster states. This should be possible in cases that don't change an index hidden status or open and closes indices or when adding / removing aliases, data streams or indices. This change should allow for elastic#79004

Most users of an `IndexAbstraction` instance doesn't need to use the `IndexMetadata` instances that `getIndices()` and `getWriteIndex()` returns. Cluster state variables/parameters can be used in places that access to `IndexMetadata` is required. By changing the `getIndices()` and `getWriteIndex()` methods to return `Index` instance, the indices lookup can be reused across different cluster states. This should be possible in cases that don't change an index hidden status or open and closes indices or when adding / removing aliases, data streams or indices. This change should allow for #79004

Backport of elastic#79080 to 7.x branch. Most users of an `IndexAbstraction` instance doesn't need to use the `IndexMetadata` instances that `getIndices()` and `getWriteIndex()` returns. Cluster state variables/parameters can be used in places that access to `IndexMetadata` is required. By changing the `getIndices()` and `getWriteIndex()` methods to return `Index` instance, the indices lookup can be reused across different cluster states. This should be possible in cases that don't change an index hidden status or open and closes indices or when adding / removing aliases, data streams or indices. This change should allow for elastic#79004

The IndexMetadataUpdater#applyChanges(...) method builds a new metadata instance, but only primary term or insync allocations may be updated. No new indices, aliases or data streams are added, so re-building indices lookup is not necessary. In clusters with many indices the cost of building indices lookup is non-neglectable and should be avoided in this case. Closes elastic#78980

martijnvg · 2021-10-15T14:45:10Z

Accidentally rebased master into this branch instead of merging...

Backport of #79080 to 7.x branch. Most users of an `IndexAbstraction` instance doesn't need to use the `IndexMetadata` instances that `getIndices()` and `getWriteIndex()` returns. Cluster state variables/parameters can be used in places that access to `IndexMetadata` is required. By changing the `getIndices()` and `getWriteIndex()` methods to return `Index` instance, the indices lookup can be reused across different cluster states. This should be possible in cases that don't change an index hidden status or open and closes indices or when adding / removing aliases, data streams or indices. This change should allow for #79004

…_lookup

original-brownbear

LGTM this seems safe to me and performed great during benchmarks.
=> All good from my end, thanks Martijn!

elasticsearchmachine · 2021-10-26T12:07:57Z

💔 Backport failed

Status	Branch	Result
❌	7.16	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 79004

Backporting elastic#79004 to 7.x branch. In cases when indices, aliases and data streams aren't modified then the indices lookup can be reused. For example in: * The IndexMetadataUpdater#applyChanges(...) method builds a new metadata instance, but only primary term or insync allocations may be updated. No new indices, aliases or data streams are added, so re-building indices lookup is not necessary. * MasterService#patchVersions Additionally the logic that checks when indices lookup can be reused, this logic also checks the hidden and system flags of indices/datastreams. In clusters with many indices the cost of building indices lookup is non-neglectable and should be avoided in this case. Closes elastic#78980 Partially addresses elastic#77888

Backporting #79004 to 7.16 branch. In cases when indices, aliases and data streams aren't modified then the indices lookup can be reused. For example in: * The IndexMetadataUpdater#applyChanges(...) method builds a new metadata instance, but only primary term or insync allocations may be updated. No new indices, aliases or data streams are added, so re-building indices lookup is not necessary. * MasterService#patchVersions Additionally the logic that checks when indices lookup can be reused, this logic also checks the hidden and system flags of indices/datastreams. In clusters with many indices the cost of building indices lookup is non-neglectable and should be avoided in this case. Closes #78980 Partially addresses #77888

* upstream/master: (209 commits) Enforce license expiration (elastic#79671) TSDB: Automatically add timestamp mapper (elastic#79136) [DOCS] `_id` is required for bulk API's `update` action (elastic#79774) EQL: Add optional fields and limit joining keys on non-null values only (elastic#79677) [DOCS] Document range enrich policy (elastic#79607) [DOCS] Fix typos in 8.0 security migration (elastic#79802) Allow listing older repositories (elastic#78244) [ML] track inference model feature usage per node (elastic#79752) Remove IncrementalClusterStateWriter & related code (elastic#79738) Reuse previous indices lookup when possible (elastic#79004) Reduce merging in PersistedClusterStateService (elastic#79793) SQL: Adjust JDBC docs to use milliseconds for timeouts (elastic#79628) Remove endpoint for freezing indices (elastic#78918) [ML] add timeout parameter for DELETE trained_models API (elastic#79739) [ML] wait for .ml-state-write alias to be readable (elastic#79731) [Docs] Update edgengram-tokenizer.asciidoc (elastic#79577) [Test][Transform] fix UpdateTransformActionRequestTests failure (elastic#79787) Limit CS Update Task Description Size (elastic#79443) Apply the reader wrapper on can_match source (elastic#78988) [DOCS] Adds new transform limitation item and a note to the tutorial (elastic#79479) ... # Conflicts: # server/src/main/java/org/elasticsearch/index/IndexMode.java # server/src/test/java/org/elasticsearch/index/TimeSeriesModeTests.java

In cases when indices, aliases and data streams aren't modified then the indices lookup can be reused. For example in: * The IndexMetadataUpdater#applyChanges(...) method builds a new metadata instance, but only primary term or insync allocations may be updated. No new indices, aliases or data streams are added, so re-building indices lookup is not necessary. * MasterService#patchVersions Additionally the logic that checks when indices lookup can be reused, this logic also checks the hidden and system flags of indices/datastreams. In clusters with many indices the cost of building indices lookup is non-neglectable and should be avoided in this case. Closes elastic#78980 Partially addresses to elastic#77888

martijnvg added >enhancement :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.16.0 labels Oct 12, 2021

martijnvg requested a review from original-brownbear October 12, 2021 16:03

elasticmachine added the Team:Distributed Meta label for distributed team label Oct 12, 2021

DaveCTurner reviewed Oct 12, 2021

View reviewed changes

martijnvg mentioned this pull request Oct 13, 2021

Refactor IndexAbstraction to not use IndexMetadata #79080

Merged

martijnvg mentioned this pull request Oct 15, 2021

Refactor IndexAbstraction to not use IndexMetadata #79246

Merged

martijnvg added 5 commits October 15, 2021 16:33

iter

dc92929

added test

accf90c

added equals() and hashcode() methods for assertions

a9084c5

fixed assertions

bbd4276

martijnvg force-pushed the sometimes_reuse_indices_lookup branch from cfb5627 to bbd4276 Compare October 15, 2021 14:44

martijnvg added 4 commits October 15, 2021 16:56

fixed compile error after updating branch

7e52414

removed unused imports

803429a

also check for system indices

9839e32

Merge remote-tracking branch 'es/master' into sometimes_reuse_indices…

6249ce6

…_lookup

original-brownbear mentioned this pull request Oct 24, 2021

MasterService#patchVersions is rather inefficient #77888

Closed

Merge remote-tracking branch 'es/master' into sometimes_reuse_indices…

6e5248d

…_lookup

original-brownbear approved these changes Oct 26, 2021

View reviewed changes

DaveCTurner approved these changes Oct 26, 2021

View reviewed changes

martijnvg added the auto-backport-and-merge Automatically create backport pull requests and merge when ready label Oct 26, 2021

martijnvg changed the title ~~Reuse previous indices lookup in IndexMetadataUpdater#applyChanges(...)~~ Reuse previous indices lookup when possible Oct 26, 2021

martijnvg merged commit bada1ba into elastic:master Oct 26, 2021

martijnvg added v7.16.1 and removed v7.16.0 labels Oct 26, 2021

martijnvg mentioned this pull request Oct 26, 2021

Reuse previous indices lookup when possible #79804

Merged

jakelandis added v8.0.0-beta1 and removed v8.0.0 labels Oct 27, 2021

danhermann added v7.16.0 and removed v7.16.1 labels Oct 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse previous indices lookup when possible #79004

Reuse previous indices lookup when possible #79004

martijnvg commented Oct 12, 2021 •

edited

Loading

elasticmachine commented Oct 12, 2021

DaveCTurner Oct 12, 2021

martijnvg Oct 13, 2021

martijnvg Oct 13, 2021

original-brownbear Oct 13, 2021

martijnvg Oct 13, 2021

original-brownbear Oct 13, 2021

martijnvg commented Oct 15, 2021

original-brownbear left a comment

elasticsearchmachine commented Oct 26, 2021

Reuse previous indices lookup when possible #79004

Reuse previous indices lookup when possible #79004

Conversation

martijnvg commented Oct 12, 2021 • edited Loading

elasticmachine commented Oct 12, 2021

DaveCTurner Oct 12, 2021

Choose a reason for hiding this comment

martijnvg Oct 13, 2021

Choose a reason for hiding this comment

martijnvg Oct 13, 2021

Choose a reason for hiding this comment

original-brownbear Oct 13, 2021

Choose a reason for hiding this comment

martijnvg Oct 13, 2021

Choose a reason for hiding this comment

original-brownbear Oct 13, 2021

Choose a reason for hiding this comment

martijnvg commented Oct 15, 2021

original-brownbear left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Oct 26, 2021

💔 Backport failed

martijnvg commented Oct 12, 2021 •

edited

Loading