Shrink indices without needing shards to be co-located #63519

lanerjo · 2020-10-09T13:03:25Z

Currently the shrink index operation requires 1 copy of every shard to reside on the same node.

This may be not possible for some indices.
Example:
Index primary store size = 5Tb
Index has setting total_shards_per_node: 2 (To avoid hotspots being created and reduce cpu usage during queries)
Rollover is not feasible, as too many indices/shards would be created, causing cluster instability
Number of primary shards = 100 (required to keep up with ingestion rates)
Largest Disk = 3.5Tb

Suggestions for improvement:
Distribute shrink operation across multiple nodes
Do not require all shards to be on same node
Do not move more shards than required.

Essentially the shrink could be performed on 2,3,4... shards on a per node basis. Thereby, distributing the workload across multiple nodes, reducing load from a shrink operation. Added benefit of performing the shrink operation across multiple nodes, fewer shards would need to be relocated.

Bonus improvements:
Ability to specify nodes as shrink only, so that index routing could be ignored for the shrink operation. This node type would not store data any time other than during the shrink operation. This would limit the impact of shrink load to ingestion and queries.

elasticmachine · 2020-10-13T09:34:14Z

Pinging @elastic/es-distributed (:Distributed/CRUD)

DaveCTurner · 2020-10-21T13:47:41Z

We discussed this as a team today and we can see some value in supporting shrinking an index from shards that are not colocated.

We considered a "remote shrink" operation, implemented as a new recovery source, which would copy segments directly from remote nodes into the new shards. This copy process could use hard-links if the source shard were on the same node (and filesystem) but would in general consume up to 2x the disk space; however today's full process is to relocate, then shrink, then relocate again, which itself consumes a good deal of additional disk space and network bandwidth. It'd also be a good deal simpler to orchestrate the shrinking process if it could be done in fewer steps. There is no need for the source shard copy to be a primary, so we could try and select source shard copies from the same AZ to further reduce cross-zone network costs.

There's some subtleties. For instance in today's shrink we check up-front that we will not exceed the maximum doc count in any of the shrunken shards, but this check would need to be deferred until later and would need to be robust against retrying the shrink from a different source shard copy.

That said, we are not intending to work on this project in the foreseeable future. We'll leave this issue open to invite other ideas or indications of support.

JohnLyman · 2021-01-11T18:47:55Z

That said, we are not intending to work on this project in the foreseeable future. We'll leave this issue open to invite other ideas or indications of support.

I'd like to add my support. My issue is that without this there are no great solutions to avoiding node hot spots. The recommendation is to use index.routing.allocation.total_shards_per_node, but that doesn't play well with shrink (especially when automated with ILM) due to shrink requiring one copy of every shard on the same node.

The current situation means there are several best practices in direct conflict - using multiple shards per index for ingest performance, using shrink/force merge for search performance, and using total_shards_per_node to avoid hot spots.

aliciascott · 2021-01-14T18:51:04Z

Hey all not sure how this is not a bug or won't have any work in the foreseeable future. We created a Support Known Issue for support as this is coming up a lot, can we perhaps re-prioritize ? Thanks!

cc @leehinman

leehinman · 2021-01-14T18:54:26Z

@dakrone

percygrunwald · 2021-12-21T00:14:00Z

We have just noticed the same issue in a number of our deployments. We set total_shards_per_node to 2 in order to overcome hotspots in ingest, but as mentioned, this doesn't play well with the part of the shrink operation that requires all the shards to be collocated on the same node. In an ideal world, I think we would remove the total_shards_per_node setting after the index is rolled over by ILM. Our main concern with hotspots is to do with indices that are actively being written to, but once the index is ready to be shrunk, we'd happily have more than 2 shards on a given node since it's no longer actively being written to.

Is there a way to change arbitrary settings of an index when transitioning ILM steps? That would solve the problem, since you could set the total_shards_per_node to null after transition to the warm phase. Either that, or the warm phase could ignore the total_shards_per_node setting until it has finished the shrink operation.

We're likely going to need to write some automation that detects indices in the warm phase that still have the total_shards_per_node setting set to 2 and remove it so the shrink operation can proceed.

dakrone · 2021-12-21T21:22:39Z

@percygrunwald for what it is worth, we do this detection and un-setting automatically in 8.0+: #76732

lanerjo added >enhancement needs:triage Requires assignment of a team area label labels Oct 9, 2020

danielmitterdorfer added :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed needs:triage Requires assignment of a team area label labels Oct 13, 2020

elasticmachine added the Team:Distributed Meta label for distributed team label Oct 13, 2020

henningandersen added the team-discuss label Oct 21, 2020

DaveCTurner changed the title ~~Improve the Shrink Index Operations~~ Shrink indices without needing shards to be co-located Oct 21, 2020

DaveCTurner added feedback_needed :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. and removed team-discuss :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. labels Oct 21, 2020

dakrone mentioned this issue Dec 9, 2020

More resilient Shrink Action in ILM to node dropping from cluster. #61174

Closed

dakrone mentioned this issue Jan 6, 2021

Allow adjustment of index resource constraints in ILM phase transitions #44070

Closed

JohnLyman mentioned this issue Jan 25, 2021

ILM shrink causes cluster to turn red #67957

Open

DaveCTurner mentioned this issue Mar 17, 2021

ILM rollover within the same availability zone #62194

Closed

danielmitterdorfer removed the feedback_needed label Mar 18, 2021

dakrone mentioned this issue May 27, 2021

Shrink an index from a snapshot #73500

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shrink indices without needing shards to be co-located #63519

Shrink indices without needing shards to be co-located #63519

lanerjo commented Oct 9, 2020 •

edited

Loading

elasticmachine commented Oct 13, 2020

DaveCTurner commented Oct 21, 2020 •

edited

Loading

JohnLyman commented Jan 11, 2021

aliciascott commented Jan 14, 2021 •

edited by jasontedor

Loading

leehinman commented Jan 14, 2021

percygrunwald commented Dec 21, 2021

dakrone commented Dec 21, 2021

Shrink indices without needing shards to be co-located #63519

Shrink indices without needing shards to be co-located #63519

Comments

lanerjo commented Oct 9, 2020 • edited Loading

elasticmachine commented Oct 13, 2020

DaveCTurner commented Oct 21, 2020 • edited Loading

JohnLyman commented Jan 11, 2021

aliciascott commented Jan 14, 2021 • edited by jasontedor Loading

leehinman commented Jan 14, 2021

percygrunwald commented Dec 21, 2021

dakrone commented Dec 21, 2021

lanerjo commented Oct 9, 2020 •

edited

Loading

DaveCTurner commented Oct 21, 2020 •

edited

Loading

aliciascott commented Jan 14, 2021 •

edited by jasontedor

Loading