Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shrink indices without needing shards to be co-located #63519

Open
lanerjo opened this issue Oct 9, 2020 · 7 comments
Open

Shrink indices without needing shards to be co-located #63519

lanerjo opened this issue Oct 9, 2020 · 7 comments
Labels
:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement Team:Distributed Meta label for distributed team

Comments

@lanerjo
Copy link

lanerjo commented Oct 9, 2020

Currently the shrink index operation requires 1 copy of every shard to reside on the same node.

This may be not possible for some indices.
Example:
Index primary store size = 5Tb
Index has setting total_shards_per_node: 2 (To avoid hotspots being created and reduce cpu usage during queries)
Rollover is not feasible, as too many indices/shards would be created, causing cluster instability
Number of primary shards = 100 (required to keep up with ingestion rates)
Largest Disk = 3.5Tb

Suggestions for improvement:
Distribute shrink operation across multiple nodes
Do not require all shards to be on same node
Do not move more shards than required.

Essentially the shrink could be performed on 2,3,4... shards on a per node basis. Thereby, distributing the workload across multiple nodes, reducing load from a shrink operation. Added benefit of performing the shrink operation across multiple nodes, fewer shards would need to be relocated.

Bonus improvements:
Ability to specify nodes as shrink only, so that index routing could be ignored for the shrink operation. This node type would not store data any time other than during the shrink operation. This would limit the impact of shrink load to ingestion and queries.

@lanerjo lanerjo added >enhancement needs:triage Requires assignment of a team area label labels Oct 9, 2020
@danielmitterdorfer danielmitterdorfer added :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed needs:triage Requires assignment of a team area label labels Oct 13, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/CRUD)

@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Oct 13, 2020
@DaveCTurner
Copy link
Contributor

DaveCTurner commented Oct 21, 2020

We discussed this as a team today and we can see some value in supporting shrinking an index from shards that are not colocated.

We considered a "remote shrink" operation, implemented as a new recovery source, which would copy segments directly from remote nodes into the new shards. This copy process could use hard-links if the source shard were on the same node (and filesystem) but would in general consume up to 2x the disk space; however today's full process is to relocate, then shrink, then relocate again, which itself consumes a good deal of additional disk space and network bandwidth. It'd also be a good deal simpler to orchestrate the shrinking process if it could be done in fewer steps. There is no need for the source shard copy to be a primary, so we could try and select source shard copies from the same AZ to further reduce cross-zone network costs.

There's some subtleties. For instance in today's shrink we check up-front that we will not exceed the maximum doc count in any of the shrunken shards, but this check would need to be deferred until later and would need to be robust against retrying the shrink from a different source shard copy.

That said, we are not intending to work on this project in the foreseeable future. We'll leave this issue open to invite other ideas or indications of support.

@DaveCTurner DaveCTurner changed the title Improve the Shrink Index Operations Shrink indices without needing shards to be co-located Oct 21, 2020
@DaveCTurner DaveCTurner added feedback_needed :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. and removed team-discuss :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. labels Oct 21, 2020
@JohnLyman
Copy link

That said, we are not intending to work on this project in the foreseeable future. We'll leave this issue open to invite other ideas or indications of support.

I'd like to add my support. My issue is that without this there are no great solutions to avoiding node hot spots. The recommendation is to use index.routing.allocation.total_shards_per_node, but that doesn't play well with shrink (especially when automated with ILM) due to shrink requiring one copy of every shard on the same node.

The current situation means there are several best practices in direct conflict - using multiple shards per index for ingest performance, using shrink/force merge for search performance, and using total_shards_per_node to avoid hot spots.

@aliciascott
Copy link

aliciascott commented Jan 14, 2021

Hey all not sure how this is not a bug or won't have any work in the foreseeable future. We created a Support Known Issue for support as this is coming up a lot, can we perhaps re-prioritize ? Thanks!

cc @leehinman

@leehinman
Copy link

@dakrone

@percygrunwald
Copy link

We have just noticed the same issue in a number of our deployments. We set total_shards_per_node to 2 in order to overcome hotspots in ingest, but as mentioned, this doesn't play well with the part of the shrink operation that requires all the shards to be collocated on the same node. In an ideal world, I think we would remove the total_shards_per_node setting after the index is rolled over by ILM. Our main concern with hotspots is to do with indices that are actively being written to, but once the index is ready to be shrunk, we'd happily have more than 2 shards on a given node since it's no longer actively being written to.

Is there a way to change arbitrary settings of an index when transitioning ILM steps? That would solve the problem, since you could set the total_shards_per_node to null after transition to the warm phase. Either that, or the warm phase could ignore the total_shards_per_node setting until it has finished the shrink operation.

We're likely going to need to write some automation that detects indices in the warm phase that still have the total_shards_per_node setting set to 2 and remove it so the shrink operation can proceed.

@dakrone
Copy link
Member

dakrone commented Dec 21, 2021

@percygrunwald for what it is worth, we do this detection and un-setting automatically in 8.0+: #76732

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement Team:Distributed Meta label for distributed team
Projects
None yet
Development

No branches or pull requests

10 participants