Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider only forcemerging primary shards #51455

Closed
n0othing opened this issue Jan 24, 2020 · 3 comments
Closed

Consider only forcemerging primary shards #51455

n0othing opened this issue Jan 24, 2020 · 3 comments
Labels
:Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. >enhancement

Comments

@n0othing
Copy link
Member

Describe the feature:

Today, running a _forcemerge?max_num_segments=1 against an index will result in both primaries and replicas asynchronously merging down to a single segment. This ultimately results in all shard copies containing a single segment that look different from one another (e.g the primary might have segment _c, while the replica might have segment _e).

It might be better to only forcemerge the primary shard --> then recover the replicas, resulting in identical segments for all shard copies.

Several benefits come to mind:

  • Having "different" looking segments amongst shard copies can result in excessive Snapshot repo disk usage (e.g replica gets promoted to primary and its single segment looks different so it gets backed up in full).
  • Should avoid the "bouncing results" problem.
  • Less load on nodes hosting replicas (assuming a recovery is cheaper than full blown merging)
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Features)

@dakrone dakrone added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Core/Features/Features labels Jan 24, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Distributed)

@dnhatn
Copy link
Member

dnhatn commented Jan 30, 2020

@n0othing Thanks for raising this. I don't think we should implement this. Even without force merge, segment files on the primary and replicas would be very different because of: (1) concurrent indexing, (2) different refresh schedule (depending on search requests and memory pressure), (3) background merges.

@dnhatn dnhatn closed this as completed Jan 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. >enhancement
Projects
None yet
Development

No branches or pull requests

4 participants