New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lucene merges should run on the target shard during recovery #10463

Closed
wants to merge 3 commits into
base: 1.x
from

Conversation

Projects
None yet
4 participants
@mikemccand
Contributor

mikemccand commented Apr 7, 2015

This is already fixed on 2.0, since we let Lucene launch its own merges again.

But in 1.x, Lucene merges might not run on the target during recovery, causing segment explosion when there are many docs to replay and/or the index buffer is low. This then makes recovery time O(N^2) and can cause issues like #9226.

I just moved launching of the mergeScheduleFuture out of startScheduledTasksIfNeeded (only called once recovery is done) and into createNewEngine. This way whenever the engine is created we also start checking for merges.

I also renamed startScheduledTasksIfNeeded -> startEngineRefresher, and cleaned up a couple unrelated things.

@mikemccand mikemccand self-assigned this Apr 7, 2015

@mikemccand

This comment has been minimized.

Show comment
Hide comment
@mikemccand

mikemccand Apr 7, 2015

Contributor

I moved the mergeScheduleFuture creation to ctor, so now we create it once when the IndexShard is created, not in newEngine.

And I fixed EngineMerge to use engineUnsafe and skip merging if engine is currently null...

Contributor

mikemccand commented Apr 7, 2015

I moved the mergeScheduleFuture creation to ctor, so now we create it once when the IndexShard is created, not in newEngine.

And I fixed EngineMerge to use engineUnsafe and skip merging if engine is currently null...

@bleskes

This comment has been minimized.

Show comment
Hide comment
@bleskes

bleskes Apr 7, 2015

Member

LGTM

Member

bleskes commented Apr 7, 2015

LGTM

@mikemccand mikemccand added the >bug label Apr 7, 2015

mikemccand added a commit that referenced this pull request Apr 7, 2015

Core: Lucene merges must run on target shard during recovery
This does not affect 2.0, where we let Lucene launch merges normally
(#8643).

In 1.x, every 1 sec (default), we ask Lucene to kick off any new
merges, but we unfortunately don't turn that logic on in the target
shard until after recovery has finished.

This means if you have a large translog, and/or a smallish index
buffer, way too many segments can accumulate in the target shard
during recovery, making version lookups slower and slower (OI(N^2))
and possibly causing slow recovery issues like #9226.

This fix changes IndexShard to launch merges as soon as the shard is
created, so merging runs during recovery.

Closes #10463

mikemccand added a commit that referenced this pull request Apr 7, 2015

Core: Lucene merges must run on target shard during recovery
This does not affect 2.0, where we let Lucene launch merges normally
(#8643).

In 1.x, every 1 sec (default), we ask Lucene to kick off any new
merges, but we unfortunately don't turn that logic on in the target
shard until after recovery has finished.

This means if you have a large translog, and/or a smallish index
buffer, way too many segments can accumulate in the target shard
during recovery, making version lookups slower and slower (OI(N^2))
and possibly causing slow recovery issues like #9226.

This fix changes IndexShard to launch merges as soon as the shard is
created, so merging runs during recovery.

Closes #10463

@mikemccand mikemccand closed this Apr 7, 2015

mikemccand added a commit to mikemccand/elasticsearch that referenced this pull request Apr 11, 2015

Core: Lucene merges must run on target shard during recovery
This does not affect 2.0, where we let Lucene launch merges normally
(#8643).

In 1.x, every 1 sec (default), we ask Lucene to kick off any new
merges, but we unfortunately don't turn that logic on in the target
shard until after recovery has finished.

This means if you have a large translog, and/or a smallish index
buffer, way too many segments can accumulate in the target shard
during recovery, making version lookups slower and slower (OI(N^2))
and possibly causing slow recovery issues like #9226.

This fix changes IndexShard to launch merges as soon as the shard is
created, so merging runs during recovery.

Closes #10463

mikemccand added a commit that referenced this pull request Apr 11, 2015

Core: Lucene merges must run on target shard during recovery
This does not affect 2.0, where we let Lucene launch merges normally
(#8643).

In 1.x, every 1 sec (default), we ask Lucene to kick off any new
merges, but we unfortunately don't turn that logic on in the target
shard until after recovery has finished.

This means if you have a large translog, and/or a smallish index
buffer, way too many segments can accumulate in the target shard
during recovery, making version lookups slower and slower (OI(N^2))
and possibly causing slow recovery issues like #9226.

This fix changes IndexShard to launch merges as soon as the shard is
created, so merging runs during recovery.

Closes #10463

@kimchy kimchy added the v1.4.5 label Apr 11, 2015

@clintongormley clintongormley changed the title from Core: Lucene merges should run on the target shard during recovery to Lucene merges should run on the target shard during recovery May 30, 2015

mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015

Core: Lucene merges must run on target shard during recovery
This does not affect 2.0, where we let Lucene launch merges normally
(#8643).

In 1.x, every 1 sec (default), we ask Lucene to kick off any new
merges, but we unfortunately don't turn that logic on in the target
shard until after recovery has finished.

This means if you have a large translog, and/or a smallish index
buffer, way too many segments can accumulate in the target shard
during recovery, making version lookups slower and slower (OI(N^2))
and possibly causing slow recovery issues like #9226.

This fix changes IndexShard to launch merges as soon as the shard is
created, so merging runs during recovery.

Closes #10463

mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015

Core: Lucene merges must run on target shard during recovery
This does not affect 2.0, where we let Lucene launch merges normally
(#8643).

In 1.x, every 1 sec (default), we ask Lucene to kick off any new
merges, but we unfortunately don't turn that logic on in the target
shard until after recovery has finished.

This means if you have a large translog, and/or a smallish index
buffer, way too many segments can accumulate in the target shard
during recovery, making version lookups slower and slower (OI(N^2))
and possibly causing slow recovery issues like #9226.

This fix changes IndexShard to launch merges as soon as the shard is
created, so merging runs during recovery.

Closes #10463
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment