New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to use serial merge schedule by default #5447

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@kimchy
Member

kimchy commented Mar 17, 2014

Today, we use ConcurrentMergeScheduler, and this can be painful since it is concurrent on a shard level, with a max of 3 threads doing concurrent merges. If there are several shards being indexed, then there will be a minor explosion of threads trying to do merges, all being throttled by our merge throttling.
Moving to serial merge scheduler will still maintain concurrency of merges across shards, as we have the merge thread pool that schedules those merges. It will just be a serial one on a specific shard.
Also, on serial merge scheduler, we now have a limit of how many merges it will do at one go, so it will let other shards get their fair chance of merging. We use the pending merges on IW to check if merges are needed or not for it.
Note, that if a merge is happening, it will not block due to a sync on the maybeMerge call at indexing (flush) time, since we wrap our merge scheduler with the EnabledMergeScheduler, where maybeMerge is not activated during indexing, only with explicit calls to IW#maybeMerge (see Merges).

Move to use serial merge schedule by default
Today, we use ConcurrentMergeScheduler, and this can be painful since it is concurrent on a shard level, with a max of 3 threads doing concurrent merges. If there are several shards being indexed, then there will be a minor explosion of threads trying to do merges, all being throttled by our merge throttling.
Moving to serial merge scheduler will still maintain concurrency of merges across shards, as we have the merge thread pool that schedules those merges. It will just be a serial one on a specific shard.
Also, on serial merge scheduler, we now have a limit of how many merges it will do at one go, so it will let other shards get their fair chance of merging. We use the pending merges on IW to check if merges are needed or not for it.
Note, that if a merge is happening, it will not block due to a sync on the maybeMerge call at indexing (flush) time, since we wrap our merge scheduler with the EnabledMergeScheduler, where maybeMerge is not activated during indexing, only with explicit calls to IW#maybeMerge (see Merges).
@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Mar 17, 2014

Contributor

I like the change the only thing that we need to figure out is the nameing. I also think we need a unittest for this.

Contributor

s1monw commented Mar 17, 2014

I like the change the only thing that we need to figure out is the nameing. I also think we need a unittest for this.

first round of review
- renamed the variable name
- added a unit test for the serial merge scheduler limiting
@kimchy

This comment has been minimized.

Show comment
Hide comment
@kimchy

kimchy Mar 17, 2014

Member

renamed + added unit test.

Member

kimchy commented Mar 17, 2014

renamed + added unit test.

@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Mar 18, 2014

Contributor

LGTM

Contributor

s1monw commented Mar 18, 2014

LGTM

@kimchy kimchy closed this in 0ef3b03 Mar 18, 2014

kimchy added a commit that referenced this pull request Mar 18, 2014

Move to use serial merge schedule by default
Today, we use ConcurrentMergeScheduler, and this can be painful since it is concurrent on a shard level, with a max of 3 threads doing concurrent merges. If there are several shards being indexed, then there will be a minor explosion of threads trying to do merges, all being throttled by our merge throttling.
Moving to serial merge scheduler will still maintain concurrency of merges across shards, as we have the merge thread pool that schedules those merges. It will just be a serial one on a specific shard.
Also, on serial merge scheduler, we now have a limit of how many merges it will do at one go, so it will let other shards get their fair chance of merging. We use the pending merges on IW to check if merges are needed or not for it.
Note, that if a merge is happening, it will not block due to a sync on the maybeMerge call at indexing (flush) time, since we wrap our merge scheduler with the EnabledMergeScheduler, where maybeMerge is not activated during indexing, only with explicit calls to IW#maybeMerge (see Merges).
closes #5447

@kimchy kimchy deleted the kimchy:serial_merge_scheduler branch Mar 18, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment