New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThrottlingAllocationDecider should not counting relocating shards #12409

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@bleskes
Member

bleskes commented Jul 23, 2015

The ThrottlingAllocationDecider is responsible to limit the number of incoming/local recoveries on a node. It therefore shouldn't count shards marked as relocating which represent the source of the recovery.

@dakrone

This comment has been minimized.

Show comment
Hide comment
@dakrone

dakrone Jul 23, 2015

Member

@bleskes can you update the Javadoc for this decider then? It currently says it limits the number of recovery operations (incoming and outgoing) but this change would make it only check incoming.

Also, is this causing an issue? I'm curious if we want the behavior changed.

Member

dakrone commented Jul 23, 2015

@bleskes can you update the Javadoc for this decider then? It currently says it limits the number of recovery operations (incoming and outgoing) but this change would make it only check incoming.

Also, is this causing an issue? I'm curious if we want the behavior changed.

Allocation: ThrottlingAllocationDecider should not counting relocatin…
…g shards

The ThrottlingAllocationDecider is responsible to limit the number of incoming/local recoveries on a node. It therefore shouldn't count shards marked as relocating which represent the source of the recovery.
@bleskes

This comment has been minimized.

Show comment
Hide comment
@bleskes

bleskes Jul 24, 2015

Member

@dakrone I'm not sure I follow your comment (I don't see the incoming and outgoing part). The problem is that this supposed to limit the amount of shards intializing on a node. This can happen in two ways - a primary assignment (Recover from gateway), a replica assignment and relocation to the node. The later ends up copying the data from the current primary.

At the moment I don't think the intention of this class is limit out going recoveries (we do see using INDICES_RECOVERY_CONCURRENT_STREAMS currently, in a totally different way). If we want to also count outgoing recoveries, the problem with the current relocation check is that it doesn't necessarily mark the source node. Replicas the relocate will still recovery from the primary. We should then count any initializing replica against the node holding the primary.

In both cases the current code is wrong. I opted to make it just count incoming recoveries which I think is the current model. We can make a bigger change later.

Member

bleskes commented Jul 24, 2015

@dakrone I'm not sure I follow your comment (I don't see the incoming and outgoing part). The problem is that this supposed to limit the amount of shards intializing on a node. This can happen in two ways - a primary assignment (Recover from gateway), a replica assignment and relocation to the node. The later ends up copying the data from the current primary.

At the moment I don't think the intention of this class is limit out going recoveries (we do see using INDICES_RECOVERY_CONCURRENT_STREAMS currently, in a totally different way). If we want to also count outgoing recoveries, the problem with the current relocation check is that it doesn't necessarily mark the source node. Replicas the relocate will still recovery from the primary. We should then count any initializing replica against the node holding the primary.

In both cases the current code is wrong. I opted to make it just count incoming recoveries which I think is the current model. We can make a bigger change later.

@dakrone

This comment has been minimized.

Show comment
Hide comment
@dakrone

dakrone Jul 24, 2015

Member

@bleskes I was thinking the current javadoc should be changed from:

  * <li><tt>cluster.routing.allocation.node_concurrent_recoveries</tt> -
  * restricts the number of concurrent recovery operations on a single node. The
  * default is <tt>2</tt></li>

To something like:

  * <li><tt>cluster.routing.allocation.node_concurrent_recoveries</tt> -
  * restricts the number of shards initializing from recovery on a single node. The
  * default is <tt>2</tt></li>

to make it clearer which shard state this is checking. Other than that, the code part LGTM :)

Member

dakrone commented Jul 24, 2015

@bleskes I was thinking the current javadoc should be changed from:

  * <li><tt>cluster.routing.allocation.node_concurrent_recoveries</tt> -
  * restricts the number of concurrent recovery operations on a single node. The
  * default is <tt>2</tt></li>

To something like:

  * <li><tt>cluster.routing.allocation.node_concurrent_recoveries</tt> -
  * restricts the number of shards initializing from recovery on a single node. The
  * default is <tt>2</tt></li>

to make it clearer which shard state this is checking. Other than that, the code part LGTM :)

@bleskes bleskes closed this in 57cbce2 Jul 24, 2015

@bleskes bleskes deleted the bleskes:throttling_relocating_count branch Jul 24, 2015

@clintongormley clintongormley removed the review label Aug 7, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment