Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kibana + 2 data nodes = uneven search loads? #24642

Closed
markharwood opened this issue May 12, 2017 · 5 comments

Comments

Projects
None yet
4 participants
@markharwood
Copy link
Contributor

commented May 12, 2017

Perhaps a common scenario here where in a 2 data node system one data node gets hammered while the other is idle. The reason seems to lie in default Kibana and search routing config.

  • Kibana looks to use a preference key for routing searches based on a session ID. It also uses _msearch for dashboards to bulk up requests.
  • All primaries were on one data node and the replicas on the other (not uncommon after a restart)

When elasticsearch gets a request to route on a preference setting that is a session ID it looks to select a choice of primary vs replica by hashing the preference string only. If each shard of an index presents their list of primaries and replicas in the same order (and I haven't confirmed this is the case!) then this routing algo will pick the same node for all searches given the same session key which is what the user was seeing.

If we hashed the preference key AND the shard number we would randomize the choice of primary vs replica and hence node choice for each shard. This would spread load more evenly.

@javanna

This comment has been minimized.

Copy link
Member

commented May 12, 2017

Isn't the whole point of the preference to query the same shard copy all the time when the value is the same?

@markharwood markharwood added the >bug label May 12, 2017

@markharwood

This comment has been minimized.

Copy link
Contributor Author

commented May 12, 2017

Yes. This change proposes that for each shardID we deterministically pick the same replica given the same session key but not using the same policy across all shardIDs which leads to uneven loads.

@markharwood markharwood removed the discuss label May 12, 2017

@markharwood

This comment has been minimized.

Copy link
Contributor Author

commented May 12, 2017

We discussed this on FixItFriday and decided to adopt the proposed change of including shard_id in the hash of the preference key but with appropriate check for backwards-compatibility.

@markharwood markharwood self-assigned this May 12, 2017

@s1monw

This comment has been minimized.

Copy link
Contributor

commented May 12, 2017

@markharwood I am +1 on this but we need to make sure that we preserve BWC. It should be rather simple here like in OperationRouting you can do this:

private ShardIterator preferenceActiveShardIterator(IndexShardRoutingTable indexShard, String localNodeId, DiscoveryNodes nodes, @Nullable String preference) {
  // ...
  if (nodes.getMinNodeVersion().onOrAfter(Version.V_6_0_0_alpha1_UNRELEASED)) {
    // use new method
  } else {
    // use old method
  }
}

markharwood added a commit to markharwood/elasticsearch that referenced this issue May 23, 2017

Search: Fairer balancing when routing searches by user-supplied prefe…
…rence values.

A user reported uneven balancing of load on nodes handling search requests from Kibana which supplies a session ID in a routing preference. Each shardId was selecting the same node for a given session ID because one data node had all primaries and the other data node held all replicas after cluster startup.
This change counteracts the tendency to opt for the same node given the same user-supplied preference by incorporating shard ID in the hash of the preference key. This will help randomise node choices across shards.

Closes elastic#24642

markharwood added a commit that referenced this issue May 23, 2017

Search: Fairer balancing when routing searches by session ID (#24671)
A user reported uneven balancing of load on nodes handling search requests from Kibana which supplies a session ID in a routing preference. Each shardId was selecting the same node for a given session ID because one data node had all primaries and the other data node held all replicas after cluster startup.
This change counteracts the tendency to opt for the same node given the same user-supplied preference by incorporating shard ID in the hash of the preference key. This will help randomise node choices across shards.

Closes #24642
@ruria

This comment has been minimized.

Copy link

commented Jul 7, 2017

Sorry, but I don´t get the point. Why is so important to preserve BWC? "eventually" primary and replica can diverge, so hitting same shards in a "real time" scenario along the (say) session is fine. But if you update your cluster, restart nodes, and so on... to install this "patch" What´s the problem to hit another shards? They should be synced and almost sure your session is changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.