Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search flow enhancements for admission controller #8913

Open
bharath-techie opened this issue Jul 27, 2023 · 0 comments
Open

Search flow enhancements for admission controller #8913

bharath-techie opened this issue Jul 27, 2023 · 0 comments
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc

Comments

@bharath-techie
Copy link
Contributor

Overview

The parent RFC #8910 discusses admission control framework which limits and restricts the new incoming requests early when a node begins to go under stress.
As part of it , we will enhance search flow to intelligently route the requests away from the stressed nodes.
We also need to enhance the coordinator node (TransportSearchAction) logic to mark shard requests for rejection if all primary and replicas of index shard belong to nodes under stress.

Routing enhancements

Enhance ARS ( Adaptive replica selection)

We will make adjustments to the ARS ranking algorithm based on resource utilization of target nodes which will help in proactively rerouting the requests away from nodes with high stress.

Ranking algorithm changes

We will increase the rank by a factor for each resource utilization threshold breached , so the nodes with high resource utilization will have higher ranks.
We increase the rank calculated by ARS by a multiplier.
Rank = current rank * performance based multiplier

Examples :

  1. When I/O is greater than 95% ,
    Rank = rank * 1.7 ( rank is increased by 70% )

  2. When CPU and IO both are beyond the thresholds ,
    Rank = rank * 1.7 * 1.7 ( rank is increased by 70% twice for each resource utilization threshold breached )

  • 1.7 as factor worked well on benchmarks done on POC. This will be configurable and also be updated during development.

Stats adjustment post ranking

We also need to adjust the stats of the bad nodes post ranking , otherwise we will end up herding all the requests to the good node.
So for each new request that doesn’t get routed to bad nodes,

Resource utilization stat of bad node = Resource utilization stat * Reduction factor

Reduction factor can be configured based on how soon we want to normalize the stats of bad nodes.

AdmissionControllerSearchFlow (1)

Weighted round robin routing

For routing based on weights, we can use weighted ARS instead of weighted round robin routing since ARS already has the enhancements mentioned above, and it'll provide fairness to each new request.

Rejection of search requests

Rejection in coordinator node

When primary and replicas of the search shard requests are all in stressed nodes, we can fail fast the shard request in coordinator.

Approach

We build set of ‘SearchShardIterator’ as part of ‘executeSearch’ in ‘TransportSearchAction’ in which we’ll execute the search request.
Similar to ‘skip’ option in ‘SearchShardIterator’ , we can provide a new option ‘failFast’ , based on which we can fail fast the shard in the coordinator , mark the shard as failed as part of search response , and skip sending the request to the actual data nodes.

Rejection in target nodes

We can reject incoming requests if the data node is in stress, this will be an extension to existing search backpressure.

Co-authored by : @ajaymovva

@bharath-techie bharath-techie added enhancement Enhancement or improvement to existing feature or request untriaged labels Jul 27, 2023
@Xtansia Xtansia added the Search Search query, autocomplete ...etc label Aug 14, 2023
@msfroh msfroh removed the untriaged label Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants