-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Description
Currently nest is not able to differentiate between Service Unavailable and Timeout exception.
Problem description
- Think of 3 query nodes (q1,q2,q3) in cluster. One of query node q1 is booted out of cluster for some reasone like OOM.
- Now all query requests to q1 will fail with error status code 503 service not available.
A typical response where this situation arises look like,
ConnectionStatus = {StatusCode: 503,
Method: POST,
Url: http://localhost:9200/codeindex/sourceNoDedupeFileContract/_search?routing=0&pretty=true,
- Only In this case query should be routed to another query node q2.
Currently nest retires all retryable exception like timeouts on all nodes of the connection pool.
So say a wildcard query is sent to query node q1. If it timesout the query should timeout.
But in this scenario nest still sends this query to another node q2. It again timesout and in turn it is sent to q3. So query times out after 3 mins instead of 1 min.
This also makes a case of DOS attack.
Proper fix should be to differentiate between timeout and ServiceNotAvailable exception. Query should be sent on another node from connection pool only if service is not available on first node.
Metadata
Metadata
Assignees
Labels
No labels