Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup Elasticsearch timeout query parameter to a non-default value #3818

Open
sebgl opened this issue Oct 8, 2020 · 3 comments
Open

Setup Elasticsearch timeout query parameter to a non-default value #3818

sebgl opened this issue Oct 8, 2020 · 3 comments
Labels
>enhancement Enhancement of existing functionality

Comments

@sebgl
Copy link
Contributor

sebgl commented Oct 8, 2020

Some Elasticsearch API endpoints (/_cluster/settings, /_cluster/voting_config_exclusions) have a timeout query parameter. If not set, it defaults to 30 seconds.

There might be cases were an overloaded cluster needs more than 30 seconds to respond.

For those calls, our Elasticsearch client timeout (default 3min) does not matter much, since Elasticsearch timeout will be reached earlier. Even though users can override our client timeout through an annotation on the Elasticsearch resource, it is ignored in this case.

We should probably make sure all API calls we make that support the timeout query parameter actually set it to a value close to our client timeout?

Conceptually, we could set the api timeout parameter to a lower value than the client timeout, so the client does not timeout before Elasticsearch. By how much though? A few seconds likely. In practice, I don't think that makes a big difference: if the request times out on our end we'll retry it at next reconciliation. Our reconciliation loop should be idempotent: there should be no case where a call that times out for us but succeeds on Elasticsearch side would cause troubles.

@sebgl sebgl added the >enhancement Enhancement of existing functionality label Oct 8, 2020
@charith-elastic
Copy link
Contributor

I agree with your comments about the reconciliation loop being idempotent, but, I think we should have some mechanical sympathy here as well. If the timeout parameter is exactly the same as the client timeout, there's a very good chance that the client will hit its request timeout before the Elasticsearch server times out. Of course, since the call is supposedly idempotent, there's no harm in retrying (even though the call may have succeeded unbeknown to us). The problem is that if the Elasticsearch cluster is so overloaded that it can't even respond to a request within a reasonable amount of time, blindly retrying calls is not going to make things better. Ideally, in cases like this we should at least be backing off exponentially to avoid overloading the cluster. The requeue mechanism probably does this for us; however, we then end up running the entire reconciliation loop again: doing a lot of unnecessary work on struggling cluster.

@sebgl
Copy link
Contributor Author

sebgl commented Oct 12, 2020

I'm fine with setting a slightly lower timeout for the api call vs. the client. It's not that much more extra work :)
What do you think would be a good delta value between both timeouts? ~5sec?

@charith-elastic
Copy link
Contributor

It shouldn't be too hard to derive the timeout parameter based on the client timeout.

Ideally the difference would be calculated by measuring the network latency but that's probably going overboard for this. Since we are communicating within the Kubernetes cluster, I think 2-3 seconds would be reasonable enough to cover most cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality
Projects
None yet
Development

No branches or pull requests

2 participants