-
Notifications
You must be signed in to change notification settings - Fork 685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Readyness probe is timing out #2259
Comments
Relates #2248 |
So close! 😞 Thanks for bringing this up. I opened #2260 to allow it to be configurable. In the meantime, you should be able to override the readiness probe in the podTemplate. The easiest solution may be to simply use a tcp check as described here: which means some pods may join the service before they are ready, but that seems to be an improvement over the existing situation. If you have any issues or further questions please let us know. |
The above referenced PR has been merged and should land in the next release. In the meantime the above mentioned workaround exists, so I'll go ahead and close this. Feel free to re-open if there's further questions/comments though. |
Bug Report
What did you do?
I'm running a three node cluster. I have write-heavy loads. I run sometimes queries on a pretty large dataset (~600GB).
What did you expect to see?
The nodes are all up and running, and kibana should work.
What did you see instead? Under which circumstances?
Soon after the initial query, kibana stops working and every request starts to fail. I get the following in kibana log:
When I check the pods, they are marked as not ready.
I have stuff like this in my pods events:
However, all my curl requests to the node seem to work fine.
Because no nodes is considered ready, the service reports no endpoints:
I tracked it down to the probe script (
/mnt/elastic-internal/scripts/readiness-probe-script.sh
). It performs a curl request with a 3 seconds timeout. And it seems pretty obvious I'm hitting the timeout.I quickly confirmed that this was indeed the case:
Environment
I feel we should either:
Any ideas of workarounds would be greatly appreciated.
The text was updated successfully, but these errors were encountered: