-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: increase liveness probe timeout to 5s #10986
Conversation
stability issues have been observed with 1s. socket latency is expected whenever CPUs are under minor pressure. Increasing value to 5s should cover most small-medium scale envs. Resolves BZ: 2126566 Signed-off-by: Randy J. Martinez <randy@cephtips.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
one small question:
default is 1s and making it 5s is a bit longer but if it makes it more stable then good. have you seen 5s giving more stability? @randymtz
never minds. Seems like this was discussed in yesterday call. |
As discussed, 5s could give more stability and less aggressive to restart the OSDs that are resource-starved, although it is still arbitrary. |
update: Awaiting feedback in BZ from linked-case to gauge 5s effectiveness. Once received, we'll be able to confirm if failureThreshold requires update as well. note: Some OSDs were reporting CLBO post livenessProbe update. After investigating, the env did not have PR#10250 value update. I provided patch and am awaiting feedback for that as well. |
The BZ has the feedback that 5s is working better, let's go ahead with this. |
core: increase liveness probe timeout to 5s (backport #10986)
stability issues have been observed with 1s.
socket latency is expected whenever CPUs are
under minor pressure. Increasing value to
5s should cover most small-medium scale envs.
Resolves BZ: 2126566
Signed-off-by: Randy J. Martinez randy@cephtips.com
Description of your changes:
Which issue is resolved by this Pull Request:
Resolves #
Checklist:
skip-ci
on the PR.