Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: increase liveness probe timeout to 5s #10986

Merged
merged 1 commit into from
Oct 17, 2022

Conversation

eljabsheh
Copy link

stability issues have been observed with 1s.
socket latency is expected whenever CPUs are
under minor pressure. Increasing value to
5s should cover most small-medium scale envs.

Resolves BZ: 2126566

Signed-off-by: Randy J. Martinez randy@cephtips.com

Description of your changes:

Which issue is resolved by this Pull Request:
Resolves #

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide).
  • Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

stability issues have been observed with 1s.
socket latency is expected whenever CPUs are
under minor pressure. Increasing value to
5s should cover most small-medium scale envs.

Resolves BZ: 2126566

Signed-off-by: Randy J. Martinez <randy@cephtips.com>
Copy link
Contributor

@subhamkrai subhamkrai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

one small question:
default is 1s and making it 5s is a bit longer but if it makes it more stable then good. have you seen 5s giving more stability? @randymtz

@subhamkrai
Copy link
Contributor

LGTM!

one small question: default is 1s and making it 5s is a bit longer but if it makes it more stable then good. have you seen 5s giving more stability? @randymtz

never minds. Seems like this was discussed in yesterday call.

@travisn
Copy link
Member

travisn commented Sep 14, 2022

As discussed, 5s could give more stability and less aggressive to restart the OSDs that are resource-starved, although it is still arbitrary.
@randymtz Do you have a cluster where you could observe that 2s was causing the OSDs to be restarted, but 5s was more stable? Perhaps we should also increase the failureThreshold which defaults to 3.

@eljabsheh
Copy link
Author

As discussed, 5s could give more stability and less aggressive to restart the OSDs that are resource-starved, although it is still arbitrary. @randymtz Do you have a cluster where you could observe that 2s was causing the OSDs to be restarted, but 5s was more stable? Perhaps we should also increase the failureThreshold which defaults to 3.

update: Awaiting feedback in BZ from linked-case to gauge 5s effectiveness. Once received, we'll be able to confirm if failureThreshold requires update as well.

note: Some OSDs were reporting CLBO post livenessProbe update. After investigating, the env did not have PR#10250 value update. I provided patch and am awaiting feedback for that as well.

@travisn
Copy link
Member

travisn commented Oct 17, 2022

The BZ has the feedback that 5s is working better, let's go ahead with this.

@travisn travisn merged commit 7f0c83a into rook:master Oct 17, 2022
travisn added a commit that referenced this pull request Oct 17, 2022
core: increase liveness probe timeout to 5s (backport #10986)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants