New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1834473: ovnkube: set NB/SB database inactivity probes to 60 seconds #631
Bug 1834473: ovnkube: set NB/SB database inactivity probes to 60 seconds #631
Conversation
Multiple northds run for HA in active/passive mode where the active northd holds a lock. If that northd loses connectivity to the database or is killed without releasing the lock, ovsdb-server will clear the lock after twice the inactivity probe. But if that probe is set to 0 (disabled) that will never happen, and a new northd will never grab the lock and continue reconciling NB->SB. Set the DB inactivity probe to something greater than 0 to ensure that a northd will always eventually become active.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship, dcbw The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
8 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@dcbw: This pull request references Bugzilla bug 1834473, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@dcbw: All pull requests linked via external trackers have merged: openshift/cluster-network-operator#631. Bugzilla bug 1834473 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Multiple northds run for HA in active/passive mode where the active
northd holds a lock. If that northd loses connectivity to the database
or is killed without releasing the lock, ovsdb-server will clear
the lock after twice the inactivity probe. But if that probe is set
to 0 (disabled) that will never happen, and a new northd will never
grab the lock and continue reconciling NB->SB.
Set the DB inactivity probe to something greater than 0 to ensure
that a northd will always eventually become active. The value of 60 was
chosen as a reasonable middle-ground between the lock being cleared
and another northd grabbing it (~120s) and the possibility that a loaded
ovsdb-server (many ovn-controller clients) would take more than 30-40
seconds to send/reply to all inactivity probes from clients.
Related: https://bugzilla.redhat.com/show_bug.cgi?id=1828989