New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1990631: ovnkube: use ovn-nbctl daemon monitor mode to restart and log issues #1182
Conversation
If the ovn-nbctl daemon has problems and dies monitor mode will restart it for us and log about the problem. Would help stability and debug issues like: utils.go:236] Gateway router GR_worker-0.ostest.test.metalkube.org does not have load balancer (OVN command '/usr/bin/ovn-nbctl --timeout=15 --data=bare --no-heading --columns=_uuid find load_balancer external_ids:UDP_lb_gateway_router=GR_worker-0.ostest.test.metalkube.org' failed: signal: alarm clock) services_controller.go:210] "Error syncing service, retrying" service="e2e-webhook-856/e2e-test-webhook" err="unable to create load balancer: stderr: 2021-08-10T19:19:03Z|00001|fatal_signal|WARN|terminating with signal 14 (Alarm clock)\n, err: OVN command '/usr/bin/ovn-nbctl --timeout=15 set load_balancer 3c704510-997b-42b0-962d-fdb8d9f12ab3 vips:\"172.30.37.42:8443\"=\"\"' failed: signal: alarm clock"
I'm currently looking at an issue where ovn-nbctl daemon seems stuck or broken, and all I see are the ovn-nbctl commands failing trying to talk to it. This PR will be super useful for more info on what is happening there. Thank you @dcbw ! /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dcbw, flavio-fernandes The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
16 similar comments
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
@dcbw: This pull request references Bugzilla bug 1990631, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Third time is the charm, right? :) /bugzilla refresh |
@flavio-fernandes: This pull request references Bugzilla bug 1990631, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Bugzilla (anusaxen@redhat.com), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
10 similar comments
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
@dcbw: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
@dcbw: All pull requests linked via external trackers have merged: Bugzilla bug 1990631 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.9 |
@dcbw: new pull request could not be created: failed to create pull request against openshift/cluster-network-operator#release-4.9 from head openshift-cherrypick-robot:cherry-pick-1182-to-release-4.9: status code 422 not one of [201], body: {"message":"Validation Failed","errors":[{"resource":"PullRequest","code":"custom","message":"No commits between openshift:release-4.9 and openshift-cherrypick-robot:cherry-pick-1182-to-release-4.9"}],"documentation_url":"https://docs.github.com/rest/reference/pulls#create-a-pull-request"} In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.8 |
@dcbw: new pull request created: #1211 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
If the ovn-nbctl daemon has problems and dies monitor mode will restart
it for us and log about the problem. Would help stability and debug
issues like: