Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-4.10] Bug 2052017: restart pod on non-retriable failures when deleting stale objects #945

Conversation

flavio-fernandes
Copy link
Contributor

In cases where we currently miss doing retries for removal of stale
objects, it is best to restart the pod than simply log an error and
bring the pod up. This change is changing that behavior on functions
run early on the pod start up.

Signed-off-by: Flavio Fernandes flaviof@redhat.com

Signed-off-by: Flavio Fernandes <flaviof@redhat.com>
(cherry picked from commit 44d06f5)
Signed-off-by: Flavio Fernandes <flaviof@redhat.com>
(cherry picked from commit 4e9e424)
findSwitch only sets the UUID in the provided parameter.
So, renaming it to findSwitchUUID

Signed-off-by: Flavio Fernandes <flaviof@redhat.com>
(cherry picked from commit d92eab2)
Upon starting, failures when syncing OVN DB with K8 should
be considered fatal. Still, this change will introduce
retry logic to minimize pod restarts.

Conflicts:
  go-controller/pkg/ovn/pods.go

Signed-off-by: Flavio Fernandes <flaviof@redhat.com>
(cherry picked from commit af27b80)
@openshift-ci openshift-ci bot added the bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. label Feb 8, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 8, 2022

@flavio-fernandes: This pull request references Bugzilla bug 2052017, which is invalid:

  • expected the bug to target the "4.10.0" release, but it targets "4.10.z" instead
  • expected dependent Bugzilla bug 2027874 to be in one of the following states: MODIFIED, ON_QA, VERIFIED, but it is CLOSED (ERRATA) instead
  • expected dependent Bugzilla bug 2027874 to target a release in 4.11.0, but it targets "4.7.z" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

[release-4.10] Bug 2052017: restart pod on non-retriable failures when deleting stale objects

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Feb 8, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 8, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: flavio-fernandes
To complete the pull request process, please assign squeed after the PR has been reviewed.
You can assign the PR to them by writing /assign @squeed in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot requested review from dcbw and trozet February 8, 2022 14:25
@flavio-fernandes
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 8, 2022

@flavio-fernandes: This pull request references Bugzilla bug 2052017, which is invalid:

  • expected the bug to target the "4.10.0" release, but it targets "4.10.z" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@flavio-fernandes
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci openshift-ci bot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Feb 8, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 8, 2022

@flavio-fernandes: This pull request references Bugzilla bug 2052017, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.10.0) matches configured target release for branch (4.10.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 2042999 is in the state MODIFIED, which is one of the valid states (MODIFIED, ON_QA, VERIFIED)
  • dependent Bugzilla bug 2042999 targets the "4.11.0" release, which is one of the valid target releases: 4.11.0
  • bug has dependents

Requesting review from QA contact:
/cc @anuragthehatter

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Feb 8, 2022
@flavio-fernandes
Copy link
Contributor Author

/hold waiting for additional changes to retry (cc @tssurya )

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 10, 2022
@flavio-fernandes
Copy link
Contributor Author

Holding this PR to cherry-pick the commits from @tssurya : ovn-org/ovn-kubernetes#2787

@flavio-fernandes
Copy link
Contributor Author

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 22, 2022

@flavio-fernandes: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-e2e-gcp-ovn e0ee1e1 link false /test okd-e2e-gcp-ovn
ci/prow/e2e-openstack-ovn e0ee1e1 link false /test e2e-openstack-ovn
ci/prow/e2e-vsphere-ovn e0ee1e1 link false /test e2e-vsphere-ovn
ci/prow/e2e-metal-ipi-ovn-dualstack e0ee1e1 link true /test e2e-metal-ipi-ovn-dualstack
ci/prow/e2e-aws-ovn e0ee1e1 link true /test e2e-aws-ovn
ci/prow/4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade e0ee1e1 link true /test 4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@flavio-fernandes
Copy link
Contributor Author

This is now folded into #994
Closing this PR.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 29, 2022

@flavio-fernandes: This pull request references Bugzilla bug 2052017. The bug has been updated to no longer refer to the pull request using the external bug tracker.

In response to this:

[release-4.10] Bug 2052017: restart pod on non-retriable failures when deleting stale objects

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@flavio-fernandes flavio-fernandes deleted the fatal_on_rm_stale_4.10 branch March 29, 2022 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant