Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-174: [release-4.10] Fix race when adding and removing pod with same name #1250

Conversation

flavio-fernandes
Copy link
Contributor

Adding and removing a pod on changing nodes back to back can end up in a race where
corresponding logical switch port remains in the wrong logical switch and never gets
properly removed. In order for this to happen, the logical switch port has to have
the same name, which is the _.

This PR includes the back-porting of 2 fixes needed to address this race.
They were merged to D/S master via PR #1237

https://github.com/openshift/ovn-kubernetes/commit/f1be8d298bfe7b087af009a4093a863ebb7804fb
https://github.com/openshift/ovn-kubernetes/commit/be8786a89546effe2de121fce9c05907fae4c1ce

Trivial change to pod retries logs when attempting deletion to
provide the error value itself.

When @ricky-rav moved logic from /pkg/ovn/pods_retry.go to /pkg/ovn/obj_retry.go in
4.11, he already took care of showing the error message in the log.

https://github.com/ricky-rav/ovn-kubernetes/blob/b4738c77138b1f332d41c88046d29e7b558f6683/go-controller/pkg/ovn/obj_retry.go#L787

Signed-off-by: Flavio Fernandes <flaviof@redhat.com>
@openshift-ci-robot openshift-ci-robot added the jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. label Aug 17, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 17, 2022

@flavio-fernandes: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

OCPBUGS-174: [release-4.10] Fix race when adding and removing pod with same name

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Aug 17, 2022
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 17, 2022

@flavio-fernandes: This pull request references [Jira Issue OCPBUGS-174](https://issues.redhat.com//browse/OCPBUGS-174), which is invalid:

  • expected dependent Jira Issue OCPBUGSM-47974 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is Code Review instead
  • expected dependent Jira Issue OCPBUGSM-47974 to target a version in 4.11.0, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Adding and removing a pod on changing nodes back to back can end up in a race where
corresponding logical switch port remains in the wrong logical switch and never gets
properly removed. In order for this to happen, the logical switch port has to have
the same name, which is the _.

This PR includes the back-porting of 2 fixes needed to address this race.
They were merged to D/S master via PR #1237

f1be8d2
be8786a

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested review from dcbw and tssurya August 17, 2022 18:53
@flavio-fernandes
Copy link
Contributor Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 17, 2022
flavio-fernandes and others added 2 commits August 17, 2022 20:39
…dicate

This small change pulls FindLogicalSwitchesWithPredicate from release 4.11 and newer,
to facilitate the backporting of other changes that depend on this function.

The function was initially introduced as part of a much bigger PR. This commit is just
a small portion of it:

openshift#1049
openshift@6d60741#r81458726

Signed-off-by: Flavio Fernandes <flaviof@redhat.com>
…e informer cache

When processing an object in terminal state there is a chance that it was already removed from
the API server. Since delete events for objects in terminal state are skipped delete it here.

Conflicts:
  go-controller/pkg/ovn/obj_retry.go --> where code lives in 4.11 and newer
  go-controller/pkg/ovn/ovn.go --> where code lives in 4.10 and older

Signed-off-by: Patryk Diak <pdiak@redhat.com>
(cherry picked from commit f1be8d2)
@flavio-fernandes flavio-fernandes force-pushed the pod_removal_fix_4.10 branch 2 times, most recently from cd2a1cb to 697267f Compare August 17, 2022 20:46
@flavio-fernandes
Copy link
Contributor Author

/remove-hold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 17, 2022
@flavio-fernandes
Copy link
Contributor Author

/assign @kyrtapz
/assign @trozet

@flavio-fernandes flavio-fernandes requested review from kyrtapz and removed request for tssurya and dcbw August 18, 2022 13:52
@flavio-fernandes
Copy link
Contributor Author

/test e2e-aws-ovn

@abhat
Copy link
Contributor

abhat commented Aug 18, 2022

/bugzilla valid-bug

JIRA OCPBUGS-174 tracks this and depends on the 4.11 bug which was open in Bugzilla.

Adding and removing a pod on changing nodes back to back can end up in a race where
corresponding logical switch port remains in the wrong logical switch and never gets
properly removed. In order for this to happen, the logical switch port has to have
the same name, which is the <namespace>_<podName>.

Conflicts:
    go-controller/pkg/ovn/pods.go
    go-controller/pkg/ovn/pods_test.go

Signed-off-by: Flavio Fernandes <flaviof@redhat.com>
Co-authored-by: Tim Rozet <trozet@redhat.com>
(cherry picked from commit be8786a)
@flavio-fernandes
Copy link
Contributor Author

@trozet Please add the missing labels, if it all looks right to you.

@flavio-fernandes
Copy link
Contributor Author

/test e2e-aws-ovn-windows

@flavio-fernandes
Copy link
Contributor Author

/test e2e-vsphere-ovn
/test e2e-vsphere-windows
/test e2e-openstack-ovn

@flavio-fernandes
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 21, 2022

@flavio-fernandes: This pull request references [Jira Issue OCPBUGS-174](https://issues.redhat.com//browse/OCPBUGS-174), which is invalid:

  • expected dependent Jira Issue OCPBUGSM-47974 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is Code Review instead
  • expected dependent Jira Issue OCPBUGSM-47974 to target a version in 4.11.0, 4.11.z, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@flavio-fernandes
Copy link
Contributor Author

/test e2e-vsphere-ovn
/test e2e-openstack-ovn

@kyrtapz
Copy link
Contributor

kyrtapz commented Aug 22, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 22, 2022
@flavio-fernandes
Copy link
Contributor Author

/test e2e-openstack-ovn

@jcaamano
Copy link
Contributor

label /backport-risk-assessed

@jcaamano
Copy link
Contributor

/approve

@jcaamano
Copy link
Contributor

/label backport-risk-assessed

lol

@openshift-ci openshift-ci bot added backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 22, 2022
@flavio-fernandes
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 22, 2022

@flavio-fernandes: This pull request references [Jira Issue OCPBUGS-174](https://issues.redhat.com//browse/OCPBUGS-174), which is invalid:

  • expected dependent Jira Issue OCPBUGSM-47974 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is Code Review instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@anuragthehatter
Copy link

/label cherry-pick-approved

@openshift-ci openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Aug 22, 2022
@knobunc knobunc added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Aug 22, 2022
@knobunc
Copy link
Contributor

knobunc commented Aug 22, 2022

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 22, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: flavio-fernandes, jcaamano, knobunc, kyrtapz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 2 against base HEAD 56af075 and 8 for PR HEAD b62dba2 in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 1 against base HEAD 56af075 and 7 for PR HEAD b62dba2 in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 23, 2022

@flavio-fernandes: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-openstack-ovn b62dba2 link false /test e2e-openstack-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 72538c5 into openshift:release-4.10 Aug 23, 2022
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 23, 2022

@flavio-fernandes: All pull requests linked via external trackers have merged:

[Jira Issue OCPBUGS-174](https://issues.redhat.com//browse/OCPBUGS-174) has been moved to the MODIFIED state.

In response to this:

Adding and removing a pod on changing nodes back to back can end up in a race where
corresponding logical switch port remains in the wrong logical switch and never gets
properly removed. In order for this to happen, the logical switch port has to have
the same name, which is the _.

This PR includes the back-porting of 2 fixes needed to address this race.
They were merged to D/S master via PR #1237

f1be8d2
be8786a

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@flavio-fernandes flavio-fernandes deleted the pod_removal_fix_4.10 branch August 23, 2022 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants