Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1845664: release-4.5 Cleanup Conntrack when endpoints and pods are deleted #180

Conversation

JacobTanenbaum
Copy link
Contributor

- What this PR does and why is it needed

When pods and endpoints are deleted if there are ongoing networking connections there will be a stale conntrack entry. This PR ensures that those conntrack entries are cleaned up

- Special notes for reviewers

- How to verify it

See the BZ for reproduction information

- Description for the changelog

Add cleanup of pods and endpoints conntrack entries

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jun 9, 2020
@openshift-ci-robot
Copy link
Contributor

@JacobTanenbaum: This pull request references Bugzilla bug 1845664, which is invalid:

  • expected dependent Bugzilla bug 1787434 to be in one of the following states: MODIFIED, ON_QA, VERIFIED, but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1845664: Cleanup Conntrack when endpoints and pods are deleted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@JacobTanenbaum JacobTanenbaum changed the title Bug 1845664: Cleanup Conntrack when endpoints and pods are deleted Bug 1845664: release-4.5 Cleanup Conntrack when endpoints and pods are deleted Jun 9, 2020
@JacobTanenbaum
Copy link
Contributor Author

/retest

@JacobTanenbaum
Copy link
Contributor Author

ci/prow/e2e-aws-ovn

error: some steps failed:
  * could not run steps: step ovn-kubernetes failed: could not wait for build: the build ovn-kubernetes failed after 4m2s with reason DockerBuildFailed: Docker build strategy has failed. 

/retest

@JacobTanenbaum
Copy link
Contributor Author

/retest

2 similar comments
@JacobTanenbaum
Copy link
Contributor Author

/retest

@JacobTanenbaum
Copy link
Contributor Author

/retest

@dcbw
Copy link
Contributor

dcbw commented Jul 14, 2020

@JacobTanenbaum does this backport need to be redone to match the new upstream implementation?

@dcbw
Copy link
Contributor

dcbw commented Jul 14, 2020

/bugzilla refresh
/hold
(until it matches newer upstream implementation)

@openshift-ci-robot openshift-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jul 14, 2020
@openshift-ci-robot
Copy link
Contributor

@dcbw: This pull request references Bugzilla bug 1845664, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.5.z) matches configured target release for branch (4.5.z)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 1787434 is in the state VERIFIED, which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA))
  • dependent Bugzilla bug 1787434 targets the "4.6.0" release, which is one of the valid target releases: 4.6.0, 4.6.z
  • bug has dependents

In response to this:

/bugzilla refresh
/hold
(until it matches newer upstream implementation)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

when a pod is deleted make sure to delete any conntrack entries assocaited with
those pods.

https://bugzilla.redhat.com/show_bug.cgi?id=1787434

Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>
when using node port services in a multinode cluster when a backend server
pod gets deleted there could be conntrack entries associated with that pods
ip on any node in the cluster. When removing an endpoint from the list
of endpoints we should be deleteing all conntrack entries that may include
the stale information

Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>
the cherry-pick of 0951d91 did not apply cleanly
had to manually add

    github.com/vishvananda/netlink v0.0.0-20200625175047-bca67dfc8220

to mod.go and run

   go mod vendor

then had to manually add the below files to the git tree

        vendor/github.com/vishvananda/netlink/.gitignore
        vendor/github.com/vishvananda/netlink/devlink_linux.go
        vendor/github.com/vishvananda/netlink/go.mod
        vendor/github.com/vishvananda/netlink/go.sum
        vendor/github.com/vishvananda/netlink/nl/devlink_linux.go
        vendor/github.com/vishvananda/netns/go.mod
        vendor/github.com/vishvananda/netns/go.sum
        vendor/golang.org/x/sys/unix/fdset.go
        vendor/golang.org/x/sys/unix/sockcmsg_dragonfly.go
        vendor/golang.org/x/sys/unix/sockcmsg_unix_other.go
        vendor/golang.org/x/sys/unix/zptrace_armnn_linux.go
        vendor/golang.org/x/sys/unix/zptrace_linux_arm64.go
        vendor/golang.org/x/sys/unix/zptrace_mipsnn_linux.go
        vendor/golang.org/x/sys/unix/zptrace_mipsnnle_linux.go
        vendor/golang.org/x/sys/unix/zptrace_x86_linux.go
        vendor/golang.org/x/sys/windows/empty.s
forgot to return after an error for conntrack deletion

Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>
…delete conntrack

the original implementation of the conntrack deletion will cause errors
it deletes all conntrack accross all ports for all protocols.

after revendoring the netlink code use it to allow filtering the UDP and SCTP
protocols and filter to delete selected ports

Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>

the cherry pick did not apply cleanly commit 663ba29
was added between since we checked for format of error and log messages. Needed slight
manual edits becasue of the log messages
@JacobTanenbaum
Copy link
Contributor Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 22, 2020
@JacobTanenbaum
Copy link
Contributor Author

@dcbw please take a look, manual edits where necessary on two of the commits (4f8232c and e25f547) the first because the revendor wasn't clean and the second because a large PR that edited the formating of log messages went in that has not been backported (and doesn't backport cleanly). Let me know what you think?

@JacobTanenbaum
Copy link
Contributor Author

aws had some sig-storage issues

/retest

@dcbw
Copy link
Contributor

dcbw commented Aug 5, 2020

/approve
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 5, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dcbw, JacobTanenbaum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 5, 2020
@sdodson sdodson added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Aug 5, 2020
@openshift-merge-robot openshift-merge-robot merged commit dd9e330 into openshift:release-4.5 Aug 5, 2020
@openshift-ci-robot
Copy link
Contributor

@JacobTanenbaum: All pull requests linked via external trackers have merged: openshift/ovn-kubernetes#180. Bugzilla bug 1845664 has been moved to the MODIFIED state.

In response to this:

Bug 1845664: release-4.5 Cleanup Conntrack when endpoints and pods are deleted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants