Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean up kube-proxy stale-conntrack-entry handling, revert broken code #115299

Merged
merged 4 commits into from Mar 14, 2023

Conversation

danwinship
Copy link
Contributor

What type of PR is this?

/kind cleanup
/sig network

What this PR does / why we need it:

As discussed in #108523 (comment), the "delete stale conntrack entries for UDP and SCTP connections" code in the iptables and ipvs proxiers actually only deletes stale conntrack entries for UDP connections; #74073 was mostly a no-op at the time it was merged, because the fix it originally contained got sniped by another change elsewhere in the code that silently broke it.

While we could re-fix things to work the way they were supposed to work post-#74073, I don't think that is the right answer at this point; no one has complained about the current behavior being broken in a while, and we had a long argument about the correct behavior in #108523 without any of the participants ever noticing that we were claiming kube-proxy did something that it does not do. It seems that either (a) people are happy with kube-proxy's current behavior, or (b) nobody is actually using SCTP Services in kubernetes any more.

Either way, given that we don't seem totally sure what the right behavior is, this PR doesn't change the current behavior. It just reverts the non-functional parts of #74073 and renames some struct fields and rewrites comments so as to make it clearer what the behavior is (and always has been).

I considered just changing conntrack.IsClearConntrackNeeded to only return true for UDP, and leaving the rest of the original PR in place (so as to put us in a better place for eventually "re"-enabling SCTP conntrack clearing). However, I suspect that we don't want a single "protocol needs conntrack cleanup" predicate anyway; we may want to do different protocols in different cases. (UDPStaleClusterIP DeletedUDPClusterIPs was always UDP-specific, before and after #74073, and I think StaleServiceNames NewlyActiveUDPServices probably ought to be UDP-specific as well.)

The last commit removes some "protocol == UDP" checks that are now "obviously unnecessary", but it broke the deleteEndpointConnections unit tests because they were passing data to that function that would now be considered wrong. Hence "Make deleteEndpointConnection test use syncProxyRules", which makes it so that the test results are unchanged, by letting syncProxyRules decide whether to actually call deleteEndpointConnection or not. This is kind of a big ugly change and makes the "unit test" more of an integration test kind of, so maybe it would be better instead to just remove the TCP/SCTP tests from that function? (Though as noted in #108523, if the test had been written this way in the first place then it would have caught the fact that #74073 was broken.)

Which issue(s) this PR fixes:

none, but related to #108523

Does this PR introduce a user-facing change?

NONE

/cc @uablrek @tssurya @thockin @aojea

@k8s-ci-robot k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Jan 24, 2023
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Jan 24, 2023
@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 24, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/ipvs labels Jan 24, 2023
}
detectLocal, _ := proxyutiliptables.NewDetectLocalByCIDR("10.0.0.0/8", ipt)
detectLocal, _ := proxyutiliptables.NewDetectLocalByCIDR(podCIDR, ipt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

@aojea
Copy link
Member

aojea commented Jan 28, 2023

/lgtm

or how we could avoid hours/days of discussion if just someone have tested it 🙃

I don't know why I assumed this was running in production somewhere

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 28, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 9190d8a7036fe56cc803748063e76ae49ebab7fc

@aojea
Copy link
Member

aojea commented Jan 28, 2023

/assign @thockin

you need approval for conntrack utils

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 8, 2023
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 8, 2023
@danwinship
Copy link
Contributor Author

(rebased for FakeExec vs *FakeExec changes in unit tests)

@aojea
Copy link
Member

aojea commented Mar 1, 2023

/lgtm
@thockin we need pkg/OWNERS approval

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 1, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 0d91e7fb7b84ff652f56394b087a2984138f3ae0

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 4, 2023
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 5, 2023
}
return nil
}

Copy link
Member

@aojea aojea Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This library is only used by the proxies ,

pkg/proxy/iptables/proxier.go:                          err = conntrack.ClearEntriesForPortNAT(proxier.exec, endpointIP, nodePort, svcProto)
pkg/proxy/iptables/proxier.go:                  err = conntrack.ClearEntriesForNAT(proxier.exec, svcInfo.ClusterIP().String(), endpointIP, svcProto)
pkg/proxy/iptables/proxier.go:                          err := conntrack.ClearEntriesForNAT(proxier.exec, extIP, endpointIP, svcProto)
pkg/proxy/iptables/proxier.go:                          err := conntrack.ClearEntriesForNAT(proxier.exec, lbIP, endpointIP, svcProto)
pkg/proxy/iptables/proxier.go:          if err := conntrack.ClearEntriesForIP(proxier.exec, svcIP, v1.ProtocolUDP); err != nil {
pkg/proxy/iptables/proxier.go:          err := conntrack.ClearEntriesForPort(proxier.exec, nodePort, isIPv6, v1.ProtocolUDP)
pkg/proxy/ipvs/proxier.go:              if err := conntrack.ClearEntriesForIP(proxier.exec, svcIP, v1.ProtocolUDP); err != nil {
pkg/proxy/ipvs/proxier.go:              err := conntrack.ClearEntriesForPort(proxier.exec, nodePort, proxier.ipFamily == v1.IPv6Protocol, v1.ProtocolUDP)
pkg/proxy/ipvs/proxier.go:                              err = conntrack.ClearEntriesForPortNAT(proxier.exec, endpointIP, nodePort, svcProto)
pkg/proxy/ipvs/proxier.go:                      err = conntrack.ClearEntriesForNAT(proxier.exec, svcInfo.ClusterIP().String(), endpointIP, svcProto)
pkg/proxy/ipvs/proxier.go:                              err := conntrack.ClearEntriesForNAT(proxier.exec, extIP, endpointIP, svcProto)
pkg/proxy/ipvs/proxier.go:  

I think that previously it was used by kubelet hostport, but now it makes sense to me that we move this to pkg/proxy/util ... and we also have it under the sig-network approvers umbrella

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes... the person who was working on the iptables/ipvs conntrack sync was going to do that I think. Do you want me to do that here? (We could also just fix the OWNERS file in the existing dir...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, fixing the owners thing is the point, he is waiting on this PR.
I think is better if you can do it here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 14, 2023
@@ -265,46 +235,42 @@ func TestDeleteConnections(t *testing.T) {
}
}

func TestClearConntrackForPortNAT(t *testing.T) {
func TestClearUDPConntrackForPortNAT(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this change, we remove one test case of the existing 2, but you append new outputs as it there are 3 test cases, am I looking it wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fakeexec is just terrible... I tried to fix it once (kubernetes/utils#14)

anyway, as it was, the PR correctly reverted the test back to how it was before 70020, but yes, it had the wrong number of entries before; the test case code checks that all of the expected outputs are correct, but it doesn't check that there are no extra expected outputs. Presumably someone copy+pasted this from one of the other test cases where there were more expected commands and forgot to delete the extras.

anyway, repushed with a fix

@@ -177,29 +177,21 @@ func TestClearUDPConntrackForPort(t *testing.T) {
}
}

func TestDeleteConnections(t *testing.T) {
func TestDeleteUDPConnections(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ok, we remove 3 test cases and we remove the 3 fake calls, TestClearUDPConntrackForPortNAT seems something went missing on the revert, no?

return netIP.To4() == nil
// Check the executed conntrack command
if tc.protocol == UDP {
if fexec.CommandCalls != 2 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 times?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, is explained below,

This commit did not actually work; in between when it was first
written and tested, and when it merged, the code in
pkg/proxy/endpoints.go was changed to only add UDP endpoints to the
"stale endpoints"/"stale services" lists, and so checking for "either
UDP or SCTP" rather than just UDP when processing those lists had no
effect.

This reverts most of commit aa8521d
(but leaves the changes related to
ipvs.IsRsGracefulTerminationNeeded() since that actually did have the
effect it meant to have).
The APIs talked about "stale services" and "stale endpoints", but the
thing that is actually "stale" is the conntrack entries, not the
services/endpoints. Fix the names to indicate what they actual keep
track of.

Also, all three fields (2 in the endpoints update object and 1 in the
service update object) are currently UDP-specific, but only the
service one made that clear. Fix that too.
Rather than calling fp.deleteEndpointConnection() directly, set up the
proxy to have syncProxyRules() call it, so that we are testing it in
the way that it actually gets called.

Squash the IPv4 and IPv6 unit tests together so we don't need to
duplicate all that code. Fix a tiny bug in NewFakeProxier() found
while doing this...
Now that the endpoint update fields have names that make it clear that
they only contain UDP objects, it's obvious that the "protocol == UDP"
checks in the iptables and ipvs proxiers were no-ops, so remove them.
@aojea
Copy link
Member

aojea commented Mar 14, 2023

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 14, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 3f99f3f8cd2ab08d00206bce260d57722adc61a0

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

// and then goes unready or changes its IP address.
// detectStaleConntrackEntries detects services that may be associated with stale conntrack entries.
// (See UpdateEndpointMapResult.DeletedUDPEndpoints and .NewlyActiveUDPServices.)
func detectStaleConntrackEntries(oldEndpointsMap, newEndpointsMap EndpointsMap, deletedUDPEndpoints *[]ServiceEndpoint, newlyActiveUDPServices *[]ServicePortName) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine as a continuation of what we've always done, but the REAL fix would be to load conntrack info and scan that for tuples that don't match and need to be discarded. It works as is, but if anything goes wrong, we lose the "event" and never come back to reconcile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine as a continuation of what we've always done, but the REAL fix would be to load conntrack info and scan that for tuples that don't match and need to be discarded.

do you mean to do a diff between the conntrack table and the "virtual generated by kube-proxy expected conntrack table" ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of. Like "run conntrack -L and look for UDP records which correspond to a service endpoint, but that endpoint doesn't exist in that service" as opposed to a removal "event" .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be interesting if someone can do the exploratory work to see how expensive is to list all the conntrack table ... some time ago I asked one kernel/netfilter developer to implement server side filtering so you can dump only some specific entries, he told me this will solve the problem http://patchwork.ozlabs.org/project/netfilter-devel/patch/20210730131422.16958-3-fw@strlen.de/ but I never had time to follow up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/ipvs cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/network Categorizes an issue or PR as relevant to SIG Network. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants