Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-proxy: Drop packets in INVALID state #74840

Open
wants to merge 1 commit into
base: master
from

Conversation

@anfernee
Copy link
Member

commented Mar 2, 2019

Fixes: #74839

What type of PR is this?
/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #74839

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Packets considered INVALID by conntrack are now dropped. In particular, this fixes
a problem where spurious retransmits in a long-running TCP connection to a service
IP could result in the connection being closed with the error "Connection reset by
peer"
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Mar 2, 2019

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: anfernee
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: thockin

If they are not already assigned, you can assign the PR to them by writing /assign @thockin in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@bowei

This comment has been minimized.

Copy link
Member

commented Mar 5, 2019

/assign @thockin

Is this testable is some way?

@anfernee

This comment has been minimized.

Copy link
Member Author

commented Mar 5, 2019

/test pull-kubernetes-integration

@anfernee

This comment has been minimized.

Copy link
Member Author

commented Mar 5, 2019

the repro (https://github.com/tcarmet/k8s-connection-reset) is more like a load test, it would be nice to have a simpler deterministic reproducing script, but currently i don't have one. other than that, a unit test can be added too.

I think it would be nice to make it into v1.14 's march 7th window.

@mainred
Copy link

left a comment

Should we also drop invalid state packets that will be delivered locally? like the node port service queries.

Show resolved Hide resolved pkg/proxy/iptables/proxier.go Outdated
@thockin

This comment has been minimized.

Copy link
Member

commented Mar 5, 2019

This is subtle, and I don't really understand what would cause it to happen. Without a test I have no way to say whether this is doing anything at all. Do we have any way to force this condition?

We must look at this for all sorts of traffic, too - NodePorts, LB IPs, etc. Does this cover them all?

Also need to be looked at in IPVS mode. @m1093782566

@anfernee

This comment has been minimized.

Copy link
Member Author

commented Mar 9, 2019

I created a small app to reproduce this issue: https://github.com/anfernee/k8s-issue-74839

@anfernee

This comment has been minimized.

Copy link
Member Author

commented Mar 12, 2019

@m1093782566 do you have comments?

@anfernee anfernee force-pushed the anfernee:connreset branch from a86204f to a07169b Mar 18, 2019

@anfernee

This comment has been minimized.

Copy link
Member Author

commented Mar 18, 2019

@mainred this specific bug is all about the cluster-local traffic between 2 pods. other incoming traffic is not affected. The rule fixes the returning packet, so either NodePort or ClusterIP should already be fixed together.

@thockin

This comment has been minimized.

Copy link
Member

commented Mar 18, 2019

// unexpected connection reset.
// https://github.com/kubernetes/kubernetes/issues/74839
writeLine(proxier.filterRules,
"-A", string(kubeForwardChain),

This comment has been minimized.

Copy link
@danwinship

danwinship Mar 28, 2019

Contributor

Wouldn't this need to be near the start of syncProxyRules rather than near the end? Many packets have already been matched and redirected by this point...

This comment has been minimized.

Copy link
@anfernee

anfernee Mar 28, 2019

Author Member

Yes, it's the first rule in KUBE-FORWARD chain.

Chain KUBE-FORWARD (1 references)
target     prot opt source               destination 
*** NEW RULE INSERTED HERE ***
ACCEPT     all  --  anywhere             anywhere             /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT     all  --  10.36.0.0/14         anywhere             /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             10.36.0.0/14         /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED

This comment has been minimized.

Copy link
@danwinship

danwinship Mar 29, 2019

Contributor

oh, all the (relevant) stuff before that is in natRules, not filterRules. ok

This comment has been minimized.

Copy link
@cmluciano

cmluciano Apr 4, 2019

Member

FYI Calico is doing similar things for disabling ConntrackInvalid projectcalico/felix#1424 .

The Calico implementation also works for a cluster that runs IPVS as a load-balancer with iptables-mode of k8s-proxy .

This comment has been minimized.

Copy link
@anfernee

anfernee Apr 5, 2019

Author Member

Ah, good to know. Thanks to bring it up @cmluciano . not sure if it should be an option here.. any reason not to enable that option?

@danwinship

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

So this needs a release note (@anfernee you would add this in the appropriate place in the initial comment on the PR). Something like:

Packets considered INVALID by conntrack are now dropped. In particular, this fixes
a problem where spurious retransmits in a long-running TCP connection to a service
IP could result in the connection being closed with the error "Connection reset by
peer"

And maybe it should not get backported right away, until we have more confidence that this doesn't break anything else.

/lgtm

@thockin

This comment has been minimized.

Copy link
Member

commented Mar 29, 2019

@anfernee - I am still worred that we have no testing or way to test this. Can you do me a favor at least and write a big comment at this code-site explaining it (I know you have a link, but I'd like some more detail here) and explaining why testing on this is "less than ideal" :)

@m1093782566 - we need input on IPVS mode

@Lion-Wei as a backup?

If you are not actively maintaining IPVS any more, please let me know, or suggest others who have expertise in it and can help shoulder the load.

@anfernee

This comment has been minimized.

Copy link
Member Author

commented Mar 29, 2019

@thockin I will. In terms of testing, I have to rely on the existing tests to make sure it doesn't break anything. I also have this https://github.com/anfernee/k8s-issue-74839 to check if #74839 is fixed. It'll be great to make it part of the testgrid.

@anfernee

This comment has been minimized.

Copy link
Member Author

commented Mar 30, 2019

Talked with @MrHohn https://github.com/kubernetes/kubernetes/tree/master/test/images is a good place to host my test app. Action Item for me to add an e2e.

@m1093782566

This comment has been minimized.

Copy link
Member

commented Mar 30, 2019

@lbernail

Would you please take a look at this issue for IPVS mode if you have bandwidth?

@lbernail

This comment has been minimized.

Copy link
Contributor

commented Mar 30, 2019

@m1093782566 I had a look at the original issue and read through the linked page. My thoughts for IPVS:

  • IPVS doesn't rely on NAT as much as iptables so it should be less impacted
  • we still perform NAT for nodeport traffic and hairpin (and in a few other cases based on cluster-cidr and masquerade-all)

I don't know how the ipvs connection tracking system handles this situation. I think invalid packets arriving on the host should be dropped in the forward chain before hitting any ipvs code (because in ipvs mode, packets still hit the conntrack even if they are not NAT-ed), but we need to validate this

If this is ok with iptables, I see no reason it shouldn't be for IPVS. I'll try and reproduce on an ipvs setup and test adding this rule.

@anfernee

This comment has been minimized.

Copy link
Member Author

commented Apr 12, 2019

#76218 is the regression test for this PR.

@anfernee

This comment has been minimized.

Copy link
Member Author

commented Apr 12, 2019

Looks like not much concerns from reviewers. and we are ready to go.

The sequence I want to take is to merge #76218 first to make sure it breaks the testgrid, then merge this one to fix it.

@anfernee

This comment has been minimized.

Copy link
Member Author

commented Apr 12, 2019

@danwinship

This comment has been minimized.

Copy link
Contributor

commented Apr 12, 2019

I would assume that intentionally breaking the test grid is frowned upon. You could just temporarily remove the [Slow] from the test description in the other PR and run e2e-gce a few times

@lbernail

This comment has been minimized.

Copy link
Contributor

commented Apr 15, 2019

I confirm we have the issue with IPVS for the case where we have NAT (verified in host networking to service communication which require SNAT)
I think we can integrate this change as-is in the IPVS proxier
I'll work on a PR

@anfernee

This comment has been minimized.

Copy link
Member Author

commented Apr 16, 2019

@danwinship Done as you recommended, the e2e does fail. Both #76218 and this one should be good to merge now.

@anfernee

This comment has been minimized.

Copy link
Member Author

commented Apr 18, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.