Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPVS does not RR with hostPort on service's POD and running on the same node #60688

Closed
CallMeFoxie opened this issue Mar 2, 2018 · 11 comments
Closed
Assignees
Labels
area/ipvs kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@CallMeFoxie
Copy link
Contributor

CallMeFoxie commented Mar 2, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug
(I think? Or at least possibly undocumented -- I couldn't find any docs about it -- feature?)

** Kube-Proxy IPVS mode only **

What happened:
If you have let's say a Service which is targetting port 4041 on a set of pods that export this port not just as containerPort but also as hostPort,

v1.Pod spec:
...
        ports:
        - containerPort: 4041
          hostPort: 4041
          name: testport
          protocol: TCP
---
v1.Service spec:
  ports:
  - name: miniapps
    nodePort: 30985
    port: 4041
    protocol: TCP
    targetPort: 4041

and now one of these pods runs on a node from where you're trying to reach the service (either the host or any pod on it) it will always target the local POD rather than run the normal RR in IPVS. Does not happen with Iptables backend.

# ipvsadm -ln | grep -A 2 "172.31.243.178" && curl -v 172.31.243.178:4041 -sS -m 1
TCP  172.31.243.178:4041 rr
  -> 10.66.137.237:4041           Masq    1      0          0         
  -> 10.66.140.59:4041            Masq    1      0          0         
* Rebuilt URL to: 172.31.243.178:4041/
*   Trying 172.31.243.178...
* TCP_NODELAY set
* Connected to 172.31.243.178 (172.31.243.178) port 4041 (#0)
> GET / HTTP/1.1
> Host: 172.31.243.178:4041
> User-Agent: curl/7.52.1
> Accept: */*
... response from the svc

note how the AcctConn/InAcctConn is 0 no matter how long you keep CURLing that service.
When you move the pod off the node it will be RRing just fine. just as well when you remove the hostPort.

I wonder if it is related to this:

5089: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default 
...
    inet 172.31.243.178/32 brd 172.31.243.178 scope global kube-ipvs0
       valid_lft forever preferred_lft forever

in combination with the hostPort marking the packet for local delivery rather than going through the IPVS machinery?

Also it might be the cause of #60305 issue quite possibly.

What you expected to happen:
RR even when the pod is running locally.

How to reproduce it (as minimally and precisely as possible):
Set up a pod on a node with hostPort, run a service on top of it and do a request on the service via svcIP, it will always target the local node without RR.

Environment:

  • Kubernetes version (use kubectl version): 1.9.3
  • Cloud provider or hardware configuration: on-prem
  • OS (e.g. from /etc/os-release): debian stretch
  • Kernel (e.g. uname -a): debian latest stable
  • Networking: Calico in BGP mode

Cheers
Ashley

/sig networking

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 2, 2018
@CallMeFoxie
Copy link
Contributor Author

@kubernetes/sig-network-bugs

@k8s-ci-robot k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Mar 2, 2018
@k8s-ci-robot
Copy link
Contributor

@CallMeFoxie: Reiterating the mentions to trigger a notification:
@kubernetes/sig-network-bugs

In response to this:

@kubernetes/sig-network-bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@islinwb
Copy link
Contributor

islinwb commented Mar 2, 2018

/area ipvs

@CallMeFoxie
Copy link
Contributor Author

CallMeFoxie commented Mar 2, 2018

Actually I can use any svcIP from kube-ipvs0 iface and the port 4041 and it will always end up on the local hostPort :)

It looks like the kube-ipvs0 is working like lo interface and has preference over IPVS machinery.

@Lion-Wei
Copy link

Lion-Wei commented Mar 2, 2018

Thanks for the thorough research.
@m1093782566 Seems like a complicated issue.

@CallMeFoxie
Copy link
Contributor Author

It is probably related to this iptables rule being created:

-A CNI-DN-7704d44bf55727be2ab87 -p tcp -m tcp --dport 4141 -j DNAT --to-destination 10.66.103.129:4141

(ignore different IP)

@rramkumar1 rramkumar1 mentioned this issue Mar 30, 2018
19 tasks
@Lion-Wei
Copy link

@CallMeFoxie Is this rule created by CNI?

It looks like the kube-ipvs0 is working like lo interface and has preference over IPVS machinery.

That's true, we would suggest service address don't have the same port with hostport.

We will try to figure out how to fix this problem, but gonna need some time.

@CallMeFoxie
Copy link
Contributor Author

Yeah you may be right, it may be created by Calico :)

It's mostly legacy reasons, we're removing hostports, but I am sure others will run into it as well

@Lion-Wei
Copy link

@CallMeFoxie #62718 suppose to fix this issue.

@CallMeFoxie
Copy link
Contributor Author

Thanks! Will test when it gets pulled into a release :)

k8s-github-robot pushed a commit that referenced this issue Apr 28, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix problem that ipvs can't work with hostPort

**What this PR does / why we need it**:
Make ipvs proxy mode can work with pods that have hostPort.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #61938
#60688 and #60305 are related too.

**Special notes for your reviewer**:
IPVS proxier will create dummy device named `kube-ipvs0`, which will maintain all ipvs virtual service address. That means all ipvs maintained clusterIP/externalIP/ingress will be treat as local address.

Then if we have a pod with hostPort, cni will attach this rule to `PREROUTING` chain:
```
KUBE-HOSTPORTS  all  --  0.0.0.0/0            0.0.0.0/0            /* kube hostport portals */ ADDRTYPE match dst-type LOCAL
```
so if a service have same port with pod's hostport, then this service can't be access.

In this pr, we added `ACCESS` rule for traffic that aim to ipvs virtual service, to prevent those traffic from be blocked by other rules.

**Release note**:
```release-note
NONE
```
@m1093782566
Copy link
Contributor

/close

Please check the HEAD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ipvs kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests

5 participants