Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iptables rules not deleted when using hostPort #3412

Closed
sergey-safarov opened this issue Apr 3, 2020 · 11 comments
Closed

iptables rules not deleted when using hostPort #3412

sergey-safarov opened this issue Apr 3, 2020 · 11 comments

Comments

@sergey-safarov
Copy link

I use nodePort definition for nginx deployment.
When deployment is deleted then related IP iptables rules not cleared.
If I create again deployment then iptable contains two related rules and traffic forwarding is broken.

Expected Behavior

iptables rules cleared when deleted deployment with hostPort definition.

Current Behavior

iptables rules still present after deployment deleted.

Possible Solution

I not know.

Steps to Reproduce (for bugs)

  1. need create deployment using this yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: http
spec:
  selector:
    matchLabels:
      app: http
  template:
    metadata:
      labels:
        app: http
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
          hostPort: 80
  1. check present of DNAT rule for new container
[root@safarov-server wordpress]# iptables-save | grep "dnat name"
-A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: \"k8s-pod-network\" id: \"efa1c24cca6a2f355ad807349d4e4e2fdf17ea5d7af126aedcdd1bfeb7756b21\"" -m multiport --dports 80 -j CNI-DN-93fd5f42fffe1daf6b01f
  1. delete deployment
  2. check DNAT rule is still exist
[root@safarov-server wordpress]# iptables-save | grep "dnat name"
-A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: \"k8s-pod-network\" id: \"efa1c24cca6a2f355ad807349d4e4e2fdf17ea5d7af126aedcdd1bfeb7756b21\"" -m multiport --dports 80 -j CNI-DN-93fd5f42fffe1daf6b01f
  1. create deployment again;
  2. check that new deployment not available outside from host;
  3. check DNAT rules. Must be here two rules
[root@safarov-server wordpress]# iptables-save | grep "dnat name"
-A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: \"k8s-pod-network\" id: \"efa1c24cca6a2f355ad807349d4e4e2fdf17ea5d7af126aedcdd1bfeb7756b21\"" -m multiport --dports 80 -j CNI-DN-93fd5f42fffe1daf6b01f
-A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: \"k8s-pod-network\" id: \"1a282db50a9120f4e79b0a9c293d3ee70dcd2bf1d4de2759f7159f3b52ba2af9\"" -m multiport --dports 80 -j CNI-DN-89bcce78e6dc2f2eaab60

Context

I not able publish nginx on my dev server using 80 and 443 port.

Your Environment

  • Calico version
[root@safarov-server wordpress]# /opt/cni/bin/calico -v
v3.13.2
  • Orchestrator version (e.g. kubernetes, mesos, rkt):
[root@safarov-server wordpress]# kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:58:59Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:50:46Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
  • Operating System and version:
[root@safarov-server wordpress]# cat /etc/os-release 
NAME="CentOS Linux"
VERSION="8 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="8"
  • Link to your project (optional):
@tmjd
Copy link
Member

tmjd commented Apr 15, 2020

Thank you for reporting this.
I suspect this is a problem with the hostPort plugin "portmap" that Calico configures as part of the CNI conflist it installs.
Here is the upstream repo https://github.com/containernetworking/plugins/tree/master/plugins/meta/portmap

@sergey-safarov
Copy link
Author

How i can get additional debug logs?
Hope this allow identify what part need to change.

@sergey-safarov
Copy link
Author

I faced with same issue with cilium with chained HostPort cni (description)
So think issue related to HostPort cni or to kubernetes 1.18.

Is required close this ticket?

@davojan
Copy link

davojan commented May 11, 2020

@sergey-safarov this is definitely not about 1.18, I have 1.17.4 with the issue. It seems that HostPort is totally broken in kubernetes :(

I don't think this ticket should be closed because as an end-user we don't install hostPort plugin directly but choose calico as a networking solution which seems broken in terms of hostPort support.

@tmjd I just wanted to state that this is pretty severe issue. I experienced half-hour totally unexpected downtime because of it just by re-deploying minor configuration update on my traefik ingress controller which uses hostPort (on bare-metal cluster). The only way to deal with this is to restart those nodes or to identify and clean the iptables rules manually.

I've tried to navigate and search through the provided link to the plugin and it seems that nobody going to fix the issue. I'm quite new to the k8s and don't deeply understand how the networking works, maybe the calico maintainers could consider pushing this to be fixed or consider changing the plugin to more maintained one (if there is) or at least document a big red warning that hostPort feature is totally broken with calico? Thank you.

@caseydavenport
Copy link
Member

Has anyone submitted an upstream issue against the hostPort plugin?

Calico doesn't play a role in the implementation of host ports, so I'm not sure there's much we could do here.

The other thing to look into is whether or not the CNI plugin is even getting called to tear down the pod. It could be an issue with the runtime not calling the CNI plugin, or it could be an issue with the host port plugin itself.

@iptizer
Copy link

iptizer commented Jun 20, 2020

I am also experiencing this, also on CentOS8.

Could it be that this issue is NFT related, as CentOS8 uses NFT, which is configured as backend. I'd think that otherwise more people would have complained in the mean time.

I honestly don't know where to start debuggig, any suggestions welcome.

@iptizer
Copy link

iptizer commented Jun 28, 2020

Okay, did a round of debuggin I wanted to document here.

Using the following manifest:

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: nginx
  name: nginx
  namespace: troubleshoot
spec:
  containers:
  - image: nginx
    name: nginx
    resources: {}
    ports:
    - containerPort: 80
      hostPort: 10081
      name: http
    - containerPort: 443
      hostPort: 8443
      name: https
    
  dnsPolicy: ClusterFirst
  restartPolicy: Never

Then executing the following:

$ k apply -f nginx.yml; sleep 15 ;k delete -f nginx.yml
$ # checking iptables
$ iptables -t nat --line-numbers -L CNI-HOSTPORT-DNAT
Chain CNI-HOSTPORT-DNAT (2 references)
num  target     prot opt source               destination         
1    CNI-DN-bcfdb00a9541b4df781a0  tcp  --  anywhere             anywhere             /* dnat name: "cni0" id: "e05c0b7dc624c20da56c8db60492744381eb27b7449e28e87f8efa6da8210a9e" */ multiport dports kamanda,pcsync-https
$ # next round
$ k apply -f nginx.yml; sleep 15 ;k delete -f nginx.yml
$ iptables -t nat --line-numbers -L CNI-HOSTPORT-DNAT
Chain CNI-HOSTPORT-DNAT (2 references)
num  target     prot opt source               destination         
1    CNI-DN-bcfdb00a9541b4df781a0  tcp  --  anywhere             anywhere             /* dnat name: "cni0" id: "e05c0b7dc624c20da56c8db60492744381eb27b7449e28e87f8efa6da8210a9e" */ multiport dports kamanda,pcsync-https
2    CNI-DN-2a1d84389734e54983abb  tcp  --  anywhere             anywhere             /* dnat name: "cni0" id: "24b63cb1cbda96c6c89e5fb89700141460323010decfb98160b6434700e3a9a4" */ multiport dports kamanda,pcsync-https

This means the target=CNI-DN-bcfdb00a9541b4df781a0 should have been deleted, but isn't. To verify this, see the following:

$ iptables -t nat --line-numbers -L CNI-DN-bcfdb00a9541b4df781a0
Chain CNI-DN-bcfdb00a9541b4df781a0 (1 references)
num  target     prot opt source               destination         
1    CNI-HOSTPORT-SETMARK  tcp  --  10.233.124.50        anywhere             tcp dpt:kamanda
2    CNI-HOSTPORT-SETMARK  tcp  --  localhost6           anywhere             tcp dpt:kamanda
3    DNAT       tcp  --  anywhere             anywhere             tcp dpt:kamanda to:10.233.124.50:80
4    CNI-HOSTPORT-SETMARK  tcp  --  10.233.124.50        anywhere             tcp dpt:pcsync-https
5    CNI-HOSTPORT-SETMARK  tcp  --  localhost6           anywhere             tcp dpt:pcsync-https
6    DNAT       tcp  --  anywhere             anywhere             tcp dpt:pcsync-https to:10.233.124.50:443
$ kubectl get po -A -o wide | grep 10.233.124.50
$

=> No pod with this IP address exists. I verified this, when the pod exists, an entry is shown.

Next step, setting FELIX to loglevel debug and using stern to show the logs:

$ stern calico-node- | grep -i CNI-DN-bcfdb00a9541b4df781a0
calico-node-vx7p8 calico-node 2020-06-28 15:12:07.416 [DEBUG][61] table.go 729: Parsing line ipVersion=0x4 line=":CNI-DN-bcfdb00a9541b4df781a0 - [0:0]" table="nat"
calico-node-vx7p8 calico-node 2020-06-28 15:12:07.416 [DEBUG][61] table.go 736: Found forward-reference chainName="CNI-DN-bcfdb00a9541b4df781a0" ipVersion=0x4 line=":CNI-DN-bcfdb00a9541b4df781a0 - [0:0]" table="nat"
calico-node-vx7p8 calico-node 2020-06-28 15:12:07.420 [DEBUG][61] table.go 729: Parsing line ipVersion=0x4 line="-A CNI-HOSTPORT-DNAT -p tcp -m comment --comment \"dnat name: \\\"cni0\\\" id: \\\"e05c0b7dc624c20da56c8db60492744381eb27b7449e28e87f8efa6da8210a9e\\\"\" -m multiport --dports 10081,8443 -j CNI-DN-bcfdb00a9541b4df781a0" table="nat"
calico-node-vx7p8 calico-node 2020-06-28 15:12:07.423 [DEBUG][61] table.go 729: Parsing line ipVersion=0x4 line="-A CNI-DN-bcfdb00a9541b4df781a0 -s 10.233.124.50/32 -p tcp -m tcp --dport 10081 -j CNI-HOSTPORT-SETMARK" table="nat"
calico-node-vx7p8 calico-node 2020-06-28 15:12:07.423 [DEBUG][61] table.go 729: Parsing line ipVersion=0x4 line="-A CNI-DN-bcfdb00a9541b4df781a0 -s 127.0.0.1/32 -p tcp -m tcp --dport 10081 -j CNI-HOSTPORT-SETMARK" table="nat"
calico-node-vx7p8 calico-node 2020-06-28 15:12:07.423 [DEBUG][61] table.go 729: Parsing line ipVersion=0x4 line="-A CNI-DN-bcfdb00a9541b4df781a0 -p tcp -m tcp --dport 10081 -j DNAT --to-destination 10.233.124.50:80" table="nat"
calico-node-vx7p8 calico-node 2020-06-28 15:12:07.423 [DEBUG][61] table.go 729: Parsing line ipVersion=0x4 line="-A CNI-DN-bcfdb00a9541b4df781a0 -s 10.233.124.50/32 -p tcp -m tcp --dport 8443 -j CNI-HOSTPORT-SETMARK" table="nat"
calico-node-vx7p8 calico-node 2020-06-28 15:12:07.423 [DEBUG][61] table.go 729: Parsing line ipVersion=0x4 line="-A CNI-DN-bcfdb00a9541b4df781a0 -s 127.0.0.1/32 -p tcp -m tcp --dport 8443 -j CNI-HOSTPORT-SETMARK" table="nat"
calico-node-vx7p8 calico-node 2020-06-28 15:12:07.423 [DEBUG][61] table.go 729: Parsing line ipVersion=0x4 line="-A CNI-DN-bcfdb00a9541b4df781a0 -p tcp -m tcp --dport 8443 -j DNAT --to-destination 10.233.124.50:443" table="nat"
calico-node-vx7p8 calico-node 2020-06-28 15:12:07.424 [DEBUG][61] table.go 799: Read hashes from dataplane: map[string][]string{"CNI-DN-2a1d84389734e54983abb":[]string{"", "", "", "", "", ""}, "CNI-DN-365d9a185cff3e5ea9379":[]string{"", ""}, "CNI-DN-442e7af5e08e6bf46621a":[]string{"", "", ""}, "CNI-DN-90b43263af9d462e04f9b":[]string{"", "", "", "", "", ""}, "CNI-DN-9a7663babfc285ea01c15":[]string{"", "", "", "", "", ""}, "CNI-DN-bcfdb00a9541b4df781a0":[]string{"", "", "", "", "", ""}, "CNI-DN-e4582bd2b3ff7ed196a28":[]string{"", ""}, "CNI-DN-ea59c359e9d718f7116dc":[]string{}, "CNI-DN-f0f13904d22032106b1e7":[]string{"", "", ""}, "CNI-DN-f54bc33afc414d6e64133":[]string{}, "CNI-DN-fc387103aa44e3b189d09":[]string{"", "", "", "", "", ""}, "CNI-DN-fc95996fce81df369bc47":[]string{}, "CNI-DN-ff88843368fae4d5cdfae":[]string{}, "CNI-HOSTPORT-DNAT":[]string{"", ""}, "CNI-HOSTPORT-MASQ":[]string{""}, "CNI-HOSTPORT-SETMARK":[]string{""}, "INPUT":[]string{}, "KUBE-KUBELET-CANARY":[]string{}, "KUBE-MARK-DROP":[]string{""}, "KUBE-MARK-MASQ":[]string{""}, "KUBE-POSTROUTING":[]string{""}, "OUTPUT":[]string{"tVnHkvAo15HuiPy0", ""}, "POSTROUTING":[]string{"O3lYWMrLQYEMJtB5", "", ""}, "PREROUTING":[]string{"6gwbT8clXdHdC1b1", ""}, "cali-OUTPUT":[]string{"GBTAv2p5CwevEyJm"}, "cali-POSTROUTING":[]string{"Z-c7XtVd2Bq7s_hA", "nYKhEzDlr11Jccal", "SXWvdsbh4Mw7wOln"}, "cali-PREROUTING":[]string{"r6XmIziWUJsdOK6Z"}, "cali-fip-dnat":[]string{}, "cali-fip-snat":[]string{}, "cali-nat-outgoing":[]string{"flqWnvo8yq4ULQLa"}} ipVersion=0x4 table="nat"
calico-node-vx7p8 calico-node 2020-06-28 15:12:07.425 [DEBUG][61] table.go 568: Skipping expected chain chainName="CNI-DN-bcfdb00a9541b4df781a0" ipVersion=0x4 table="nat"

For me the sentence Skipping expected chain chainName="CNI-DN-bcfdb00a9541b4df781a0" means, that Calico still thinks the chain is required.

The reference is to this line of code in Felix. As the cause of the problem seems to be in Felix, I will open an issue there.

@skydoctor
Copy link

Does anyone know if this has been fixed in upstream portmap or here?

@caseydavenport caseydavenport changed the title iptables rules not delete when used hostPort iptables rules not deleteed when using hostPort Aug 7, 2020
@caseydavenport caseydavenport changed the title iptables rules not deleteed when using hostPort iptables rules not deleted when using hostPort Aug 7, 2020
@davojan
Copy link

davojan commented Apr 12, 2022

@caseydavenport can you please explain - is this issue closed because it's fixed in some version (which?) or just because it's inactive?

@bozoli
Copy link

bozoli commented Apr 27, 2023

I saw this issue happening in several environments (3 different customers + internal labs) in the last 12 months, including one customer with Calico Enterprise with Tigera support.
Tigera said they couldn't reproduce the issue so they didn't push to a resolution. They blamed on the Portmap plugin, but this plugin is included in Calico, so it still part of their solution.

@lwabish
Copy link

lwabish commented Sep 12, 2023

Met this, too. Iptables shows duplicate rules for hostPort. Even after deleting the container hostPorts, the redundant rule still exists.
Rebooting the node does correct this problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants