Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hostPort is not working after upgrade from calico v3.15.5 to any latest(v3.16.0 above) #4617

Open
asudhakarreddy opened this issue May 20, 2021 · 15 comments

Comments

@asudhakarreddy
Copy link

asudhakarreddy commented May 20, 2021

Expected Behavior

In k8s deployment container hostPort should expose to the host level

Current Behavior

with calico version v3.16.0 and above hostPort functionality broken

Possible Solution

Steps to Reproduce (for bugs)

  1. Create a Kubernetes cluster with version v1.19.8 and use Calico CNI v3.16.10
  2. Deploy an application with exposing SNMP trap port 162 as hostPort
  3. Check that port is exposed on the host level or not.
  4. With the same Kubernetes v1.19.8 and Calico v3.15.5, when we expose container port as hostPort then on host also listening on same ports.

Context

Due to this issue applications will not support SNMP traps, Syslog, telemetry functionalities

Your Environment

  • Calico version v3.16.10
  • Orchestrator version (e.g. kubernetes, mesos, rkt): kubernetes v1.19.8
  • Operating System and version: Centos7 with firewalld enabled
  • Link to your project (optional):
@song-jiang
Copy link
Member

song-jiang commented May 20, 2021

@asudhakarreddy How did you install Calico v3.16.10. AFAIK, hostport is done by portmap CNI plugin which chained with Calico CNI.

@asudhakarreddy
Copy link
Author

@song-jiang
Copy link
Member

song-jiang commented May 20, 2021

Do you happen to get any log from kubelet on portmap plugin? I suspect we may have an issue here since we are not actively testing portmap plugin.

@asudhakarreddy
Copy link
Author

@song-jiang with this commit it seems broken projectcalico/felix#2424

@shahaf600
Copy link

Hey,
We get the same issue while using firewalld on those kubernetes servers.
We run Kubernetes v1.16 via kubeadm on prem, and we are trying to upgrade from calico 3.3.x to 3.16.x .
The calico upgrade is affecting the chains order in the iptables in a way that blocks traffic to pods from outside the node subnet. It is happening because we are using REJECT first rule in firewalld.
We solved it by changing the firewalld default zone rule to 'default' instead of REJECT. after that it seems to work again and also firewalld is still working properly.

@jurschel
Copy link

jurschel commented Feb 3, 2022

@caseydavenport Is this still an active problem? I am experiencing a problem with Datadog APM not working because the hostPort isn't being mapped in correctly. We are running a cluster that is using version 3.18.1 of calico...

This is in our configmap so it should be working...
{ "type": "portmap", "snat": true, "capabilities": {"portMappings": true} },

@caseydavenport
Copy link
Member

caseydavenport commented Feb 11, 2022

We get the same issue while using firewalld on those kubernetes servers.

This sounds different, and is clearly called out in the docs requirements for Calico:

If your Linux distribution comes with installed Firewalld or another iptables manager it should be disabled. These may interfere with rules added by Calico and result in unexpected behavior.

https://projectcalico.docs.tigera.io/getting-started/kubernetes/requirements

@jurschel
Copy link

@caseydavenport No firewalld enabled in my environment that I'm aware of. 3.18.1 is definitely still having this problem for us.

@caseydavenport
Copy link
Member

We run e2e tests that verify host ports function properly regularly, so I can confirm that in some capacity hostPorts should be functioning. There may be something specific about your environment that's causing this.

host ports are implemented in iptables by the portmap plugin - can you confirm whether or not you see those rules being programmed in iptables for pods with hostPort set?

@sfudeus
Copy link

sfudeus commented Feb 11, 2022

@jurschel We use calico and portmap in parallel for hostPorts for quite a while through many calico versions and hadn't had any problem with that. I cannot guarantee that for 3.18.1 specifically but I could go through our history. We recently switched from 3.20 to 3.21 and have no issues with that.

@jurschel
Copy link

jurschel commented Feb 11, 2022

I asked our managed provider to discuss upgrading calico to the current version. We use platform9 as our provider. Worked for quite a number of days with Datadog engineering to come to the conclusion that likely the version of calico we are on or some other setting is causing it to not map hostPort correctly. When we look it does a mapping to a dynamic port and the host never actually listens on the hostPort that is required... Any guidance on troubleshooting the issue from calico's perspective to gain some insight into it? Like look for logs somewhere etc?

DataDog/helm-charts#527 (comment)

@sfudeus
Copy link

sfudeus commented Feb 11, 2022

Not sure if this is relevant, but we have first calico, then portmap in the cni config, and we set externalSetMarkChain explicitly:

      "plugins": [
        {
          "type": "calico",
.....
        },
        {
          "type": "portmap",
          "snat": true,
          "externalSetMarkChain": "KUBE-MARK-MASQ",
          "capabilities": {"portMappings": true}
        }
      ]

@caseydavenport
Copy link
Member

When we look it does a mapping to a dynamic port and the host never actually listens on the hostPort that is required... Any guidance on troubleshooting the issue from calico's perspective to gain some insight into it? Like look for logs somewhere etc?

The most simple thing, like I mentioned above, is to confirm whether or not the iptables rules are being programmed by the portmap plugin and if so, are they being hit?

sudo iptables-save -c

^ This should tell you whether the rules exist, and give packet/byte counts to see if the rules are being hit. Note that if something else on your host is bound to the hostPort, I believe it will take precedence and traffic will be consumed by the process bound to the port. You should never use a hostPort that matches a port already in use on the node. In generally, I highly recommend against using host ports as they are a tricky and contentious host resource to manage.

@jurschel
Copy link

jurschel commented Feb 11, 2022

Thanks @caseydavenport if I had another choice for a kubernetes deployment of Datadog agent I would use that. Their APM uses hostPort to bind to 8126 and so that's what I got. I've worked around it for the time being with a service but that messes up the metric correlation as it might be from any machine since it's now using a service instead of hostPort.

sudo iptables-save -c |grep 8126
[0:0] -A CNI-DN-ddb0866915bc323564499 -s 10.8.45.71/32 -p tcp -m tcp --dport 8126 -j CNI-HOSTPORT-SETMARK
[0:0] -A CNI-DN-ddb0866915bc323564499 -s 127.0.0.1/32 -p tcp -m tcp --dport 8126 -j CNI-HOSTPORT-SETMARK
[0:0] -A CNI-DN-ddb0866915bc323564499 -p tcp -m tcp --dport 8126 -j DNAT --to-destination 10.8.45.71:8126
[0:0] -A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: "k8s-pod-network" id: "a05ea9f3874b0d9471941ddaa3253c437a07a1e7031d1e427b8a21f21383b525"" -m multiport --dports 8126 -j CNI-DN-ddb0866915bc323564499

root@platform908:~# netstat -l
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 localhost:6010 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:30299 0.0.0.0:* LISTEN
tcp 0 0 localhost:8158 0.0.0.0:* LISTEN
tcp 0 0 localhost:7391 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:53825 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:30152 0.0.0.0:* LISTEN
tcp 0 0 localhost:10248 0.0.0.0:* LISTEN
tcp 0 0 localhost:amqp 0.0.0.0:* LISTEN
tcp 0 0 localhost:10249 0.0.0.0:* LISTEN
tcp 0 0 localhost:5673 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:32362 0.0.0.0:* LISTEN
tcp 0 0 localhost:mysql 0.0.0.0:* LISTEN
tcp 0 0 localhost:9099 0.0.0.0:* LISTEN
tcp 0 0 localhost:2379 0.0.0.0:* LISTEN
tcp 0 0 platform908.gmcps.:2380 0.0.0.0:* LISTEN
tcp 0 0 localhost:5100 0.0.0.0:* LISTEN
tcp 0 0 localhost:33421 0.0.0.0:* LISTEN
tcp 0 0 localhost:8558 0.0.0.0:* LISTEN
tcp 0 0 localhost:8111 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:sunrpc 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:31280 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:bgp 0.0.0.0:* LISTEN
tcp 0 0 localhost:5395 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.53:domain 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:ssh 0.0.0.0:* LISTEN
tcp 0 0 localhost:8023 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:34583 0.0.0.0:* LISTEN
tcp 0 0 localhost:6264 0.0.0.0:* LISTEN
tcp 0 0 localhost:9080 0.0.0.0:* LISTEN
tcp 0 0 localhost:9977 0.0.0.0:* LISTEN
tcp 0 0 localhost:12121 0.0.0.0:* LISTEN
tcp6 0 0 ip6-localhost:6010 [::]:* LISTEN
tcp6 0 0 [::]:https [::]:* LISTEN
tcp6 0 0 ip6-localhost:8158 [::]:* LISTEN
tcp6 0 0 ip6-localhost:7391 [::]:* LISTEN
tcp6 0 0 [::]:4001 [::]:* LISTEN
tcp6 0 0 ip6-localhost:amqp [::]:* LISTEN
tcp6 0 0 ip6-localhost:5673 [::]:* LISTEN
tcp6 0 0 [::]:10250 [::]:* LISTEN
tcp6 0 0 ip6-localhost:mysql [::]:* LISTEN
tcp6 0 0 [::]:10251 [::]:* LISTEN
tcp6 0 0 [::]:10252 [::]:* LISTEN
tcp6 0 0 [::]:9100 [::]:* LISTEN
tcp6 0 0 ip6-localhost:5100 [::]:* LISTEN
tcp6 0 0 ip6-localhost:8558 [::]:* LISTEN
tcp6 0 0 ip6-localhost:8111 [::]:* LISTEN
tcp6 0 0 [::]:41903 [::]:* LISTEN
tcp6 0 0 [::]:47407 [::]:* LISTEN
tcp6 0 0 [::]:sunrpc [::]:* LISTEN
tcp6 0 0 [::]:10256 [::]:* LISTEN
tcp6 0 0 [::]:10257 [::]:* LISTEN
tcp6 0 0 [::]:10259 [::]:* LISTEN
tcp6 0 0 ip6-localhost:5395 [::]:* LISTEN
tcp6 0 0 [::]:ssh [::]:* LISTEN
tcp6 0 0 ip6-localhost:8023 [::]:* LISTEN
tcp6 0 0 [::]:30072 [::]:* LISTEN
tcp6 0 0 ip6-localhost:6264 [::]:* LISTEN
tcp6 0 0 ip6-localhost:9080 [::]:* LISTEN
tcp6 0 0 ip6-localhost:12121 [::]:* LISTEN
udp 0 0 127.0.0.53:domain 0.0.0.0:*
udp 0 0 0.0.0.0:sunrpc 0.0.0.0:*
udp 0 0 localhost:952 0.0.0.0:*
udp 0 0 0.0.0.0:56428 0.0.0.0:*
udp 0 0 0.0.0.0:42008 0.0.0.0:*
udp6 0 0 [::]:32858 [::]:*
udp6 0 0 [::]:sunrpc [::]:*
udp6 0 0 [::]:56407 [::]:*
raw 0 0 0.0.0.0:ah 0.0.0.0:* 7
raw 0 0 0.0.0.0:ah 0.0.0.0:* 7
raw6 0 0 [::]:ipv6-icmp [::]:*

@caseydavenport
Copy link
Member

[0:0] -A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: "k8s-pod-network" id: "a05ea9f3874b0d9471941ddaa3253c437a07a1e7031d1e427b8a21f21383b525"" -m multiport --dports 8126 -j CNI-DN-ddb0866915bc323564499

Doesn't look like any packets are hitting that rule. So, either there was no traffic to the pod around the time that you ran this command, or there is another rule earlier in the chain that is accepting / denying the traffic before it reaches here.

I'd recommend trying to ensure there is traffic flowing to the service on the hostport, and then monitor to see if it is being handled by another rule. From the netstat output, doesn't look like anyone else is listening on that port so that's good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants