Skip to content

NodeLocal DNS cache breaks connection tracking, conflicts with Calico #98758

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fasaxc opened this issue Feb 4, 2021 · 14 comments
Closed

NodeLocal DNS cache breaks connection tracking, conflicts with Calico #98758

fasaxc opened this issue Feb 4, 2021 · 14 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@fasaxc
Copy link
Contributor

fasaxc commented Feb 4, 2021

What happened:

the node local DNS cache uses NOTRACK rules in iptables and these disable connection tracking. This is incompatible with Calico, which relies on connection tracking to implement its flow-based firewall. Since the DNS traffic gets hit with NOTRACK, the response packets are dropped because they're not part of a flow.

What you expected to happen:

DNS traffic conntracked as normal.

How to reproduce it (as minimally and precisely as possible):

This came from a user report, I think that what's needed is:

  • Install a system with Calico for CNI.
  • Add some network policy to a pod that allows egress DNS but disallows ingress.
  • DNS requests from the pod fail.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.17.4
  • Cloud provider or hardware configuration: Centos/Ubuntu
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug): Calico v3.14.1
  • Others:
@fasaxc fasaxc added the kind/bug Categorizes issue or PR as related to a bug. label Feb 4, 2021
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Feb 4, 2021
@k8s-ci-robot
Copy link
Contributor

@fasaxc: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Feb 4, 2021
@fasaxc
Copy link
Contributor Author

fasaxc commented Feb 4, 2021

/sig network

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 4, 2021
@thockin
Copy link
Member

thockin commented Feb 4, 2021

Can we get a bit more detail about which leg of DNS is being impacted? The rationale for NOTRACK is that connection-tracking + UDP + non-reused sockets == LOTS of conntrack records that serve literally no value but are all stuck waiting to expire => unhappy users.

@prameshj
Copy link
Contributor

prameshj commented Feb 4, 2021

Can we get a bit more detail about which leg of DNS is being impacted? The rationale for NOTRACK is that connection-tracking + UDP + non-reused sockets == LOTS of conntrack records that serve literally no value but are all stuck waiting to expire => unhappy users.

Correct. NOTRACK was added as a feature to save on conntrack table entries and also avoid DNATs which had some race conditions leading to packet drops.
I think a similar issue with NetworkPolicy was reported with NodeLocalDNS and we documented it in https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/dns/nodelocaldns/README.md#network-policy-and-dns-connectivity

Can we get more details on which part of the traffic was being dropped? is it the nodelocaldns to client pod response?

@pacoxu
Copy link
Member

pacoxu commented Feb 5, 2021

In #3795, users use calico GlobalNetworkPolicy and only projectcalico/calico#3795 (comment) without it.

@thockin
Copy link
Member

thockin commented Feb 18, 2021

What do we do with this issue?

@prameshj
Copy link
Contributor

We had this come up elsewhere and it was confirmed that the rule in https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/dns/nodelocaldns/README.md#network-policy-and-dns-connectivity was working.

Given that the node-local-dns behavior depends on allowing untracked connections, we would need this additional config.
Can we check if that works in your setup too @fasaxc ?
/assign @fasaxc

@thockin
Copy link
Member

thockin commented Mar 4, 2021

Ping @fasaxc

@fasaxc
Copy link
Contributor Author

fasaxc commented Mar 9, 2021

Sorry, I need to replicate the user's environment to get the next level of diags; haven't got around to doing that yet.

@thockin
Copy link
Member

thockin commented Mar 18, 2021

@prameshj Are we tracking this as a bug, as a feature request, or is it still needing triage?

@prameshj
Copy link
Contributor

Actually this is is Working as Intended.. so if configuring the rule documented in https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/dns/nodelocaldns/README.md#network-policy-and-dns-connectivity works, then this can be closed, IMO.

@jayunit100
Copy link
Member

... ok, so shall we close?

@prameshj
Copy link
Contributor

ok, let's close it. Please reopen if configuring the additional rule does not work.

/close

@k8s-ci-robot
Copy link
Contributor

@prameshj: Closing this issue.

In response to this:

ok, let's close it. Please reopen if configuring the additional rule does not work.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests

6 participants