Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Add node-local-dns support #1524

Merged
merged 9 commits into from Jul 16, 2021
Merged

Add node-local-dns support #1524

merged 9 commits into from Jul 16, 2021

Conversation

surajssd
Copy link
Member

@surajssd surajssd commented Jul 6, 2021

To enable node local dns in your cluster add the following line in the controller config:

enable_node_local_dns = true

@surajssd surajssd force-pushed the surajssd/add-node-local-dns branch 4 times, most recently from 35c74be to b133925 Compare July 7, 2021 09:48
Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to set clusterDNS in kubelet config when this feature is enabled, so it actually make use of local DNS server.

I think we should also:

  • Collect metrics from local DNS.
  • Make sure CoreDNS dashboard works properly with this change.
  • Ensure HA of this solution. As far as I understand, existing solution will cause DNS outage during update. See https://povilasv.me/kubernetes-node-local-dns-cache/ for more details.

pkg/platform/baremetal/baremetal.go Outdated Show resolved Hide resolved
pkg/platform/aws/aws.go Outdated Show resolved Hide resolved
@surajssd
Copy link
Member Author

surajssd commented Jul 9, 2021

We also need to set clusterDNS in kubelet config when this feature is enabled, so it actually make use of local DNS server.

Oh we don't need to do that. So coredns on each host, will bind on clusterDNS IP and nodeLocalDNS IP both. So this works like a plug and play.

More info: https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/#configuration

  • Collect metrics from local DNS.

Then this should go into the prometheus-operator stack, than in the node-local-dns chart.

  • Make sure CoreDNS dashboard works properly with this change.

Sure once above is taken care of then I will look at it.

I will look into this.

@surajssd surajssd force-pushed the surajssd/add-node-local-dns branch 3 times, most recently from c2e5e62 to 9413540 Compare July 12, 2021 12:56
Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
- Add servicemonitor.
- Replicate the coredns grafana dashboard.
- Add test verifying if the node-local-dns is being scraped.

Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
@surajssd surajssd force-pushed the surajssd/add-node-local-dns branch from 9413540 to 3797b39 Compare July 14, 2021 07:49
@surajssd surajssd requested a review from invidian July 14, 2021 12:20
cli/cmd/cluster/apply.go Outdated Show resolved Hide resolved
@surajssd surajssd requested a review from invidian July 15, 2021 07:06
Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still concerned, that upgrading the cluster/controlplane, when it includes local DNS update, will disrupt the DNS resolution on all nodes.

I think we should still consider configurations used in https://github.com/contentful-labs/coredns-nodecache, with set of 2 DaemonSets, so we can avoid the downtime.

Otherwise LGTM (- some suggested code changes)

@surajssd
Copy link
Member Author

I am still concerned, that upgrading the cluster/controlplane, when it includes local DNS update, will disrupt the DNS resolution on all nodes.

AFAIK this won't be an issue while upgrading but when the pod is OOMKiled by the kernel. So when the pod starts it creates an interface and a bunch of iptables rules. If the pod is gracefully killed then it will do the cleanup. Hence all the traffiic will start routing to upstream coredns.

Move the logic to select Kubelet and/or NodeLocalDNS control plane chart
into "common" control plane charts in the function
CommonControlPlaneCharts. Now the function makes decision based on the
struct provided.

Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
node-local-dns runs on host network. And the node-local-dns metrics port
is exposed on 9253, on AWS it needs to be exposed explicitly.

Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
The default IP for the node-local-dns from the link local IP range is
stored in a variable that can be accessed from various locations.

Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
@surajssd surajssd force-pushed the surajssd/add-node-local-dns branch from 3797b39 to 33625d4 Compare July 15, 2021 10:40
charts := platform.CommonControlPlaneCharts(options.UpgradeKubelets)
charts := c.platform.Meta().ControlplaneCharts
if !options.UpgradeKubelets {
charts = removeKubeletChart(charts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very very soft suggestion, more like a thought, we could also write:

Suggested change
charts = removeKubeletChart(charts)
charts = withoutKubeletChart(charts)

Or withoutCharts(charts, []string{platform.KubeletChartName})

Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants