Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove trailing dots from the parsed searches from host resolv.conf #83069

Merged

Conversation

fcgravalos
Copy link
Contributor

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test

/kind feature

/kind flake

What this PR does / why we need it:
In short: This will make kubelet ignore trailing dots . in search lines from host resolv.conf files, so when omitDuplicates function gets called, is de-duplicated and we don't hit validation limits. Please

Long story: We provision our Debian stretch based clusters (1.15.3) with kubespray, which supersedes search lines to add svc.clusterFqdn and default.svc.clusterFqdn to the host resolvconf conifg file (/etc/resolv.conf).

https://github.com/kubernetes-sigs/kubespray/blob/da50ed0936742bb633c6b48a84ffca98d8ab03ad/roles/kubernetes/preinstall/tasks/0040-set_facts.yml#L121

When a server restart, dhclient runs, adds trailing dots are added to those search lines.

When a pod with DNS policy ClusterFirst gets scheduled, kubelet will add svc.clusterFqdn and namespace.clusterFqdn to the pod resolv.conf and it will be merged with the host resolv.conf deduplicating entries and doing limit validations.

func (c *Configurer) generateSearchesForDNSClusterFirst(hostSearch []string, pod *v1.Pod) []string {

func omitDuplicates(strs []string) []string {

func (c *Configurer) formDNSSearchFitsLimits(composedSearch []string, pod *v1.Pod) []string {

So, if the search line now contains trailing ., and you have things like svc.clusterFqdn. when scheduling a pod, svc.clusterFqdn and svc.clusterFqdn. are not deduplicated and you can hit one of the validation limits (MaxDNSSearchPaths) and the resulting line may be cut down to this limit, 6.

In our case it removed our search for services outside our kubernetes cluster, so dns request were not sent to upstream resolver and we had some timeouts affecting our production workload.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:
NOTE I'm not saying this is a bug. Adding trailing dots to search lines don't provide any values, but it is harmless, and I think it'd would be nice that kubelet can detect them and remove them so we don't have garbage in pod resolv.conf or we hit a validation limit.

Some facts:

  • Kubernetes version: v1.15.3
  • Kernel: 4.19.0-0.bpo.4-amd64 #1 SMP Debian 4.19.28-2~bpo9+1 (2019-03-27) x86_64 GNU/Linux
  • Os release:
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
  • Cloud: NONE. bare-metal servers,
    Does this PR introduce a user-facing change?:

NONE


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot
Copy link
Contributor

Welcome @fcgravalos!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

@fcgravalos: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 24, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @fcgravalos. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 24, 2019
@fcgravalos
Copy link
Contributor Author

/sig network

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 24, 2019
@fcgravalos
Copy link
Contributor Author

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Sep 24, 2019
@fcgravalos
Copy link
Contributor Author

/assign @freehan

@aojea
Copy link
Member

aojea commented Sep 24, 2019

after this #74686 got merged maybe removing "all" the trailing dots in the search line is not the best solution because seems to change the behavior implemented in the PR mentioned previously.

However, based on the assumption that a domain with or without trailing dot in the search field is the same #74685 (comment), maybe we can consider as a duplicate a suffix with and without a trailing dot.

@fcgravalos
Copy link
Contributor Author

after this #74686 got merged maybe removing "all" the trailing dots in the search line is not the best solution because seems to change the behavior implemented in the PR mentioned previously.

However, based on the assumption that a domain with or without trailing dot in the search field is the same #74685 (comment), maybe we can consider as a duplicate a suffix with and without a trailing dot.

Another option I considered was changin omitDuplicates to do so, but seems used for also server names, so I thought that it make sense when parsing the host resolv.conf

@thz
Copy link
Contributor

thz commented Sep 24, 2019

after this #74686 got merged maybe removing "all" the trailing dots in the search line is not the best solution because seems to change the behavior implemented in the PR mentioned previously.

PR #74686 allows trailing dots in searches. It added the comment // it is fine to have a trailing dot and then removes the dot. This is okay because a trailing dot in a search list entry is just redundant. The search list entries always make the name to which they append to fully qualified (there is no second round of appending). So that trailing dot removal could be seen as a "normalization".

Removing all trailing dots in the searches is also just a "normalization" and not in conflict with #74686.

Although having a trailing dot in the searches is wrong already, it is good behavior to still accept and normalize (ignore) it.

@fcgravalos
Copy link
Contributor Author

Hi @freehan !! 👋

Do you have an estimate about when this can be tested? I ran TestParseResolvConf locally and it did work, but I was curious about e2e tests as well

Thanks a lot!!

Copy link
Contributor

@mattjmcnaughton mattjmcnaughton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test
/lgtm

From my reading of the conversation, I think this looks good to me. Worth adding a comment explaining why we strip the "." (or alternatively extracting the code into a function with a descriptive name (i.e. removeRedundantTrailingPeriod)).

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 26, 2019
@fcgravalos
Copy link
Contributor Author

/retest

@fcgravalos fcgravalos force-pushed the remove_trailing_dots_from_searches branch from b35e708 to 52dbfdf Compare October 4, 2019 10:22
@fcgravalos fcgravalos force-pushed the remove_trailing_dots_from_searches branch from 52dbfdf to c959b5e Compare October 4, 2019 10:23
@fcgravalos
Copy link
Contributor Author

fcgravalos commented Oct 4, 2019

/retest

@fcgravalos
Copy link
Contributor Author

tests failing because:

W1004 10:43:12.129] ERROR: (gcloud.compute.routers.create) Quota 'ROUTERS' exceeded. Limit: 10.0 globally.

I'll try again later

@johnbelamaric
Copy link
Member

/retest
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 4, 2019
@fcgravalos
Copy link
Contributor Author

It seems we're still hitting gcloud routers limits

@fcgravalos
Copy link
Contributor Author

/retest

@fcgravalos
Copy link
Contributor Author

fcgravalos commented Oct 6, 2019

Finally al tests passed @johnbelamaric :)

@johnbelamaric
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fcgravalos, johnbelamaric

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 7, 2019
@k8s-ci-robot k8s-ci-robot merged commit 39ba488 into kubernetes:master Oct 7, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Oct 7, 2019
@fcgravalos fcgravalos deleted the remove_trailing_dots_from_searches branch October 7, 2019 20:29
@fcgravalos
Copy link
Contributor Author

@johnbelamaric do you think it makes sense to cherry-pick this to recent releases (>=1.15)?

@johnbelamaric
Copy link
Member

Oddly it's marked as kind/feature here, not kind/bug. But I would consider it a bug. It can certainly cause hard-to-identify DNS related production issues. So, I think it would makes sense to cherry pick.

@dannyk81
Copy link

dannyk81 commented Oct 9, 2019

@johnbelamaric is it possible to label it at this point as kind/bug? how would we go about with the cherry-pick now that it's merged?

@fcgravalos
Copy link
Contributor Author

Certainly It is not a feature :). But on the other hand I was not sure of considering this a bug... I think I should have brought Up that discussion when I opened the PR.

@johnbelamaric
Copy link
Member

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 9, 2019
@johnbelamaric
Copy link
Member

/remove-kind feature

@k8s-ci-robot k8s-ci-robot removed the kind/feature Categorizes issue or PR as related to a new feature. label Oct 9, 2019
@johnbelamaric
Copy link
Member

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 9, 2019
@fcgravalos
Copy link
Contributor Author

Thanks @johnbelamaric

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet