Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DualStack headless services DNS behavior #1531

Closed
wants to merge 2 commits into from
Closed

Conversation

aojea
Copy link
Member

@aojea aojea commented Jan 30, 2020

This PR tries to document the DNS behavior of the Headless Kubernetes services, reflecting current behavior in the implementation. Quoting the context of the discussion for context:

Clayton Colemant:

Ok, I'm currently looking at implementing 1 (we clear IPFamily when type external name, as we mostly did before, and you are allowed to send it as a client and we just toss it out to preserve the API behavior of old services).

As part of that, we should decide between following behaviors for headless services:

Break old headless services with custom endpoints (selector empty) by having DNS make assumptions about IPFamily defaults no matter what and having the apiserver implicitly fill in an IPFamily by default.
Allow mutable IPFamily for headless services with normal selectors to allow users to decide between dual-stack and single stack at their preference. If IPFamily is nil, both address types are added and DNS reports both A and AAAA. If IPFamily is set, endpoints controllers filter by that address, and DNS would report only the records that show up.
Require IPFamily to be immutable on headless services with normal selectors, but allow it to be nil. Same as above, but user has to delete and recreate service to get the new behavior.
I am leery of 1 because we generally try not to break. This implies that DNS must tolerate endpoints and endpoint slices with mixed addresses (report A and AAAA)

If we're going to not break people through 1, then 2 vs 3 mostly comes down to complexity of implementation (what are we likely to get wrong). We already have to support switching between service types in endpoints controller, is mutability of IPFamily a big thing or not? If it's not, I'd probably prefer 2.

Dan Winship:

Note that this is only potentially-breaking for people who have enabled features.IPv6DualStack. In a cluster without the feature gate enabled, custom Endpoints have to match the cluster IP family, because they would fail validation otherwise. (This is different from the case of ExternalName services.) So, 1 is only a break for people who were using an alpha-level feature gate.

Rob Scott:

It's hard to keep track of this whole conversation, so let me know if I'm not quite understanding the proposals here, I just wanted to make sure EndpointSlices are also being considered here. Unlike Endpoints, EndpointSlices are tied to a specific IP family. EndpointSlices have an address type that attempts to match the family of the Cluster IP or IPFamily on the Service. EndpointSlices can currently only support IPv4 or IPv6 addresses. If I'm understanding the proposal here correctly, We could potentially rework the controller to maintain different sets of EndpointSlices per IP family when it is not something that can be derived from the Service, but that would be a relatively significant change and certainly worthy of another issue. At the very least, it seems like it does not currently handle things properly when an IPFamily can not be derived.

It's also worth noting that, at least as I understand it, kube-proxy has these same potential issues. From an implementation perspective, it seems like we intentionally made the choice that Services and their corresponding Endpoint(Slice)s would represent a single IPFamily (notably here), but missed a number of edge cases in that process. If I'm understanding the proposal correctly here, to include IPs from both families when a family can't be derived, we'll likely need to make some significant changes in controllers and kube-proxy.

PD.-
CoreDNS is a specific implementation of the Kubernetes DNS specification, we should be more generic in the spec to avoid misunderstandings.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 30, 2020
@aojea
Copy link
Member Author

aojea commented Jan 30, 2020

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/network Categorizes an issue or PR as relevant to SIG Network. labels Jan 30, 2020
@robscott
Copy link
Member

/cc @freehan @bowei

- Headless Kubernetes services: CoreDNS will resolve these services to either an IPv4 entry (A record), an IPv6 entry (AAAA record), or both, depending on the service's `ipFamily`.
- Because service IPs will remain single-family, pods will continue to access the DNS server via a single service IP. In other words, the nameserver entries in a pod's /etc/resolv.conf will typically be a single IPv4 or single IPv6 address, depending upon the IP family of the cluster's service CIDR.
- Non-headless Kubernetes services: DNS will resolve these services to either an IPv4 entry (A record) or an IPv6 entry (AAAA record), depending upon the IP family of the cluster's service CIDR.
- Headless Kubernetes services: DNS will resolve these services to either an IPv4 entry (A record) or an IPv6 entry (AAAA record), depending on the service's `ipFamily`, defaulting to IPv4 if `ipFamily` is not specified.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we have to preserve backwards compatible behavior of the endpoints object (where you can have both types of addresses), just as general Kubernetes API guarantee.

That implies that anything we do with ipFamily has to allow users to preserve the existing behavior, and thus downstream components have to tolerate mixed addresses and offer meaningful extension.

I would prefer that DNS be changed to use ipFamily as a filter, and DNS treat nil ipFamily as "allow all addresses".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, does this discussion only apply to headless and external name services? Can we safely state that all Services handled by kube-proxy will be single family since they will all have a single ClusterIP and/or IPFamily? Is there any precedent for supporting multi-family Endpoints behind single family Services before the dual stack work? If we can limit this to Services ignored by kube-proxy, I think it would significantly limit the scope and complexity of the needed changes. I do agree that for the sake of backwards compatibility, the lack of an IPFamily should be treated by DNS as "allow all addresses."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we are on the same page, will reword accordingly

Copy link
Member Author

@aojea aojea Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton I just realized that there is a different behavior on headless services depending on using selectors or not. This is the bug that Dan found, currently headless service with selectors filters based on the ipFamily and if nil, it defaults to IPv4.
This is due to the function that returns the endpoints based on the pods, only returns one IP per pod. If we want to support this for headless return multiple endpoints per pod we have to modify this function too.
Do we want to modify the behavior of headless service with selectors to support ipFamily=nil?

PTAL to the updated PR

ref: kubernetes/kubernetes#87608

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to modify the behavior of headless service with selectors to support ipFamily=nil?

I think that might be appropriate, if DNS is expected to report A and AAAA when ipfamily is nil.

Copy link
Member

@robscott robscott Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'd agree with that since kube-dns is already responding with both A and AAAA records in some cases (https://github.com/kubernetes/dns/blob/master/vendor/github.com/skynetservices/skydns/server/server.go#L476-L479).

EDIT: Disregard me, kube-dns is mostly irrelevant here for newer clusters. I guess there's still a chance that somewhere in older clusters someone was relying on A and AAAA DNS records, but that does seem unlikely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let me try to send a PR with the modifications needed so we can observe the implications

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this PR with the last @thockin suggestion of keep the current endpoints controller defaulting to IPv4 and modifying the endpointslice controller to return both A and AAAA

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aojea
To complete the pull request process, please assign thockin
You can assign the PR to them by writing /assign @thockin in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@thockin
Copy link
Member

thockin commented Feb 12, 2020

FTR, I am putting this on back-burner until discussion in kubernetes/kubernetes#86895 resolves

@aojea
Copy link
Member Author

aojea commented Feb 13, 2020

FTR, I am putting this on back-burner until discussion in kubernetes/kubernetes#86895 resolves

ack
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 13, 2020
@aojea
Copy link
Member Author

aojea commented Apr 10, 2020

/close
superseded by #1679

@k8s-ci-robot
Copy link
Contributor

@aojea: Closed this PR.

In response to this:

/close
superseded by #1679

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

RomanBednar pushed a commit to RomanBednar/enhancements that referenced this pull request Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/network Categorizes an issue or PR as relevant to SIG Network. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants