Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dualstack services #1679

Merged
merged 1 commit into from
Jun 19, 2020
Merged

Conversation

khenidak
Copy link
Contributor

@khenidak khenidak commented Apr 9, 2020

updates dualstack kep as per comments found here kubernetes/kubernetes#86895 in prep for dualstack next phase.

@thockin + @aojea + @lachie83 - I think need to revise the CoreDNS Operation section. The resolution is a function of how the client asks for address. DNS Servers typically respond with all records they have about name irrespective of protocol version used to make the query.

@khenidak khenidak requested review from aojea and lachie83 April 9, 2020 23:57
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Apr 9, 2020
@k8s-ci-robot k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Apr 9, 2020
Headless services will operate using the following conditions:
1. api-server does not default `ipFamily` field. User may optionally to choose to set it.
2. If `ipFamily` field is set by the user, then the value of `preferredIPFamily` will follow accordingly. Example: setting `ipFamily` to `IPv6` then `preferredIPFamily` will have `IPv6` as a value.
3. If `ipFamily` field is not set, then the value of `preferredIPFamily` will carry the values set in `-- service-cluster-ip-range`, which can be a single value in even in a dual stack cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit!

Suggested change
3. If `ipFamily` field is not set, then the value of `preferredIPFamily` will carry the values set in `-- service-cluster-ip-range`, which can be a single value in even in a dual stack cluster.
3. If `ipFamily` field is not set, then the value of `preferredIPFamily` will carry the values set in `--service-cluster-ip-range`, which can be a single value in even in a dual-stack cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, you're allowed to have single-stack --service-cluster-ip-range in a dual-stack cluster? I didn't realize that. (That ought to be called out explicitly if so since network plugins have to deal with that.)

Copy link
Contributor Author

@khenidak khenidak Apr 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make sure it is clear in the KEP and our docs, do you recommend other places where we should make that call out?

do you mind sharing how does this impacts CNIs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, like, I specifically added a unit test to ensure we reject this as an invalid configuration in ovn-kubernetes: https://github.com/ovn-org/ovn-kubernetes/pull/1189/files#diff-173cba3f6f9206af564ac9ed53992a91R982 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton was the one that clarified this kubernetes/kubernetes#86895 (comment) ,

is it possible to enable dual-stack without configuring two service CIDRs? I mean configuring only a service-cidr for IPv6 and not for IPv4?

The feature gate flag allows dual stack config. Enabling the flag doesn’t require you to use dual stack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text says "--service-cluster-ip-range ... can be a single value even in a dual-stack cluster". It should not say "a dual-stack cluster" if it means "any cluster in which the IPv6DualStack feature gate is active, regardless of configuration".

Saying "you can enable the IPv6DualStack feature gate and not be obligated to configure dual-stack networking" is unnecessary, because that's a hard requirement of the feature process. (Otherwise we'd never be able to get rid of the feature gate and declare the feature GA.)

So if the text means "--service-cluster-ip-range can be a single value even when IPv6DualStack is enabled" then that doesn't need to be said. But if it means "--service-cluster-ip-range can be single-stack even when the pod network is dual-stack", then that's surprising.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the clarification, I was always assuming

--service-cluster-ip-range can be a single value even when IPv6DualStack is enabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aojea Your assumption is correct.

keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
Headless services will operate using the following conditions:
1. api-server does not default `ipFamily` field. User may optionally to choose to set it.
2. If `ipFamily` field is set by the user, then the value of `preferredIPFamily` will follow accordingly. Example: setting `ipFamily` to `IPv6` then `preferredIPFamily` will have `IPv6` as a value.
3. If `ipFamily` field is not set, then the value of `preferredIPFamily` will carry the values set in `-- service-cluster-ip-range`, which can be a single value in even in a dual stack cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, you're allowed to have single-stack --service-cluster-ip-range in a dual-stack cluster? I didn't realize that. (That ought to be called out explicitly if so since network plugins have to deal with that.)

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to talk somewhere about the requirement for constancy of the primary family?

keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we get these changes in, I'd like to maybe do a pass over the doc and make it clearer which phases things are in. Unless you read the whole thing at once, it can be confusing.

keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. label Apr 15, 2020
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we demanding that --cluster-cidr and --services-cluster-ip-range have the same ordering? How about node.spec.podCIDR ?

keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A topic that we've gone back and forth on, and is not covered here: ipFamilies controller. If ipFamilies is not stored (is live-derived from apiserver flags), then clients can get different answers depending on which apiserver they ask (while a single->dual upgrade is in progress). That's bad.

Consider these 2 cases:

Case 1: upgrading to dual-stack at the same time as enabling the gate (likely when it goes beta):
t0: user created a service in single-stack v4 before dual-stack was available (so no ipFamily)
t1: Kubernetes BETAs dual-stack support
t2: cluster upgrades version and configure dual-stack at the same time, begins
t3: reading the service populates ipFamilies with either nil or "v4,v6" depending on which apiserver you hit
t4: controllers do not comprehend ipFamilies anyway
t5: pods have not been restarted, so v6 traffic is a black-hole
t6: cluster upgrade completes
t7: reading the service populates ipFamilies with "v4,v6"
t8: pods may or may not have not been restarted, so v6 traffic is still a partial black-hole

Case 2: upgrading to dual-stack when it goes GA:
t0: user created a service in single-stack v4 before dual-stack was available (so no ipFamily)
t1: Kubernetes BETAs dual-stack support
t2: cluster upgrades version, does not configure dual-stack
t3: reading the service populates ipFamilies with either nil or "v4" depending on which apiserver you hit
t4: controllers do not comprehend ipFamilies anyway
t5: cluster upgrade completes
t6: reading the service populates ipFamilies with "v4" only
t7: Kubernetes GAs dual-stack support
t8: cluster upgrade to dual-stack begins
t9: reading the service populates ipFamilies with either "v4" or "v4,v6" depending on which apiserver you hit
t10: controllers thrash
t11: kube-proxies are inconsistent with each other
t12: pods have not been restarted, so v6 traffic is a black-hole anyway
t13: cluster upgrade to dual-stack completes
t13: reading the service populates ipFamilies with "v4,v6"
t14: pods may or may not have not been restarted, so v6 traffic is still a partial black-hole

Both seem problematic. It seems better to instead have a controller which writes that value to the service status.

Let's look at the new timelines:

Case 1: upgrading to dual-stack at the same time as enabling the gate (likely when it goes beta):
t0: user created a service in single-stack v4 before dual-stack was available (so no ipFamily)
t1: Kubernetes BETAs dual-stack support
t2: cluster upgrades version and configure dual-stack at the same time, begins
t3: reading the service populates ipFamilies with nil
t4: controllers which comprehend ipFamilies do whatever they did before
t5: cluster upgrade completes
t6: controller iterates all services and writes ipFamilies as "v4"

Case 2: upgrading to dual-stack when it goes GA:
t0: user created a service in single-stack v4 before dual-stack was available (so no ipFamily)
t1: Kubernetes BETAs dual-stack support
t2: cluster upgrades version, does not configure dual-stack
t3: reading the service populates ipFamilies with nil
t4: controllers which comprehend ipFamilies do whatever they did before
t5: cluster upgrade completes
t6: controller iterates all services and writes ipFamilies as "v4"
t7: Kubernetes GAs dual-stack support
t8: cluster upgrade to dual-stack begins
t9: reading the service returns ipFamilies as "v4"
t10: cluster upgrade to dual-stack completes
t11: reading the service returns ipFamilies as "v4"

That raises 2 new issues:

  1. How do you upgrade an existing service from "v4" to "v4,v6". You can only really do that after all your pods are restarted.

  2. Is that really status any more? Or did I just convert it into spec ? It's not really recoverable by observation...

What if (thinking out loud now):

  • change spec.ipFamily to spec.ipFamilies
  • drop status.ipFamilies
  • on CREATE:
    • if user provided a value, use it
    • else set it from cluster-cidrs
  • on GET:
    • if stored value, use it
    • else default from cluster-cidrs[0]

Then the answer to question 1 (how to upgrade single-dual) is simply "add a value to ipFamiles". Allow mutations which add a value, but disallow mutations that change [0].

Single-stack controllers (endpoints) look at ipFamilies[0]. Dul-stack controllers (endpoint-slice) look at all entries.

No controller needed. The timelines become:

Case 1: upgrading to dual-stack at the same time as enabling the gate (likely when it goes beta):
t0: user created a service in single-stack v4 before dual-stack was available (so no ipFamily)
t1: Kubernetes BETAs dual-stack support
t2: cluster upgrades version and configure dual-stack at the same time, begins
t3: reading the service populates ipFamilies with either nil or "v4" depending on which apiserver you hit
t4: controllers which comprehend ipFamilies either do whatever they did before or get "v4"
t5: cluster upgrade completes
t6: reading the service populates ipFamilies with "v4"
t7: user upgrades service to dual-stack by writing ipFamilies = [v4,v6]

Case 2: upgrading to dual-stack when it goes GA:
t0: user created a service in single-stack v4 before dual-stack was available (so no ipFamily)
t1: Kubernetes BETAs dual-stack support
t2: cluster upgrades version, does not configure dual-stack
t3: reading the service populates ipFamilies with either nil or "v4" depending on which apiserver you hit
t4: controllers which comprehend ipFamilies either do whatever they did before or get "v4"
t5: cluster upgrade completes
t6: reading the service populates ipFamilies with "v4"
t7: Kubernetes GAs dual-stack support
t8: cluster upgrade to dual-stack begins
t9: reading the service returns ipFamilies as "v4"
t10: cluster upgrade to dual-stack completes
t11: reading the service returns ipFamilies as "v4"
t12: user upgrades service to dual-stack by writing ipFamilies = [v4,v6]

keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
@danwinship
Copy link
Contributor

  1. How do you upgrade an existing service from "v4" to "v4,v6". You can only really do that after all your pods are restarted.

It really seems like the decision about whether a Service is "v4", "v6", or "v4,v6" should just be defined by the Endpoints, not the Service. If the Endpoints addresses are all IPv4, then the Service should be "v4" (for NodePort, etc, purposes) without you having to do anything else. Likewise with "v6" if the Endpoints are all IPv6. If there are both IPv4 and IPv6 Endpoints addresses (whether because the Service points to dual-stack pods or because it points to a mix of single-stack IPv4 and single-stack IPv6 pods) then the Service should be "v4,v6".

That will fail in the context of a pod that is launched as dual-stack but actually only listens on one address family, but we had already talked about needing to indicate ipFamily on pods to make health checks work in that case anyway, so we could just do that. Then if a pod has dual-stack PodIPs but is tagged ipFamily: IPv4, the Endpoints controller(s) would ignore its IPv6 IP.

We'd still need Service.Spec.IPFamily at creation time to control ClusterIP allocation, but that would be mostly unchanged from how it is now.

@khenidak
Copy link
Contributor Author

  1. How do you upgrade an existing service from "v4" to "v4,v6". You can only really do that after all your pods are restarted.

It really seems like the decision about whether a Service is "v4", "v6", or "v4,v6" should just be defined by the Endpoints, not the Service. If the Endpoints addresses are all IPv4, then the Service should be "v4" (for NodePort, etc, purposes) without you having to do anything else. Likewise with "v6" if the Endpoints are all IPv6. If there are both IPv4 and IPv6 Endpoints addresses (whether because the Service points to dual-stack pods or because it points to a mix of single-stack IPv4 and single-stack IPv6 pods) then the Service should be "v4,v6".

That will fail in the context of a pod that is launched as dual-stack but actually only listens on one address family, but we had already talked about needing to indicate ipFamily on pods to make health checks work in that case anyway, so we could just do that. Then if a pod has dual-stack PodIPs but is tagged ipFamily: IPv4, the Endpoints controller(s) would ignore its IPv6 IP.

We'd still need Service.Spec.IPFamily at creation time to control ClusterIP allocation, but that would be mostly unchanged from how it is now.

There is no dual stack service This entire configuration is helping user defines the ipfamily of a service, which is always single (except headless without selector services).

@khenidak
Copy link
Contributor Author

Both seem problematic. It seems better to instead have a controller which writes that value to the service status.

@thockin this scenario assumes that secondary family services will be created during an unfinished upgrade? Do we really want to support that? also it assumes that whoever reads the values of ipfamilies does not expect nil which i think they should, at least during an upgrade process.

@danwinship
Copy link
Contributor

There is no dual stack service

what is the whole discussion about "ipFamilies" about then?

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm close to LGTM on this delta, after which I want to do an overall edit pass to make sure it is still coherent at a macro level.

There are a few things that need addressing.

https://github.com/kubernetes/enhancements/pull/1679/files#r434536119

https://github.com/kubernetes/enhancements/pull/1679/files#r436273707 (the second proposal) or
https://github.com/kubernetes/enhancements/pull/1679/files#r434023994

https://github.com/kubernetes/enhancements/pull/1679/files#r435267286

2. dualstack: assigned from both the first and the second cidr entries, in this scenario Services will have a dual assigned `ClusterIPs`

The above is achieved using the following changes to Service api.
1. Add `spec.ipFamilyPolicy` field with possible values of `SingleStack`, `DualStack`, or `nil`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My inclination is to leave this as an enum, but it's a soft pref. It's unlikely we'll ever have another value here. But I have been wrong before.

Does the naming here present real confusion to people?

keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
@khenidak khenidak changed the title update dualstack kep with ipfamilies field dualstack services Jun 11, 2020
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads like I expected. @danwinship this adopts your proposed change for bool. It leaves the dependency on EPSlice and I am OK with that - it just prevents a path that is almost certainly not what the user wanted anyway.

What we should do now is boil it down to a smaller form (slides?) and get an outside API reviewer to read it. I can sign up to condense it if you want, Kal

keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved

#### Type Headless services

\<TBD\>
1. For Headless with selector, the value of `spec.preferDualStack` and `spec.ipFamilies` will drive the endpoint selection. Endpoint slice controller will select endpoints with ip families that match the value of these two flags.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, only ipFamilies matters to controllers. Nobody except the service allocator should have to look at the prefer flag, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we should do now is boil it down to a smaller form (slides?) and get an outside API reviewer to read it. I can sign up to condense it if you want, Kal

@thockin yes please condense. I will attend the api review as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, only ipFamilies matters to controllers. Nobody except the service allocator should have to look at the prefer flag, right?

Yes. Because there will be cases where user wants dualstack preferDualStack: true but won't get it, and IPFamilies is the old field that reflect that fact (and ClusterIPs but that will require testing ips for families wherever controller is concerned)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the only reason a service would preferDualStack but not get it is because the cluster isn't dual-stack. Or at least "doesn't have dual-stack service CIDRs". I suppose in theory we could support "I want a cluster with dual-stack pods and dual-stack headless services but only single-stack ClusterIP services" but that's weird and I think it's better to say that nobody except the service allocator looks at preferDualStack

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a relatively minor point but ipFamiles[] has all the information it needs. Eg.

time +0: user creates a headless service with prefer=true, but cluster is single-stack. They get ipFamiles = [ v4 ].

time +1: endpoint controllers see ipFamiles = [ v4 ] and only consider v4 addresses

time +2: admin upgrades cluster to dual-stack

time +3: endpoint controllers see ipFamiles = [ v4 ] and only consider v4 addresses

The point being that the controllers should NOT change at T+3, because the service pre-existed dual-stack enablement.

keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
5. The first cidr entry of `--service-cluster-ip-range=<first cidr entry>, <secondary cidr entry>` is considered cluster default ip address family.
6. Attempting to allocate `ClusterIP` or `ClusterIPs` from an ip address family that is not configured on the cluster will result in an error.

> The below discussion assumes that `Endpoint Slice and Endpoint Slice Controller` will allow dual stack endpoints, the existing endpoint controller will remain single stack. That means enabling dual stack services by having more than one entry in `--service-cluster-ip-range` without enabling `Endpoint Slice` feature gate will result in validation failures during startup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but even when using EndpointSliceController, we still also run the EndpointController and generate Endpoints for compatibility with old components, correct? So we still need to specify how EndpointController behaves in a dual-stack cluster... (Actually, we need to specify how EndpointSliceController behaves in a dual-stack cluster too... you didn't update that part of the KEP...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robscott Was weighing options, but I think we have a canonical answer unless we choose to change it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it sounds like this is suggesting Endpoints resources would be single stack only. As far as the Endpoints controller, it would be relatively straightforward to drop any IPs not matching the family of the Endpoints resource.

If we want to say that the Endpoints resource is guaranteed to be single stack, that may be more difficult. Of course the Endpoints controller is not the only thing that can produce/manage Endpoints. There is a chance that someone is already writing dual-stack Endpoints in a cluster somewhere. Would we need to add some kind of field and/or validation to enforce this single stack behavior? Maybe it's fine to only update the behavior of the controller here and leave the API alone?

For the EndpointSlice controller I think the answer is somewhat simpler. Assuming we maintain the IPv4 and IPv6 address types, the EndpointSlice controller could just be modified to detect the address type and add IP addresses to EndpointSlices with the appropriate address type. So if all Pods backing a Service have both v4 and v6 IPs, we'd end up with 2x the number of EndpointSlices for a dual stack Service, half for the v4 addresses and the other half for the v6 addresses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robscott yeah, let's be clear - when an Endpoints is managed by the Enpoints controller, it will be single-stack, corresponding to ipFamilies[0]. Endpoints managed by anything else is out of scope.

For EPSlice - I'm still looking to you for thoughts on whether you think merging back to a single type (IP) or keeping distinct IPv4, IPv6 makes sense. Or both, which gives creators freedom, but forces consumers to have to handle both. Once you GA, you have to keep whatever you choose.

keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved

User who want to create a new single stack Service can create it using one of the following methods:
1. Set `spec.preferDualStack: nil` and `spec.ipFamilies: nil` fields. api-server will default `spec.preferDualStack: false` and `spec.ipFamilies: ['default ip family of the cluster']`.
2. Set `spec.preferDualStack: false` api-server will default `spec.ipFamilies: ['cluster default ip address family']`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to distinguish nil from false for preferDualStack?

In particular, the distinction is not needed for the mutating admission hook case like it was with ipFamilyPolicy, because if you want to say "I can't be mutated to dual-stack" you do that by setting ipFamilies, not by setting preferDualStack (because explicitly setting preferDualStack: false but not setting ipFamilies doesn't make sense)

If we do need both nil and false then the two examples should be consistent. (specifically "default ip family of the cluster" vs "cluster default ip address family")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to handle nil. We can safely use defaulting to convert nil to false (or make it a non-pointer bool)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only at ipalloc, not in the defaulting logic. To allow web hooks to change single stack by default to dual stack by default

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure exactly which part of which comment you're referring to, but what I was saying is that we do not need to have a nil vs false distinction, at any point, including to handle the webhook case.

preferDualStack: false means what ipFamilyPolicy: nil used to, which means that a webhook can change a preferDualStack: false Service to preferDualStack: true as long as ipFamilies or clusterIPs isn't set, regardless of whether it's preferDualStack: false because the user specified that explicitly or because the value was filled in by default.

Copy link
Contributor Author

@khenidak khenidak Jun 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic idea is to allow webhooks to identify

  1. preferDualStack:nil && (nil families || single family entry) => no preference. can do both. i don't have code that depend on single/multiple IPs
  2. preferDualStack:false && (nil families || single family entry) => I can not work with dual stack, i will fail if it is dual stack
  3. preferDualStack:true && (nil families || single family entry) => i can work with dual stack if available
  4. preferDualStack:true && (two ip families) => i have to have dual stack

a web hook can not and must not change false => true for

  1. This is not caller intention.
  2. It can not tell if the caller can actually deal with dual stack or not.
  3. same applies on true + two families because the intention is to have dual stack. either api-server can full fill it or fail it, but must not respond with single family service.

forcing false => true might be applicable to some use cases, but it won't be the default use case.

the ip-alloc comment above is wrt where the defaulting will have to take effect, and it will be in ip allocation phase, not at defaulting phase because defaulting logic does not know the flags set on the cluster (at least not yet).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preferDualStack:false && (nil families || single family entry) => I can not work with dual stack, i will fail if it is dual stack

No. This was the point of getting rid of ipFamilyPolicy: SingleStack and switching to a two-valued field (false/true) rather than a three-valued field (nil/SingleStack/DualStack). There is no "I can't do dual stack" any more. You have to say either "I can only do single-stack IPv4" (preferDualStack: false, ipFamilies: ["IPv4"]) or "I can only do single-stack IPv6" (preferDualStack: false, ipFamilies: ["IPv6"]). It doesn't make sense to say "I can't do dual-stack but I have no preference about IP families". Thus, there's no reason to treat preferDualStack: nil, ipFamilies: nil and preferDualStack: false, ipFamilies: false as meaning different things, and thus there is no reason to even have preferDualStack: nil.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or whatever. We're still alpha. We can still fix this in 1.20 if it merges one way now and we agree later it was wrong. We kind of need to get this merged soon though.

Copy link
Member

@thockin thockin Jun 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing now, but the problem is defaulting.

We want to be able to assert that consumers only need to look at ipFamilies and that they can rely on ipFamilies to not be empty.

That means that an old, stored service MUST be defaulted properly on read.

For a headful service, I can rely on clusterIP, but for a headless Service I can't.

So defaulting code observes { clusterIP: None, ipFamilies: nil } it has to convert that to { clusterIP: None, ipFamiles: [ $default-family] }. Defaulting can't tell the difference between reading an old saved instance or a new create op. Admission control runs AFTER defaulting, so this hypothetical admission controller can not know whether the user meant to say "I require single-stack v4" or "I don't care, but the defaults have been set".

We either need a tri-state (blech) or we need to distinguish read from create when applying defaults.

Kal was looking at this last option last I spoke to him.

Copy link
Member

@thockin thockin Jun 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No description provided.

ipFamilies: # defaulted
- IPv4
clusterIP: 1.2.3.4 # allocated
clusterIPs: # allocated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be clearer if you explicitly showed what the "input" yaml was

2. set `spec.ClusterIPs: true` and set `spec.ClusterIPs` to any two dual stack ips in any order.
3. set `spec.ipFamilies` to any two dual stack ip families in any order.
4. Set `spec.CluterIPs` to any two dual stack ip families in any order.
- If the user prefer dual stack if available (service creation will not fail if the cluster is not configured for dual stack) then they can do one of the following
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is weird that this is under "Creating a New Dual Stack Service", since it may not result in a dual-stack service...

I think you're torn between half wanting to describe how the code behaves and half wanting to provide end-user use cases, but it gets muddled.

I'm not sure we really need to fix this now, but maybe a better organization would be to first give the encyclopedic description of how the apiserver responds to each set of fields, and then after give examples of things like "what happens with a legacy service that doesn't specify any of the new fields", "how to create a service that will get allocated a single IP in a single-stack cluster and dual IPs in a dual-stack cluster", "how to create an IPv4-only service", "how to create an IPv6-primary dual-stack service even in an IPv4-primary dual-stack cluster", "how to manually specify dual cluster IPs", etc. And then you can show both the input yaml and the result after defaulting/allocation.

But that doesn't have to block this merging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to do a full edit pass just because it has been patched enough times now :)

keps/sig-network/20180612-ipv4-ipv6-dual-stack.md Outdated Show resolved Hide resolved
@sftim
Copy link
Contributor

sftim commented Jun 19, 2020

💭 if you wanted to find EndpointSlices for a particular address family, one way to achieve that (if it's defined before GA) is to use a selector. To make that feasible, have control plane become responsible for the getting the labels right (maybe an admission controller?)

@thockin
Copy link
Member

thockin commented Jun 19, 2020

@sftim Is there a requirement to be able to watch only slices of a certain family (rather than filtering the WATCH client-side) ?

The requirement I know is that any given service's Endpoints might be single-stack, but different service might choose different families.

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm LGTM on this. There are some small points to be cleaned up in followup, and one notable point to debate - pointerness of the bool. I'll open a distinct comment on that.


User who want to create a new single stack Service can create it using one of the following methods:
1. Set `spec.preferDualStack: nil` and `spec.ipFamilies: nil` fields. api-server will default `spec.preferDualStack: false` and `spec.ipFamilies: ['default ip family of the cluster']`.
2. Set `spec.preferDualStack: false` api-server will default `spec.ipFamilies: ['cluster default ip address family']`.
Copy link
Member

@thockin thockin Jun 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No description provided.

@thockin
Copy link
Member

thockin commented Jun 19, 2020

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 19, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: khenidak, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 19, 2020
@k8s-ci-robot k8s-ci-robot merged commit 689f7c6 into kubernetes:master Jun 19, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone Jun 19, 2020
@thockin
Copy link
Member

thockin commented Jun 19, 2020

OK, let's talk about what I think is the only major sticking point left. I mentioned it in my last review, but let's just take it here.

The previous proposal had ipFamilyPolicy with possible values "SingleStack", "DualStack", or nil (meaning don't care). This allowed the expression of "I require single-stack, but I don't care which family". @danwinship found this particularly distasteful, and convinced me. The counter proposal was a bool preferDualStack, which is written up here.

It seems there's a not-quite-aligned interpretation of this field, though. Kal and I debated a bit and there's arguments in both directions. Summarized as "does the bool need to be a tristate or not". (See footnote 1 for a remark on conventions for optional)

My argument (and @danwinship I think) is that plain bi-state is suficient:

  • Assume we can set ipFamilies in code understands get vs create (it seems we can)
  • On creating a new service, user says nothing about IPs, which deserializes as (false, nil, nil); defaulting doesn't touch it; optional admission controller sees (false, nil, nil); allocator sets (false, [$default], [$ip])
  • on reading an old service, we find either (false, nil, [$ip]) or (false, nil, ["None"]); defaulting doesn't touch it; REST.Get() converts these to (false, [$default], [$ip]) or (false, [$default], ["None"]) respectively.
  • In all cases, consumers of service will find a valid and correct value in ipFamilies
  • The optional admission controller knows without ambiguity that (false, nil, nil) is safe to change to (true, nil, nil).

Kal argues that (false, nil, nil) is ambiguous. It might mean "I said nothing at all" or "I require single-stack, but I don't care which family". In the latter case, changing it to (true, nil, nil) would be wrong.

Dan's point in proposing the bool was that "I require single-stack, but I don't care which family" is a nonsense statement. There's literally no way to code that in an app - either you handle both families or you handle one SPECIFIC family. The correct expression of "I require single-stack" must define WHICH family.

Kal's counter which I find some merit in is that having 2 service IPs represents "exposed surface" that some users will actively want to prevent, e.g. for simpler auditing purposes. I find it hard to refute "this is doesn't make sense, but I want to do it anyway".

So this is what I want to discuss.

I'll characterize these as 3 models:

  1. ipFamilyPolicy which has 3 valid values
  2. bi-state bool
  3. tri-state bool (nil being meaningful)

In all three models, a user who sets ipFamilyPolicy: SingleStack or preferDualStack: false or just doesn't say anything about IP families will get a Service which is single-stack. The ONLY exception is of the cluster admin installs an admission controller to change the default to dual-stack.

In model 1 that means changing ipFamilyPolicy: nil to ipFamilyPolicy: DualStack. This is unambiguous, but allows the expression of "I require single-stack, but I don't care which family".

In model 2 that means changing (false, nil, nil) to (true, nil, nil). This is ambiguous and does not allow the expression of "I require single-stack, but I don't care which family" when the admission controller is enabled (which is optional).

In model 3 that means changing (nil, nil, nil) to (true, nil, nil). This is unambiguous, but allows the expression of "I require single-stack, but I don't care which family".

My personal feeling after writing this up is that model 3 is sort of the worst choice. It uses a bool (which can be very limiting) and forces API consumers to interpret it as a tri-state. It is isomorphic to model 1. I would argue we either go with model 2 (and tell users who care about the extra surface area to NOT run the optional admission controller) or revert the design to model 1 (which is clearest expression of intention, I think).

Footnote 1:
We may still make it a pointer because that is the convention for optional fields, but the defaulting can simply turn nil into false. This convention is not always followed, especially for bool fields where the Go "zero-value" is the correct default.

@danwinship
Copy link
Contributor

We may still make it a pointer because that is the convention for optional fields, but the defaulting can simply turn nil into false.

Yeah, this is why I said "let's just commit it like this and we can argue more later". I'm fine if we get locked into having a *bool in the API even if we eventually decide we didn't need it.

@thockin
Copy link
Member

thockin commented Jun 19, 2020 via email

@danwinship
Copy link
Contributor

danwinship commented Jun 19, 2020

Dan's point in proposing the bool was that "I require single-stack, but I don't care which family" is a nonsense statement. There's literally no way to code that in an app

I mean, if you're going to make claims like that then my brain is automatically going to go into "devil's advocate" mode and realize that a pod could learn its pod IP via the downward API and then explicitly bind to just that one address. 🤦‍♂️ In that case, it would work in either single-stack IPv4 or single-stack IPv6 clusters, but would not be capable of doing dual-stack. (This wouldn't make a lot of sense for golang-based servers where binding on either 0.0.0.0 or :: will give you dual-stack for free, but if the server was written in some other language that didn't have that feature, then this would be an easy way to work in any kind of single-stack cluster.) Given the relatively low adoption of single-stack IPv6, there are probably not many existing pods that do this though.

Kal's counter which I find some merit in is that having 2 service IPs represents "exposed surface" that some users will actively want to prevent, e.g. for simpler auditing purposes.

Sure, but the nil == false model doesn't prevent you from becoming single-stack, it just changes how you request it. If you only want half the exposed surface, just specify which half.

@thockin
Copy link
Member

thockin commented Jun 19, 2020

If you only want half the exposed surface, just specify which half.

If you want to write a helm chart or just generic YAML that can run in either cluster, and you only want half the surface area, this doesn't help.

That said, "don't run the optional admission controller" seems like a practical answer. In that case (false, nil, nil) does mean "I want single stack and I don't care which"

@danwinship
Copy link
Contributor

Yeah, "I want to force Services to become dual-stack but also want Services to not be dual-stack for security reasons" isn't a very compelling user story. 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants