-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-1664: Better Support for Dual-Stack Node Addresses #1665
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: danwinship The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
``` | ||
|
||
The other information in `Node.Status.Addresses` will be moved to a | ||
new `Node.Status.IPs` array of type `NodeIP`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add that the PodIPs
field from pods with HostNetwork=true
will take the value from this field, or is this out of scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that property, but what value do we get from specifying it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other information in
Node.Status.Addresses
will be moved to a newNode.Status.IPs
array of typeNodeIP
Can you clarify what you mean by "moved", as long as we're still maintaining the status.addresses field in v1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Motivation | ||
|
||
### Problems | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fix this long standing issue kubernetes/kubernetes#42125
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this as a "Maybe Goal"; there are approaches we could take that would fix that, and approaches that wouldn't.
the new behavior. Perhaps this should warn and continue to use the old | ||
behavior during alpha, then warn and use the new behavior in beta. | ||
|
||
(Alternatively, since we need to extend `--node-ip` to support dual |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach taken for dual stack was to modify the flags behavior to accept comma separated lists of parameters
https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/20180612-ipv4-ipv6-dual-stack.md#kubelet-startup-configuration-for-dual-stack-pod-cidrs . It can be an option here too
`Node.Status.Addresses` will eventually need to be updated to look at | ||
the new fields instead. | ||
|
||
### Version Skew Strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naive question, can we have an issue with these new fields like we have with the IPFamily one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a more specific concern?
It seems to me that the problems around IPFamily mostly involve defaulting and unspecified interactions with other fields, which would not be relevant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no specific, just afraid of possible unspecified interactions with other fields
but you just clarified it, thanks.
|
||
// NodeIPPublic indicates an IP address that is expected to be directly | ||
// reachable from anywhere, inside or outside the cluster. | ||
NodeIPPublic NodeIPScope = "Public" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have doubts about this NodeIPPublic
option, it has a lot of connotations :/,
an IP address that is expected to be directly reachable from anywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing options have a lot of connotations too. eg, this is partly inspired by kubernetes/kubernetes#86918 (comment).
Note that it's "expected to be directly reachable from anywhere", not "required to be directly reachable from anywhere". And if the cloud provider doesn't know, it should just use Unknown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but IPv6 node IPs will most likely be
Public
, and bare-metal IPs will generally beUnknown
I think that continuing with the IPv6 analogy, we can merge Unknown
and External
in Global
, taking a more minimalistic approach and avoiding the divergence on the use of IPs scopes between private/public cloud providers as you describe below. However, I don't know if the goal is exactly making that difference explicit ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conversation in 86918 was specifically about the distinction between node IPs where traffic is expected to go directly to the node vs node IPs where traffic is expected to go to some piece of cloud infrastructure first, and a concern that we might accidentally end up preferring an IPv6 address of the latter type over an IPv4 address of the former type. The Public
vs External
distinction is exactly what was needed there.
I had also previously thought about just having some boolean flags on NodeIP
instead of a type, like:
Public bool // IP is assumed to be globally reachable
Indirect bool // Packets sent to this IP will be rewritten to another IP
So Internal
would be !public && !indirect
, External
would be public && indirect
and Public
would be public && !indirect
. But then that leaves !public && indirect
unused, and it doesn't allow expressing unknown public-ness, which is really usually the case for autodetected bare metal IPs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodeIP Scope | Private | Public |
---|---|---|
Belongs Cluster | Internal | Public |
Does not belong | Unknown | External |
👍 I got it 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it
not quite; "Private" + "Does not belong" should be empty because there's no name that means that explicitly. Unknown
means unknown. Eg, on bare metal, kubelet knows the local IPs aren't External
but if they're not from a known private IP range, then it has no way of knowing if they're Internal
(ie, firewalled off from the internet) or Public
. So it would call them Unknown
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're designing in a vacuum, hence why it's hard to pin down what semantics are useful. What are the use cases for anything other than "reachable by all other nodes within the cluster"?
(Assuming we can even find more than one use case) Perhaps we should have different fields for the different use cases rather than capturing all reachability/cost/policy semantics in a single intrinsic "type"? (ie: "use this address when doing foo-type operation", rather than "this address is intrinsically foo-type")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if it would be simpler to add new address types for IPv6: InternalIPv6
and ExternalIPv6
?
When using an external cloud provider, cloud-controller-manager is | ||
responsible for setting `Node.Status.Addresses`, but it does not know | ||
whether the cluster it is running in is supposed to be single-stack | ||
IPv4, single-stack IPv6, or dual stack. If a node has both IPv4 and IPv6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be more bang for our buck to just add new node address types for v6 and call it a day. If we add a InternalIPv6
and ExternalIPv6
, users can configure the priority based on --kubelet-preferred-address-types
. The strategic merge patch issue would still exist but I don't think it's a major issue since we have workarounds in both the kubelet and the cloud-controller-manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloud-controller-manager sets Node.Status.Addresses
directly, but it doesn't get any input from kubelet on how to do it. So to implement a --preferred-address-types
arg in kubelet we need to improve kubelet/CCM communication as well. (Presumably via a Node annotation, which is how --node-ip
gets passed.)
I should call this out more explicitly as a problem in the Motivation section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--kubelet-preferred-address-types
is an (existing) apiserver flag btw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be more bang for our buck to just add new node address types for v6 and call it a day.
I was mostly only fiddling with NodeAddressType
because if we're going to replace Node.Status.Addresses
then we might as well try to fix all the problems with it at once. (Did I miss any?) If we aren't going replace it, I'm not sure it's super important to distinguish more address types. Calling globally-routable IPv6 addresses InternalIP
is confusing but it works fine.
(And changing existing IPv6 addresses to InternalIPv6
will immediately break existing code that only looks at InternalIP
and ExternalIP
. This is another reason why adding the new fields is nice; because we can still leave everything as it was in the old fields.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And changing existing IPv6 addresses to InternalIPv6 will immediately break existing code that only looks at InternalIP and ExternalIP. This is another reason why adding the new fields is nice; because we can still leave everything as it was in the old fields.)
Good point, my assumption is that this only breaks when going from IPv6 -> Dual Stack both of which are alpha features.
If we aren't going replace it, I'm not sure it's super important to distinguish more address types. Calling globally-routable IPv6 addresses InternalIP is confusing but it works fine.
The real benefit is allowing users to configure the preferred family. Setting the type for an IPv6 address to InternalIP
works for IPv6 only. In dual-stack it falls apart since order matters and the order is up to the cloud provider. It's not really feasible for the cloud provider to know the preferred order of address I think. This should be up to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware of apiserver --kubelet-preferred-address-type
. Need to think about how that fits into this.
This is basically for the case where the masters are on a separate network from the nodes and can't reach the "internal" IPs, right?
At first glance, it seems like we ought to be able to decide whether apiserver→node communication should be IPv4 or IPv6 based on whether the apiserver is advertising an IPv4 or an IPv6 IP to kubernetes.default
. If node→apiserver communication is going to be over IPv6 then it seems to make sense to assume that apiserver→node communication should also be IPv6?
Note that anything deprecated in v1 will not be removed until a hypothetical v2, so the existing field will need to continue to be populated/maintained in v1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no major objections to this idea, but I don't understand what all this information is being used for any more, so all I see is a lot of mess :)
## Design Details | ||
|
||
The `NodeAddress` of type `NodeHostName` will be moved to a new | ||
`Node.Status.Hostname` field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the meaning of this field? Is it the hostname as the node knows itself (e.g.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the answer to any of those questions... currently Addresses
contains a single element of type NodeHostName
and one or more IP/DNS elements. This proposal was just splitting the hostname out into a separate field from the IP/DNS stuff.
I had an "UNRESOLVED" section noting that I wasn't sure whether the hostname was currently required or optional. I guess I should check if it's even used...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a bunch of notes about what NodeHostName is used for...
``` | ||
|
||
The other information in `Node.Status.Addresses` will be moved to a | ||
new `Node.Status.IPs` array of type `NodeIP`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that property, but what value do we get from specifying it?
+1 |
Sorry, I found this KEP today so my comment is rather long; About InternalIPThe
In hind-sight
This definition is not restrictive enough. Say some special internal network is only reachable from 3 nodes out of 10, then the addresses are certainly "internal" but they can not be used as endpoints. In dual-stack at least one About auto-detecting (guessing) the InternalIP(=EndpointIP) addressesWhile this is convenient for the vast majority of K8s installstions it is totally impossible for some. For instance nodes can have say 15 interfaces with 1-3 ipv4 and 3-10 ipv6 addresses each (exaggerated example, but it can happen). It should be easy to satisfy booth needs, just leave an option to configure the InternalIP(=EndpointIP) addresses. The proposed For instance it can't be assumed that the default route points to a network that shall be used by traffic to services, we actually often separate external traffic to a different (less secure) network which has the default route. And I have tried to use multiple targets for the default route just to see if it works. It does not;
breaks all guess-by-default-route algorithms I have seen. About separating ipv4 and ipv6 addresses in configurationThis should not be necessary. It is easy to separate them when parsing. The only pitfall I can think of is the ipv6-mapped ipv4 addresses. Some user may try to specify;
but this is the same (ipv4) address. The go function |
Right. I did mention that in the description of "Unknown" ("
There was no proposal to get rid of manually-overridden node IPs, only to make it work consistently across different deployment types. (eg, currently you can't override the default node IP when using an external cloud provider, which is bad, for the reasons you give)
Any user who puts ipv6-mapped ipv4 addresses in their config files deserves whatever sort of wacky undefined behavior they get. |
1e33d13
to
bdd3405
Compare
I haven't pushed any updates to the KEP in a while, but kubernetes/kubernetes#95239 implements the subset that has been consistent through every iteration ("you should be able to specify dual-stack node IPs on the command line if you're using bare metal" and "hostnetwork pods on dual-stack nodes should have dual-stack pod IPs") I'm currently kind of thinking about Tim's " |
I think a reasonable step forward to make progress here would be to merge kubernetes/kubernetes#95239 (adds dual-stack support for I would even go as far to say that maybe we shouldn't support dual-stack for in-tree cloud providers at all, since:
|
So one issue in this KEP is the difference in behavior between in-tree cloud providers and external cloud providers with
I had been assuming we should make external providers behave more like in-tree providers here, but if we're going to ignore in-tree providers for this KEP, then maybe we shouldn't? Do you know if there are any outstanding issues/complaints about the fact that external cloud providers do not allow you override the choice of primary node IP? One issue that may have made the in-tree behavior more necessary before was that Well, actually, one case where it's important is if you want to try to create a single-stack IPv6 cluster in a dual-stack cloud. In that case you need to force kubelet to choose an IPv6 IP rather than an IPv4 IP. (There may be some use case for forcing an "IPv6,IPv4" ordering of node IPs in a dual-stack cluster rather than "IPv4,IPv6" too, though in theory it mostly shouldn't matter.) So, definitely need a way to override the cloud IPv4-vs-IPv6 ordering. It's not clear if we need to be able to override the cloud IP address ordering in any other cases... |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Rotten issues close after 30d of inactivity. Send feedback to sig-contributor-experience at kubernetes/community. |
@fejta-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@danwinship do we still need the KEP? |
so, of the Goals in the KEP:
This was implemented in kubernetes/kubernetes#95239
95239 allows manually specifying dual-stack node IPs for bare metal clusters on the command line. It does not do dual-stack autodetection.
This is not done, but also, it seems like kubernetes/kubernetes#86918 (dual-stack IPs on AWS) may merge now, and people are just leaning toward "well if your cluster has pods that will get confused by seeing dual-stack node addresses then you shouldn't be running on nodes with dual-stack IPs".
This is not done but as discussed in #1665 (comment) it's possibly not important, unless we care about letting people install single-stack IPv6 clusters on dual-stack cloud nodes. Arguably we should not care about that, since clouds that don't allow provisioning single-stack IPv6 nodes are probably not going to be able to bring up single-stack IPv6 clusters anyway (eg, due to IPv4-only DNS or other things like that).
Not done There was also discussion in the PR about introducing alternative Also I think at some point there was discussion about the fact that we never pluralized It would be cool to have some kubelet option for better node IP autodetection. eg, the default should be something like "the first non-secondary non-deprecated IP on the lowest-ifindexed NIC that contains a default route which is not out-prioritied by any other default route", and there could maybe also additional modes ("use dns", "use interface ethX", "use the interface that has the route that would be used to reach IP W.X.Y.Z") Combining also with some of the discussion in kubernetes/kubernetes#95768, and also kind of kubernetes/kubernetes#96981, I'd say maybe there's an argument for having a mode where Node.Status.Addresses always contains exactly one NodeInternalIP (and nothing else) on single-stack, and exactly two NodeInternalIPs (and nothing else) on dual-stack. |
besides having one or more, I think that it will be good to define clearly the "primary" IP of the node, the one that is going to be used as source for all the communications ... that will avoid lot of problems with multiples interfaces, asymmetric routing and consequences of wrong networking configurations from the operators .. having only one solves the problem immediately :) |
The "primary" IP of the node is already clearly defined; it's the one that kubelet sets as the
Node.Status.Addresses is more about what IP we promise can be used as a target than it is about what IP we think will be used as a source. Almost no clients ever explicitly choose their source IP, and if you have non-trivial routing, then it is possible that one node IP will be used for some purposes and another IP will be used for other purposes. (eg, with many network plugins, hostNetwork-to-pod traffic will not use the official node IP as its source address but will instead use an undocumented IP associated with the virtual network interface that connects the node to the pod network) |
There are problems with kubelet/cloud-provider communication regarding
Node.Status.Addresses
that currently break/complicate some IPv6 and dual-stack scenarios.Additionally,Node.Status.Addresses
has unclear and inconsistent semantics, and has a note attached to it in the documentation warning you to not use it with strategic merge patch because of an incorrect but unfixable annotation.To resolve these problems, this KEP proposes deprecating the existingNode.Status.Addresses
field and replacing it with two new fields,Node.Status.Hostname
andNode.Status.IPs
, with clearer and more consistent semantics.This proposes a fix.
Enhancement: #1664
/cc @thockin @aojea @khenidak @lachie83
for the ipv6/dual-stack bits
/cc @liggitt @andrewsykim @cheftako
who were involved in the node and cloud bits of #79391 ("Don't use strategic merge patch on Node.Status.Addresses") and so might have thoughts on this. Feel free to uncc and/or cc other people. I'm not sure who the right people to look at this are in sig-node or sig-cloud-provider.
ftr I initially wrote this as "Clarify the semantics of Node.Status.Addresses and --node-ip", and in the "Alternatives" section, noted that we could maybe deprecate Node.Status.Addresses instead, and then I started thinking that that was really a better idea, so I made that the plan, and left not deprecating as an "Alternative". So anyway, that's also totally a possibility.