New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set secondary address on host-network pods #106715
Conversation
@aojea: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @danwinship @khenidak @thockin |
@aojea: GitHub didn't allow me to request PR reviews from the following users: hanamantagoudvk. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@@ -3284,6 +3284,7 @@ func TestGenerateAPIPodStatusHostNetworkPodIPs(t *testing.T) { | |||
criPodIPs: []string{"192.168.0.1"}, | |||
podIPs: []v1.PodIP{ | |||
{IP: "192.168.0.1"}, | |||
{IP: "fd01::1234"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting corner case from DanWinships test cases
name: "CRI PodIPs override NodeAddresses",
nodeAddresses: []v1.NodeAddress{
{Type: v1.NodeInternalIP, Address: "10.0.0.1"},
{Type: v1.NodeInternalIP, Address: "fd01::1234"},
},
criPodIPs: []string{"192.168.0.1"},
podIPs: []v1.PodIP{
{IP: "192.168.0.1"},
{IP: "fd01::1234"},
CRIO and containerd doesn't send IPs for host network pods
https://github.com/cri-o/cri-o/blob/8d4d158935929800c4300b82eb4b5a83ded400f4/server/sandbox_network.go#L34-L36
but they can.
I think that the discussion here is if override means removing IPs or just updating the ones that it sets leaving the others as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think override means "ignore CRI".
Do we think it is EVER correct for CRI to return host-net IPs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we think it is EVER correct for CRI to return host-net IPs?
I think we all agree that is a bad idea, but it seems that it is the way it was #106715 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danwinship what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If nothing else we should specify what Kubernetes expects host-network pod IPs to be (order, number, type). I don't like leaving that an undefined wild-west.
And I think we should enforce that expectation. If there are corner cases that are currently possible that we break, I think a KEP should get written to implement that corner-case correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for reference, the CRI IPs are obtained in kuberuntime in GetPodStatus()
podIPs = m.determinePodSandboxIPs(namespace, name, podSandboxStatus) |
and it seems they are stored in the PLEG cache
kubernetes/pkg/kubelet/pleg/generic.go
Lines 407 to 419 in 571e067
status, err := g.runtime.GetPodStatus(pod.ID, pod.Name, pod.Namespace) | |
if klog.V(6).Enabled() { | |
klog.V(6).ErrorS(err, "PLEG: Write status", "pod", klog.KRef(pod.Namespace, pod.Name), "podStatus", status) | |
} else { | |
klog.V(4).ErrorS(err, "PLEG: Write status", "pod", klog.KRef(pod.Namespace, pod.Name)) | |
} | |
if err == nil { | |
// Preserve the pod IP across cache updates if the new IP is empty. | |
// When a pod is torn down, kubelet may race with PLEG and retrieve | |
// a pod status after network teardown, but the kubernetes API expects | |
// the completed pod's IP to be available after the pod is dead. | |
status.IPs = g.getPodIPs(pid, status) | |
} |
This cache is used to syncPod()
from the PodWorker
kubernetes/pkg/kubelet/pod_workers.go
Line 904 in 571e067
err = p.syncPodFn(ctx, update.Options.UpdateType, pod, update.Options.MirrorPod, status) |
and generateAPIPodStatus()
kubernetes/pkg/kubelet/kubelet.go
Line 1554 in 571e067
apiPodStatus := kl.generateAPIPodStatus(pod, podStatus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems there is no way to differentiate between:
- the CRI sets the IPs from CRI
- CRI does NOT return IPs, but Pod status IPs are updated incrementally,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danwinship what do you think?
I'm happy to declare that it was a bug that we ever let the CRI set hostNetwork pod IPs, but we should probably update the CRI docs to clarify this if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the commit with the CRI API change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this sounds good from the CRI API perspective 👍
/test pull-kubernetes-e2e-kind |
why wouldn't it be? / why wouldn't we expect you to restart kubelet in that case?
So, originally, cri-o did send IPs for host network pods, and kubelet just accepted them. But this was a bad idea, because it means that if you wanted to override kubelet's I'm not sure what the historical situation with containerd was. The unit test is testing that if the CRI does set the pod IPs, that kubelet continues to behave like it did historically, so as to not break CRIs that still do that. |
Because that is the issue reported, and kubelet will race again 😄 #105320
agree, that is why I've commented here #106715 (comment)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall, I think the simpler and more obvious this is, the better. I like this.
a9c536e
to
ae8c0e3
Compare
ae8c0e3
to
50af04d
Compare
@@ -445,6 +445,7 @@ message PodIP{ | |||
string ip = 1; | |||
} | |||
// PodSandboxNetworkStatus is the status of the network for a PodSandbox. | |||
// Ignored for pods sharing the host networking namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrunalp you can find the context in this thread https://github.com/kubernetes/kubernetes/pull/106715/files#r758454031
50af04d
to
ae64f5f
Compare
@thockin @danwinship I've updated the CRI API , can you please take another look? |
pkg/kubelet/kubelet_pods.go
Outdated
// If the Node changes its IPs the user has to restart the Kubelet | ||
// so the HostNetwork pods update their IPs. We may change this in a future | ||
// https://github.com/kubernetes/enhancements/pull/3000 | ||
// but since it changes the current behavior "here be dragons" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure you need to explain that much. Also, you spend several sentences explaining that podIPs
is immutable, and then add code that changes it.
// HostNetwork Pods inherit the node IPs as PodIPs. They are immutable once set,
// other than that if the node becomes dual-stack, we add the secondary IP.
(though probably we should remove the second IP on dual-stack-to-single-stack downgrade too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I see my gap on the language, I try to mean that once a PodIP exist that never changes, however, you can add a new IP is not the array that is immutable (it can grow), is the element in the array that can't change once set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(though probably we should remove the second IP on dual-stack-to-single-stack downgrade too)
this can open a new set of corner cases and we don't have a good history here, per example, I think that all the dual stack services that are stored will fail in validation on a migration like this.
@@ -445,6 +445,7 @@ message PodIP{ | |||
string ip = 1; | |||
} | |||
// PodSandboxNetworkStatus is the status of the network for a PodSandbox. | |||
// Ignored for pods sharing the host networking namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a chance we might add new fields in the future that it wouldn't make sense to ignore for hostNetwork pods? Maybe hedge a little bit here and prepend "Currently" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great point
host-network pods IPs are obtained from the reported kubelet nodeIPs. Historically, host-network podIPs are immutable once set, but when we've added dual-stack support, we didn't consider that the secondary IP address may not be present at the same time that the primary nodeIP. If a secondary IP address is added to a node after the host-network pods IPs are set, we can add the secondary host-network pod IP address maintaining the current behavior of not updating the current podIPs on host-network pods.
ae64f5f
to
c516320
Compare
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
s.PodIPs = []v1.PodIP{{IP: s.PodIP}} | ||
if len(hostIPs) == 2 { | ||
// HostNetwork Pods inherit the node IPs as PodIPs. They are immutable once set, | ||
// other than that if the node becomes dual-stack, we add the secondary IP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also needto downgrade? I feel like eventually we'll have to revisit this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/assign @mrunalp |
c516320
to
bc8e7ac
Compare
updated the CRI-API godoc on v1 and v1alpha2 too |
/approve |
Thanks! /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aojea, mrunalp, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…-of-#106715-upstream-release-1.23 Automated cherry pick of #106715: set secondary address on host-network pods
host-network pods IPs are obtained from the reported kubelet nodeIPs.
Historically, host-network podIPs are immutable once set, but when
we've added dual-stack support, we didn't consider that the secondary
IP address may not be present at the same time that the primary nodeIP.
If a secondary IP address is added to a node after the host-network pods
IPs are set, we can add the secondary host-network pod IP address
maintaining the current behavior of not updating the current podIPs on
host-network pods.
This is an alternative to #105410 ,
without having to change the kubelet and host-network podIPs behavior
xref: kubernetes/enhancements#3000
/kind bug
Fixes: #105320