Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Honor status.podIP over status.podIPs and node.spec.podCIDR over node.spec.PodCIDRs when mismatched #88505

Merged
merged 1 commit into from
Mar 3, 2020

Conversation

liggitt
Copy link
Member

@liggitt liggitt commented Feb 25, 2020

What type of PR is this?
/kind api-change
/kind bug

What this PR does / why we need it:

When status.podIP and status.podIPs[0] do not agree, honor status.podIP as the older API field, rather than failing to convert.

This is required to support Kubelets < 1.16 running against API servers >= 1.16.

Kubelets >= 1.16 set both podIP and podIPs:
https://github.com/kubernetes/kubernetes/blob/release-1.16/pkg/kubelet/kubelet_pods.go#L1386-L1396

Kubelets < 1.16 only set podIP:
https://github.com/kubernetes/kubernetes/blob/release-1.15/pkg/kubelet/kubelet_pods.go#L1384-L1385

However, since kubelets update pod status with a patch API request, existing unknown fields are preserved. That means a 1.15 kubelet, when trying to modify a podIP, will only update the podIP field and not podIPs.

When node.spec.podCIDR and node.spec.podCIDRs[0] do not agree, honor node.spec.podCIDR as the older API field, rather than failing to convert.

prior to 1.16, the rangeAllocator and cloudCIDRAllocator controllers updated node.spec.podCIDR using a patch:
https://github.com/kubernetes/kubernetes/blob/release-1.15/pkg/controller/nodeipam/ipam/adapter.go#L92-L106

in 1.16+, there are three methods to update node specs, all of which use patch:

These patch methods either just set the old node.spec.podCIDR field, or they set both the old and new podCIDR/podCIDRs field (for paths that are multi-CIDR-aware).

Which issue(s) this PR fixes:
Fixes #88494

Does this PR introduce a user-facing change?:

Fixes a regression with clients prior to 1.15 not being able to update podIP in pod status, or podCIDR in node spec, against >= 1.16 API servers

/cc @thockin @khenidak
/sig network

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 25, 2020
@liggitt liggitt changed the title Honor status.podIP over status.podIPs Honor status.podIP over status.podIPs when mismatched Feb 25, 2020
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Feb 25, 2020
@liggitt liggitt added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Feb 25, 2020
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Feb 25, 2020
@liggitt
Copy link
Member Author

liggitt commented Feb 25, 2020

/retest

@fejta-bot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@gongguan
Copy link
Contributor

gongguan commented Feb 25, 2020

naive question:
If honor status.podIP while status.podIP and status.podIPs[0] do not agree, should https://github.com/kubernetes/kubernetes/blob/release-1.16/pkg/kubelet/kubelet_pods.go#L1395 be apiPodStatus.PodIP = pod.Status.PodIP?

@liggitt
Copy link
Member Author

liggitt commented Feb 25, 2020

If honor status.podIP while status.podIP and status.podIPs[0] do not agree, should https://github.com/kubernetes/kubernetes/blob/release-1.16/pkg/kubelet/kubelet_pods.go#L1395 be apiPodStatus.PodIP = pod.Status.PodIP?

No, the kubelet is the writer, and that function is generating the API status it should set. This PR is fixing the API server's treatment of requests made by older kubelets that weren't aware of the PodIPs field

@liggitt
Copy link
Member Author

liggitt commented Feb 25, 2020

/retest
storage node pod capacity flake

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure on this.

https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md (search for "plural")

Upon any mutating API operation:
  * If only the singular field is specified (e.g. an older client), API logic
    must populate plural[0] from the singular value, and de-dup the plural
    field.
  * If only the plural field is specified (e.g. a newer client), API logic must                                                                                  
    populate the singular value from plural[0].
  * If both the singular and plural fields are specified, API logic must
    validate that the singular value matches plural[0].
  * Any other case is an error and must be rejected.

Clearly that isn't correct in the face of patch.

This would allow someone to send a PUT with podIP: X and podIPs: [ Y, Z ] which silently nukes podIPs, which seems wrong (I think?).

Would it be saner to do this silent change ONLY when dealing with some sort of patch? Is that even possible to know?

Either way, the documented bits are incorrect...

@liggitt
Copy link
Member Author

liggitt commented Feb 26, 2020

patch is indistinguishable from put that left the unpatched fields unmodified.

if you want to do delta-specific behavior, you need the old object, which means you cannot do it in conversion, and would need to represent both the plural and singular field in the internal type and deal with the logic in the rest handler PrepareForUpdate function

@thockin
Copy link
Member

thockin commented Feb 26, 2020 via email

@liggitt
Copy link
Member Author

liggitt commented Mar 1, 2020

/retest

@liggitt
Copy link
Member Author

liggitt commented Mar 1, 2020

/test pull-kubernetes-conformance-image-test

@liggitt
Copy link
Member Author

liggitt commented Mar 2, 2020

The referenced guidance assumed the ability to detect change in a field coming from a client (emphasis added) relative to the currently persisted value:

For this purpose "is specified" means the following:

  • On a create or patch operation: the field is present in the user-provided input
  • On an update operation: the field is present and has changed from the current value

The place that has visibility to the old and new pod on updates is working with internal objects, at which point the information about the plural/singular field content has been lost in conversion (the internal object only has the plural field). That means we cannot follow that guidance with the internal pod types as they stand today.

Since this impacts upgrades to 1.16, and the fix here will need to be picked back 2 releases, I'm inclined to resolve this bug in favor of old kubelet clients, and make the singular podIP rule in case of mismatch, rather than try to rework the pod internal types and conversion.

@thockin
Copy link
Member

thockin commented Mar 2, 2020

I guess I don't see a choice. @khenidak for last confirm

@thockin
Copy link
Member

thockin commented Mar 2, 2020

/lgtm
/hold

@khenidak PTAL or we will remove the hold . We'll have to make the same change on nodeIPs. Anywhere else?

EDIT: nodeIPs is not a thing :) I swear we had 2 cases we had to handle?

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Mar 2, 2020
@liggitt
Copy link
Member Author

liggitt commented Mar 2, 2020

looking at our handling of podCIDR/podCIDRs, which looks like it has a similar problem

@liggitt
Copy link
Member Author

liggitt commented Mar 2, 2020

prior to 1.16, the rangeAllocator and cloudCIDRAllocator controllers updated node.spec.podCIDR using a patch:

func (a *adapter) UpdateNodePodCIDR(ctx context.Context, node *v1.Node, cidrRange *net.IPNet) error {
patch := map[string]interface{}{
"apiVersion": node.APIVersion,
"kind": node.Kind,
"metadata": map[string]interface{}{"name": node.Name},
"spec": map[string]interface{}{"podCIDR": cidrRange.String()},
}
bytes, err := json.Marshal(patch)
if err != nil {
return err
}
_, err = a.k8s.CoreV1().Nodes().Patch(node.Name, types.StrategicMergePatchType, bytes)
return err
}

in 1.16+, there are three methods to update node specs, all of which use patch:

// PatchNodeCIDRs patches the specified node.CIDR=cidrs[0] and node.CIDRs to the given value.
func PatchNodeCIDRs(c clientset.Interface, node types.NodeName, cidrs []string) error {
rawCidrs, err := json.Marshal(cidrs)
if err != nil {
return fmt.Errorf("failed to json.Marshal CIDRs: %v", err)
}
rawCidr, err := json.Marshal(cidrs[0])
if err != nil {
return fmt.Errorf("failed to json.Marshal CIDR: %v", err)
}
// set the pod cidrs list and set the old pod cidr field
patchBytes := []byte(fmt.Sprintf(`{"spec":{"podCIDR":%s , "podCIDRs":%s}}`, rawCidr, rawCidrs))

// PatchNodeCIDR patches the specified node's CIDR to the given value.
func PatchNodeCIDR(c clientset.Interface, node types.NodeName, cidr string) error {
raw, err := json.Marshal(cidr)
if err != nil {
return fmt.Errorf("failed to json.Marshal CIDR: %v", err)
}
patchBytes := []byte(fmt.Sprintf(`{"spec":{"podCIDR":%s}}`, raw))

func (a *adapter) UpdateNodePodCIDR(ctx context.Context, node *v1.Node, cidrRange *net.IPNet) error {
patch := map[string]interface{}{
"apiVersion": node.APIVersion,
"kind": node.Kind,
"metadata": map[string]interface{}{"name": node.Name},
"spec": map[string]interface{}{"podCIDR": cidrRange.String()},
}
bytes, err := json.Marshal(patch)
if err != nil {
return err
}
_, err = a.k8s.CoreV1().Nodes().Patch(node.Name, types.StrategicMergePatchType, bytes)

These patch methods either just set the old node.spec.podCIDR field, or they set both the old and new podCIDR/podCIDRs field (for paths that are multi-CIDR-aware).

That means our conversion handling of podCIDR/podCIDRs must follow the same approach as podIP/podIPs: if the singular and plural[0] disagree, assume we're dealing with an old client and let the singular rule.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 2, 2020
@liggitt liggitt changed the title Honor status.podIP over status.podIPs when mismatched Honor status.podIP over status.podIPs and node.spec.podCIDR over node.spec.PodCIDRs when mismatched Mar 2, 2020
@liggitt
Copy link
Member Author

liggitt commented Mar 2, 2020

Updated with identical handling and tests for podCIDR/podCIDRs

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 2, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@thockin
Copy link
Member

thockin commented Mar 2, 2020 via email

@liggitt
Copy link
Member Author

liggitt commented Mar 2, 2020

I'll make sure we have clean CI signal on this and all the cherry-picks, then plan to unhold

@liggitt
Copy link
Member Author

liggitt commented Mar 2, 2020

/retest

@liggitt
Copy link
Member Author

liggitt commented Mar 2, 2020

picks are green.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 2, 2020
@k8s-ci-robot k8s-ci-robot merged commit f221dbb into kubernetes:master Mar 3, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Mar 3, 2020
@liggitt liggitt deleted the pod-ip-patch branch March 9, 2020 12:52
k8s-ci-robot added a commit that referenced this pull request Mar 9, 2020
…5-upstream-release-1.16

Automated cherry pick of #88505: Honor status.podIP over status.podIPs and node.spec.podCIDR over node.spec.PodCIDRs when mismatched
k8s-ci-robot added a commit that referenced this pull request Mar 10, 2020
…5-upstream-release-1.17

Automated cherry pick of #88505: Honor status.podIP over status.podIPs and node.spec.podCIDR over node.spec.PodCIDRs when mismatched
@aojea aojea mentioned this pull request Oct 20, 2020
27 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Daemonset pod status PATCH errors after upgrading cluster from 1.14 to 1.16
6 participants