Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet: memory manager: fix preferred topology hints calculation #104689

Conversation

cynepco3hahue
Copy link

What type of PR is this?

/kind bug

What this PR does / why we need it:

Prevent starting pods with resources satisfied by a single NUMA node on multiple NUMA nodes.
The code returned before it updated the minimal amount of NUMA nodes that can satisfy the container
requests.

Which issue(s) this PR fixes:

Fixes #104682

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Signed-off-by: Artyom Lukianov alukiano@redhat.com

…ation

Prevent starting pods with resources satisfied by a single NUMA node on multiple NUMA nodes.
The code returned before it updated the minimal amount of NUMA nodes that can satisfy the container
requests.

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 31, 2021
@cynepco3hahue
Copy link
Author

/sig node

@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 31, 2021
@cynepco3hahue
Copy link
Author

/assign @klueska

@cynepco3hahue
Copy link
Author

/test pull-kubernetes-unit

@cynepco3hahue
Copy link
Author

/test

@k8s-ci-robot
Copy link
Contributor

@cynepco3hahue: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

  • /test pull-kubernetes-dependencies
  • /test pull-kubernetes-dependencies-go-canary
  • /test pull-kubernetes-files-remake
  • /test pull-kubernetes-e2e-gce
  • /test pull-kubernetes-e2e-gce-no-stage
  • /test pull-kubernetes-e2e-gce-canary
  • /test pull-kubernetes-e2e-gce-ubuntu
  • /test pull-kubernetes-e2e-gce-ubuntu-containerd
  • /test pull-kubernetes-e2e-gce-ubuntu-containerd-canary
  • /test pull-kubernetes-integration
  • /test pull-kubernetes-integration-go-canary
  • /test pull-kubernetes-e2e-kind
  • /test pull-kubernetes-e2e-kind-ipv6
  • /test pull-kubernetes-conformance-kind-ga-only-parallel
  • /test pull-kubernetes-bazel-build-canary
  • /test pull-kubernetes-bazel-test-canary
  • /test pull-kubernetes-bazel-test-integration-canary
  • /test pull-kubernetes-unit
  • /test pull-kubernetes-unit-go-canary
  • /test pull-kubernetes-e2e-gce-network-proxy-http-connect
  • /test pull-kubernetes-node-e2e-containerd
  • /test pull-kubernetes-node-e2e-alpha
  • /test pull-kubernetes-e2e-gce-100-performance
  • /test pull-kubernetes-e2e-gce-big-performance
  • /test pull-kubernetes-e2e-gce-large-performance
  • /test pull-kubernetes-kubemark-e2e-gce-scale
  • /test pull-kubernetes-e2e-gce-large-performance-canary
  • /test pull-kubernetes-typecheck
  • /test pull-kubernetes-verify-govet-levee
  • /test pull-kubernetes-verify
  • /test pull-kubernetes-verify-go-canary

The following commands are available to trigger optional jobs:

  • /test pull-kubernetes-conformance-image-test
  • /test pull-kubernetes-conformance-kind-ipv6-parallel
  • /test pull-kubernetes-e2e-ipvs-azure-dualstack
  • /test pull-kubernetes-e2e-iptables-azure-dualstack
  • /test pull-kubernetes-e2e-gce-kubetest2
  • /test pull-kubernetes-e2e-gce-alpha-features
  • /test pull-kubernetes-e2e-gce-device-plugin-gpu
  • /test pull-kubernetes-cross
  • /test check-dependency-stats
  • /test pull-kubernetes-e2e-kind-canary
  • /test pull-kubernetes-e2e-kind-ipv6-canary
  • /test pull-kubernetes-conformance-kind-ga-only
  • /test pull-kubernetes-e2e-kops-aws
  • /test pull-kubernetes-local-e2e
  • /test pull-kubernetes-unit-experimental
  • /test pull-publishing-bot-validate
  • /test pull-kubernetes-e2e-aks-engine-windows-dockershim
  • /test pull-kubernetes-e2e-aks-engine-windows-containerd
  • /test pull-kubernetes-e2e-aks-engine-azure-disk-windows-dockershim
  • /test pull-kubernetes-e2e-aks-engine-azure-file-windows-dockershim
  • /test pull-kubernetes-e2e-capz-windows-dockershim
  • /test pull-kubernetes-e2e-aks-engine-gpu-windows-dockershim
  • /test pull-kubernetes-e2e-aks-engine-azure-disk-windows-containerd
  • /test pull-kubernetes-e2e-aks-engine-azure-file-windows-containerd
  • /test pull-kubernetes-e2e-capz-azure-disk
  • /test pull-kubernetes-e2e-capz-azure-disk-vmss
  • /test pull-kubernetes-e2e-capz-azure-file
  • /test pull-kubernetes-e2e-capz-azure-file-vmss
  • /test pull-kubernetes-e2e-capz-conformance
  • /test pull-kubernetes-e2e-capz-ha-control-plane
  • /test pull-kubernetes-e2e-gce-network-proxy-grpc
  • /test pull-kubernetes-e2e-gci-gce-autoscaling
  • /test pull-kubernetes-e2e-kind-dual-canary
  • /test pull-kubernetes-e2e-kind-ipvs-dual-canary
  • /test pull-kubernetes-e2e-gci-gce-ingress
  • /test pull-kubernetes-e2e-ubuntu-gce-network-policies
  • /test pull-kubernetes-e2e-gci-gce-ipvs
  • /test pull-kubernetes-node-e2e
  • /test pull-kubernetes-node-e2e-podutil
  • /test pull-kubernetes-e2e-containerd-gce
  • /test pull-kubernetes-node-e2e-containerd-features
  • /test pull-kubernetes-node-kubelet-serial
  • /test pull-kubernetes-node-kubelet-serial-containerd
  • /test pull-kubernetes-node-kubelet-eviction
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager
  • /test pull-kubernetes-node-kubelet-serial-topology-manager
  • /test pull-kubernetes-node-kubelet-serial-hugepages
  • /test pull-kubernetes-node-crio-cgrpv2-e2e
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv1
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
  • /test pull-kubernetes-node-crio-e2e
  • /test pull-kubernetes-node-kubelet-serial-memory-manager
  • /test pull-kubernetes-node-memoryqos-cgrpv2
  • /test pull-kubernetes-node-swap-ubuntu
  • /test pull-kubernetes-node-swap-fedora
  • /test pull-kubernetes-e2e-gce-correctness
  • /test pull-kubernetes-kubemark-e2e-gce-big
  • /test pull-kubernetes-e2e-gce-storage-slow
  • /test pull-kubernetes-e2e-gce-storage-snapshot
  • /test pull-kubernetes-e2e-gce-csi-serial
  • /test pull-kubernetes-e2e-gce-iscsi
  • /test pull-kubernetes-e2e-gce-iscsi-serial
  • /test pull-kubernetes-e2e-gce-storage-disruptive
  • /test pull-kubernetes-e2e-windows-gce

Use /test all to run the following jobs that were automatically triggered:

  • pull-kubernetes-dependencies
  • pull-kubernetes-e2e-gce-ubuntu-containerd
  • pull-kubernetes-integration
  • pull-kubernetes-e2e-kind
  • pull-kubernetes-e2e-kind-ipv6
  • pull-kubernetes-conformance-kind-ga-only-parallel
  • pull-kubernetes-unit
  • pull-kubernetes-node-e2e-containerd
  • pull-kubernetes-e2e-gce-100-performance
  • pull-kubernetes-typecheck
  • pull-kubernetes-verify-govet-levee
  • pull-kubernetes-verify

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cynepco3hahue
Copy link
Author

/test pull-kubernetes-node-kubelet-serial-memory-manager

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Aug 31, 2021

@cynepco3hahue: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-node-kubelet-serial-memory-manager 9ea9798 link /test pull-kubernetes-node-kubelet-serial-memory-manager

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@pacoxu
Copy link
Member

pacoxu commented Sep 1, 2021

/triage accepted
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 1, 2021
@pacoxu pacoxu added this to Needs Reviewer in SIG Node PR Triage Sep 1, 2021
Copy link
Contributor

@matthyx matthyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 2, 2021
@cynepco3hahue
Copy link
Author

The memory manager lane fails because of the lack of a Dynamic Kubelet feature gate.

@249043822
Copy link
Member

This change makes caculating resources first to ensure set minAffinitySize correctly, looks nice.

@ehashman ehashman moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage Sep 3, 2021
@cynepco3hahue cynepco3hahue changed the title kubelet: memory manager: fix topology preferred topology hints calculation kubelet: memory manager: fix preferred topology hints calculation Sep 14, 2021
@klueska
Copy link
Contributor

klueska commented Oct 1, 2021

Can you explain this change a bit better? It's not immediately obvious from the code change what the old semantics were vs. what the desired new semantics should be. Which makes it hard to verify that the new semantics are correct.

@cynepco3hahue
Copy link
Author

cynepco3hahue commented Oct 3, 2021

Can you explain this change a bit better? It's not immediately obvious from the code change what the old semantics were vs. what the desired new semantics should be. Which makes it hard to verify that the new semantics are correct.

Sure, will try to explain it better. The problem is that during bitmask.IterateBitMasks when we are checking all possible permutations of NUMA nodes it is possible that we will return from the method for the current permutation before updating the minAffinitySize. And once we are calculating if the hins are preferred or not, we can get the wrong result because of the wrong minAffinitySize. The PR only re-organizes a little bit of the code that we will sure that we always will update the minAffinitySize.

@klueska Please let me know if you need more details.

Comment on lines -441 to -445
// the node already in group with another node, it can not be used for the single NUMA node allocation
if singleNUMAHint && len(machineState[maskBits[0]].Cells) > 1 {
return
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So basically you move this call below the loop (so we don't break out early), and then repurpose the loop to only update minAffinitySize(if possible).

Then, if you make it through this loop (with a new minAffinity or one equal to a previously calculated minAffinity), only then do you go through the process of calculating the set of hints using (basically) the same logic you did before.

I didn't verify in this review that the existing logic is necessarily correct (I assume we did that in a previous PR), but the minor refactoring here seems makes sense.

@klueska
Copy link
Contributor

klueska commented Oct 5, 2021

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cynepco3hahue, klueska, matthyx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 5, 2021
@k8s-ci-robot k8s-ci-robot merged commit c91f9bd into kubernetes:master Oct 5, 2021
SIG Node PR Triage automation moved this from Needs Approver to Done Oct 5, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Memory Manager allows Guaranteed QoS Pod with hugepages requested is exactly equal to the left over Hugepages
6 participants