-
Notifications
You must be signed in to change notification settings - Fork 41.9k
memorymanager: fix checkpoint file comparison #127074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memorymanager: fix checkpoint file comparison #127074
Conversation
|
Hi @Tal-or. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
6f18f11 to
fddd074
Compare
|
/ok-to-test |
fddd074 to
dc2170e
Compare
|
/retest |
ffromani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provisional LGTM (need to do another pass).
The ordering of allocation should not matter indeed, only the totals should matter
ffromani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provisional LGTM
since we cannot guarantee we recreate pods in the very same ordering across restarts, there are multiple legal representations (even including pinning) of the same memory state, thus we should check the aggregate values
…start (cherry picked from commit b91951f)
(cherry picked from commit de03335)
(cherry picked from commit 91a9a19)
For a resource within a group, such as memory, we should validate the total `Free` and total `Reserved` size of the expected `machineState` and state restored from checkpoint file after kubelet start. If total `Free` and total `Reserved` are equal, the restored state is valid. The old comparison however was done by reflection. There're times when the memory accounting is equals but the allocations across the NUMA nodes are varies. In such cases we still need to consider the states as equals. Signed-off-by: Talor Itzhak <titzhak@redhat.com>
4459d6b to
18696da
Compare
perform the memoryStates comparison in helper function Signed-off-by: Talor Itzhak <titzhak@redhat.com>
18696da to
d64f34e
Compare
ffromani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
LGTM label has been added. Git tree hash: b5f370a652a9e22d8f83920885c9f39569958492
|
|
/test pull-kubernetes-node-kubelet-serial-containerd |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mrunalp, Tal-or The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
For a resource within a group, such as memory,
we should validate the total
Freeand totalReservedsize of the expectedmachineStateand state restored from checkpoint file after kubelet start.If total
Freeand totalReservedare equal, the restored state is valid.The old comparison however was done by reflection.
There're times when the memory accounting is equals
but the allocations across the NUMA nodes are varies.
In such cases we still need to consider the states as equals.
Which issue(s) this PR fixes:
Fixes ##113130
Special notes for your reviewer:
This PR use as a replacement for #114501 which is not active for a long time now and is needed for MemoryManager GA graduation.
see: kubernetes/enhancements#1769 (comment)
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: