New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug in TopologyManager with merging hints when NUM_NUMA > 2 #108052
Conversation
@klueska: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig node |
/assign @fromanirh @swatisehgal |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: klueska The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Before this fix, hint permutations such as: permutation: [{11 true} {0101 true}] Could result in merged hints of: mergedHint: {01 true} This was possible because both hints in the permutation container a "preferred" allocation (i.e. the full set of NUMA nodes set in the affinity bitmask are *required* to satisfy the allocation). With this in place, the simplified logic we had simply kept the merged hint as preferred as well. However, what we really want is to ensure that the merged hint is only preferred if *true* alignment of all resources is possible (i.e. if all hints in the permutation are preferred AND their affinities are exactly equal). The only exception to this is if *no* topology information is provided by a given hint provider. In this case, we assume alignment doesn't matter and only consider the resources that actually have hints provided for them. This changes the semantics of permutations of the form: permutation: [{111 true} {011 true}] To now result in the merged hint of: mergedHint: {011 false} Instead of: mergedHint: {011 true} This is arguably how it should always have been though (because a hint should not be preferred if true alignment isn't possible), and two tests have had to change to accomodate these new semantics. This commit changes the merge function to implement the updated logic, adds a test to verify it is functioning correctly, and updates the two tests mentioned above to adjust to the new semantics. Signed-off-by: Kevin Klues <kklues@nvidia.com>
3e35197
to
155562d
Compare
We may need to follow-up to ensure that "better" hints are now generated in the case of the |
the only thing is that is not obvious to me is:
I thought that Besides that, LGTM. I'll have another pass shortly and most likely add the tag. |
If there are any concerns at all with changing those semantics, then we shouldn't rush things. To answer your question though, imagine this set of possible permutations:
With the old logic, the first will result in a merged hint of:
And the second will result in:
The one we want is the first one (i.e. The reason we prefer |
Thanks, this helps. Your example makes a lot of sense. I was thinking about more general cases and I'm realizing more and more I was a bit blindsided by the short-circuit of the approximation I mentioned above. Your change fixes a clear bug and the end behaviour, albeit perhaps a bit less intuitive, looks indeed more correct. LGTM, but putting hold to let other reviewers chime in. Feel free to remove when you see fit. |
/lgtm |
The other option (to keep the original semantics) would be to not check for equality of all hints in the permutation to decide if the merged hint should be The implications of this though, are that there is no way to know for sure that perfect alignment was satisfied (which is what the |
I think the new semantics are the more "correct" way of interpreting the merged hints, but (as I mentioned before) it means that we will almost always end up with a While technically correct (because it's best-effort), we could do better if more information were encoded into the set of For example:
We would actually rather have the first, but the second will be chosen. |
I was having a hard time breaking the mental model too given that we always preferred a more narrower selection but based on the discussion above, I think it makes sense as we are optimizing the selection of merged hint to identify the most appropriate hint in order to optimize the NUMA alignment of resources. Thanks for the detailed explanation @klueska. I think the explanation in the comment section (#108052 (comment) and #108052 (comment)) captures the rationale behind this change much better so would be better to capture that in the PR description as well as the commit message. Even though this is a Topology Manager internal semantic change, I am conscious that Topology Manager is a Beta feature and a changelog entry might not be sufficient here. I sense that this change could potentially cause confusion when it comes to the expected behavior as the way hints are merged ultimately influences the NUMA node selection. It will invalidate the content in Topology Manager blog post which I assume is a reference point for anyone who is looking to understand the internal workings of Topology Manager. Even though there is a note that indicates that the article is outdated we should do our due diligence to communicate this change to the rest of the SIG Node community. Sending an email to SIG Node mailing list will be a good starting point. /lgtm |
I'm happy to send an email update, but this truly is a bug fix (so anyone relying on the old semantics for some reason, really shouldn't have been). In terms of rejecting or admitting pods, this semantic change only affects what pods will now be allowed in by the The only "incorrect" info in the blog now is the details of the |
/hold cancel |
Thanks @klueska. Totally understand and agree that this fixes a bug, just trying to be proactive here. |
I've decided to hold off on cherry-picking these until we can see if there is a good way to do what I mention in #108052 (comment) about favoring one unpreferred allocation over another. |
I've updated the description to the PR as suggested and will send out a summary email of the changes once we merge the follow-up PR here: #108154 |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Before this fix, hint permutations such as:
Could result in merged hints of:
This was possible because both hints in the permutation contain a "preferred"
allocation (i.e. the full set of NUMA nodes set in the affinity bitmask are
required to satisfy the allocation). With this in place, the simplified logic
we had simply kept the merged hint as preferred as well.
However, what we really want is to ensure that the merged hint is only
preferred if true alignment of all resources is possible (i.e. if all hints
in the permutation are preferred AND their affinities are exactly equal).
The only exception to this is if no topology information is provided by a
given hint provider. In this case, we assume alignment doesn't matter and only
consider the resources that actually have hints provided for them.
This changes the semantics of permutations of the form:
To now result in the merged hint of:
Instead of:
This is arguably how it should always have been though (because a hint should
not be preferred if true alignment isn't possible), and two tests have had to
change to accommodate these new semantics.
This new algorithm also ensures that the first merged hint from the set of
generated preferred hints below gets chosen (the old algorithm would
have naively chosen the second because it was narrower).
This PR changes the merge function to implement the updated logic, adds a
test to verify it is functioning correctly, and updates the two tests mentioned
above to adjust to the new semantics.
A follow-up PR updates the logic to ensure that the (now larger) set of
non-preferred hints are prioritized in a less naive fashion: #108154
For example, it will ensure that the first hint below is chosen instead the second
(which the existing naive algorithm will choose):
Does this PR introduce a user-facing change?