-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nil pointer dereference fix #1031
Conversation
- Formats a key composition for checking a map correctly. Was trying to access map using node name as a key when sharing idle by node, but needs to access using <cluster>/<node> - Checks to make sure that the key is present in allocTotalsAfterFilters, as well as allocTotals. I cannot fathom what would cause a key to be missing from this map, but that seems to be the error reported, and I do know that a missing key likely means that the subsequent coefficient we are attempting to compute should simply be 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few clarifying comments about the changes
key = ia.Properties.Node | ||
key = fmt.Sprintf("%s/%s", ia.Properties.Cluster, ia.Properties.Node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this was the cause of the reported error. cluster/node
is used in all of the other key creation in this function if idleByNode
is true. Here, it was just node
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, yes, good catch.
filteredAlloc, ok := allocTotalsAfterFilters[key] | ||
if ok { | ||
cpuFilterCoeff = filteredAlloc.CPUCost / allocTotals[key].CPUCost | ||
} else { | ||
cpuFilterCoeff = 0.0 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unable to create a situation where allocTotals
had a key for an idle allocation that allocTotalsAfterFilters
did not. Planning to ask Niko about it when he gets back, because maybe it's possible.
In any case, the initial stack trace we saw lead us to believe that key
may be present in allocTotals
but not allocTotalsAfterFilters
. If that is the case, then I believe the allocation in question would have been filtered out, and the correct action is to set this ratio to 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to talk about this one if you want to. In general, the fix looks good 😎
if allocTotals[key].RAMCost > 0.0 { | ||
gpuFilterCoeff = allocTotalsAfterFilters[key].RAMCost / allocTotals[key].RAMCost | ||
if allocTotals[key].GPUCost > 0.0 { | ||
filteredAlloc, ok := allocTotalsAfterFilters[key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gpuCoeff was being determined using RAMKeys, which I think is a typo or a copy/paste miss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow good catch! Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the overview in review comments, it helped a lot.
Thanks for the review, @michaelmdresser ! @AjayTripathy this PR is against |
What does this PR change?
Fixes a comment typo.
Causes GPU idle coefficient to be calculated with GPU costs instead of RAM costs.
Formats a key composition for checking a map correctly. Was trying to
access map using node name as a key when sharing idle by node, but
needs to access using <cluster>/<node>.
Checks to make sure that the key is present in
allocTotalsAfterFilters, as well as allocTotals. I cannot fathom what
would cause a key to be missing from this map, but that seems to be the
error reported, and I do know that a missing key likely means that the
subsequent coefficient we are attempting to compute should simply be 0.
Does this PR rely on any other PRs?
No
How does this PR impact users? (This is the kind of thing that goes in release notes!)
Fixes a bug where requesting an allocation summary with
shareIdle=false&idleByNode=true
would crash the pod.Also guards against another potential nil pointer dereference in the case where two maps are desynced.
Links to Issues or ZD tickets this PR addresses or fixes
N/A
How was this PR tested?
An integration test is being written in our integration test suite.
Additionally, manually by querying
/allocation/summary
with different combinations of filters, shareIdle, and idleByNode options. Notably:shareIdle
andidleByNode
true/false valuesHave you made an update to documentation?
No