New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dra: resourceclass missing #120213
dra: resourceclass missing #120213
Conversation
Because of a misplaced `append` (should have been inside if clause, not after it), some handler from a previous loop iteration was added again. This was harmless because the resulting slice was only used for waiting for cache sync, but should better get fixed anyway.
/assign @sanposhiho |
@@ -272,6 +272,8 @@ func (pl *dynamicResources) EventsToRegister() []framework.ClusterEventWithHint | |||
// A resource might depend on node labels for topology filtering. | |||
// A new or updated node may make pods schedulable. | |||
{Event: framework.ClusterEvent{Resource: framework.Node, ActionType: framework.Add | framework.UpdateNodeLabel}}, | |||
// A pod might be waiting for a class to get created or modified. | |||
{Event: framework.ClusterEvent{Resource: framework.ResourceClass, ActionType: framework.Add | framework.Update}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which scenario rejected Pods should be moved from unsched to backoffQ/activeQ via update events? I'm wondering if it's OK to have only Add
event here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The node filter in the class might change. I have a hunch that this then leads to an unschedulable pod (no suitable nodes at all) which might be stuck without this update event, already now (i.e. a bug).
Let me write a test for this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's as I suspected: the new test hangs when the Update
event is not registered.
131c818
to
a479431
Compare
/triage accepted |
a479431
to
16d3f2a
Compare
When filtering fails because a ResourceClass is missing, we can treat the pod as "unschedulable" as long as we then also register a cluster event that wakes up the pod. This is more efficient than periodically retrying.
16d3f2a
to
c682d2b
Compare
@sanposhiho: can you take another look? The PR should be complete now and I fixed the gofmt issue. |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/retest
LGTM label has been added. Git tree hash: 2179a2b696669f856adf43eb79f16dab6182acba
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pohly, sanposhiho The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest-required |
/retest |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
When filtering fails because a ResourceClass is missing, we can treat the pod
as "unschedulable" as long as we then also register a cluster event that wakes
up the pod. This is more efficient than periodically retrying.
Special notes for your reviewer:
Includes one other drive-by fix for event registration.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: