-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ui: explain why alloc
is failing
#17213
Conversation
case this.task.code.errors.length > 0: | ||
return "there was an error in the task or service's code."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this prevent reschedule, or simply throw an error but otherwise continue on? Asking because I want to make sure that we're giving the right cause when we say "stopped .... because of ________"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a derived state computation to show a reason why an allocation
has stopped rescheduling to be shown in the allocations.allocation.index
view. This is only shown when the current allocation (the one that's associated with the view) has stopped rescheduling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the invocation in the template layer for usage details and you'll see that this only is applied for the top level allocation on the reschedule timeline because the rest of the items in the timeline are RescheduleEventRow
which only provides links and no text.
Ember Asset Size actionAs of e5116aa Files that got Bigger 🚨:
Files that stayed the same size 🤷:
|
'task.{config.errors.length,code.errors.length}' | ||
) | ||
get failureReason() { | ||
switch (this.hasStoppedRescheduling) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my context, this field value we're reading is from this function, right?:
get hasStoppedRescheduling() {
return (
!this.get('nextAllocation.content') &&
!this.get('followUpEvaluation.content') &&
this.clientStatus === 'failed'
);
}
If so, it doesn't seem like this field is relevant to determining why the allocation is failing, where you only care about client status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmmm.... good call.
) | ||
get failureReason() { | ||
switch (this.hasStoppedRescheduling) { | ||
case this.node.status === 'failed': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no "failed" node status. See structs.go#L1866-L1871
case this.resources.length === 0: | ||
return 'the resources that the allocation was scheduled on were not available.'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the allocation has no resources, it wouldn't have been placed in the first place, right? I don't think we can actually see this case.
case this.task.config.errors.length > 0: | ||
return "the task or service's configuration was incorrect."; | ||
case this.task.code.errors.length > 0: | ||
return "there was an error in the task or service's code."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we know this -- we only know that the task returned errors. It could be a bug in the user's code, an infrastructure outage, networking failure, etc, etc and all of that is owned by the application (not Nomad). It'd probably be handy to link to where we show the task events here though.
Ember Test Audit comparison
|
@ChaiWithJai this doesn't seem like it actually solves the problem described in #16942. The description you have here (and the code) tells us why the allocation failed, not why the scheduler stopped scheduling it. |
Yeah... looks I conflated the concepts. There doesn't appear to be a solution on this problem. There's no follow up evaluation to link for the evaluations view and we can't predictably compute derived state to give the user a better explanation. |
Resolves #16942
This PR creates a new derived state property on the
Allocation
model to show why an allocation has stopped rescheduling and updates the template to provide a better reason about why the allocation has stopped rescheduling and links to the follow up evaluation to enable the user to debug what's happening.