Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: Use TaskStatus Err field for non-terminal errors #2287

Merged
merged 1 commit into from
Oct 9, 2017

Conversation

aaronlehmann
Copy link
Collaborator

There are some cases when a task can't advance from a particular state because preconditions are not met. For example, if no nodes meet its constraints, it will not advance to "assigned".

Currently, we put a note about this in the Message field of TaskStatus, but this is not surfaced to the user. It wouldn't make sense to expose Message prominently because it usually contains uninteresting notes about how the task reached that state. Its presence does not indicate that something is wrong.

Expand the scope of Err to also cover non-terminal errors that are blocking the task from progressing, and use it in those cases.

Ideally we would also use this to surface IP allocation failures that block a task in "new", but the current design of the allocator would loop if we updated a task on failure. This might be something for a followup.

cc @aluzzardi @stevvooe

There are some cases when a task can't advance from a particular state
because preconditions are not met. For example, if no nodes meet its
constraints, it will not advance to "assigned".

Currently, we put a note about this in the Message field of TaskStatus,
but this is not surfaced to the user. It wouldn't make sense to expose
Message prominently because it usually contains uninteresting notes
about how the task reached that state. Its presence does not indicate
that something is wrong.

Expand the scope of Err to also cover non-terminal errors that are
blocking the task from progressing, and use it in those cases.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@codecov
Copy link

codecov bot commented Jun 27, 2017

Codecov Report

Merging #2287 into master will increase coverage by 0.03%.
The diff coverage is 92.85%.

@@            Coverage Diff            @@
##           master   #2287      +/-   ##
=========================================
+ Coverage   61.06%   61.1%   +0.03%     
=========================================
  Files         128     128              
  Lines       20532   20538       +6     
=========================================
+ Hits        12538   12549      +11     
+ Misses       6611    6602       -9     
- Partials     1383    1387       +4

@stevvooe
Copy link
Contributor

@aaronlehmann Why don't consumers just print the message field correctly?

@aaronlehmann
Copy link
Collaborator Author

It is unclear when the Message field contains actionable information.

@stevvooe
Copy link
Contributor

Err is meant to only be actionable when the task is in the error state. I understand that this intends to change that, but the usage of Err is quite clear in that it is valid when in an error state, which is strictly defined as the error states.

The use case that is being described here is exactly what the message field is designed for:

// Message reports a message for the task status. This should provide a
// human readable message that can point to how the task actually arrived
// at a current state.

Do we need something to indicate that a task is stalled?

@aluzzardi
Copy link
Member

In this case, tasks are stalled indefinitely - for instance not all constraints were met or we reached network exhaustion.

The operator needs to have this information surfaced somehow since state convergence cannot go forward until action is taken.

Currently, Message contains a bunch of trivial information (e.g. "the task is starting") which doesn't make it suitable for carrying important messages.

Long story short, this needs to be blinking red in some ops' dashboard.

@stevvooe
Copy link
Contributor

@aluzzardi I get what you are trying to do, but I am asking if there is a better way to do it. Overloading the Err field will have the same problems. As of now, its contents are undefined when not in an error state. Expanding the scope of that field is ambiguous.

If a task is stalled, do we have an indicator of that state? For example, a boolean field, indicating that the task is stalled and to check the message field would make sense here.

@wsong
Copy link

wsong commented Sep 1, 2017

What is the status of this PR? Are we planning on moving this forward? It would be very helpful for users to be able to see when their tasks are in a state where the user must take action for them to be scheduled (e.g. network address exhaustion).

@stevvooe
Copy link
Contributor

stevvooe commented Sep 1, 2017

LGTM

Looking at this again, I think we can go forward with it. My concerns are little abstract, but I think that this will work.

@aluzzardi How should we move this forward? It looks like it will merge cleanly.

Copy link
Contributor

@nishanttotla nishanttotla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -159,7 +159,8 @@ loop:
// restarting the task on another node
// (if applicable).
t.Status.State = api.TaskStateRejected
t.Status.Message = "assigned node no longer meets constraints"
t.Status.Message = "task rejected by constraint enforcer"
t.Status.Err = "assigned node no longer meets constraints"
Copy link
Contributor

@anshulpundir anshulpundir Sep 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the Message/Err should be swapped here. The error on the task should be related to the task, so "task rejected by constraint enforcer" seems more relevant.

Also, does it make sense to be more specific around what constraints were failed ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that we don't display Err on the command line interface. Message is used to provided better reporting to the user.

@anshulpundir
Copy link
Contributor

anshulpundir commented Sep 29, 2017

Looks good, one question/comment, which can probably be addressed later if we need to merge it right away @nishanttotla

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants