-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark abandoned tasks as "failed" #1649
Conversation
Attached issue: https://pulp.plan.io/issues/9247 |
When I trigger a failure, the task status is correct, but in the logs it looks like it's "failing" when it tries to save a null state.
|
Yeah, i can see, where that is coming from. |
319fc7d
to
a62f01a
Compare
It works now, but the logs are still a bit misleading. When the task aborts / disappears, it still prints "Cleaning up and canceling Task ..." when it should say that it's moving to failed. |
a62f01a
to
8535242
Compare
@dralley how about that? |
pulpcore/tasking/pulpcore_worker.py
Outdated
Task.objects.filter(pk=task.pk, state=TASK_STATES.RUNNING).update( | ||
state=TASK_STATES.CANCELING | ||
) | ||
task.refresh_from_db() | ||
if task.state == TASK_STATES.CANCELING: | ||
_logger.info(_("Cleaning up Task %s and marking as %s"), task.pk, final_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the capitalization of Task
consistent with the other messages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manually tested with:
- SIGABRT from within task (
os.abort()
) - SIGKILL (
sudo kill -9 $pid
) - Python exception from within task
- Task cancellation
It works and logs appropriate messages in all situations, thank you
I think we're still missing a small piece of functionality that had been discussed earlier. It logs "Cleaning up X and marking as failed" but there is no message to call out that the worker process disappeared |
8535242
to
8297aa6
Compare
I now see the following:
|
pulpcore/tasking/pulpcore_worker.py
Outdated
Task.objects.filter(pk=task.pk, state=TASK_STATES.RUNNING).update( | ||
state=TASK_STATES.CANCELING | ||
) | ||
task.refresh_from_db() | ||
if task.state == TASK_STATES.CANCELING: | ||
_logger.info(_("Cleaning up task %s and marking as %s"), task.pk, final_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the event of failed task (or code-wise perhaps if reason
), could we have this message print {...} marking as failed. reason: Worker has gone missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like
if reason:
msg = _("Cleaning up task %s and marking as %s. Reason: %s")
args = (task.pk, final_state, reason)
else:
msg = _("Cleaning up task %s and marking as %s.)
args = (task.pk, final_state)
_logger.info(msg, *args)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A clearer log message would be nice.
fixes #9247
8297aa6
to
6da67a5
Compare
@@ -129,26 +129,39 @@ def beat(self): | |||
def notify_workers(self): | |||
self.cursor.execute("NOTIFY pulp_worker_wakeup") | |||
|
|||
def cancel_abandoned_task(self, task): | |||
def cancel_abandoned_task(self, task, final_state, reason=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this method get args and kwargs docs?
"""Cancel and clean up an abandoned task. | ||
|
||
This function must only be called while holding the lock for that task. | ||
This function must only be called while holding the lock for that task. It is a no-op if | ||
the task is neither in "running" nor "canceling" state. | ||
|
||
Return ``True`` if the task was actually canceled, ``False`` otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also follow the napolean docstring format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hand tested this and it worked well. I saw Cleaning up Task b087bd6f-dc8c-45e3-9984-e70b598159f1 and marking as failed
in my logs and the task state was at failed. Thanks @mdellweg !
hey !, { this could be just a display issue but it is really confusing to see, running there. and after the worker goes missing, the pulp worker list endpoint still shows the state=missing workers in worker list, which is pretty weird. will these workers be deleted later on ? or will pulp still try to wait for those workers to be online indefinetly? @dralley, thank you ! |
They're not subtasks, just progress reports. Issue is filed here #3609 I think the workers do get cleaned up automatically after a week or so. |
fixes #9247