-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jobs terminated by Worker report as "Running" forever #633
Comments
As RQ is deprecated we won't need to solve for that. We addressed the timeout scenario w/ the Celery workers by implementing soft/hard time limits. But what about if a Celery worker dies/disappears, where are we expecting the call flow to result in the job getting updated by something else? (Such as the connective tissue between the Celery task state and the |
See #765 |
See #1622 as solving this. |
As a follow-up even with the work in #1622, when a hard The only real workaround here is making proper use of Longer term, there may be a way to solve this by use of subtasks when a hard time-limit is detected, but in the current default implementation this is results in the |
@jathanism can we close this because of #3085 and #3084? |
Yep! |
Environment
Steps to Reproduce
admin/background-tasks/
and see that the RQ task associated with the Job was killed, but this information is not captured in the JobResult or otherwise visible to non-admin users.Similar behavior is likely to be seen if a job runs for longer than the configured maximum timeout and is killed by RQ because of that.
I recognize that this specific symptom may be changed as we move to replace RQ with Celery (#531) but these same sorts of scenarios likely need to be accounted for with the Celery worker as well.
Expected Behavior
Nautobot needs to be made aware when a worker task fails or aborts and update the JobResult accordingly. For RQ, the docs have some possible approaches; I'm sure there are similar options for Celery.
Observed Behavior
The text was updated successfully, but these errors were encountered: