New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workflow_run_manager: workflow stop race condition #310
workflow_run_manager: workflow stop race condition #310
Conversation
|
c87f38b
to
1cb66c2
Compare
1cb66c2
to
8495199
Compare
I haven't fully addressed the state machine but I've recorded the knowledge summarised in reanahub/reana-client#192 (comment) (referenced in #149) inside a function in a separate module prepared for a possible future refactor for status transition management. |
8495199
to
568a933
Compare
* Happening when user stops a workflow before its first job is created. RWC stops only the existing jobs. However, because there is a grace period for stopping pods, RJC sidecar still runs, submitting a new job, and reporting its status, causing the workflow to “revive”. To mitigate this, we decrease the grace period to 0 and we don't allow stopped workflows to change status (closes reanahub/reana-client#395).
568a933
to
8ba38e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested successfully on my machine 👍
Something to take into account for future improvements, several trials to switch to a forbidden status are produced. See the job-status-consumer logs:
2020-04-10 08:55:57,554 | kombu.mixins | MainThread | INFO | Connected to amqp://test:**@reana-message-broker:5672//
2020-04-10 08:57:44,143 | root | MainThread | ERROR | Cannot transition workflow d4e80dbd-80b4-4725-a415-60708a8aa896 from status WorkflowStatus.stopped to WorkflowStatus.running.
2020-04-10 08:57:44,291 | root | MainThread | ERROR | Cannot transition workflow d4e80dbd-80b4-4725-a415-60708a8aa896 from status WorkflowStatus.stopped to WorkflowStatus.running.
2020-04-10 08:57:50,325 | root | MainThread | ERROR | Cannot transition workflow d4e80dbd-80b4-4725-a415-60708a8aa896 from status WorkflowStatus.stopped to WorkflowStatus.running.
2020-04-10 08:57:50,431 | root | MainThread | ERROR | Cannot transition workflow d4e80dbd-80b4-4725-a415-60708a8aa896 from status WorkflowStatus.stopped to WorkflowStatus.running.
2020-04-10 08:57:59,471 | root | MainThread | ERROR | Cannot transition workflow d4e80dbd-80b4-4725-a415-60708a8aa896 from status WorkflowStatus.stopped to WorkflowStatus.finished.
RWC stops only the existing jobs. However, because there is a grace
period for stopping pods, RJC sidecar still runs, submitting a new
job, and reporting its status, causing the workflow to “revive”
(closes cli: stop command misbehaviour reana-client#395).