-
Notifications
You must be signed in to change notification settings - Fork 6
lsf job finished successfully... workflow still running #239
Comments
Searching kibana-logstash for
This is consistent with the lsf job succeeding, but the workflow is still running. Unfortunately, this bug didn't manifest itself in staging where we have logging of when webhooks are scheduled on the http_worker queue. The only thing I can tell is that the status of the lsf job changed in the lsf service, but the The Perhaps an unknown entity is connecting to production rabbitmq and stealing messages again? |
@davidlmorton -- we talked a bit in person about increasing logging around sending these webhooks. I'd like to document our plan here. We currently log when ptero-workflow receives webhooks sent by ptero-lsf. In the case of this bug, we are not seeing any webhooks received by ptero-workflow from ptero-lsf, even though we expected the We could increase the logging in job.py where the messages are put on the queue. We could also increase the logging in http.py where the messages are received from the queue. I think we should increase the logging level in both places. What do you think about increasing the log level in these two places? |
I agree with the plan, it gets tricky to increase the logging in the http.py since that is common library code. Perhaps we introduce a separate logging level for http altogether and let each service set its main log level separately from its http log level? |
I opened genome/ptero-common#48 to introduce a separate logging level for http.py altogether. |
http://lsf.apipe-deis.gsc.wustl.edu/v1/jobs/b04f3f26-d509-4930-8bbb-37f288fa7f3a
http://workflow.apipe-deis.gsc.wustl.edu/v1/reports/workflow-details?workflow_id=182976
The text was updated successfully, but these errors were encountered: