Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow invocations delayed #8088

Open
hexylena opened this issue Jun 4, 2019 · 2 comments
Open

Workflow invocations delayed #8088

hexylena opened this issue Jun 4, 2019 · 2 comments

Comments

@hexylena
Copy link
Member

hexylena commented Jun 4, 2019

Talking with @natefoo in private it was decided maybe we should open an issue for a conversation about workflow invocation, our investigations, etc.

$ journalctl -u galaxy-handler@1 --since '5 days ago' | grep 'invocation 60463 delayed' | wc -l
55270
$ journalctl -u galaxy-handler@1 --since '5 days ago' | grep 'invocation 60463 delayed' | head -n 1
May 28 14:03:32 ...

Most steps were in new, only a couple scheduled, and only one with a job ID. I just kinda assumed if job was scheduled it would have an ID? I guess not?

galaxy=> select * from workflow_invocation_step where workflow_invocation_id = 60463;
   id    |        create_time         |        update_time         | workflow_invocation_id | workflow_step_id | job_id  | action | implicit_collection_jobs_id |   state
---------+----------------------------+----------------------------+------------------------+------------------+---------+--------+-----------------------------+-----------
 3627014 | 2019-05-28 12:03:32.567777 | 2019-05-28 12:03:32.567791 |                  60463 |           467011 |         |        |                             | new
 3627013 | 2019-05-28 12:03:32.566865 | 2019-05-28 12:03:32.566879 |                  60463 |           467009 |         |        |                             | new
 3627012 | 2019-05-28 12:03:32.565943 | 2019-05-28 12:03:32.565958 |                  60463 |           467014 |         |        |                             | new
 3627011 | 2019-05-28 12:03:32.565046 | 2019-05-28 12:03:32.565066 |                  60463 |           467013 |         |        |                             | new
 3627010 | 2019-05-28 12:03:32.563842 | 2019-05-28 12:03:32.563859 |                  60463 |           467012 |         |        |                             | new
 3627009 | 2019-05-28 12:03:32.262526 | 2019-05-28 12:03:32.563089 |                  60463 |           467007 | 5360258 |        |                             | scheduled
 3627008 | 2019-05-28 12:03:32.261826 | 2019-05-28 12:03:32.26184  |                  60463 |           467010 |         |        |                             | scheduled
 3627007 | 2019-05-28 12:03:32.260585 | 2019-05-28 12:03:32.260618 |                  60463 |           467008 |         |        |                             | scheduled
(8 rows)
$ journalctl -u galaxy-handler@* --since '5 days ago' | grep 'outputs of invocation' | awk '{print $16}' | sort | uniq -c
   1822 60403
  10926 60404
   1811 60405
  12698 60406
  55455 60463
  92047 60502

Logs are full of these messages for some recent workflows and isn't clear that they will ever complete.

@natefoo
Copy link
Member

natefoo commented Jun 11, 2019

I've got two of these right now, in one case the user deleted a job and the remaining jobs that had been created were paused. In another, a job ended in error and the downstream jobs were set to paused.

Looping these invocations seems to consume a huge amount of memory.

Screen Shot 2019-06-11 at 10 57 58 AM

@hexylena
Copy link
Member Author

Nice graph. Adding procstat now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants