Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any method can get all flow jobs? #142

Closed
lvscup opened this issue Apr 18, 2019 · 5 comments
Closed

Any method can get all flow jobs? #142

lvscup opened this issue Apr 18, 2019 · 5 comments

Comments

@lvscup
Copy link

lvscup commented Apr 18, 2019

Hello,
I design a flow, many jobs has joined the flow queue. Now, I want to get all flow jobs.
Question 1:
If I don't store all dispatcher ids after "run_flow", any method can list the dispatcher ids directly?
Question 2:
At one point, job may be in one of four status(Queued/Running/Finished/Failed). Tracing can monitor jobs that have started(Running/Finished/Failed), but any simple method list jobs that still stay in queue?
assume redis as broker.

@fridex
Copy link
Member

fridex commented Apr 18, 2019

Question 1:
If I don't store all dispatcher ids after "run_flow", any method can list the dispatcher ids directly?

If you do not explicitly store these ids, there is no way how to retrieve them as both worker and flow/task producer are stateless. State is kept only on the queue (so by picking messages from the queue, you are able to obtain these ids or by explicitly storing them).

Question 2:
At one point, job may be in one of four status(Queued/Running/Finished/Failed). Tracing can monitor jobs that have started(Running/Finished/Failed), but any simple method list jobs that still stay in queue?
assume redis as broker.

Explicitly storing dispatcher ids on producer side is one option (and matching it against the tracing output). Another option is to use redis and retrieve states of tasks. This option is however too low level to me, if there will be any change in Celery, this solution can break.

I don't know your use case, but it sounds to me it could be a good idea in your system to introduce "monitoring" entity which would track jobs from producer side (a flow has been scheduled) and then create a tracing function which will notify the monitoring entity about the flow state (failure/runnig/finished/failed). If the solution would be generic enough, I'm happy to accept it in Selinon ecosystem :). It can be a simple REST API with some database for persistence. A UI can be then build to expose status of flows/tasks.

@lvscup
Copy link
Author

lvscup commented Apr 22, 2019

Yes,I will track jobs from producer side as you said. If Selinon supports it later, I will be happy.

But another question arises: dispatcher_id can't assign before invoke "run_flow", if I use the dispatcher_id returned by run_flow method , then store it, the state of entity may be incorrect.
For example, t1: invoke run_flow, t2: event=FLOW_START(update state, but dispatcher_id hasn't stored yet). Because the call is asynchronous.

Celery apply_async method accepts the param 'task_id', before invoke apply_async method, init task_id as dispathcer_id, e.g. Dispatcher().apply_async(kwargs=, queue=,task_id=). if Selinon can support it, the problem of incorrect state will not exist.

@fridex
Copy link
Member

fridex commented Apr 23, 2019

For example, t1: invoke run_flow, t2: event=FLOW_START(update state, but dispatcher_id hasn't stored yet). Because the call is asynchronous.

Once you call run_flow, the flow will be scheduled (it hasn't been started yet) and once a worker picks dispatcher task (task responsible for scheduling the actual workload based on configuration), there will be emitted FLOW_START. Maybe it would worth it to add FLOW_SCHEDULED event to track also flow scheduling see https://selinon.readthedocs.io/en/latest/selinon/doc/selinon.trace.html#module-selinon.trace I'm open for PRs or I can do it if you want.

Anyway, if you call run_flow and there is no exception, you can expect the flow is scheduled for execution. I don't see how the task_id argument would help here.

@lvscup
Copy link
Author

lvscup commented Apr 24, 2019

It will be great if Selinon has FLOW_SCHEDULED event. Use tracing function can track the life cycle of jobs. Then the issue about task_id can be ignored. Expect the new event. Thank you.

@fridex
Copy link
Member

fridex commented May 9, 2019

I'm closing this one, considering it as resolved. Feel free to reopen/post new issue in case of more questions.

@fridex fridex closed this as completed May 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants