Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more optimal wait for finish of all jobs #95

Open
pkopta opened this issue Jan 12, 2021 · 1 comment
Open

more optimal wait for finish of all jobs #95

pkopta opened this issue Jan 12, 2021 · 1 comment

Comments

@pkopta
Copy link

pkopta commented Jan 12, 2021

Currently if client submit 1k jobs, he asks about 1k job statuses. Instead there should be info returned from QCG-PJ if all submited jobs finished.

@LourensVeen
Copy link

I think this was partially addressed in 63888cb, but I'm running into some problems with it.

First, if a job fails to start because the description is invalid, then manager.Manager will move it into the FAILED state, rather than submitting it to the Executor. That makes sense, but in this case no NO_JOBS event is generated even if the invalid job is the last one, which can cause api.manager.Manager.wait4all() to wait for such a message forever.

Second, there's a race condition in api.manager.Manager.wait4all(). If the last job was valid and finished, and its status and NO_JOBS messages have been queued, then the AllJobsFinished check in api.manager.Manager.wait4all() will return True and we return from the function immediately. However, it's possible that the JST/JFI/NO_JOBS have not yet been received. As a result, on the next call to wait4all() if a job is running then AllJobsFinished will return False, but then those previous messages will be received by the poller and wait4all() returns while there are still running jobs.

This turned out to be a bit tricky to fix but I think I have something that works. See the explanation in #152.

An even better solution would be to replace the synchronous status request with AllJobsFinished with a request that, on the server side, checks whether there are any active jobs, and if not posts a NO_JOBS message to the event queue. Then the client side can simply call that and then process events until it runs into a NO_JOBS one. But that requires a change in the protocol, and it will mean requiring a Poller rather than having it be optional, so I stopped short of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants