Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-26136: Improve handling of crashes in pipetask #66

Merged
merged 5 commits into from Aug 20, 2020

Conversation

andy-slac
Copy link
Collaborator

Replaced use of multiprocessing Pool with manual process management based on Process class. Crashes in sub-processes should be handled correctly now. Exceptions are not propagated to main process, they will appear in the log when subprocess fail, main process will just list all tasks that failed.

Implemented "keep running" as default option, there is new command line option --fail-fast which turns on old behavior.

This is implemented using multiprocesing Pool class, but I need to
switch from Pool to lower-level Process because Pool does not support
stopping long running (timed out) processes (and it is just broken).
Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks easy to follow.

# but multiprocessing does not provide an API for that, for now
# just sleep a little bit and go back to the loop.
if jobs.running():
time.sleep(0.1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't help pondering asyncio in cases like this where we are sleeping and looping.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very proud of this two lines of code, it's stupid and inefficient. We could certainly do fancier things here, in the context of pipetask it probably does not matter very much, but generally I agree we want to make things behaving better. Maybe something for later ticket together with finally getting rid of multiprocessing stuff.

@andy-slac andy-slac merged commit ff154c3 into master Aug 20, 2020
@andy-slac andy-slac deleted the tickets/DM-26136 branch November 11, 2020 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants