Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mp teardown hardening #208

Merged
merged 4 commits into from May 6, 2021
Merged

Mp teardown hardening #208

merged 4 commits into from May 6, 2021

Conversation

goodboy
Copy link
Owner

@goodboy goodboy commented Apr 29, 2021

Might solve some of the complaints in any of:
#84 #89 #134 #146

Still want to test enabling the nested spawning tests again to see if we get any improvements.

@goodboy goodboy requested a review from guilledk April 29, 2021 02:22
Comment on lines 425 to 442
except trio.Cancelled:
# if the reaping task was cancelled we may have hit
# a race where the subproc disconnected before we
# could send it a message to cancel (classic 2 generals)
# in that case, wait shortly then kill the process.
reaping_cancelled = True

if proc.is_alive():
with trio.move_on_after(0.1) as cs:
cs.shield = True
await proc_waiter(proc)

if cs.cancelled_caught:
proc.terminate()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is necessary to end this block with an unconditional raise. Although most cases would work, I suppose some edge cases would break down the line. For example, consider if you needed to directly cancel the top-level nursery of this function, and then check it's cancelled_caught property (similar to the move_on_after in this block). This exception handler would discard the cancel and it would appear to never catch any cancels.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for detailed explanation @richardsheridan, will definitely add!

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The necessary bit is tagged on the last logic block.

proc.terminate()

if not reaping_cancelled:
if proc.is_alive():
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe we can just rely on the proc being dead here?
I guess it would make more sense to poll and then check?

If that works we can probably drop the flag set and check.

Base automatically changed from stream_contexts to master May 4, 2021 14:41
It's clear now that special attention is needed to handle the case where
a spawned `multiprocessing` proc is started but then the parent is
cancelled before the child can connect back; in this case we need to be
sure to kill the near-zombie child asap. This may end up being the
solution to other resiliency issues seen around mp with nested process
trees too. More testing is needed to be sure.

Relates to #84 #89 #134 #146
@goodboy
Copy link
Owner Author

goodboy commented May 6, 2021

@richardsheridan this is more acceptable yah?

@goodboy goodboy merged commit ffd10e1 into master May 6, 2021
@goodboy goodboy deleted the mp_teardown_hardening branch May 6, 2021 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants