fix(run_command): fix subprocess timeout recovery #179

ketiltrout · 2024-05-15T19:03:54Z

Normally, when a transfer times out, it's because it's stuck deep down in I/O wait, meaning SIGKILL
is not going to be handled in a timely manner, and, so, trying to .communicate() with it again is just futile.

With the current code, after a timeout, the worker just gets stuck in the same wait it tried to get out of with the kill, as if there were no timeout given.

So, let's just send it a kill and then abandon it.

Related to #175 (abandonning subprocesses may lead to zombies), but this doesn't really address the items brought up in that issue, which are higher-level ideas.

I have deployed this to cedar.

Normally, when a transfer times out, it's because it's stuck deep down in I/O wait, meaning SIGKILL is not going to be handled in a timely manner, and, so, trying to `.communicate()` with it again is just futile. With the current code, after a timeout, the worker just gets stuck in the same wait it tried to get out of with the kill, as if there were no timeout given. So, let's just send it a kill and then abandon it. Related to #175 (abandonning subprocesses may lead to zombies), but this doesn't really address the items brought up in that issue, which are higher- level ideas.

rikvl

Ok.

I think (the newest version of) black wants to add empty lines and such in a bunch of files

ketiltrout · 2024-05-15T19:25:28Z

It was the semicolons! (I think I've been spending too much time in perl recently).

ketiltrout requested review from ljgray and rikvl May 15, 2024 19:03

rikvl approved these changes May 15, 2024

View reviewed changes

remove the perl that was leaking in

1a7d9af

ljgray approved these changes May 15, 2024

View reviewed changes

ketiltrout merged commit 799dfc9 into master May 15, 2024
3 checks passed

ketiltrout deleted the subprocess_timeout_recovery branch May 15, 2024 19:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(run_command): fix subprocess timeout recovery #179

fix(run_command): fix subprocess timeout recovery #179

ketiltrout commented May 15, 2024

rikvl left a comment

ketiltrout commented May 15, 2024

fix(run_command): fix subprocess timeout recovery #179

fix(run_command): fix subprocess timeout recovery #179

Conversation

ketiltrout commented May 15, 2024

rikvl left a comment

Choose a reason for hiding this comment

ketiltrout commented May 15, 2024