Skip to content
This repository has been archived by the owner on Jun 11, 2022. It is now read-only.

Fix CPU usage on macOS wait_until. #200

Closed

Conversation

toonetown
Copy link
Contributor

Related (I believe) to boostorg/process#55

With the fix from #197, multiple processes can still at times be spawned, and they take extremely high CPU usage. This PR basically uses the same approach used for boost 1.68.0 (which was the last time wait_for seemed to work reliably) and guards it for macOS compilations only.

@klemens-morgenstern
Copy link
Owner

klemens-morgenstern commented Apr 26, 2019

Right this is what I originally did. Doesn't that cause the loop to run at a 100% CPU speed because WNOHANG causes waitpid to return immediately?

@toonetown
Copy link
Contributor Author

Reading the man page makes it sound like it would...but it doesn’t seem to...the alternative version (in #else) actually takes much more CPU.

Maybe some kind of extremely short (millisecond or even less) sleep in the if (ret == 0) block...maybe after the if (timed_out) return false? I don’t know what side effects that would have, though.

The change in this PR is exactly what was being done in 1.68.0 (and possibly earlier), just enabled on macOS only. It doesn’t seem that there were any adverse effects from it being that way before the change.

@toonetown
Copy link
Contributor Author

Looks like my code for the wait_group.hpp is incorrect. Will look at that.

@toonetown
Copy link
Contributor Author

Here are some more details of what I'm seeing. This is the example code:

int main(int /*argc*/, const char* /*argv*/[]) {
    auto c = boost::process::child("/bin/sleep", "60");
    c.wait_for(std::chrono::seconds(120));
    return 0;
}

Prior to #197, on macOS, this would spawn a new process (which would take 100% of one cpu), and when the main process exited, the spawned process didn't go away (and still too 100% cpu).

After #197, this would spawn a new process (which still takes 100% of one cpu), and when the main process exited, the spawned process would usually go away (though I have seen times when it appears it doesn't go away...appears to be a timing issue)

With this PR, a new process is not spawned - but the original process still takes 100% of one cpu (as you mentioned). But at least you're never stuck with 100%-consuming zombie processes.

Given that - I think the best approach is to avoid spawning a new process altogether and instead trying to put a well-timed sleep in this block:

if (ret == 0)
{
    timed_out = Clock::now() >= time_out;
    if (timed_out)
        return false;
    // PUT SLEEP HERE
}

It does look like putting a sleep in there will drop CPU usage drastically. Initial tests show (on my machine):

  • No Sleep: 99.9% CPU
  • 1 nanosecond: ~33% CPU
  • 1 microsecond: ~66% CPU (weird)
  • 1 millisecond: ~3% CPU
  • 1 second: 0% CPU

I think that 1 second is much too large. I don't know why microseconds were higher than nanoseconds...but in multiple tests those numbers were fairly reproducible (perhaps it's a function of how I'm doing the sleep or something).

I would almost say that if you are spawning a new process (as boost::process is intended to do), that 1 millisecond granularity is probably acceptable...there is quite a bit of overhead in the system just to spawn and launch the process. I also think that 3% CPU usage while waiting is an acceptable amount as well. However - I would like to get some feedback from others.

Also - I'm considering removing the "Apple-specific" section. This seems like a less fragile approach for any time that ::sigtimedwait is not available.

Thoughts/opinions? I can update this PR with some of those, and I ask for feedback.

@codecov
Copy link

codecov bot commented Apr 26, 2019

Codecov Report

❗ No coverage uploaded for pull request base (develop@6a4d2ff). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             develop     #200   +/-   ##
==========================================
  Coverage           ?   87.02%           
==========================================
  Files              ?      106           
  Lines              ?     2951           
  Branches           ?        0           
==========================================
  Hits               ?     2568           
  Misses             ?      383           
  Partials           ?        0
Impacted Files Coverage Δ
...clude/boost/process/detail/posix/wait_for_exit.hpp 81.08% <ø> (ø)
include/boost/process/detail/posix/wait_group.hpp 82.22% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6a4d2ff...916c96e. Read the comment docs.

klemens-morgenstern added a commit that referenced this pull request May 7, 2019
Check if it solves #200
@klemens-morgenstern
Copy link
Owner

Hey Nathan, can you try out the current commit on the develop branch? I restructured the way wait_for is implemented, but can only test on an OSX VM.

@toonetown
Copy link
Contributor Author

Hey Nathan, can you try out the current commit on the develop branch? I restructured the way wait_for is implemented, but can only test on an OSX VM.

Yes - I will try it out. It might be a bit later this week before I'm able to do so, however.

@toonetown
Copy link
Contributor Author

Sorry for the long delay (got pulled off on a bunch of other stuff). I just tested this with 1.71.0, and it is still happening for the wait_for_exit case (the wait_group seems to be working). I will update the PR with how I'm able to get it fixed for macOS.

Related (I believe) to boostorg/process#55

With the fix from klemens-morgenstern#197, multiple processes can still at times be spawned, and they take extremely high CPU usage. This PR basically uses the same approach used for boost 1.68.0 (which was the last time wait_for seemed to work reliably) but adds a 1ms sleep to each iteration of the loop so as not to spike the CPU.
@toonetown
Copy link
Contributor Author

Still happening with 1.72 (no changes to boost::process between 1.71 and 1.72) - rebased pull request onto latest develop branch.

@klemens-morgenstern
Copy link
Owner

Hi Nathan, thanks for bearing with me here. Can you check the current develop branch?

The problem with your solution is the race condition. If the waiting thread sleeps for a second another thread might wait for and reap the child. Otherwise I'd be happy to merge.

@toonetown
Copy link
Contributor Author

Yeah - still having the same issues with the latest develop. However, I understand the reasoning for not wanting to merge in this PR.

@klemens-morgenstern
Copy link
Owner

Heya Nathan, I am closing this, future issues shall be at boostorg/process.

The whole wait_for / wait_until was a design mistake, since posix apis just don't support it like this.
In v2 you can use async_wait with a timeout using asio::parallel_group.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants