cluster - worker shutdown - use WNOHANG with nil return tests #1741

MSP-Greg · 2019-03-12T03:06:01Z

Ruby 2.6 introduced a change that affects worker shutdown. I believe the change occurred between 'commits' r64316 and r64376 of ruby trunk.

Added code using Process::WNOHANG along with needed logic. Adds worker status (via $?) and total shutdown time to log.
Logged status matches status returned by current code (exit 0) for Ruby 2.5.x and lower.

Co-authored-by: MSP-Greg greg.mpls@gmail.com
Co-authored-by: guilleiguaran guilleiguaran@gmail.com

MSP-Greg · 2019-03-12T04:37:23Z

force-pushed change - cleaned up sleep logic

guilleiguaran · 2019-03-12T15:04:54Z

Thanks for this!

I prefer this version over #1738 since I appreciate the extra logs but I'll keep #1738 open just in case if the Puma authors don't want the extra logging.

MSP-Greg · 2019-03-13T15:01:20Z

@evanphx

See #1674 (comment)

As to the code I added, it's replacing line 40 of:

puma/lib/puma/cluster.rb

Lines 35 to 44 in ca03c52

    
           def stop_workers 
        
             log "- Gracefully shutting down workers..." 
        
             @workers.each { |x| x.term } 
        
             begin 
        
               @workers.each { |w| Process.waitpid(w.pid) } 
        
             rescue Interrupt 
        
               log "! Cancelled waiting for workers" 
        
             end 
        
           end

Line 40 did not change the @workers array, so my code does not.
Since Process.waitpid(w_pid, Process::WNOHANG) is essentially non-blocking on a running process, I wanted the code to loop thru the whole @workers array, as the original blocked on a running process. Also, I added code to not add a 'sleep' if the all processes were shutdown.
I can remove the logging, but it might be helpful to see the 'shutdown' time?

AlexWayfer · 2019-03-15T19:34:09Z

lib/puma/cluster.rb

+            end
+          end
+          sleep 0.5 if wait
+        end


There are two cycles instead of one.

You can use redo.

See #1738 (comment)

@AlexWayfer

Thanks. As mentioned, I did it this way so it would loop thru all the workers on first pass, then go back for the nil returns. I'd rather use Process.waitpid(-1, Process::WNOHANG), as that's a bit cleaner, but it seems to repeatedly fail on trunk JIT, I never tried with 2.6.x JIT...

As mentioned, I did it this way so it would loop thru all the workers on first pass, then go back for the nil returns.

It's strange algorithm.

Option 1, with redo:

Go through all workers

Sleep and redo on the first fail (wait every first fail)

Option 2:

Go through all workers and attempt to stop them

Delete from array in success, wait after all if any failed

Retry if there are any remained

Option 3, your choice, as I see:

Go through all workers

Delete in success, and retry (go further)

Don't do anything in fail, just sleep in the end

Retry until there are any

I think, the first option is simplest (but may be longest), the second is most optimal, and the third is strange.

I changed the logic to use reject!

I changed the logic to use reject!

You can use loop do instead of while true do. 😉

I often look here:
https://msp-greg.github.io/ruby_trunk/file.control_expressions.html

loop do isn't mentioned there. Time to review kernel methods again...

Also, about two cycles (and map): do we need for @workers after successful run of this code?

If no — we can use @workers.reject!.

evanphx · 2019-03-18T22:36:23Z

I don't see why we need this. Is Process.wait without WNOHANG fundementally broken on ruby now? This PR just implements a busy polling loop which is unnecessary if a blocking Process.wait works the way it should, which is that it waits until a child is finished.

MSP-Greg · 2019-03-18T22:59:48Z

Is Process.wait without WNOHANG fundamentally broken on ruby now?

As I think I've mentioned elsewhere, is it a bug or a breaking change? I tried to create a small repo of it, but being a windows type, I can't fork locally. Couldn't find a way to repo in Windows.

There are a few Ruby tests with waitpid & waitpid2, and also a few specs. I believe these have continued to pass even when Puma started having issues last year with trunk.

busy polling loop

I tried to make it as 'light' as possible, with the minimum number of waitpid calls.

guilleiguaran · 2019-03-18T23:04:02Z

@evanphx indeed, it seems to be caused by ruby/ruby@48b6bd7 but doesn't look like it will be fixed anytime soon since the Ruby team isn't sure about how to fix it 😞

Edit: The related issue is https://bugs.ruby-lang.org/issues/15499, but it was closed as resolved even when someone pointed that the patch didn't solve the problem with Puma.

evanphx · 2019-03-18T23:18:53Z

This is absolutely a bug in ruby, not a behavior change. Looks like they attempted to fix it here: ruby/ruby@9e66910

MSP-Greg · 2019-03-18T23:51:39Z

I could add a conditional on RUBY_VERSION and/or comments about the code being needed for a bug in Ruby 2.6+.

See #1674 (comment)

evanphx · 2019-03-18T23:54:14Z

I'm pretty annoyed that ruby-core has broken a fundamental and simple unix function and hasn't fixed it.

Conditionalizing it on ruby version is fine.

MSP-Greg · 2019-03-19T00:02:47Z

Okay with a string (RUBY_VERSION < '2.6'), which will fail on Ruby 2.10?

Ruby 2.6 introduced a bug that affects worker shutdown (waitpid). Added code using Process::WNOHANG along with needed logic. Adds worker status (via $?) and total shutdown time to log. Co-authored-by: MSP-Greg <greg.mpls@gmail.com> Co-authored-by: guilleiguaran <guilleiguaran@gmail.com>

JRuby 9.2.6.0 may have intermittent test failures, leave for now Co-authored-by: MSP-Greg <greg.mpls@gmail.com> Co-authored-by: guilleiguaran <guilleiguaran@gmail.com>

MSP-Greg · 2019-03-19T00:28:48Z

Done

Linking them all with this commit. puma#1741 puma#1755 puma#1674 puma#1730 puma#1720

Linking them all with this commit. #1741 #1755 #1674 #1730 #1720

MSP-Greg force-pushed the wait-issue branch from 0fcfe89 to bee3e9b Compare March 12, 2019 04:36

MSP-Greg mentioned this pull request Mar 12, 2019

Use Process::WNOHANG flag when stoping child workers processes. #1738

Closed

guilleiguaran approved these changes Mar 12, 2019

View reviewed changes

MSP-Greg force-pushed the wait-issue branch from bee3e9b to 77ee459 Compare March 12, 2019 15:31

MSP-Greg mentioned this pull request Mar 12, 2019

Puma hangs on shutting down workers when received SIGINT #1674

Closed

dentarg added a commit to dentarg/gists that referenced this pull request Mar 13, 2019

test puma/puma#1741

7e6699e

MSP-Greg force-pushed the wait-issue branch 2 times, most recently from 664c877 to d294d63 Compare March 15, 2019 13:42

AlexWayfer reviewed Mar 15, 2019

View reviewed changes

MSP-Greg force-pushed the wait-issue branch 2 times, most recently from 726fe01 to 54e8030 Compare March 16, 2019 00:27

guilleiguaran approved these changes Mar 16, 2019

View reviewed changes

MSP-Greg and others added 2 commits March 18, 2019 19:09

travis.yml - update to 2.5.5 & 2.6.2, remove 2.6.2 from allow failure

d1ac2bf

JRuby 9.2.6.0 may have intermittent test failures, leave for now Co-authored-by: MSP-Greg <greg.mpls@gmail.com> Co-authored-by: guilleiguaran <guilleiguaran@gmail.com>

MSP-Greg force-pushed the wait-issue branch from 54e8030 to d1ac2bf Compare March 19, 2019 00:11

evanphx merged commit 821905c into puma:master Mar 19, 2019

AlexWayfer mentioned this pull request Mar 19, 2019

TestIntegration#test_term_signal_exit_code_in_clustered_mode fails #1720

Closed

MSP-Greg mentioned this pull request Mar 19, 2019

Travis - Ruby head/trunk test failures #1703

Closed

MSP-Greg deleted the wait-issue branch March 19, 2019 16:04

MSP-Greg mentioned this pull request Apr 20, 2019

Cluster#check_workers - waitpid logic #1748

Merged

This was referenced Jul 26, 2019

Hang on SIGTERM with ruby 2.6 in clustered mode #1755

Closed

Using at_exit and Signal.trap possibly creates a race-condition jruby/jruby#5437

Closed

dentarg added a commit to dentarg/puma that referenced this pull request Aug 24, 2019

Note the Ruby 2.6 SIGTERM issue in the changelog

c0b5f53

Linking them all with this commit. puma#1741 puma#1755 puma#1674 puma#1730 puma#1720

dentarg added a commit to dentarg/puma that referenced this pull request Aug 24, 2019

Note the Ruby 2.6 SIGTERM issue in the changelog

4e8751d

Linking them all with this commit. puma#1741 puma#1755 puma#1674 puma#1730 puma#1720

nateberkopec pushed a commit that referenced this pull request Aug 28, 2019

Note the Ruby 2.6 SIGTERM issue in the changelog (#1928)

079dff5

Linking them all with this commit. #1741 #1755 #1674 #1730 #1720

dentarg mentioned this pull request Jan 11, 2024

Puma cluster not reaping child processes when PID is 1 with Puma 6.4.1 #3313

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster - worker shutdown - use WNOHANG with nil return tests #1741

cluster - worker shutdown - use WNOHANG with nil return tests #1741

MSP-Greg commented Mar 12, 2019 •

edited

Loading

MSP-Greg commented Mar 12, 2019

guilleiguaran commented Mar 12, 2019

MSP-Greg commented Mar 13, 2019

AlexWayfer Mar 15, 2019

MSP-Greg Mar 15, 2019

AlexWayfer Mar 15, 2019

MSP-Greg Mar 15, 2019

AlexWayfer Mar 15, 2019

MSP-Greg Mar 16, 2019

AlexWayfer Mar 16, 2019

evanphx commented Mar 18, 2019

MSP-Greg commented Mar 18, 2019 •

edited

Loading

guilleiguaran commented Mar 18, 2019 •

edited

Loading

evanphx commented Mar 18, 2019

MSP-Greg commented Mar 18, 2019

evanphx commented Mar 18, 2019

MSP-Greg commented Mar 19, 2019 •

edited

Loading

MSP-Greg commented Mar 19, 2019

cluster - worker shutdown - use WNOHANG with nil return tests #1741

cluster - worker shutdown - use WNOHANG with nil return tests #1741

Conversation

MSP-Greg commented Mar 12, 2019 • edited Loading

MSP-Greg commented Mar 12, 2019

guilleiguaran commented Mar 12, 2019

MSP-Greg commented Mar 13, 2019

AlexWayfer Mar 15, 2019

Choose a reason for hiding this comment

MSP-Greg Mar 15, 2019

Choose a reason for hiding this comment

AlexWayfer Mar 15, 2019

Choose a reason for hiding this comment

MSP-Greg Mar 15, 2019

Choose a reason for hiding this comment

AlexWayfer Mar 15, 2019

Choose a reason for hiding this comment

MSP-Greg Mar 16, 2019

Choose a reason for hiding this comment

AlexWayfer Mar 16, 2019

Choose a reason for hiding this comment

evanphx commented Mar 18, 2019

MSP-Greg commented Mar 18, 2019 • edited Loading

guilleiguaran commented Mar 18, 2019 • edited Loading

evanphx commented Mar 18, 2019

MSP-Greg commented Mar 18, 2019

evanphx commented Mar 18, 2019

MSP-Greg commented Mar 19, 2019 • edited Loading

MSP-Greg commented Mar 19, 2019

MSP-Greg commented Mar 12, 2019 •

edited

Loading

MSP-Greg commented Mar 18, 2019 •

edited

Loading

guilleiguaran commented Mar 18, 2019 •

edited

Loading

MSP-Greg commented Mar 19, 2019 •

edited

Loading