-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster - worker shutdown - use WNOHANG with nil return tests #1741
Conversation
force-pushed change - cleaned up sleep logic |
See #1674 (comment) As to the code I added, it's replacing line 40 of: Lines 35 to 44 in ca03c52
|
664c877
to
d294d63
Compare
end | ||
end | ||
sleep 0.5 if wait | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. As mentioned, I did it this way so it would loop thru all the workers on first pass, then go back for the nil returns. I'd rather use Process.waitpid(-1, Process::WNOHANG)
, as that's a bit cleaner, but it seems to repeatedly fail on trunk JIT, I never tried with 2.6.x JIT...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, I did it this way so it would loop thru all the workers on first pass, then go back for the nil returns.
It's strange algorithm.
Option 1, with redo
:
- Go through all workers
- Sleep and redo on the first fail (wait every first fail)
Option 2:
- Go through all workers and attempt to stop them
- Delete from array in success, wait after all if any failed
- Retry if there are any remained
Option 3, your choice, as I see:
- Go through all workers
- Delete in success, and retry (go further)
- Don't do anything in fail, just sleep in the end
- Retry until there are any
I think, the first option is simplest (but may be longest), the second is most optimal, and the third is strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the logic to use reject!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the logic to use reject!
You can use loop do
instead of while true do
. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I often look here:
https://msp-greg.github.io/ruby_trunk/file.control_expressions.html
loop do
isn't mentioned there. Time to review kernel methods again...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, about two cycles (and map
): do we need for @workers
after successful run of this code?
If no — we can use @workers.reject!
.
726fe01
to
54e8030
Compare
I don't see why we need this. Is |
As I think I've mentioned elsewhere, is it a bug or a breaking change? I tried to create a small repo of it, but being a windows type, I can't There are a few Ruby tests with waitpid & waitpid2, and also a few specs. I believe these have continued to pass even when Puma started having issues last year with trunk.
I tried to make it as 'light' as possible, with the minimum number of waitpid calls. |
@evanphx indeed, it seems to be caused by ruby/ruby@48b6bd7 but doesn't look like it will be fixed anytime soon since the Ruby team isn't sure about how to fix it 😞 Edit: The related issue is https://bugs.ruby-lang.org/issues/15499, but it was closed as resolved even when someone pointed that the patch didn't solve the problem with Puma. |
This is absolutely a bug in ruby, not a behavior change. Looks like they attempted to fix it here: ruby/ruby@9e66910 |
I could add a conditional on See #1674 (comment) |
I'm pretty annoyed that ruby-core has broken a fundamental and simple unix function and hasn't fixed it. Conditionalizing it on ruby version is fine. |
Okay with a string ( |
Ruby 2.6 introduced a bug that affects worker shutdown (waitpid). Added code using Process::WNOHANG along with needed logic. Adds worker status (via $?) and total shutdown time to log. Co-authored-by: MSP-Greg <greg.mpls@gmail.com> Co-authored-by: guilleiguaran <guilleiguaran@gmail.com>
JRuby 9.2.6.0 may have intermittent test failures, leave for now Co-authored-by: MSP-Greg <greg.mpls@gmail.com> Co-authored-by: guilleiguaran <guilleiguaran@gmail.com>
Done |
Ruby 2.6 introduced a change that affects worker shutdown. I believe the change occurred between 'commits' r64316 and r64376 of ruby trunk.
Added code using
Process::WNOHANG
along with needed logic. Adds worker status (via$?
) and total shutdown time to log.Logged status matches status returned by current code (exit 0) for Ruby 2.5.x and lower.
Co-authored-by: MSP-Greg greg.mpls@gmail.com
Co-authored-by: guilleiguaran guilleiguaran@gmail.com