Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hang on shutdown in test_refork [changelog skip] #2442

Merged
merged 3 commits into from
Oct 22, 2020

Conversation

wjordan
Copy link
Contributor

@wjordan wjordan commented Oct 22, 2020

Description

This PR fixes a hang on shutdown that occurred intermittently in test_refork (example). I reproduced the hang more consistently running test_refork with an extra sleep after the 'b' (boot) message and before it starts the server (e.g., line 106 below):

begin
@worker_write << "b#{Process.pid}:#{index}\n"
rescue SystemCallError, IOError
Thread.current.purge_interrupt_queue if Thread.current.respond_to? :purge_interrupt_queue
STDERR.puts "Master seems to have exited, exiting."
return
end
while restart_server.pop
server_thread = server.run

The PR contains two small fixes.

  1. Prevent server from starting if worker receives TERM signal (0018edd). This handles some (but not 100% of all) edge cases where the TERM signal is received and calls server.stop before the server has started.
  2. Send TERM to fork_worker child processes in Cluster#wait_workers (aae4d4d) - When fork_worker is used, the call to Process.wait in wait_workers returns ECHILD for forked workers. This commit adds a call to w.term in the rescue clause, matching the behavior when fork_worker is not used and ensuring workers shut down properly. This should handle all other edge-cases where a worker receives TERM before the server starts.

Finally, note that this bug started occurring more frequently after #2435, because that PR changed behavior so that calling #stop before #run now has no effect. I've added test_command_ignored_before_run to clarify this new behavior in a unit test. I think this is more intuitive and predictable behavior overall, but we can change it back if anyone disagrees or if the change continues to cause any other issues.

If a worker receives a `TERM` signal before the server has started,
prevent it from starting at all by clearing `restart_server`.
When `fork_worker` is used, the call to `Process.wait` in `wait_workers`
returns `ECHILD` for forked workers.
Add a call to `w.term` in the rescue clause to match the behavior when
`fork_worker` is not used, ensuring workers shut down properly.
Unit test clarifying behavior of commands sent before Server#run is called.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug maintenance waiting-for-review Waiting on review from anyone
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants