Skip to content

Commit

Permalink
Fix child processes not being reaped with PID 1
Browse files Browse the repository at this point in the history
Starting with Puma v6.4.1, we observed that killed Puma cluster
workers were never being restarted when run as PID 1. Note that PID 44
remained in the `defunct` state after a `kill 44` was issued:

```
git@gitlab-webservice-default-78664bb757-2nxvh:/var/log/gitlab$ ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
git            1       0  0 Jan09 ?        00:01:39 puma 6.4.1 (tcp://0.0.0.0:8080) [gitlab-puma-worker]
git           23       1  0 Jan09 ?        00:05:46 /usr/local/bin/gitlab-logger /var/log/gitlab
git           41       1  0 Jan09 ?        00:01:55 ruby /srv/gitlab/bin/metrics-server
git           44       1  0 Jan09 ?        00:02:41 [ruby] <defunct>
git           46       1  0 Jan09 ?        00:02:38 puma: cluster worker 1: 1 [gitlab-puma-worker]
git           48       1  0 Jan09 ?        00:02:42 puma: cluster worker 2: 1 [gitlab-puma-worker]
git           49       1  0 Jan09 ?        00:02:41 puma: cluster worker 3: 1 [gitlab-puma-worker]
git         5205       0  0 21:57 pts/0    00:00:00 bash
git         5331    5205  0 22:00 pts/0    00:00:00 ps -ef
```

Further investigation showed that the introduction of
`Process.wait2(-1, Process::WNOHANG)` in puma#3255 never appears to return
anything inside Google Kubernetes Engine running as PID 1.

Previously `Process.wait(w.pid, process::WNOHANG)` was called on each
known worker PID. puma#3255 changed this behavior to do this only if the
`fork_worker` config parameter were enabled, but it seems that we
should always do this.

Closes puma#3313
  • Loading branch information
stanhu committed Jan 10, 2024
1 parent 704d251 commit 65c7246
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion lib/puma/cluster.rb
Original file line number Diff line number Diff line change
Expand Up @@ -562,7 +562,7 @@ def wait_workers
begin
# When `fork_worker` is enabled, some worker may not be direct children, but grand children.
# Because of this they won't be reaped by `Process.wait2(-1)`, so we need to check them individually)
if reaped_children.delete(w.pid) || (@options[:fork_worker] && Process.wait(w.pid, Process::WNOHANG))
if reaped_children.delete(w.pid) || Process.wait(w.pid, Process::WNOHANG)
true
else
w.term if w.term?
Expand Down

0 comments on commit 65c7246

Please sign in to comment.