fix(app): terminate child process on parent crash via reaper#41
fix(app): terminate child process on parent crash via reaper#41BANANASJIM merged 6 commits intomainfrom
Conversation
📝 WalkthroughWalkthroughAdds a new single-threaded watchdog executable ( Changes
Sequence Diagram(s)sequenceDiagram
participant Launcher as Launcher (goggles main)
participant Kernel as Kernel
participant Reaper as goggles-reaper
participant Target as Target App
Launcher->>Kernel: posix_spawn(goggles-reaper, args)
Kernel-->>Launcher: reaper_pid
Kernel-->>Reaper: start (single-threaded)
Reaper->>Kernel: prctl(PR_SET_CHILD_SUBREAPER)
Reaper->>Kernel: prctl(PR_SET_PDEATHSIG, SIGTERM)
Reaper->>Kernel: fork() -> child_pid
Reaper->>Target: execvp(target, args)
Kernel-->>Target: run target app
Note over Launcher,Reaper: Launcher exits/crashes
Kernel->>Reaper: deliver SIGTERM (PDEATHSIG)
Reaper->>Kernel: kill_process_tree(primary and descendants)
Reaper->>Kernel: wait_all_children()
Reaper-->>Kernel: exit (propagate target exit code or failure)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
|||||||||||||||||
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @openspec/changes/fix-child-process-cleanup/tasks.md:
- Line 10: Task 2.1 in tasks.md references the wrong test filename; update the
task description so it names tests/app/test_child_death_signal.cpp instead of
tests/app/test_process_utils.cpp to match the implemented test file and keep the
task list consistent.
🧹 Nitpick comments (4)
openspec/changes/fix-child-process-cleanup/proposal.md (1)
7-11: Consider adding spec delta references to the "What Changes" section.Per coding guidelines, the "What Changes" section should list spec deltas. Currently, it lists implementation details. Consider adding a reference to the spec delta being introduced:
- ADDED
app-window: Child Process Death Signal requirementThe implementation details could remain as supplementary context.
src/app/main.cpp (1)
85-94: Checkprctl()return value for robustness.
prctl(PR_SET_PDEATHSIG, SIGTERM)can fail (returns -1), for example if the signal number is invalid or on certain containerized/sandboxed environments. Consider checking the return value:🛠️ Suggested fix
if (pid == 0) { // Child: set death signal before exec so we die if parent crashes - prctl(PR_SET_PDEATHSIG, SIGTERM); + if (prctl(PR_SET_PDEATHSIG, SIGTERM) == -1) { + _exit(EXIT_FAILURE); + } // Handle race: if parent already died, getppid() returns 1 (init) if (getppid() != parent_pid) { _exit(EXIT_FAILURE); }tests/app/test_child_death_signal.cpp (2)
88-104: Test may be timing-sensitive; consider a more robust wait strategy.The 500ms sleep before attempting to reap the grandchild is a heuristic. If the system is under load, the grandchild may not have exited yet, causing a false test failure. Consider polling in a loop with a timeout:
💡 More robust waiting approach
- // Wait for grandchild (now orphaned, reparented to us or init) - // We need to wait a bit for the grandchild to receive signal and exit - std::this_thread::sleep_for(std::chrono::milliseconds(500)); - - // Try to reap any grandchild - int grandchild_status = 0; - const pid_t reaped = waitpid(-1, &grandchild_status, WNOHANG); + // Wait for grandchild with timeout + int grandchild_status = 0; + pid_t reaped = -1; + for (int i = 0; i < 100; ++i) { + reaped = waitpid(-1, &grandchild_status, WNOHANG); + if (reaped > 0) { + break; + } + std::this_thread::sleep_for(std::chrono::milliseconds(50)); + }
29-42: Consider checkingprctl()return value for consistency.Similar to the production code in
main.cpp, the test'sprctl(PR_SET_PDEATHSIG, SIGTERM)call (line 31) doesn't check the return value. For a test that validates PDEATHSIG behavior, verifying that the setup succeeded makes the test more reliable:signal(SIGTERM, signal_handler); - prctl(PR_SET_PDEATHSIG, SIGTERM); + if (prctl(PR_SET_PDEATHSIG, SIGTERM) != 0) { + _exit(EXIT_TEST_FAIL); + }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
openspec/changes/fix-child-process-cleanup/proposal.mdopenspec/changes/fix-child-process-cleanup/specs/app-window/spec.mdopenspec/changes/fix-child-process-cleanup/tasks.mdsrc/app/main.cpptests/CMakeLists.txttests/app/test_child_death_signal.cpp
🧰 Additional context used
📓 Path-based instructions (3)
openspec/changes/**/specs/**/*.md
📄 CodeRabbit inference engine (openspec/AGENTS.md)
openspec/changes/**/specs/**/*.md: Write spec deltas using## ADDED|MODIFIED|REMOVED|RENAMED Requirementsformat with at least one#### Scenario:per requirement in spec files
When modifying existing requirements in a MODIFIED delta, paste the full requirement block including the header and all scenarios, as the archiver will replace the entire requirement
Use ADDED for new capabilities that can stand alone; use MODIFIED for changes to existing requirement behavior, scope, or acceptance criteria; use RENAMED for name-only changes
Files:
openspec/changes/fix-child-process-cleanup/specs/app-window/spec.md
openspec/**/*.md
📄 CodeRabbit inference engine (openspec/AGENTS.md)
openspec/**/*.md: Use#### Scenario:format (4 hashtags) for scenario headers in requirements, not bullets or bold text, with at least one scenario per requirement
Use format- **WHEN** [condition] - **THEN** [expected result]for scenario steps in requirements
Files:
openspec/changes/fix-child-process-cleanup/specs/app-window/spec.mdopenspec/changes/fix-child-process-cleanup/tasks.mdopenspec/changes/fix-child-process-cleanup/proposal.md
openspec/changes/**/proposal.md
📄 CodeRabbit inference engine (openspec/AGENTS.md)
List spec deltas in
proposal.mdunder 'What Changes' section, marking breaking changes with BREAKING
Files:
openspec/changes/fix-child-process-cleanup/proposal.md
🧠 Learnings (3)
📚 Learning: 2025-12-29T15:44:30.520Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: openspec/project.md:0-0
Timestamp: 2025-12-29T15:44:30.520Z
Learning: Applies to openspec/tests/**/*.{cpp,hpp} : Focus unit tests on non-GPU logic: utility functions, configuration parsing, and pipeline graph logic
Applied to files:
tests/CMakeLists.txt
📚 Learning: 2025-12-19T08:54:18.412Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: openspec/AGENTS.md:0-0
Timestamp: 2025-12-19T08:54:18.412Z
Learning: Applies to openspec/changes/**/specs/**/*.md : Use ADDED for new capabilities that can stand alone; use MODIFIED for changes to existing requirement behavior, scope, or acceptance criteria; use RENAMED for name-only changes
Applied to files:
openspec/changes/fix-child-process-cleanup/specs/app-window/spec.md
📚 Learning: 2025-12-29T15:44:30.520Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: openspec/project.md:0-0
Timestamp: 2025-12-29T15:44:30.520Z
Learning: Applies to openspec/src/**/*.{cpp,hpp} : Use inter-thread communication via `goggles::util::SPSCQueue` (lock-free SPSC queue) for task parallelism
Applied to files:
src/app/main.cpp
🧬 Code graph analysis (1)
src/app/main.cpp (1)
src/util/error.hpp (1)
make_error(50-52)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Static Analysis (clang-tidy)
- GitHub Check: Build and Test (test preset)
🔇 Additional comments (4)
openspec/changes/fix-child-process-cleanup/specs/app-window/spec.md (1)
1-19: Spec structure looks good.The specification correctly uses the
## ADDED Requirementsformat with properly structured#### Scenario:headers. The BDD-style GIVEN/WHEN/THEN format clearly defines the expected behavior for both the parent crash and race condition scenarios.src/app/main.cpp (1)
78-83: Implementation is correct for the use case.The fork/exec pattern is implemented correctly:
- Parent PID is saved before fork (handles race condition)
- Error handling uses
strerror(errno)appropriately- Return value check is correct (
pid < 0)Note:
execvpe()(line 92) is a GNU extension rather than POSIX, but this is acceptable since the code already depends on Linux-specificprctl(PR_SET_PDEATHSIG).tests/CMakeLists.txt (1)
176-188: Test integration looks good.The integration test is properly set up:
- Separate executable for isolation
- Sanitizers enabled for memory safety checks
- Registered with CTest for automated runs
- Working directory configured
The test has no external library dependencies, which is appropriate since it only tests Linux system calls.
tests/app/test_child_death_signal.cpp (1)
65-82: Good test design using subreaper pattern.The test correctly uses
PR_SET_CHILD_SUBREAPERto become the reaper for orphaned grandchildren, ensuring the test can observe the grandchild's exit status. The three-process hierarchy (main → child → grandchild) accurately simulates the real-world scenario where goggles spawns a target app.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In @openspec/changes/fix-child-process-cleanup/tasks.md:
- Around line 3-7: The checklist and code disagree: the tasks reference execvp()
but the implementation uses execvpe(); update either the docs or the code so
they match. Pick one API (prefer simpler execvp() if you don't need envp) and
make spawn_target_app() and its task list consistent: if you keep execvpe(),
change the checklist items to mention execvpe(); if you switch the code to
execvp(), update spawn_target_app() to call execvp() and remove envp usage. Also
ensure the diff note about replacing posix_spawnp() still matches the chosen
exec variant and that prctl and parent-death checks remain unchanged.
In @src/app/main.cpp:
- Around line 85-95: Check the return value of prctl(PR_SET_PDEATHSIG, SIGKILL)
in the child path and handle failure: after calling prctl, if it returns -1, log
or report the error (using errno) and immediately _exit(EXIT_FAILURE) to avoid
continuing without the death-signal guarantee; retain the existing race check
using getppid() vs parent_pid, then proceed to call execvpe(argv[0],
argv.data(), envp.data()) only if prctl succeeded.
In @tests/app/test_child_death_signal.cpp:
- Around line 20-34: The test is flaky because a single non-blocking
waitpid(WNOHANG) can return 0 under load; update the wait logic to poll until a
deadline instead of one-shot. Replace the single waitpid(..., WNOHANG) + sleep
with a loop that repeatedly calls waitpid(-1, &status, WNOHANG) until it returns
a pid > 0 or a timeout is reached (sleeping briefly between polls), and apply
the same change for the other occurrence noted around lines 76-105; use clear
timeouts and short backoff sleeps so the test reliably observes child exit.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
openspec/changes/fix-child-process-cleanup/tasks.mdsrc/app/main.cpptests/app/test_child_death_signal.cpp
🧰 Additional context used
📓 Path-based instructions (1)
openspec/**/*.md
📄 CodeRabbit inference engine (openspec/AGENTS.md)
openspec/**/*.md: Use#### Scenario:format (4 hashtags) for scenario headers in requirements, not bullets or bold text, with at least one scenario per requirement
Use format- **WHEN** [condition] - **THEN** [expected result]for scenario steps in requirements
Files:
openspec/changes/fix-child-process-cleanup/tasks.md
🧠 Learnings (1)
📚 Learning: 2025-12-29T15:44:30.520Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: openspec/project.md:0-0
Timestamp: 2025-12-29T15:44:30.520Z
Learning: Applies to openspec/src/**/*.{cpp,hpp} : Use inter-thread communication via `goggles::util::SPSCQueue` (lock-free SPSC queue) for task parallelism
Applied to files:
src/app/main.cpp
🧬 Code graph analysis (1)
src/app/main.cpp (1)
src/util/error.hpp (1)
make_error(50-52)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Static Analysis (clang-tidy)
- GitHub Check: Build and Test (test preset)
| - [x] 1.1 Replace `posix_spawnp()` with `fork()` + `execvp()` in `spawn_target_app()` | ||
| - [x] 1.2 Add `#include <sys/prctl.h>` header | ||
| - [x] 1.3 In child process (after fork), call `prctl(PR_SET_PDEATHSIG, SIGKILL)` before exec | ||
| - [x] 1.4 Add check for parent death race condition (verify parent PID hasn't changed to 1) | ||
|
|
There was a problem hiding this comment.
Doc drift: tasks say execvp() but code uses execvpe() (pick one and align).
Small, but this checklist is meant to be actionable/traceable.
🤖 Prompt for AI Agents
In @openspec/changes/fix-child-process-cleanup/tasks.md around lines 3 - 7, The
checklist and code disagree: the tasks reference execvp() but the implementation
uses execvpe(); update either the docs or the code so they match. Pick one API
(prefer simpler execvp() if you don't need envp) and make spawn_target_app() and
its task list consistent: if you keep execvpe(), change the checklist items to
mention execvpe(); if you switch the code to execvp(), update spawn_target_app()
to call execvp() and remove envp usage. Also ensure the diff note about
replacing posix_spawnp() still matches the chosen exec variant and that prctl
and parent-death checks remain unchanged.
src/app/main.cpp
Outdated
| if (pid == 0) { | ||
| // Child: set death signal before exec so we die if parent crashes | ||
| // Use SIGKILL to ensure child cannot ignore the signal | ||
| prctl(PR_SET_PDEATHSIG, SIGKILL); | ||
| // Handle race: if parent already died, getppid() returns 1 (init) | ||
| if (getppid() != parent_pid) { | ||
| _exit(EXIT_FAILURE); | ||
| } | ||
| execvpe(argv[0], argv.data(), envp.data()); | ||
| _exit(EXIT_FAILURE); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n src/app/main.cpp | sed -n '80,100p'Repository: K1ngst0m/Goggles
Length of output: 921
🏁 Script executed:
man prctl 2>&1 | head -200Repository: K1ngst0m/Goggles
Length of output: 101
🌐 Web query:
prctl PR_SET_PDEATHSIG return value error handling Linux man pages
💡 Result:
Short answer: prctl(PR_SET_PDEATHSIG, sig) returns 0 on success and -1 on error, with errno set. The documented error for this option is EINVAL when sig is not a valid signal number. See the prctl(2) / PR_SET_PDEATHSIG man page for details. [1][2][3]
Sources
- PR_SET_PDEATHSIG man page. [1]
- Arch/man-pages entry for PR_SET_PDEATHSIG. [2]
- prctl(2) general man page (return-value / errors). [3]
🏁 Script executed:
# Verify that SIGKILL is signal 9 and always valid
python3 << 'EOF'
import signal
# Check if SIGKILL exists and its number
try:
sigkill = signal.SIGKILL
print(f"SIGKILL = {sigkill}")
print("SIGKILL is a valid standard signal")
except AttributeError:
print("SIGKILL not found")
EOFRepository: K1ngst0m/Goggles
Length of output: 146
Check prctl(PR_SET_PDEATHSIG, SIGKILL) return value before proceeding.
If prctl fails (returns -1), the child continues executing without the death signal being set, defeating the goal of ensuring the child dies if the parent crashes. The subsequent getppid() check handles the race condition where the parent dies before prctl is called, but does not compensate for prctl failure. Add an error check:
if (pid == 0) {
// Child: set death signal before exec so we die if parent crashes
// Use SIGKILL to ensure child cannot ignore the signal
- prctl(PR_SET_PDEATHSIG, SIGKILL);
+ if (prctl(PR_SET_PDEATHSIG, SIGKILL) != 0) {
+ _exit(EXIT_FAILURE);
+ }
// Handle race: if parent already died, getppid() returns 1 (init)
if (getppid() != parent_pid) {
_exit(EXIT_FAILURE);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (pid == 0) { | |
| // Child: set death signal before exec so we die if parent crashes | |
| // Use SIGKILL to ensure child cannot ignore the signal | |
| prctl(PR_SET_PDEATHSIG, SIGKILL); | |
| // Handle race: if parent already died, getppid() returns 1 (init) | |
| if (getppid() != parent_pid) { | |
| _exit(EXIT_FAILURE); | |
| } | |
| execvpe(argv[0], argv.data(), envp.data()); | |
| _exit(EXIT_FAILURE); | |
| } | |
| if (pid == 0) { | |
| // Child: set death signal before exec so we die if parent crashes | |
| // Use SIGKILL to ensure child cannot ignore the signal | |
| if (prctl(PR_SET_PDEATHSIG, SIGKILL) != 0) { | |
| _exit(EXIT_FAILURE); | |
| } | |
| // Handle race: if parent already died, getppid() returns 1 (init) | |
| if (getppid() != parent_pid) { | |
| _exit(EXIT_FAILURE); | |
| } | |
| execvpe(argv[0], argv.data(), envp.data()); | |
| _exit(EXIT_FAILURE); | |
| } |
🤖 Prompt for AI Agents
In @src/app/main.cpp around lines 85 - 95, Check the return value of
prctl(PR_SET_PDEATHSIG, SIGKILL) in the child path and handle failure: after
calling prctl, if it returns -1, log or report the error (using errno) and
immediately _exit(EXIT_FAILURE) to avoid continuing without the death-signal
guarantee; retain the existing race check using getppid() vs parent_pid, then
proceed to call execvpe(argv[0], argv.data(), envp.data()) only if prctl
succeeded.
| [[noreturn]] void run_grandchild(pid_t expected_parent) { | ||
| prctl(PR_SET_PDEATHSIG, SIGKILL); | ||
|
|
||
| if (getppid() != expected_parent) { | ||
| _exit(EXIT_TEST_FAIL); | ||
| } | ||
|
|
||
| // Sleep until killed by SIGKILL | ||
| for (int i = 0; i < 100; ++i) { | ||
| std::this_thread::sleep_for(std::chrono::milliseconds(50)); | ||
| } | ||
|
|
||
| // If we reach here, SIGKILL was not received | ||
| _exit(EXIT_TEST_FAIL); | ||
| } |
There was a problem hiding this comment.
Test is timing-flaky: replace single sleep(500ms) + one waitpid(WNOHANG) with a poll-until-timeout loop.
As written, waitpid(-1, ..., WNOHANG) can legitimately return 0 under load and the test fails immediately.
Suggested change
- // We need to wait a bit for the grandchild to receive signal and exit
- std::this_thread::sleep_for(std::chrono::milliseconds(500));
-
- // Try to reap grandchild - should have been killed by SIGKILL
- int grandchild_status = 0;
- const pid_t reaped = waitpid(-1, &grandchild_status, WNOHANG);
+ // Poll for grandchild exit (avoid flakes on slow/loaded CI)
+ int grandchild_status = 0;
+ pid_t reaped = 0;
+ const auto deadline = std::chrono::steady_clock::now() + std::chrono::seconds(5);
+ while (std::chrono::steady_clock::now() < deadline) {
+ reaped = waitpid(-1, &grandchild_status, WNOHANG);
+ if (reaped > 0) break;
+ if (reaped < 0 && errno == EINTR) continue;
+ std::this_thread::sleep_for(std::chrono::milliseconds(50));
+ }Also applies to: 76-105
🤖 Prompt for AI Agents
In @tests/app/test_child_death_signal.cpp around lines 20 - 34, The test is
flaky because a single non-blocking waitpid(WNOHANG) can return 0 under load;
update the wait logic to poll until a deadline instead of one-shot. Replace the
single waitpid(..., WNOHANG) + sleep with a loop that repeatedly calls
waitpid(-1, &status, WNOHANG) until it returns a pid > 0 or a timeout is reached
(sleeping briefly between polls), and apply the same change for the other
occurrence noted around lines 76-105; use clear timeouts and short backoff
sleeps so the test reliably observes child exit.
| [[noreturn]] void run_grandchild(pid_t expected_parent) { | ||
| prctl(PR_SET_PDEATHSIG, SIGKILL); | ||
|
|
||
| if (getppid() != expected_parent) { | ||
| _exit(EXIT_TEST_FAIL); | ||
| } | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "test_child_death_signal.cpp" -type fRepository: K1ngst0m/Goggles
Length of output: 99
🏁 Script executed:
wc -l ./tests/app/test_child_death_signal.cppRepository: K1ngst0m/Goggles
Length of output: 103
🏁 Script executed:
cat -n ./tests/app/test_child_death_signal.cppRepository: K1ngst0m/Goggles
Length of output: 4078
Handle syscall failures consistently (prctl + waitpid).
prctl(PR_SET_PDEATHSIG, SIGKILL) at line 21 returns no error check, and waitpid(child, ...) at line 78 ignores EINTR and return value despite error handling being applied elsewhere in this file (lines 61, 86).
Suggested fix
[[noreturn]] void run_grandchild(pid_t expected_parent) {
- prctl(PR_SET_PDEATHSIG, SIGKILL);
+ if (prctl(PR_SET_PDEATHSIG, SIGKILL) != 0) {
+ _exit(EXIT_TEST_FAIL);
+ }
@@
- waitpid(child, &child_status, 0);
+ while (waitpid(child, &child_status, 0) < 0 && errno == EINTR) {}Also applies to: line 78
|
the reaper pattern |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @src/app/reaper_main.cpp:
- Around line 73-78: The signal handler signal_handler currently calls
kill_all_children which invokes non-async-signal-safe functions; change
signal_handler to only set the atomic/volatile flag g_should_run = 0 and return
(remove the kill_all_children call), and move any child-killing logic into the
main loop that checks g_should_run (the existing loop around where main checks
the flag near the comment at line ~125) so that when g_should_run becomes 0 the
main thread performs kill_all_children and other cleanup synchronously (reusing
the existing cleanup code at lines ~135-138).
In @tests/app/test_child_death_signal.cpp:
- Around line 34-52: The test uses posix_spawnp("goggles-reaper") which assumes
the binary is in PATH; update tests/app/test_child_death_signal.cpp to use an
absolute path instead: either replicate the reaper path-resolution logic used by
your main (resolve the current executable via /proc/self/exe then derive the
goggles-reaper absolute path and pass that string to posix_spawnp instead of
"goggles-reaper"), or (alternative) add a CMake dependency so the test depends
on the goggles-reaper target and pass the built artifact's full path into the
test (replace the literal "goggles-reaper" in argv/posix_spawnp with the
resolved absolute path). Ensure the new path string lifetime outlives the
posix_spawnp call and adjust argv/posix_spawnp usage accordingly.
🧹 Nitpick comments (3)
src/app/reaper_main.cpp (1)
80-87: Consider checkingsigactionreturn values.While failures are unlikely, checking return values would improve robustness and aid debugging if signal setup fails unexpectedly.
openspec/changes/fix-child-process-cleanup/proposal.md (1)
27-33: Consider expanding the "Impact" section with spec deltas.Per coding guidelines for
proposal.md, the "What Changes" or "Impact" section should list specific spec deltas. Currently it mentions "Affected specs:app-window" but doesn't detail what specifically changes in that spec. Consider adding a brief description of the behavioral change (e.g., "Target app now terminates reliably when Goggles crashes").src/app/CMakeLists.txt (1)
32-37: Consider enabling sanitizers for goggles-reaper.The main
gogglestarget enables sanitizers viagoggles_enable_sanitizers(goggles), but the reaper does not. For consistency and to catch memory/threading issues during testing, consider enabling sanitizers for the reaper as well.Proposed fix
# Reaper watchdog process add_executable(goggles-reaper reaper_main.cpp) set_target_properties(goggles-reaper PROPERTIES RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin ) goggles_enable_clang_tidy(goggles-reaper) +goggles_enable_sanitizers(goggles-reaper)
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
openspec/changes/fix-child-process-cleanup/proposal.mdopenspec/changes/fix-child-process-cleanup/tasks.mdsrc/app/CMakeLists.txtsrc/app/main.cppsrc/app/reaper_main.cpptests/app/test_child_death_signal.cpp
✅ Files skipped from review due to trivial changes (1)
- openspec/changes/fix-child-process-cleanup/tasks.md
🧰 Additional context used
📓 Path-based instructions (2)
openspec/**/*.md
📄 CodeRabbit inference engine (openspec/AGENTS.md)
openspec/**/*.md: Use#### Scenario:format (4 hashtags) for scenario headers in requirements, not bullets or bold text, with at least one scenario per requirement
Use format- **WHEN** [condition] - **THEN** [expected result]for scenario steps in requirements
Files:
openspec/changes/fix-child-process-cleanup/proposal.md
openspec/changes/**/proposal.md
📄 CodeRabbit inference engine (openspec/AGENTS.md)
List spec deltas in
proposal.mdunder 'What Changes' section, marking breaking changes with BREAKING
Files:
openspec/changes/fix-child-process-cleanup/proposal.md
🧠 Learnings (4)
📚 Learning: 2026-01-07T07:20:57.054Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: docs/project_policies.md:0-0
Timestamp: 2026-01-07T07:20:57.054Z
Learning: Applies to docs/src/render/**/*.cpp : Single-threaded render loop on main thread by default. Render backend (`goggles::render`) runs on main thread. Pipeline execution runs on main thread. Job system for non-render work on worker threads. Threading introduced only when profiling justifies it (main thread CPU consistently >8ms).
Applied to files:
src/app/CMakeLists.txt
📚 Learning: 2026-01-07T07:20:57.053Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: docs/project_policies.md:0-0
Timestamp: 2026-01-07T07:20:57.053Z
Learning: Applies to docs/**/*.{cpp,hpp} : Use top-level namespace `goggles`. Use module namespaces `goggles::<module>` (e.g., `goggles::capture`, `goggles::render`). Use sub-module namespaces `goggles::<module>::<submodule>`. Never use `using namespace` in headers. Prefer explicit namespace qualification or scoped `using` declarations.
Applied to files:
src/app/CMakeLists.txt
📚 Learning: 2025-12-29T15:44:30.520Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: openspec/project.md:0-0
Timestamp: 2025-12-29T15:44:30.520Z
Learning: Applies to openspec/src/**/*.{cpp,hpp} : Use inter-thread communication via `goggles::util::SPSCQueue` (lock-free SPSC queue) for task parallelism
Applied to files:
src/app/reaper_main.cpp
📚 Learning: 2026-01-07T07:20:57.054Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: docs/project_policies.md:0-0
Timestamp: 2026-01-07T07:20:57.054Z
Learning: Applies to docs/**/*.{cpp,hpp} : Never use raw `int` file descriptors when possible. Use `goggles::util::UniqueFd` for all owned file descriptors. Raw `int` acceptable only at uncontrollable boundaries (syscall returns, IPC receive). Wrap immediately after receive. API boundaries may use `int` with clear documentation.
Applied to files:
src/app/main.cpp
🧬 Code graph analysis (1)
src/app/main.cpp (4)
src/ui/imgui_layer.hpp (2)
path(74-74)path(95-95)src/render/texture/texture_loader.hpp (1)
path(32-33)src/app/cli.hpp (1)
arg(76-76)src/util/error.hpp (1)
make_error(50-52)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Static Analysis (clang-tidy)
- GitHub Check: Build and Test (test preset)
🔇 Additional comments (6)
src/app/reaper_main.cpp (4)
53-59: LGTM!The recursive kill approach correctly handles the process tree. Killing the parent before children prevents the parent from spawning new processes during cleanup.
61-71: LGTM!The cleanup functions correctly gather and terminate child process trees, and the wait loop properly handles
EINTRinterruptions.
97-108: LGTM - correct parent death race handling.The pattern of capturing
parent_pidbeforePR_SET_PDEATHSIGand checkinggetppid()afterward correctly handles the race condition where the parent might die between the two calls.Consider checking
prctlreturn values for better diagnostics if these calls fail.
123-144: LGTM!The main loop correctly handles
EINTRinterruptions, and the cleanup sequence properly terminates and reaps all remaining children before returning the appropriate exit status.src/app/main.cpp (2)
25-34: LGTM!The reaper path resolution correctly uses
readlinkto find the executable's directory and falls back to PATH lookup if that fails. The buffer size and null-termination are handled properly.
84-101: LGTM!The argv construction correctly prepends the reaper path, and the switch from
posix_spawnptoposix_spawnis appropriate since the full path is now computed explicitly.
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In @src/app/reaper_main.cpp:
- Around line 53-71: The current wait_all_children() can block forever because
children may ignore SIGTERM; update the termination strategy: in
kill_all_children() send SIGTERM to each child (use kill_process_tree(pid,
SIGTERM) so the function exists), then implement a bounded wait in
wait_all_children() that repeatedly loops with non-blocking waitpid(...,
WNOHANG) and usleep short intervals until either all children have exited or a
deadline (e.g., N seconds) is reached, handling EINTR correctly; if the deadline
is exceeded, escalate by calling kill_process_tree(child, SIGKILL) for any
still-running children and perform a final blocking waitpid loop to reap them;
ensure functions reference get_child_pids, kill_process_tree, and
kill_all_children so remaining PIDs can be re-queried or tracked during
escalation.
- Around line 1-141: The reaper is Linux-only; either prevent building it on
non-Linux or guard platform-specific APIs: update the CMake target that
currently adds src/app/reaper_main.cpp (around the existing target in
CMakeLists.txt) to build only when Linux (e.g., if(UNIX AND NOT APPLE) or
if(CMAKE_SYSTEM_NAME STREQUAL Linux)), OR wrap the Linux-specific source body in
#ifdef __linux__ / #endif so symbols using /proc, prctl (PR_SET_CHILD_SUBREAPER,
PR_SET_PDEATHSIG), fork(), execvp(), get_child_pids(), kill_process_tree(), etc.
are excluded on non-Linux platforms; choose one approach and apply it
consistently so the file is not compiled/used on non-Linux systems.
In @tests/app/test_child_death_signal.cpp:
- Around line 52-76: The child branch uses posix_spawn(..., environ) without
declaring extern char** environ (portability issue) and relies on a fixed
std::this_thread::sleep_for(200ms) race; fix by adding a portable usage of the
environment to the spawn call (declare extern char** environ at top of the test
file or pass nullptr if inheriting env is acceptable) and replace the fixed
sleep_for wait with a deterministic readiness check: after posix_spawn, poll for
the spawned reaper child (reaper_pid) existence or use a pipe/handshake where
the child (spawned via posix_spawn with argv from reaper_path) writes a ready
byte before the parent proceeds; update the child branch around
fork()/posix_spawn() and the sleep_for() usage (symbols: child, reaper_pid,
posix_spawn, reaper_path, environ, std::this_thread::sleep_for) to use the
chosen readiness mechanism.
- Around line 78-84: The waitpid call can fail with EINTR and the test doesn't
validate the child's exit, so replace the direct waitpid(child, &status, 0) with
an EINTR-safe loop that retries on errno==EINTR, checks the return value
(ensuring it equals child), and then validate the child's termination using
WIFEXITED/WEXITSTATUS or WIFSIGNALED to assert expected outcome before
continuing; reference the existing variables and symbols waitpid, child, and
status in tests/app/test_child_death_signal.cpp and ensure the test fails loudly
if waitpid returns an unexpected error or the child exited abnormally.
🧹 Nitpick comments (1)
src/app/reaper_main.cpp (1)
137-140: Consider propagating signaled exit status (helps callers/debuggability).Right now a SIGKILL’d child returns
EXIT_FAILURErather than a conventional128+signalmapping.Possible tweak
if (WIFEXITED(status)) { return WEXITSTATUS(status); } + if (WIFSIGNALED(status)) { + return 128 + WTERMSIG(status); + } return EXIT_FAILURE;
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/app/reaper_main.cpptests/app/test_child_death_signal.cpp
🧰 Additional context used
🧬 Code graph analysis (1)
tests/app/test_child_death_signal.cpp (1)
src/app/main.cpp (2)
get_reaper_path(25-34)get_reaper_path(25-25)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Static Analysis (clang-tidy)
- GitHub Check: Build and Test (test preset)
| // goggles-reaper: single-threaded watchdog process | ||
| // Ensures target app and descendants are terminated when goggles dies | ||
|
|
||
| #include <array> | ||
| #include <cerrno> | ||
| #include <csignal> | ||
| #include <cstdio> | ||
| #include <cstdlib> | ||
| #include <cstring> | ||
| #include <dirent.h> | ||
| #include <sys/prctl.h> | ||
| #include <sys/wait.h> | ||
| #include <unistd.h> | ||
| #include <vector> | ||
|
|
||
| namespace { | ||
|
|
||
| volatile sig_atomic_t g_should_run = 1; | ||
|
|
||
| auto get_child_pids(pid_t parent_pid) -> std::vector<pid_t> { | ||
| std::vector<pid_t> pids; | ||
| DIR* proc_dir = opendir("/proc"); | ||
| if (proc_dir == nullptr) { | ||
| return pids; | ||
| } | ||
|
|
||
| struct dirent* entry = nullptr; | ||
| while ((entry = readdir(proc_dir)) != nullptr) { | ||
| if (entry->d_type != DT_DIR || entry->d_name[0] < '0' || entry->d_name[0] > '9') { | ||
| continue; | ||
| } | ||
|
|
||
| std::array<char, 64> path{}; | ||
| std::snprintf(path.data(), path.size(), "/proc/%s/stat", entry->d_name); | ||
|
|
||
| FILE* stat_file = std::fopen(path.data(), "r"); | ||
| if (stat_file == nullptr) { | ||
| continue; | ||
| } | ||
|
|
||
| pid_t ppid = -1; | ||
| std::fscanf(stat_file, "%*d %*s %*c %d", &ppid); | ||
| std::fclose(stat_file); | ||
|
|
||
| if (ppid == parent_pid) { | ||
| pids.push_back(static_cast<pid_t>(std::atoi(entry->d_name))); | ||
| } | ||
| } | ||
| closedir(proc_dir); | ||
| return pids; | ||
| } | ||
|
|
||
| auto kill_process_tree(pid_t pid, int sig) -> void { | ||
| const auto children = get_child_pids(pid); | ||
| kill(pid, sig); | ||
| for (const pid_t child : children) { | ||
| kill_process_tree(child, sig); | ||
| } | ||
| } | ||
|
|
||
| auto kill_all_children(int sig) -> void { | ||
| const auto children = get_child_pids(getpid()); | ||
| for (const pid_t child : children) { | ||
| kill_process_tree(child, sig); | ||
| } | ||
| } | ||
|
|
||
| auto wait_all_children() -> void { | ||
| while (waitpid(-1, nullptr, 0) > 0 || errno == EINTR) { | ||
| } | ||
| } | ||
|
|
||
| auto signal_handler(int /*sig*/) -> void { | ||
| g_should_run = 0; | ||
| } | ||
|
|
||
| auto setup_signal_handlers() -> void { | ||
| struct sigaction sa{}; | ||
| sa.sa_handler = signal_handler; | ||
| sigemptyset(&sa.sa_mask); | ||
| sigaction(SIGTERM, &sa, nullptr); | ||
| sigaction(SIGINT, &sa, nullptr); | ||
| sigaction(SIGHUP, &sa, nullptr); | ||
| } | ||
|
|
||
| } // namespace | ||
|
|
||
| auto main(int argc, char** argv) -> int { | ||
| if (argc < 2) { | ||
| std::fprintf(stderr, "goggles-reaper: missing command\n"); | ||
| return EXIT_FAILURE; | ||
| } | ||
|
|
||
| const pid_t parent_pid = getppid(); | ||
|
|
||
| // Become subreaper - orphaned descendants reparent to us | ||
| prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0); | ||
|
|
||
| // Die when parent (goggles) dies | ||
| prctl(PR_SET_PDEATHSIG, SIGTERM, 0, 0, 0); | ||
| if (getppid() != parent_pid) { | ||
| return EXIT_FAILURE; | ||
| } | ||
|
|
||
| setup_signal_handlers(); | ||
|
|
||
| // Fork and exec target app | ||
| const pid_t child = fork(); | ||
| if (child < 0) { | ||
| std::fprintf(stderr, "goggles-reaper: fork failed: %s\n", std::strerror(errno)); | ||
| return EXIT_FAILURE; | ||
| } | ||
|
|
||
| if (child == 0) { | ||
| execvp(argv[1], &argv[1]); | ||
| std::fprintf(stderr, "goggles-reaper: exec failed: %s\n", std::strerror(errno)); | ||
| _exit(EXIT_FAILURE); | ||
| } | ||
|
|
||
| // Wait for primary child or signal | ||
| int status = 0; | ||
| while (g_should_run != 0) { | ||
| const pid_t pid = waitpid(child, &status, 0); | ||
| if (pid == child) { | ||
| break; | ||
| } | ||
| if (pid == -1 && errno != EINTR) { | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| // Cleanup all remaining children | ||
| g_should_run = 0; | ||
| kill_all_children(SIGTERM); | ||
| wait_all_children(); | ||
|
|
||
| if (WIFEXITED(status)) { | ||
| return WEXITSTATUS(status); | ||
| } | ||
| return EXIT_FAILURE; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for reaper_main.cpp and Linux-specific symbols
rg -n "reaper_main\.cpp|goggles-reaper|PR_SET_CHILD_SUBREAPER|PR_SET_PDEATHSIG" -C 3Repository: K1ngst0m/Goggles
Length of output: 10761
🏁 Script executed:
#!/bin/bash
# Search for Linux platform guards in CMake and code
rg -n "if.*LINUX|__linux__|CMAKE_SYSTEM_NAME.*Linux|CMAKE_SYSTEM" -C 2Repository: K1ngst0m/Goggles
Length of output: 402
🏁 Script executed:
#!/bin/bash
# Look for CMakeLists.txt files to understand build structure
find . -name "CMakeLists.txt" -type f | head -20Repository: K1ngst0m/Goggles
Length of output: 317
🏁 Script executed:
#!/bin/bash
# Check the structure of src/app directory
ls -la src/app/ 2>/dev/null || find . -path "*/src/app*" -type f -name "*.cpp" -o -name "*.h" -o -name "CMakeLists.txt" | head -20Repository: K1ngst0m/Goggles
Length of output: 823
🏁 Script executed:
#!/bin/bash
# Read the src/app/CMakeLists.txt to see the full build configuration
cat -n src/app/CMakeLists.txtRepository: K1ngst0m/Goggles
Length of output: 1107
🏁 Script executed:
#!/bin/bash
# Search for any platform guards around the reaper_main.cpp or goggles-reaper build
rg -n -B10 -A5 "add_executable.*reaper|goggles-reaper" src/app/CMakeLists.txtRepository: K1ngst0m/Goggles
Length of output: 525
🏁 Script executed:
#!/bin/bash
# Check if there are any #ifdef or platform checks in reaper_main.cpp itself
head -30 src/app/reaper_main.cpp | cat -nRepository: K1ngst0m/Goggles
Length of output: 1051
🏁 Script executed:
#!/bin/bash
# Check the main CMakeLists.txt for any platform-specific logic
cat -n CMakeLists.txt | head -50Repository: K1ngst0m/Goggles
Length of output: 1182
🏁 Script executed:
#!/bin/bash
# Check the main CMakeLists.txt to see if there's any CMAKE_SYSTEM_NAME check
rg -n "CMAKE_SYSTEM_NAME|if.*WIN|if.*APPLE|if.*UNIX|if.*Linux" CMakeLists.txtRepository: K1ngst0m/Goggles
Length of output: 42
🏁 Script executed:
#!/bin/bash
# Verify the full reaper_main.cpp for any platform guards
wc -l src/app/reaper_main.cpp
cat -n src/app/reaper_main.cpp | grep -E "ifdef|if.*linux|if.*LINUX" || echo "No #ifdef guards found"Repository: K1ngst0m/Goggles
Length of output: 110
🏁 Script executed:
#!/bin/bash
# Check if there are any platform-specific configurations in Dependencies.cmake or CompilerConfig.cmake
rg -n "CMAKE_SYSTEM_NAME|WIN32|APPLE|LINUX" cmake/Repository: K1ngst0m/Goggles
Length of output: 185
Add Linux platform guards to goggles-reaper build or source.
This file is inherently Linux-specific (/proc filesystem access, prctl() syscalls, fork()/execvp()). Currently, src/app/CMakeLists.txt (line 33) builds it unconditionally, and the source has no #ifdef __linux__ guards. Either add if(UNIX AND NOT APPLE) or if(CMAKE_SYSTEM_NAME STREQUAL Linux) to the CMake target, or wrap Linux-specific code sections with #ifdef __linux__ in the source.
🤖 Prompt for AI Agents
In @src/app/reaper_main.cpp around lines 1 - 141, The reaper is Linux-only;
either prevent building it on non-Linux or guard platform-specific APIs: update
the CMake target that currently adds src/app/reaper_main.cpp (around the
existing target in CMakeLists.txt) to build only when Linux (e.g., if(UNIX AND
NOT APPLE) or if(CMAKE_SYSTEM_NAME STREQUAL Linux)), OR wrap the Linux-specific
source body in #ifdef __linux__ / #endif so symbols using /proc, prctl
(PR_SET_CHILD_SUBREAPER, PR_SET_PDEATHSIG), fork(), execvp(), get_child_pids(),
kill_process_tree(), etc. are excluded on non-Linux platforms; choose one
approach and apply it consistently so the file is not compiled/used on non-Linux
systems.
| auto kill_process_tree(pid_t pid, int sig) -> void { | ||
| const auto children = get_child_pids(pid); | ||
| kill(pid, sig); | ||
| for (const pid_t child : children) { | ||
| kill_process_tree(child, sig); | ||
| } | ||
| } | ||
|
|
||
| auto kill_all_children(int sig) -> void { | ||
| const auto children = get_child_pids(getpid()); | ||
| for (const pid_t child : children) { | ||
| kill_process_tree(child, sig); | ||
| } | ||
| } | ||
|
|
||
| auto wait_all_children() -> void { | ||
| while (waitpid(-1, nullptr, 0) > 0 || errno == EINTR) { | ||
| } | ||
| } |
There was a problem hiding this comment.
Termination/wait strategy can hang indefinitely; needs timeout + SIGKILL escalation.
Even with perfect PID discovery, children can ignore SIGTERM. wait_all_children() currently blocks until all children exit, with no deadline.
Sketch of bounded wait + escalation (minimal deps using usleep)
-auto wait_all_children() -> void {
- while (waitpid(-1, nullptr, 0) > 0 || errno == EINTR) {
- }
-}
+auto reap_all_children_nonblocking() -> bool {
+ for (;;) {
+ const pid_t r = waitpid(-1, nullptr, WNOHANG);
+ if (r > 0) continue;
+ if (r == 0) return false; // still have children
+ return errno == ECHILD; // true => none left
+ }
+}
+
+auto wait_all_children_with_timeout_ms(int timeout_ms) -> bool {
+ const int step_us = 20 * 1000;
+ int waited_ms = 0;
+ while (waited_ms < timeout_ms) {
+ if (reap_all_children_nonblocking()) return true;
+ usleep(step_us);
+ waited_ms += 20;
+ }
+ return reap_all_children_nonblocking();
+}Then in cleanup:
- kill_all_children(SIGTERM);
- wait_all_children();
+ kill_all_children(SIGTERM);
+ if (!wait_all_children_with_timeout_ms(2000)) {
+ kill_all_children(SIGKILL);
+ (void)wait_all_children_with_timeout_ms(2000);
+ }Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In @src/app/reaper_main.cpp around lines 53 - 71, The current
wait_all_children() can block forever because children may ignore SIGTERM;
update the termination strategy: in kill_all_children() send SIGTERM to each
child (use kill_process_tree(pid, SIGTERM) so the function exists), then
implement a bounded wait in wait_all_children() that repeatedly loops with
non-blocking waitpid(..., WNOHANG) and usleep short intervals until either all
children have exited or a deadline (e.g., N seconds) is reached, handling EINTR
correctly; if the deadline is exceeded, escalate by calling
kill_process_tree(child, SIGKILL) for any still-running children and perform a
final blocking waitpid loop to reap them; ensure functions reference
get_child_pids, kill_process_tree, and kill_all_children so remaining PIDs can
be re-queried or tracked during escalation.
| const pid_t parent_pid = getppid(); | ||
|
|
||
| // Become subreaper - orphaned descendants reparent to us | ||
| prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0); | ||
|
|
||
| // Die when parent (goggles) dies | ||
| prctl(PR_SET_PDEATHSIG, SIGTERM, 0, 0, 0); | ||
| if (getppid() != parent_pid) { | ||
| return EXIT_FAILURE; | ||
| } | ||
|
|
||
| setup_signal_handlers(); |
There was a problem hiding this comment.
prctl(...) return values are unchecked (silent failure → incorrect guarantees).
If PR_SET_CHILD_SUBREAPER or PR_SET_PDEATHSIG fails, the core safety property may not hold.
Proposed error checks
- prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0);
+ if (prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0) != 0) {
+ std::fprintf(stderr, "goggles-reaper: PR_SET_CHILD_SUBREAPER failed: %s\n", std::strerror(errno));
+ return EXIT_FAILURE;
+ }
@@
- prctl(PR_SET_PDEATHSIG, SIGTERM, 0, 0, 0);
+ if (prctl(PR_SET_PDEATHSIG, SIGTERM, 0, 0, 0) != 0) {
+ std::fprintf(stderr, "goggles-reaper: PR_SET_PDEATHSIG failed: %s\n", std::strerror(errno));
+ return EXIT_FAILURE;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const pid_t parent_pid = getppid(); | |
| // Become subreaper - orphaned descendants reparent to us | |
| prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0); | |
| // Die when parent (goggles) dies | |
| prctl(PR_SET_PDEATHSIG, SIGTERM, 0, 0, 0); | |
| if (getppid() != parent_pid) { | |
| return EXIT_FAILURE; | |
| } | |
| setup_signal_handlers(); | |
| const pid_t parent_pid = getppid(); | |
| // Become subreaper - orphaned descendants reparent to us | |
| if (prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0) != 0) { | |
| std::fprintf(stderr, "goggles-reaper: PR_SET_CHILD_SUBREAPER failed: %s\n", std::strerror(errno)); | |
| return EXIT_FAILURE; | |
| } | |
| // Die when parent (goggles) dies | |
| if (prctl(PR_SET_PDEATHSIG, SIGTERM, 0, 0, 0) != 0) { | |
| std::fprintf(stderr, "goggles-reaper: PR_SET_PDEATHSIG failed: %s\n", std::strerror(errno)); | |
| return EXIT_FAILURE; | |
| } | |
| if (getppid() != parent_pid) { | |
| return EXIT_FAILURE; | |
| } | |
| setup_signal_handlers(); |
| auto get_reaper_path() -> std::string { | ||
| std::array<char, 4096> exe_path{}; | ||
| const ssize_t len = readlink("/proc/self/exe", exe_path.data(), exe_path.size() - 1); | ||
| if (len <= 0) { | ||
| return "goggles-reaper"; | ||
| } | ||
| exe_path[static_cast<size_t>(len)] = '\0'; | ||
|
|
||
| std::string path(exe_path.data()); | ||
| // Test binary is in build/debug/tests/, reaper is in build/debug/bin/ | ||
| const auto pos = path.rfind('/'); | ||
| if (pos != std::string::npos) { | ||
| path = path.substr(0, pos); | ||
| const auto pos2 = path.rfind('/'); | ||
| if (pos2 != std::string::npos) { | ||
| path = path.substr(0, pos2); | ||
| } | ||
| } | ||
| return path + "/bin/goggles-reaper"; | ||
| } |
There was a problem hiding this comment.
Test reaper path resolution is brittle (assumes a specific build directory layout).
This will likely break in CI/packaging/out-of-tree builds. Prefer a CMake-provided absolute path (compile-time define) or an env var fallback; also validate the resolved path is executable before using it.
Proposed hardening (existence/executable check + fallback)
+#include <sys/stat.h>
+
auto get_reaper_path() -> std::string {
@@
- return path + "/bin/goggles-reaper";
+ const std::string candidate = path + "/bin/goggles-reaper";
+ if (access(candidate.c_str(), X_OK) == 0) {
+ return candidate;
+ }
+ // Fallback to PATH (lets CTest env control discovery)
+ return "goggles-reaper";
}| const pid_t child = fork(); | ||
| if (child < 0) { | ||
| std::fprintf(stderr, "fork failed: %s\n", std::strerror(errno)); | ||
| return EXIT_TEST_FAIL; | ||
| } | ||
|
|
||
| if (child == 0) { | ||
| // Child: spawn goggles-reaper with sleep 60 | ||
| std::array<char*, 4> argv = {const_cast<char*>(reaper_path.c_str()), | ||
| const_cast<char*>("sleep"), const_cast<char*>("60"), nullptr}; | ||
|
|
||
| pid_t reaper_pid = -1; | ||
| const int rc = | ||
| posix_spawn(&reaper_pid, reaper_path.c_str(), nullptr, nullptr, argv.data(), environ); | ||
| if (rc != 0) { | ||
| std::fprintf(stderr, "posix_spawn failed: %s\n", std::strerror(rc)); | ||
| _exit(EXIT_TEST_FAIL); | ||
| } | ||
|
|
||
| // Wait a bit for reaper to start | ||
| std::this_thread::sleep_for(std::chrono::milliseconds(200)); | ||
|
|
||
| // Now exit - reaper should receive SIGTERM and kill its children | ||
| _exit(EXIT_TEST_PASS); | ||
| } |
There was a problem hiding this comment.
Child-side spawn uses environ without a portable declaration and relies on fixed sleeps.
- In C++,
environmay needextern char** environ;(depends on libc headers). - The fixed
200msdelay is a race; prefer a readiness check (e.g., poll for reaper child existence) or a pipe handshake from the spawned process (if you add a test-only “ready” signal).
Minimal portability tweak
+#if defined(__linux__) || defined(__APPLE__)
+extern char** environ;
+#endifCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In @tests/app/test_child_death_signal.cpp around lines 52 - 76, The child branch
uses posix_spawn(..., environ) without declaring extern char** environ
(portability issue) and relies on a fixed std::this_thread::sleep_for(200ms)
race; fix by adding a portable usage of the environment to the spawn call
(declare extern char** environ at top of the test file or pass nullptr if
inheriting env is acceptable) and replace the fixed sleep_for wait with a
deterministic readiness check: after posix_spawn, poll for the spawned reaper
child (reaper_pid) existence or use a pipe/handshake where the child (spawned
via posix_spawn with argv from reaper_path) writes a ready byte before the
parent proceeds; update the child branch around fork()/posix_spawn() and the
sleep_for() usage (symbols: child, reaper_pid, posix_spawn, reaper_path,
environ, std::this_thread::sleep_for) to use the chosen readiness mechanism.
| // Parent: wait for child to exit | ||
| int status = 0; | ||
| waitpid(child, &status, 0); | ||
|
|
||
| // Give reaper time to clean up | ||
| std::this_thread::sleep_for(std::chrono::milliseconds(500)); | ||
|
|
There was a problem hiding this comment.
waitpid(child, ...) should handle EINTR and validate child exit status.
Right now a signal can cause waitpid to fail and the test will proceed as if setup succeeded.
Proposed EINTR-safe wait + status check
- int status = 0;
- waitpid(child, &status, 0);
+ int status = 0;
+ while (waitpid(child, &status, 0) == -1 && errno == EINTR) {}
+ if (!WIFEXITED(status) || WEXITSTATUS(status) != EXIT_TEST_PASS) {
+ std::fprintf(stderr, "child setup failed\n");
+ return EXIT_TEST_FAIL;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Parent: wait for child to exit | |
| int status = 0; | |
| waitpid(child, &status, 0); | |
| // Give reaper time to clean up | |
| std::this_thread::sleep_for(std::chrono::milliseconds(500)); | |
| // Parent: wait for child to exit | |
| int status = 0; | |
| while (waitpid(child, &status, 0) == -1 && errno == EINTR) {} | |
| if (!WIFEXITED(status) || WEXITSTATUS(status) != EXIT_TEST_PASS) { | |
| std::fprintf(stderr, "child setup failed\n"); | |
| return EXIT_TEST_FAIL; | |
| } | |
| // Give reaper time to clean up | |
| std::this_thread::sleep_for(std::chrono::milliseconds(500)); |
🤖 Prompt for AI Agents
In @tests/app/test_child_death_signal.cpp around lines 78 - 84, The waitpid call
can fail with EINTR and the test doesn't validate the child's exit, so replace
the direct waitpid(child, &status, 0) with an EINTR-safe loop that retries on
errno==EINTR, checks the return value (ensuring it equals child), and then
validate the child's termination using WIFEXITED/WEXITSTATUS or WIFSIGNALED to
assert expected outcome before continuing; reference the existing variables and
symbols waitpid, child, and status in tests/app/test_child_death_signal.cpp and
ensure the test fails loudly if waitpid returns an unexpected error or the child
exited abnormally.
| // Check if any sleep process is still running (there shouldn't be) | ||
| FILE* pgrep = popen("pgrep -x sleep 2>/dev/null | wc -l", "r"); | ||
| if (pgrep == nullptr) { | ||
| std::fprintf(stderr, "popen failed\n"); | ||
| return EXIT_TEST_FAIL; | ||
| } | ||
|
|
||
| int sleep_count = 0; | ||
| std::fscanf(pgrep, "%d", &sleep_count); | ||
| pclose(pgrep); | ||
|
|
||
| if (sleep_count == 0) { | ||
| std::printf("PASS: No orphaned sleep processes found\n"); | ||
| return EXIT_TEST_PASS; | ||
| } | ||
|
|
||
| std::fprintf(stderr, "FAIL: Found %d sleep processes still running\n", sleep_count); | ||
| return EXIT_TEST_FAIL; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find and examine the test file
fd test_child_death_signal.cppRepository: K1ngst0m/Goggles
Length of output: 97
🏁 Script executed:
# Look for Linux gating and test registration
rg -n "test_child_death_signal|child_death_signal" --hidden -C3Repository: K1ngst0m/Goggles
Length of output: 1415
🏁 Script executed:
# Check for Linux-specific guards in the test directory
rg -n "__linux__|CMAKE_SYSTEM_NAME" tests -C2Repository: K1ngst0m/Goggles
Length of output: 42
🏁 Script executed:
cat -n tests/app/test_child_death_signal.cppRepository: K1ngst0m/Goggles
Length of output: 3937
🏁 Script executed:
# Verify the test is registered without Linux gating and check for any recent platform-specific work
grep -A10 "goggles_test_child_death_signal" tests/CMakeLists.txt | head -20Repository: K1ngst0m/Goggles
Length of output: 412
🏁 Script executed:
# Check if there are any other tests using similar approach that might show best practices
rg -l "pgrep|popen.*|wc -l" tests/Repository: K1ngst0m/Goggles
Length of output: 97
Fix non-hermetic process verification: use unique helper binary instead of pgrep -x sleep.
The test's use of pgrep -x sleep will fail on shared CI runners or dev machines where unrelated processes are named sleep. This test must verify only the specific sleep spawned by goggles-reaper, not all system sleep processes. Replace the generic sleep binary with a test-specific helper (e.g., /tmp/goggles_test_sleep_$$ or similar with a unique identifier) to ensure the process count verification is hermetic.
Use fork+execvp with prctl(PR_SET_PDEATHSIG) to ensure child process receives SIGTERM when parent dies unexpectedly.
1923d18 to
0882d0a
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/app/CMakeLists.txt (1)
32-37: Consider enabling sanitizers for goggles-reaper during development.The main
gogglesexecutable has sanitizers enabled (line 29), butgoggles-reaperdoes not. Given that the reaper handles critical process cleanup logic, enabling sanitizers would help catch memory errors, undefined behavior, and other issues during development.♻️ Proposed addition
goggles_enable_clang_tidy(goggles-reaper) +goggles_enable_sanitizers(goggles-reaper)src/app/main.cpp (2)
25-34: Consider extracting the duplicated path resolution logic.This function is duplicated in
src/app/test_child_death_signal.cpp(lines 18-36), but with different path resolution logic. The test version navigates up two directory levels, while this version goes up one level. Consider extracting to a shared utility if the logic should be identical, or clearly document why they differ.
27-29: Handle potential path truncation edge case.If the executable path is exactly 4095 bytes or longer,
readlink()will truncate without error (returning 4095). While extremely rare, you could detect this by checkinglen == exe_path.size() - 1.♻️ Optional defensive check
- if (len <= 0) { + if (len <= 0 || len >= static_cast<ssize_t>(exe_path.size() - 1)) { return "goggles-reaper"; }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
openspec/changes/fix-child-process-cleanup/proposal.mdopenspec/changes/fix-child-process-cleanup/specs/app-window/spec.mdopenspec/changes/fix-child-process-cleanup/tasks.mdsrc/app/CMakeLists.txtsrc/app/main.cppsrc/app/reaper_main.cpptests/CMakeLists.txttests/app/test_child_death_signal.cpp
🚧 Files skipped from review as they are similar to previous changes (5)
- openspec/changes/fix-child-process-cleanup/tasks.md
- openspec/changes/fix-child-process-cleanup/specs/app-window/spec.md
- tests/app/test_child_death_signal.cpp
- openspec/changes/fix-child-process-cleanup/proposal.md
- tests/CMakeLists.txt
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-12-29T15:44:30.520Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: openspec/project.md:0-0
Timestamp: 2025-12-29T15:44:30.520Z
Learning: Applies to openspec/src/**/*.{cpp,hpp} : Use inter-thread communication via `goggles::util::SPSCQueue` (lock-free SPSC queue) for task parallelism
Applied to files:
src/app/main.cpp
📚 Learning: 2026-01-07T07:20:57.054Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: docs/project_policies.md:0-0
Timestamp: 2026-01-07T07:20:57.054Z
Learning: Applies to docs/**/*.{cpp,hpp} : Never use raw `int` file descriptors when possible. Use `goggles::util::UniqueFd` for all owned file descriptors. Raw `int` acceptable only at uncontrollable boundaries (syscall returns, IPC receive). Wrap immediately after receive. API boundaries may use `int` with clear documentation.
Applied to files:
src/app/main.cppsrc/app/reaper_main.cpp
📚 Learning: 2026-01-07T07:20:57.054Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: docs/project_policies.md:0-0
Timestamp: 2026-01-07T07:20:57.054Z
Learning: Applies to docs/src/render/**/*.cpp : Single-threaded render loop on main thread by default. Render backend (`goggles::render`) runs on main thread. Pipeline execution runs on main thread. Job system for non-render work on worker threads. Threading introduced only when profiling justifies it (main thread CPU consistently >8ms).
Applied to files:
src/app/CMakeLists.txt
📚 Learning: 2026-01-07T07:20:57.053Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: docs/project_policies.md:0-0
Timestamp: 2026-01-07T07:20:57.053Z
Learning: Applies to docs/**/*.{cpp,hpp} : Use top-level namespace `goggles`. Use module namespaces `goggles::<module>` (e.g., `goggles::capture`, `goggles::render`). Use sub-module namespaces `goggles::<module>::<submodule>`. Never use `using namespace` in headers. Prefer explicit namespace qualification or scoped `using` declarations.
Applied to files:
src/app/CMakeLists.txt
🧬 Code graph analysis (1)
src/app/main.cpp (1)
src/util/error.hpp (1)
make_error(50-52)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and Test (test preset)
- GitHub Check: Static Analysis (clang-tidy)
🔇 Additional comments (11)
src/app/main.cpp (2)
5-5: LGTM!The new includes are properly used for path resolution in
get_reaper_path().Also applies to: 13-13
87-106: LGTM. The transition fromposix_spawnptoposix_spawnwith an explicit reaper path is correct. The lifetime ofreaper_pathand the command strings are properly managed—both remain valid through theposix_spawncall, which reads (not stores) the argv array.Note: While functionally safe, the const_cast pattern could be clarified by ensuring
commandis documented as a non-temporary reference parameter to prevent future misuse of this pattern.src/app/reaper_main.cpp (9)
16-19: LGTM!Proper use of
volatile sig_atomic_tfor signal-safe communication between signal handler and main loop.
20-65: LGTM!The
/procscanning implementation is robust:
- Properly handles errors at each step (opendir, fopen, fgets failures)
- Correctly closes all resources (FILE*, DIR*)
- Handles edge cases like processes disappearing during iteration
- Correctly parses
/proc/PID/statformat by finding the last)to handle command names with parentheses
67-73: LGTM!The recursive process tree termination is correct. Killing the parent before recursing prevents it from spawning additional children. Any orphaned descendants will reparent to this reaper process (due to
PR_SET_CHILD_SUBREAPER), ensuring complete cleanup.
75-80: LGTM!Clean implementation that terminates all immediate children and their descendants.
82-85: LGTM!Correctly reaps all children with proper
EINTRhandling.
87-98: LGTM!Signal handlers are correctly implemented and signal-safe. The handler only modifies
sig_atomic_t, which is appropriate for signal context.
113-117: LGTM! Correct PR_SET_PDEATHSIG race condition handling.The implementation correctly handles the race condition where the parent could die between
getppid()andprctl(). By checking thatgetppid()returns the same PID after settingPR_SET_PDEATHSIG, you ensure the signal will be delivered when the parent dies.
134-144: LGTM! Robust wait loop with proper signal handling.The wait loop correctly handles:
- Normal child exit (breaking immediately)
- Signal interruption (
EINTRhandled, loop continues to checkg_should_run)- Signal-triggered shutdown (checking
g_should_runafter each iteration)
146-154: LGTM! Complete cleanup ensures no orphaned processes.The cleanup path ensures:
- All remaining children receive
SIGTERM- All children are properly reaped via
wait_all_children()- Primary child's exit status is correctly propagated
Combined with
PR_SET_CHILD_SUBREAPER, this guarantees complete process tree termination.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @src/app/reaper_main.cpp:
- Around line 91-98: setup_signal_handlers currently ignores sigaction() return
values so handlers may not be installed; update the function to check each
sigaction() call for failure, retrieve errno, and handle errors (e.g., log a
descriptive message including strerror(errno) via your logging mechanism or
perror and abort/exit with non‑zero status or throw an exception) to ensure the
process doesn’t silently continue without handlers (especially since
PR_SET_PDEATHSIG sends SIGTERM). Specifically, wrap the three sigaction calls
for SIGTERM, SIGINT and SIGHUP in checks, and if any returns -1 perform the
chosen error handling path so failures are visible and the program stops or
propagates the error.
- Around line 108-118: The prctl() calls for PR_SET_CHILD_SUBREAPER and
PR_SET_PDEATHSIG are currently unchecked; modify the code around
prctl(PR_SET_CHILD_SUBREAPER, ...) and prctl(PR_SET_PDEATHSIG, ...) to check
each return value (non-zero indicates failure), emit an error (e.g., via perror
or process logger) that includes which prctl failed and errno, and return
EXIT_FAILURE if either call fails before proceeding to the getppid() parent
check; keep the existing parent_pid/getppid logic and only proceed when both
prctl calls succeed.
🧹 Nitpick comments (1)
src/app/reaper_main.cpp (1)
20-65: Consider RAII wrapper for DIR* resource management.The DIR* pointer is correctly managed with proper closedir() cleanup, but consider wrapping it in an RAII helper (e.g., unique_ptr with custom deleter) to make the lifetime management more robust and eliminate the need to track manual cleanup.
♻️ Example RAII pattern
struct DirCloser { void operator()(DIR* d) const { if (d) closedir(d); } }; using UniqueDIR = std::unique_ptr<DIR, DirCloser>; auto get_child_pids(pid_t parent_pid) -> std::vector<pid_t> { std::vector<pid_t> pids; UniqueDIR proc_dir(opendir("/proc")); if (!proc_dir) { return pids; } // ... rest of function, no manual closedir needed
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/app/reaper_main.cpp
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-07T07:20:57.054Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: docs/project_policies.md:0-0
Timestamp: 2026-01-07T07:20:57.054Z
Learning: Applies to docs/**/*.{cpp,hpp} : Never use raw `int` file descriptors when possible. Use `goggles::util::UniqueFd` for all owned file descriptors. Raw `int` acceptable only at uncontrollable boundaries (syscall returns, IPC receive). Wrap immediately after receive. API boundaries may use `int` with clear documentation.
Applied to files:
src/app/reaper_main.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and Test (test preset)
- GitHub Check: Static Analysis (clang-tidy)
🔇 Additional comments (7)
src/app/reaper_main.cpp (7)
1-19: LGTM: Clean setup and appropriate signal-safe global.The includes are appropriate for Linux process management, and
volatile sig_atomic_tis correctly used for the signal handler flag.
67-73: LGTM: Recursive termination with subreaper safety.The function correctly performs recursive process tree termination. The order (kill parent, then recurse to children) is safe because this process is a subreaper: any orphaned descendants reparent to us and are killed later by
kill_all_children()in main. The architecture handles races where processes spawn children during traversal.
75-85: LGTM: Correct cleanup and reaping logic.Both functions correctly handle child process cleanup. The
wait_all_children()loop properly handles EINTR and reaps all children including any that may have accumulated as zombies during the main wait loop.
102-106: LGTM: Clear argument validation.
119-132: LGTM: Correct fork/exec pattern.The fork and exec logic is correctly implemented. Using
_exit()in the child after exec failure is correct (avoids flushing parent's stdio buffers), and theexecvp()arguments are properly structured.
134-144: LGTM: Robust wait loop with signal awareness.The wait loop correctly waits for the primary child while handling signals. Any descendants that reparent to the reaper during this phase will be reaped by
wait_all_children()at line 149.
146-154: LGTM: Complete cleanup and proper exit status propagation.The cleanup logic ensures all remaining descendants are terminated with SIGKILL (as intended per commit message), and the child's exit status is correctly propagated to the caller.
| auto setup_signal_handlers() -> void { | ||
| struct sigaction sa{}; | ||
| sa.sa_handler = signal_handler; | ||
| sigemptyset(&sa.sa_mask); | ||
| sigaction(SIGTERM, &sa, nullptr); | ||
| sigaction(SIGINT, &sa, nullptr); | ||
| sigaction(SIGHUP, &sa, nullptr); | ||
| } |
There was a problem hiding this comment.
Check sigaction return values.
If sigaction() fails, signal handlers won't be installed. Since PR_SET_PDEATHSIG sends SIGTERM (line 114), failure to install the SIGTERM handler means the reaper would terminate without cleaning up children when the parent dies, defeating the core purpose of this watchdog.
🛡️ Proposed fix
auto setup_signal_handlers() -> void {
struct sigaction sa{};
sa.sa_handler = signal_handler;
sigemptyset(&sa.sa_mask);
- sigaction(SIGTERM, &sa, nullptr);
- sigaction(SIGINT, &sa, nullptr);
- sigaction(SIGHUP, &sa, nullptr);
+ if (sigaction(SIGTERM, &sa, nullptr) != 0) {
+ std::fprintf(stderr, "goggles-reaper: sigaction SIGTERM failed: %s\n", std::strerror(errno));
+ std::abort();
+ }
+ if (sigaction(SIGINT, &sa, nullptr) != 0) {
+ std::fprintf(stderr, "goggles-reaper: sigaction SIGINT failed: %s\n", std::strerror(errno));
+ std::abort();
+ }
+ if (sigaction(SIGHUP, &sa, nullptr) != 0) {
+ std::fprintf(stderr, "goggles-reaper: sigaction SIGHUP failed: %s\n", std::strerror(errno));
+ std::abort();
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| auto setup_signal_handlers() -> void { | |
| struct sigaction sa{}; | |
| sa.sa_handler = signal_handler; | |
| sigemptyset(&sa.sa_mask); | |
| sigaction(SIGTERM, &sa, nullptr); | |
| sigaction(SIGINT, &sa, nullptr); | |
| sigaction(SIGHUP, &sa, nullptr); | |
| } | |
| auto setup_signal_handlers() -> void { | |
| struct sigaction sa{}; | |
| sa.sa_handler = signal_handler; | |
| sigemptyset(&sa.sa_mask); | |
| if (sigaction(SIGTERM, &sa, nullptr) != 0) { | |
| std::fprintf(stderr, "goggles-reaper: sigaction SIGTERM failed: %s\n", std::strerror(errno)); | |
| std::abort(); | |
| } | |
| if (sigaction(SIGINT, &sa, nullptr) != 0) { | |
| std::fprintf(stderr, "goggles-reaper: sigaction SIGINT failed: %s\n", std::strerror(errno)); | |
| std::abort(); | |
| } | |
| if (sigaction(SIGHUP, &sa, nullptr) != 0) { | |
| std::fprintf(stderr, "goggles-reaper: sigaction SIGHUP failed: %s\n", std::strerror(errno)); | |
| std::abort(); | |
| } | |
| } |
🤖 Prompt for AI Agents
In @src/app/reaper_main.cpp around lines 91 - 98, setup_signal_handlers
currently ignores sigaction() return values so handlers may not be installed;
update the function to check each sigaction() call for failure, retrieve errno,
and handle errors (e.g., log a descriptive message including strerror(errno) via
your logging mechanism or perror and abort/exit with non‑zero status or throw an
exception) to ensure the process doesn’t silently continue without handlers
(especially since PR_SET_PDEATHSIG sends SIGTERM). Specifically, wrap the three
sigaction calls for SIGTERM, SIGINT and SIGHUP in checks, and if any returns -1
perform the chosen error handling path so failures are visible and the program
stops or propagates the error.
| const pid_t parent_pid = getppid(); | ||
|
|
||
| // Become subreaper - orphaned descendants reparent to us | ||
| prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0); | ||
|
|
||
| // Die when parent (goggles) dies | ||
| prctl(PR_SET_PDEATHSIG, SIGTERM, 0, 0, 0); | ||
| if (getppid() != parent_pid) { | ||
| return EXIT_FAILURE; | ||
| } | ||
|
|
There was a problem hiding this comment.
Check prctl return values.
Both prctl() calls are critical to the reaper's functionality and their return values must be checked:
- If
PR_SET_CHILD_SUBREAPERfails: orphaned descendants won't reparent to the reaper, causing incomplete cleanup. - If
PR_SET_PDEATHSIGfails: the reaper won't terminate when the parent dies, leaving processes running headlessly.
🐛 Proposed fix
const pid_t parent_pid = getppid();
// Become subreaper - orphaned descendants reparent to us
- prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0);
+ if (prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0) != 0) {
+ std::fprintf(stderr, "goggles-reaper: PR_SET_CHILD_SUBREAPER failed: %s\n", std::strerror(errno));
+ return EXIT_FAILURE;
+ }
// Die when parent (goggles) dies
- prctl(PR_SET_PDEATHSIG, SIGTERM, 0, 0, 0);
+ if (prctl(PR_SET_PDEATHSIG, SIGTERM, 0, 0, 0) != 0) {
+ std::fprintf(stderr, "goggles-reaper: PR_SET_PDEATHSIG failed: %s\n", std::strerror(errno));
+ return EXIT_FAILURE;
+ }
if (getppid() != parent_pid) {
return EXIT_FAILURE;
}🤖 Prompt for AI Agents
In @src/app/reaper_main.cpp around lines 108 - 118, The prctl() calls for
PR_SET_CHILD_SUBREAPER and PR_SET_PDEATHSIG are currently unchecked; modify the
code around prctl(PR_SET_CHILD_SUBREAPER, ...) and prctl(PR_SET_PDEATHSIG, ...)
to check each return value (non-zero indicates failure), emit an error (e.g.,
via perror or process logger) that includes which prctl failed and errno, and
return EXIT_FAILURE if either call fails before proceeding to the getppid()
parent check; keep the existing parent_pid/getppid logic and only proceed when
both prctl calls succeed.
Summary
goggles-reaperwatchdog process for reliable child cleanupposix_spawn()from multi-threaded goggles to spawn single-threaded reaperPR_SET_CHILD_SUBREAPER+PR_SET_PDEATHSIGfor robust cleanup/procscanningFixes: When goggles crashes, the spawned child process continues running headlessly.
Architecture
Benefits:
PR_SET_PDEATHSIGworks reliablyPR_SET_CHILD_SUBREAPERcatches orphaned grandchildrenTest plan
goggles_test_child_death_signaltest passesSummary by CodeRabbit
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.