Skip to content

fix(app): terminate child process on parent crash via reaper#41

Merged
BANANASJIM merged 6 commits intomainfrom
fix/child-process-cleanup
Jan 9, 2026
Merged

fix(app): terminate child process on parent crash via reaper#41
BANANASJIM merged 6 commits intomainfrom
fix/child-process-cleanup

Conversation

@BANANASJIM
Copy link
Copy Markdown
Collaborator

@BANANASJIM BANANASJIM commented Jan 8, 2026

Summary

  • Introduce goggles-reaper watchdog process for reliable child cleanup
  • Use posix_spawn() from multi-threaded goggles to spawn single-threaded reaper
  • Reaper uses PR_SET_CHILD_SUBREAPER + PR_SET_PDEATHSIG for robust cleanup
  • Recursively kill process tree via /proc scanning

Fixes: When goggles crashes, the spawned child process continues running headlessly.

Architecture

Goggles (main process, multi-threaded)
    ↓ posix_spawn("goggles-reaper") [SAFE: immediate exec]
goggles-reaper (watchdog, single-threaded)
    ↓ PR_SET_CHILD_SUBREAPER + PR_SET_PDEATHSIG
    ↓ fork() + exec(target_app) [SAFE: no threads]
Target Application

Benefits:

  • Single-threaded reaper ensures PR_SET_PDEATHSIG works reliably
  • PR_SET_CHILD_SUBREAPER catches orphaned grandchildren
  • Recursive process tree kill ensures all descendants are terminated

Test plan

  • Build passes
  • All existing tests pass
  • New goggles_test_child_death_signal test passes
  • Manual test: kill goggles with SIGKILL, verify target app terminates

Summary by CodeRabbit

  • New Features

    • Added a watchdog reaper so child processes are reliably terminated and cleaned up when the parent exits or crashes.
  • Documentation

    • Added specification and task documentation describing child-process termination scenarios and the reaper-based approach.
  • Tests

    • Added an integration test validating descendant process termination and cleanup when the parent process dies.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 8, 2026

📝 Walkthrough

Walkthrough

Adds a new single-threaded watchdog executable (goggles-reaper) and updates the launcher to spawn it. The reaper sets PR_SET_CHILD_SUBREAPER and PR_SET_PDEATHSIG, forks+execs the target, reaps descendants, and terminates process trees when the parent dies.

Changes

Cohort / File(s) Summary
Specification & Tasks
openspec/changes/fix-child-process-cleanup/proposal.md, openspec/changes/fix-child-process-cleanup/specs/app-window/spec.md, openspec/changes/fix-child-process-cleanup/tasks.md
Add proposal, spec, and task docs describing reaper design, PDEATHSIG/subreaper requirements, kill_process_tree behavior, and test plans.
Launcher
src/app/main.cpp
Resolve path to goggles-reaper, build argv to invoke reaper with target command, and switch to posix_spawn using the resolved reaper path.
Reaper
src/app/reaper_main.cpp
New single-threaded watchdog: sets PR_SET_CHILD_SUBREAPER and PR_SET_PDEATHSIG, installs signal handlers, forks+execs target, scans /proc to discover children, kills process trees on shutdown, and returns target exit status.
Build
src/app/CMakeLists.txt
Add goggles-reaper executable target, set runtime output directory, and enable clang-tidy for the new target.
Tests
tests/app/test_child_death_signal.cpp, tests/CMakeLists.txt
Add Linux integration test that validates descendants are terminated when parent dies; CMake entries to build/register the test and enable sanitizers.

Sequence Diagram(s)

sequenceDiagram
    participant Launcher as Launcher (goggles main)
    participant Kernel as Kernel
    participant Reaper as goggles-reaper
    participant Target as Target App

    Launcher->>Kernel: posix_spawn(goggles-reaper, args)
    Kernel-->>Launcher: reaper_pid
    Kernel-->>Reaper: start (single-threaded)

    Reaper->>Kernel: prctl(PR_SET_CHILD_SUBREAPER)
    Reaper->>Kernel: prctl(PR_SET_PDEATHSIG, SIGTERM)
    Reaper->>Kernel: fork() -> child_pid
    Reaper->>Target: execvp(target, args)
    Kernel-->>Target: run target app

    Note over Launcher,Reaper: Launcher exits/crashes
    Kernel->>Reaper: deliver SIGTERM (PDEATHSIG)
    Reaper->>Kernel: kill_process_tree(primary and descendants)
    Reaper->>Kernel: wait_all_children()
    Reaper-->>Kernel: exit (propagate target exit code or failure)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • K1ngst0m

Poem

🐰 I tended forks where shadows creep,
I guard the hops while others sleep.
A tiny reaper, brisk and spry,
I hush the orphans with a sigh.
🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: introducing a reaper process to terminate child processes when the parent (Goggles) crashes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review bot commented Jan 8, 2026

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Ignored syscall errors: The child process does not check/handle failures from prctl(PR_SET_PDEATHSIG, SIGTERM) or
execvpe() and exits without actionable diagnostic context on these failure paths.

Referred Code
if (pid == 0) {
    // Child: set death signal before exec so we die if parent crashes
    prctl(PR_SET_PDEATHSIG, SIGTERM);
    // Handle race: if parent already died, getppid() returns 1 (init)
    if (getppid() != parent_pid) {
        _exit(EXIT_FAILURE);
    }
    execvpe(argv[0], argv.data(), envp.data());
    _exit(EXIT_FAILURE);
}

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review bot commented Jan 8, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
The change introduces a Linux-specific dependency
Suggestion Impact:The commit removed the Linux-specific prctl dependency entirely (dropped and the prctl(PR_SET_PDEATHSIG) call) and reworked process spawning to use posix_spawn plus an external "goggles-reaper" helper. The integration test was also rewritten to validate the reaper behavior instead of prctl-based parent-death signaling, improving portability without needing #ifdef __linux__.

code diff:

+#include <filesystem>
+#include <spawn.h>
 #include <string>
 #include <string_view>
-#include <sys/prctl.h>
 #include <sys/wait.h>
 #include <thread>
 #include <unistd.h>
@@ -20,6 +22,17 @@
 #include <util/logging.hpp>
 #include <vector>
 
+static auto get_reaper_path() -> std::string {
+    std::array<char, 4096> exe_path{};
+    const ssize_t len = readlink("/proc/self/exe", exe_path.data(), exe_path.size() - 1);
+    if (len <= 0) {
+        return "goggles-reaper";
+    }
+    exe_path[static_cast<size_t>(len)] = '\0';
+    const std::filesystem::path exe_dir = std::filesystem::path(exe_path.data()).parent_path();
+    return (exe_dir / "goggles-reaper").string();
+}
+
 static auto spawn_target_app(const std::vector<std::string>& command,
                              const std::string& x11_display, const std::string& wayland_display,
                              uint32_t app_width, uint32_t app_height, const std::string& gpu_uuid)
@@ -71,29 +84,23 @@
     }
     envp.push_back(nullptr);
 
+    // Build argv: goggles-reaper <target_command...>
+    const std::string reaper_path = get_reaper_path();
     std::vector<char*> argv;
-    argv.reserve(command.size() + 1);
+    argv.reserve(command.size() + 2);
+    argv.push_back(const_cast<char*>(reaper_path.c_str()));
     for (const auto& arg : command) {
         argv.push_back(const_cast<char*>(arg.c_str()));
     }
     argv.push_back(nullptr);
 
-    const pid_t parent_pid = getpid();
-    const pid_t pid = fork();
-    if (pid < 0) {
+    pid_t pid = -1;
+    const int rc =
+        posix_spawn(&pid, reaper_path.c_str(), nullptr, nullptr, argv.data(), envp.data());
+    if (rc != 0) {
         return goggles::make_error<pid_t>(goggles::ErrorCode::unknown_error,
-                                          std::string("fork() failed: ") + std::strerror(errno));
-    }
-
-    if (pid == 0) {
-        // Child: set death signal before exec so we die if parent crashes
-        prctl(PR_SET_PDEATHSIG, SIGTERM);
-        // Handle race: if parent already died, getppid() returns 1 (init)
-        if (getppid() != parent_pid) {
-            _exit(EXIT_FAILURE);
-        }
-        execvpe(argv[0], argv.data(), envp.data());
-        _exit(EXIT_FAILURE);
+                                          std::string("posix_spawn() failed: ") +
+                                              std::strerror(rc));
     }
 
     return pid;

# File: tests/app/test_child_death_signal.cpp
@@ -1,12 +1,12 @@
-// Integration test: validates child process receives SIGTERM when parent dies
-// Uses prctl(PR_SET_PDEATHSIG) which is Linux-specific
+// Integration test: validates goggles-reaper kills children when parent dies
 
+#include <array>
 #include <cerrno>
 #include <chrono>
-#include <csignal>
 #include <cstdio>
 #include <cstring>
-#include <sys/prctl.h>
+#include <spawn.h>
+#include <string>
 #include <sys/wait.h>
 #include <thread>
 #include <unistd.h>
@@ -15,94 +15,88 @@
 
 constexpr int EXIT_TEST_PASS = 0;
 constexpr int EXIT_TEST_FAIL = 1;
-constexpr int CHILD_SIGNAL_RECEIVED = 42;
 
-volatile sig_atomic_t g_signal_received = 0;
+auto get_reaper_path() -> std::string {
+    std::array<char, 4096> exe_path{};
+    const ssize_t len = readlink("/proc/self/exe", exe_path.data(), exe_path.size() - 1);
+    if (len <= 0) {
+        return "goggles-reaper";
+    }
+    exe_path[static_cast<size_t>(len)] = '\0';
 
-void signal_handler(int sig) {
-    if (sig == SIGTERM) {
-        g_signal_received = 1;
+    std::string path(exe_path.data());
+    const auto pos = path.rfind('/');
+    if (pos != std::string::npos) {
+        path = path.substr(0, pos);
+        const auto pos2 = path.rfind('/');
+        if (pos2 != std::string::npos) {
+            path = path.substr(0, pos2);
+        }
     }
-}
-
-// Child process that waits for SIGTERM from parent death
-[[noreturn]] void run_grandchild(pid_t expected_parent) {
-    signal(SIGTERM, signal_handler);
-    prctl(PR_SET_PDEATHSIG, SIGTERM);
-
-    if (getppid() != expected_parent) {
-        _exit(EXIT_TEST_FAIL);
-    }
-
-    for (int i = 0; i < 100 && g_signal_received == 0; ++i) {
-        std::this_thread::sleep_for(std::chrono::milliseconds(50));
-    }
-
-    _exit(g_signal_received != 0 ? CHILD_SIGNAL_RECEIVED : EXIT_TEST_FAIL);
-}
-
-// Intermediate process that spawns grandchild then exits
-[[noreturn]] void run_child() {
-    const pid_t my_pid = getpid();
-    const pid_t grandchild = fork();
-
-    if (grandchild < 0) {
-        std::fprintf(stderr, "fork grandchild failed: %s\n", std::strerror(errno));
-        _exit(EXIT_TEST_FAIL);
-    }
-
-    if (grandchild == 0) {
-        run_grandchild(my_pid);
-    }
-
-    // Parent of grandchild: wait briefly then exit to trigger grandchild's death signal
-    std::this_thread::sleep_for(std::chrono::milliseconds(100));
-    _exit(EXIT_TEST_PASS);
+    return path + "/bin/goggles-reaper";
 }
 
 } // namespace
 
 int main() {
-    std::printf("Testing PR_SET_PDEATHSIG behavior...\n");
+    std::printf("Testing goggles-reaper behavior...\n");
 
-    // Become a subreaper so orphaned grandchild is reparented to us
-    if (prctl(PR_SET_CHILD_SUBREAPER, 1) != 0) {
-        std::fprintf(stderr, "prctl(PR_SET_CHILD_SUBREAPER) failed: %s\n", std::strerror(errno));
-        return EXIT_TEST_FAIL;
-    }
+    const std::string reaper_path = get_reaper_path();
 
+    // Spawn: goggles-reaper sleep 60
+    // Then kill the reaper's parent (this test process spawns an intermediate)
+    // and verify the sleep process is also killed.
+
+    // We need to fork a child that spawns the reaper, then we kill that child
     const pid_t child = fork();
     if (child < 0) {
-        std::fprintf(stderr, "fork child failed: %s\n", std::strerror(errno));
+        std::fprintf(stderr, "fork failed: %s\n", std::strerror(errno));
         return EXIT_TEST_FAIL;
     }
 
     if (child == 0) {
-        run_child();
+        // Child: spawn goggles-reaper with sleep 60
+        std::array<char*, 4> argv = {const_cast<char*>(reaper_path.c_str()),
+                                     const_cast<char*>("sleep"), const_cast<char*>("60"), nullptr};
+
+        pid_t reaper_pid = -1;
+        const int rc =
+            posix_spawn(&reaper_pid, reaper_path.c_str(), nullptr, nullptr, argv.data(), environ);
+        if (rc != 0) {
+            std::fprintf(stderr, "posix_spawn failed: %s\n", std::strerror(rc));
+            _exit(EXIT_TEST_FAIL);
+        }
+
+        // Wait a bit for reaper to start
+        std::this_thread::sleep_for(std::chrono::milliseconds(200));
+
+        // Now exit - reaper should receive SIGTERM and kill its children
+        _exit(EXIT_TEST_PASS);
     }
 
-    // Wait for intermediate child
-    int child_status = 0;
-    waitpid(child, &child_status, 0);
+    // Parent: wait for child to exit
+    int status = 0;
+    waitpid(child, &status, 0);
 
-    // Wait for grandchild (now orphaned, reparented to us or init)
-    // We need to wait a bit for the grandchild to receive signal and exit
+    // Give reaper time to clean up
     std::this_thread::sleep_for(std::chrono::milliseconds(500));
 
-    // Try to reap any grandchild
-    int grandchild_status = 0;
-    const pid_t reaped = waitpid(-1, &grandchild_status, WNOHANG);
-
-    if (reaped > 0 && WIFEXITED(grandchild_status)) {
-        const int exit_code = WEXITSTATUS(grandchild_status);
-        if (exit_code == CHILD_SIGNAL_RECEIVED) {
-            std::printf("PASS: Grandchild received SIGTERM on parent death\n");
-            return EXIT_TEST_PASS;
-        }
-        std::fprintf(stderr, "FAIL: Grandchild exited with code %d\n", exit_code);
+    // Check if any sleep process is still running (there shouldn't be)
+    FILE* pgrep = popen("pgrep -x sleep 2>/dev/null | wc -l", "r");
+    if (pgrep == nullptr) {
+        std::fprintf(stderr, "popen failed\n");
         return EXIT_TEST_FAIL;
     }
 
-    std::fprintf(stderr, "FAIL: Could not reap grandchild (reaped=%d)\n", reaped);
+    int sleep_count = 0;
+    std::fscanf(pgrep, "%d", &sleep_count);
+    pclose(pgrep);
+
+    if (sleep_count == 0) {
+        std::printf("PASS: No orphaned sleep processes found\n");
+        return EXIT_TEST_PASS;
+    }
+
+    std::fprintf(stderr, "FAIL: Found %d sleep processes still running\n", sleep_count);
     return EXIT_TEST_FAIL;

The use of prctl(PR_SET_PDEATHSIG) is Linux-specific and breaks portability.
Consider using conditional compilation (e.g., #ifdef linux) to maintain
support for other POSIX systems like macOS or BSD.

Examples:

src/app/main.cpp [87]
        prctl(PR_SET_PDEATHSIG, SIGTERM);
tests/app/test_child_death_signal.cpp [2]
// Uses prctl(PR_SET_PDEATHSIG) which is Linux-specific

Solution Walkthrough:

Before:

// src/app/main.cpp
pid_t spawn_target_app(...) {
    // ...
    const pid_t pid = fork();
    if (pid == 0) {
        // Child process
        // prctl() is Linux-specific and will fail to compile on other systems.
        prctl(PR_SET_PDEATHSIG, SIGTERM);

        if (getppid() != parent_pid) {
            _exit(EXIT_FAILURE);
        }
        execvpe(argv[0], argv.data(), envp.data());
        _exit(EXIT_FAILURE);
    }
    return pid;
}

After:

// src/app/main.cpp
pid_t spawn_target_app(...) {
    // ...
    const pid_t pid = fork();
    if (pid == 0) {
        // Child process
#ifdef __linux__
        // Use conditional compilation to isolate Linux-specific code.
        prctl(PR_SET_PDEATHSIG, SIGTERM);

        if (getppid() != parent_pid) {
            _exit(EXIT_FAILURE);
        }
#endif
        execvpe(argv[0], argv.data(), envp.data());
        _exit(EXIT_FAILURE);
    }
    return pid;
}
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a significant portability regression by introducing the Linux-specific prctl call, which is a valid high-level design concern.

Medium
Possible issue
Validate prctl return value
Suggestion Impact:The commit removed the fork/child prctl(PR_SET_PDEATHSIG, SIGTERM) logic entirely by switching to posix_spawn() of a separate "goggles-reaper" executable, and also removed the include. This eliminates the need for the suggested prctl() return-value validation rather than implementing it directly.

code diff:

-#include <sys/prctl.h>
 #include <sys/wait.h>
 #include <thread>
 #include <unistd.h>
@@ -20,6 +22,17 @@
 #include <util/logging.hpp>
 #include <vector>
 
+static auto get_reaper_path() -> std::string {
+    std::array<char, 4096> exe_path{};
+    const ssize_t len = readlink("/proc/self/exe", exe_path.data(), exe_path.size() - 1);
+    if (len <= 0) {
+        return "goggles-reaper";
+    }
+    exe_path[static_cast<size_t>(len)] = '\0';
+    const std::filesystem::path exe_dir = std::filesystem::path(exe_path.data()).parent_path();
+    return (exe_dir / "goggles-reaper").string();
+}
+
 static auto spawn_target_app(const std::vector<std::string>& command,
                              const std::string& x11_display, const std::string& wayland_display,
                              uint32_t app_width, uint32_t app_height, const std::string& gpu_uuid)
@@ -71,29 +84,23 @@
     }
     envp.push_back(nullptr);
 
+    // Build argv: goggles-reaper <target_command...>
+    const std::string reaper_path = get_reaper_path();
     std::vector<char*> argv;
-    argv.reserve(command.size() + 1);
+    argv.reserve(command.size() + 2);
+    argv.push_back(const_cast<char*>(reaper_path.c_str()));
     for (const auto& arg : command) {
         argv.push_back(const_cast<char*>(arg.c_str()));
     }
     argv.push_back(nullptr);
 
-    const pid_t parent_pid = getpid();
-    const pid_t pid = fork();
-    if (pid < 0) {
+    pid_t pid = -1;
+    const int rc =
+        posix_spawn(&pid, reaper_path.c_str(), nullptr, nullptr, argv.data(), envp.data());
+    if (rc != 0) {
         return goggles::make_error<pid_t>(goggles::ErrorCode::unknown_error,
-                                          std::string("fork() failed: ") + std::strerror(errno));
-    }
-
-    if (pid == 0) {
-        // Child: set death signal before exec so we die if parent crashes
-        prctl(PR_SET_PDEATHSIG, SIGTERM);
-        // Handle race: if parent already died, getppid() returns 1 (init)
-        if (getppid() != parent_pid) {
-            _exit(EXIT_FAILURE);
-        }
-        execvpe(argv[0], argv.data(), envp.data());
-        _exit(EXIT_FAILURE);
+                                          std::string("posix_spawn() failed: ") +
+                                              std::strerror(rc));
     }

Add error handling for the prctl(PR_SET_PDEATHSIG, SIGTERM) call. Check its
return value and exit the child process with an error message if it fails.

src/app/main.cpp [87]

-prctl(PR_SET_PDEATHSIG, SIGTERM);
+if (prctl(PR_SET_PDEATHSIG, SIGTERM) != 0) {
+    std::fprintf(stderr, "prctl(PR_SET_PDEATHSIG) failed: %s\n", std::strerror(errno));
+    _exit(EXIT_FAILURE);
+}

[Suggestion processed]

Suggestion importance[1-10]: 7

__

Why: This suggestion correctly points out missing error handling for the prctl system call, which is critical for ensuring the core functionality of this PR works as intended.

Medium
General
Replace fixed sleep with polling loop

Replace the fixed sleep_for(500ms) in the test with a polling loop that
repeatedly calls waitpid() with WNOHANG. This will make the test more reliable
by actively waiting for the grandchild process to be reaped instead of using a
fixed delay.

tests/app/test_child_death_signal.cpp [90-94]

-// Wait for grandchild (now orphaned, reparented to us or init)
-// We need to wait a bit for the grandchild to receive signal and exit
-std::this_thread::sleep_for(std::chrono::milliseconds(500));
+// Wait for grandchild (now orphaned, reparented to us)
+int grandchild_status = 0;
+pid_t reaped = -1;
+for (int i = 0; i < 100; ++i) { // Timeout after ~5 seconds
+    reaped = waitpid(-1, &grandchild_status, WNOHANG);
+    if (reaped > 0) {
+        break;
+    }
+    std::this_thread::sleep_for(std::chrono::milliseconds(50));
+}
 
-// Try to reap any grandchild
-int grandchild_status = 0;
-const pid_t reaped = waitpid(-1, &grandchild_status, WNOHANG);
-

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that a fixed sleep can cause test flakiness and proposes a more robust polling mechanism, which significantly improves the reliability of the new test case.

Low
Log exec failure reason
Suggestion Impact:The PR removed the fork()+execvpe() path entirely and replaced it with posix_spawn(). While it did not add an execvpe-failure log, it did add improved error reporting by returning a message containing strerror(rc) when posix_spawn() fails, addressing the same debuggability goal.

code diff:

-    const pid_t parent_pid = getpid();
-    const pid_t pid = fork();
-    if (pid < 0) {
+    pid_t pid = -1;
+    const int rc =
+        posix_spawn(&pid, reaper_path.c_str(), nullptr, nullptr, argv.data(), envp.data());
+    if (rc != 0) {
         return goggles::make_error<pid_t>(goggles::ErrorCode::unknown_error,
-                                          std::string("fork() failed: ") + std::strerror(errno));
-    }
-
-    if (pid == 0) {
-        // Child: set death signal before exec so we die if parent crashes
-        prctl(PR_SET_PDEATHSIG, SIGTERM);
-        // Handle race: if parent already died, getppid() returns 1 (init)
-        if (getppid() != parent_pid) {
-            _exit(EXIT_FAILURE);
-        }
-        execvpe(argv[0], argv.data(), envp.data());
-        _exit(EXIT_FAILURE);
+                                          std::string("posix_spawn() failed: ") +
+                                              std::strerror(rc));
     }

Log the error message from errno when execvpe() fails. This will provide
valuable debugging information if the child process fails to start.

src/app/main.cpp [92-93]

 execvpe(argv[0], argv.data(), envp.data());
+std::fprintf(stderr, "execvpe failed: %s\n", std::strerror(errno));
 _exit(EXIT_FAILURE);

[Suggestion processed]

Suggestion importance[1-10]: 6

__

Why: This is a good suggestion for improving debuggability. Logging the reason for an execvpe failure is crucial for diagnosing issues with launching the target application.

Low
  • Update

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @openspec/changes/fix-child-process-cleanup/tasks.md:
- Line 10: Task 2.1 in tasks.md references the wrong test filename; update the
task description so it names tests/app/test_child_death_signal.cpp instead of
tests/app/test_process_utils.cpp to match the implemented test file and keep the
task list consistent.
🧹 Nitpick comments (4)
openspec/changes/fix-child-process-cleanup/proposal.md (1)

7-11: Consider adding spec delta references to the "What Changes" section.

Per coding guidelines, the "What Changes" section should list spec deltas. Currently, it lists implementation details. Consider adding a reference to the spec delta being introduced:

  • ADDED app-window: Child Process Death Signal requirement

The implementation details could remain as supplementary context.

src/app/main.cpp (1)

85-94: Check prctl() return value for robustness.

prctl(PR_SET_PDEATHSIG, SIGTERM) can fail (returns -1), for example if the signal number is invalid or on certain containerized/sandboxed environments. Consider checking the return value:

🛠️ Suggested fix
     if (pid == 0) {
         // Child: set death signal before exec so we die if parent crashes
-        prctl(PR_SET_PDEATHSIG, SIGTERM);
+        if (prctl(PR_SET_PDEATHSIG, SIGTERM) == -1) {
+            _exit(EXIT_FAILURE);
+        }
         // Handle race: if parent already died, getppid() returns 1 (init)
         if (getppid() != parent_pid) {
             _exit(EXIT_FAILURE);
         }
tests/app/test_child_death_signal.cpp (2)

88-104: Test may be timing-sensitive; consider a more robust wait strategy.

The 500ms sleep before attempting to reap the grandchild is a heuristic. If the system is under load, the grandchild may not have exited yet, causing a false test failure. Consider polling in a loop with a timeout:

💡 More robust waiting approach
-    // Wait for grandchild (now orphaned, reparented to us or init)
-    // We need to wait a bit for the grandchild to receive signal and exit
-    std::this_thread::sleep_for(std::chrono::milliseconds(500));
-
-    // Try to reap any grandchild
-    int grandchild_status = 0;
-    const pid_t reaped = waitpid(-1, &grandchild_status, WNOHANG);
+    // Wait for grandchild with timeout
+    int grandchild_status = 0;
+    pid_t reaped = -1;
+    for (int i = 0; i < 100; ++i) {
+        reaped = waitpid(-1, &grandchild_status, WNOHANG);
+        if (reaped > 0) {
+            break;
+        }
+        std::this_thread::sleep_for(std::chrono::milliseconds(50));
+    }

29-42: Consider checking prctl() return value for consistency.

Similar to the production code in main.cpp, the test's prctl(PR_SET_PDEATHSIG, SIGTERM) call (line 31) doesn't check the return value. For a test that validates PDEATHSIG behavior, verifying that the setup succeeded makes the test more reliable:

     signal(SIGTERM, signal_handler);
-    prctl(PR_SET_PDEATHSIG, SIGTERM);
+    if (prctl(PR_SET_PDEATHSIG, SIGTERM) != 0) {
+        _exit(EXIT_TEST_FAIL);
+    }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 310e5c0 and 7ff8918.

📒 Files selected for processing (6)
  • openspec/changes/fix-child-process-cleanup/proposal.md
  • openspec/changes/fix-child-process-cleanup/specs/app-window/spec.md
  • openspec/changes/fix-child-process-cleanup/tasks.md
  • src/app/main.cpp
  • tests/CMakeLists.txt
  • tests/app/test_child_death_signal.cpp
🧰 Additional context used
📓 Path-based instructions (3)
openspec/changes/**/specs/**/*.md

📄 CodeRabbit inference engine (openspec/AGENTS.md)

openspec/changes/**/specs/**/*.md: Write spec deltas using ## ADDED|MODIFIED|REMOVED|RENAMED Requirements format with at least one #### Scenario: per requirement in spec files
When modifying existing requirements in a MODIFIED delta, paste the full requirement block including the header and all scenarios, as the archiver will replace the entire requirement
Use ADDED for new capabilities that can stand alone; use MODIFIED for changes to existing requirement behavior, scope, or acceptance criteria; use RENAMED for name-only changes

Files:

  • openspec/changes/fix-child-process-cleanup/specs/app-window/spec.md
openspec/**/*.md

📄 CodeRabbit inference engine (openspec/AGENTS.md)

openspec/**/*.md: Use #### Scenario: format (4 hashtags) for scenario headers in requirements, not bullets or bold text, with at least one scenario per requirement
Use format - **WHEN** [condition] - **THEN** [expected result] for scenario steps in requirements

Files:

  • openspec/changes/fix-child-process-cleanup/specs/app-window/spec.md
  • openspec/changes/fix-child-process-cleanup/tasks.md
  • openspec/changes/fix-child-process-cleanup/proposal.md
openspec/changes/**/proposal.md

📄 CodeRabbit inference engine (openspec/AGENTS.md)

List spec deltas in proposal.md under 'What Changes' section, marking breaking changes with BREAKING

Files:

  • openspec/changes/fix-child-process-cleanup/proposal.md
🧠 Learnings (3)
📚 Learning: 2025-12-29T15:44:30.520Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: openspec/project.md:0-0
Timestamp: 2025-12-29T15:44:30.520Z
Learning: Applies to openspec/tests/**/*.{cpp,hpp} : Focus unit tests on non-GPU logic: utility functions, configuration parsing, and pipeline graph logic

Applied to files:

  • tests/CMakeLists.txt
📚 Learning: 2025-12-19T08:54:18.412Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: openspec/AGENTS.md:0-0
Timestamp: 2025-12-19T08:54:18.412Z
Learning: Applies to openspec/changes/**/specs/**/*.md : Use ADDED for new capabilities that can stand alone; use MODIFIED for changes to existing requirement behavior, scope, or acceptance criteria; use RENAMED for name-only changes

Applied to files:

  • openspec/changes/fix-child-process-cleanup/specs/app-window/spec.md
📚 Learning: 2025-12-29T15:44:30.520Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: openspec/project.md:0-0
Timestamp: 2025-12-29T15:44:30.520Z
Learning: Applies to openspec/src/**/*.{cpp,hpp} : Use inter-thread communication via `goggles::util::SPSCQueue` (lock-free SPSC queue) for task parallelism

Applied to files:

  • src/app/main.cpp
🧬 Code graph analysis (1)
src/app/main.cpp (1)
src/util/error.hpp (1)
  • make_error (50-52)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Static Analysis (clang-tidy)
  • GitHub Check: Build and Test (test preset)
🔇 Additional comments (4)
openspec/changes/fix-child-process-cleanup/specs/app-window/spec.md (1)

1-19: Spec structure looks good.

The specification correctly uses the ## ADDED Requirements format with properly structured #### Scenario: headers. The BDD-style GIVEN/WHEN/THEN format clearly defines the expected behavior for both the parent crash and race condition scenarios.

src/app/main.cpp (1)

78-83: Implementation is correct for the use case.

The fork/exec pattern is implemented correctly:

  • Parent PID is saved before fork (handles race condition)
  • Error handling uses strerror(errno) appropriately
  • Return value check is correct (pid < 0)

Note: execvpe() (line 92) is a GNU extension rather than POSIX, but this is acceptable since the code already depends on Linux-specific prctl(PR_SET_PDEATHSIG).

tests/CMakeLists.txt (1)

176-188: Test integration looks good.

The integration test is properly set up:

  • Separate executable for isolation
  • Sanitizers enabled for memory safety checks
  • Registered with CTest for automated runs
  • Working directory configured

The test has no external library dependencies, which is appropriate since it only tests Linux system calls.

tests/app/test_child_death_signal.cpp (1)

65-82: Good test design using subreaper pattern.

The test correctly uses PR_SET_CHILD_SUBREAPER to become the reaper for orphaned grandchildren, ensuring the test can observe the grandchild's exit status. The three-process hierarchy (main → child → grandchild) accurately simulates the real-world scenario where goggles spawns a target app.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In @openspec/changes/fix-child-process-cleanup/tasks.md:
- Around line 3-7: The checklist and code disagree: the tasks reference execvp()
but the implementation uses execvpe(); update either the docs or the code so
they match. Pick one API (prefer simpler execvp() if you don't need envp) and
make spawn_target_app() and its task list consistent: if you keep execvpe(),
change the checklist items to mention execvpe(); if you switch the code to
execvp(), update spawn_target_app() to call execvp() and remove envp usage. Also
ensure the diff note about replacing posix_spawnp() still matches the chosen
exec variant and that prctl and parent-death checks remain unchanged.

In @src/app/main.cpp:
- Around line 85-95: Check the return value of prctl(PR_SET_PDEATHSIG, SIGKILL)
in the child path and handle failure: after calling prctl, if it returns -1, log
or report the error (using errno) and immediately _exit(EXIT_FAILURE) to avoid
continuing without the death-signal guarantee; retain the existing race check
using getppid() vs parent_pid, then proceed to call execvpe(argv[0],
argv.data(), envp.data()) only if prctl succeeded.

In @tests/app/test_child_death_signal.cpp:
- Around line 20-34: The test is flaky because a single non-blocking
waitpid(WNOHANG) can return 0 under load; update the wait logic to poll until a
deadline instead of one-shot. Replace the single waitpid(..., WNOHANG) + sleep
with a loop that repeatedly calls waitpid(-1, &status, WNOHANG) until it returns
a pid > 0 or a timeout is reached (sleeping briefly between polls), and apply
the same change for the other occurrence noted around lines 76-105; use clear
timeouts and short backoff sleeps so the test reliably observes child exit.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ff8918 and 209416b.

📒 Files selected for processing (3)
  • openspec/changes/fix-child-process-cleanup/tasks.md
  • src/app/main.cpp
  • tests/app/test_child_death_signal.cpp
🧰 Additional context used
📓 Path-based instructions (1)
openspec/**/*.md

📄 CodeRabbit inference engine (openspec/AGENTS.md)

openspec/**/*.md: Use #### Scenario: format (4 hashtags) for scenario headers in requirements, not bullets or bold text, with at least one scenario per requirement
Use format - **WHEN** [condition] - **THEN** [expected result] for scenario steps in requirements

Files:

  • openspec/changes/fix-child-process-cleanup/tasks.md
🧠 Learnings (1)
📚 Learning: 2025-12-29T15:44:30.520Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: openspec/project.md:0-0
Timestamp: 2025-12-29T15:44:30.520Z
Learning: Applies to openspec/src/**/*.{cpp,hpp} : Use inter-thread communication via `goggles::util::SPSCQueue` (lock-free SPSC queue) for task parallelism

Applied to files:

  • src/app/main.cpp
🧬 Code graph analysis (1)
src/app/main.cpp (1)
src/util/error.hpp (1)
  • make_error (50-52)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Static Analysis (clang-tidy)
  • GitHub Check: Build and Test (test preset)

Comment on lines +3 to +7
- [x] 1.1 Replace `posix_spawnp()` with `fork()` + `execvp()` in `spawn_target_app()`
- [x] 1.2 Add `#include <sys/prctl.h>` header
- [x] 1.3 In child process (after fork), call `prctl(PR_SET_PDEATHSIG, SIGKILL)` before exec
- [x] 1.4 Add check for parent death race condition (verify parent PID hasn't changed to 1)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Doc drift: tasks say execvp() but code uses execvpe() (pick one and align).
Small, but this checklist is meant to be actionable/traceable.

🤖 Prompt for AI Agents
In @openspec/changes/fix-child-process-cleanup/tasks.md around lines 3 - 7, The
checklist and code disagree: the tasks reference execvp() but the implementation
uses execvpe(); update either the docs or the code so they match. Pick one API
(prefer simpler execvp() if you don't need envp) and make spawn_target_app() and
its task list consistent: if you keep execvpe(), change the checklist items to
mention execvpe(); if you switch the code to execvp(), update spawn_target_app()
to call execvp() and remove envp usage. Also ensure the diff note about
replacing posix_spawnp() still matches the chosen exec variant and that prctl
and parent-death checks remain unchanged.

src/app/main.cpp Outdated
Comment on lines 85 to 95
if (pid == 0) {
// Child: set death signal before exec so we die if parent crashes
// Use SIGKILL to ensure child cannot ignore the signal
prctl(PR_SET_PDEATHSIG, SIGKILL);
// Handle race: if parent already died, getppid() returns 1 (init)
if (getppid() != parent_pid) {
_exit(EXIT_FAILURE);
}
execvpe(argv[0], argv.data(), envp.data());
_exit(EXIT_FAILURE);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n src/app/main.cpp | sed -n '80,100p'

Repository: K1ngst0m/Goggles

Length of output: 921


🏁 Script executed:

man prctl 2>&1 | head -200

Repository: K1ngst0m/Goggles

Length of output: 101


🌐 Web query:

prctl PR_SET_PDEATHSIG return value error handling Linux man pages

💡 Result:

Short answer: prctl(PR_SET_PDEATHSIG, sig) returns 0 on success and -1 on error, with errno set. The documented error for this option is EINVAL when sig is not a valid signal number. See the prctl(2) / PR_SET_PDEATHSIG man page for details. [1][2][3]

Sources

  • PR_SET_PDEATHSIG man page. [1]
  • Arch/man-pages entry for PR_SET_PDEATHSIG. [2]
  • prctl(2) general man page (return-value / errors). [3]

🏁 Script executed:

# Verify that SIGKILL is signal 9 and always valid
python3 << 'EOF'
import signal
# Check if SIGKILL exists and its number
try:
    sigkill = signal.SIGKILL
    print(f"SIGKILL = {sigkill}")
    print("SIGKILL is a valid standard signal")
except AttributeError:
    print("SIGKILL not found")
EOF

Repository: K1ngst0m/Goggles

Length of output: 146


Check prctl(PR_SET_PDEATHSIG, SIGKILL) return value before proceeding.
If prctl fails (returns -1), the child continues executing without the death signal being set, defeating the goal of ensuring the child dies if the parent crashes. The subsequent getppid() check handles the race condition where the parent dies before prctl is called, but does not compensate for prctl failure. Add an error check:

    if (pid == 0) {
        // Child: set death signal before exec so we die if parent crashes
        // Use SIGKILL to ensure child cannot ignore the signal
-       prctl(PR_SET_PDEATHSIG, SIGKILL);
+       if (prctl(PR_SET_PDEATHSIG, SIGKILL) != 0) {
+           _exit(EXIT_FAILURE);
+       }
        // Handle race: if parent already died, getppid() returns 1 (init)
        if (getppid() != parent_pid) {
            _exit(EXIT_FAILURE);
        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (pid == 0) {
// Child: set death signal before exec so we die if parent crashes
// Use SIGKILL to ensure child cannot ignore the signal
prctl(PR_SET_PDEATHSIG, SIGKILL);
// Handle race: if parent already died, getppid() returns 1 (init)
if (getppid() != parent_pid) {
_exit(EXIT_FAILURE);
}
execvpe(argv[0], argv.data(), envp.data());
_exit(EXIT_FAILURE);
}
if (pid == 0) {
// Child: set death signal before exec so we die if parent crashes
// Use SIGKILL to ensure child cannot ignore the signal
if (prctl(PR_SET_PDEATHSIG, SIGKILL) != 0) {
_exit(EXIT_FAILURE);
}
// Handle race: if parent already died, getppid() returns 1 (init)
if (getppid() != parent_pid) {
_exit(EXIT_FAILURE);
}
execvpe(argv[0], argv.data(), envp.data());
_exit(EXIT_FAILURE);
}
🤖 Prompt for AI Agents
In @src/app/main.cpp around lines 85 - 95, Check the return value of
prctl(PR_SET_PDEATHSIG, SIGKILL) in the child path and handle failure: after
calling prctl, if it returns -1, log or report the error (using errno) and
immediately _exit(EXIT_FAILURE) to avoid continuing without the death-signal
guarantee; retain the existing race check using getppid() vs parent_pid, then
proceed to call execvpe(argv[0], argv.data(), envp.data()) only if prctl
succeeded.

Comment on lines +20 to +34
[[noreturn]] void run_grandchild(pid_t expected_parent) {
prctl(PR_SET_PDEATHSIG, SIGKILL);

if (getppid() != expected_parent) {
_exit(EXIT_TEST_FAIL);
}

// Sleep until killed by SIGKILL
for (int i = 0; i < 100; ++i) {
std::this_thread::sleep_for(std::chrono::milliseconds(50));
}

// If we reach here, SIGKILL was not received
_exit(EXIT_TEST_FAIL);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Test is timing-flaky: replace single sleep(500ms) + one waitpid(WNOHANG) with a poll-until-timeout loop.
As written, waitpid(-1, ..., WNOHANG) can legitimately return 0 under load and the test fails immediately.

Suggested change
-    // We need to wait a bit for the grandchild to receive signal and exit
-    std::this_thread::sleep_for(std::chrono::milliseconds(500));
-
-    // Try to reap grandchild - should have been killed by SIGKILL
-    int grandchild_status = 0;
-    const pid_t reaped = waitpid(-1, &grandchild_status, WNOHANG);
+    // Poll for grandchild exit (avoid flakes on slow/loaded CI)
+    int grandchild_status = 0;
+    pid_t reaped = 0;
+    const auto deadline = std::chrono::steady_clock::now() + std::chrono::seconds(5);
+    while (std::chrono::steady_clock::now() < deadline) {
+        reaped = waitpid(-1, &grandchild_status, WNOHANG);
+        if (reaped > 0) break;
+        if (reaped < 0 && errno == EINTR) continue;
+        std::this_thread::sleep_for(std::chrono::milliseconds(50));
+    }

Also applies to: 76-105

🤖 Prompt for AI Agents
In @tests/app/test_child_death_signal.cpp around lines 20 - 34, The test is
flaky because a single non-blocking waitpid(WNOHANG) can return 0 under load;
update the wait logic to poll until a deadline instead of one-shot. Replace the
single waitpid(..., WNOHANG) + sleep with a loop that repeatedly calls
waitpid(-1, &status, WNOHANG) until it returns a pid > 0 or a timeout is reached
(sleeping briefly between polls), and apply the same change for the other
occurrence noted around lines 76-105; use clear timeouts and short backoff
sleeps so the test reliably observes child exit.

Comment on lines +20 to +26
[[noreturn]] void run_grandchild(pid_t expected_parent) {
prctl(PR_SET_PDEATHSIG, SIGKILL);

if (getppid() != expected_parent) {
_exit(EXIT_TEST_FAIL);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "test_child_death_signal.cpp" -type f

Repository: K1ngst0m/Goggles

Length of output: 99


🏁 Script executed:

wc -l ./tests/app/test_child_death_signal.cpp

Repository: K1ngst0m/Goggles

Length of output: 103


🏁 Script executed:

cat -n ./tests/app/test_child_death_signal.cpp

Repository: K1ngst0m/Goggles

Length of output: 4078


Handle syscall failures consistently (prctl + waitpid).

prctl(PR_SET_PDEATHSIG, SIGKILL) at line 21 returns no error check, and waitpid(child, ...) at line 78 ignores EINTR and return value despite error handling being applied elsewhere in this file (lines 61, 86).

Suggested fix
 [[noreturn]] void run_grandchild(pid_t expected_parent) {
-    prctl(PR_SET_PDEATHSIG, SIGKILL);
+    if (prctl(PR_SET_PDEATHSIG, SIGKILL) != 0) {
+        _exit(EXIT_TEST_FAIL);
+    }
@@
-    waitpid(child, &child_status, 0);
+    while (waitpid(child, &child_status, 0) < 0 && errno == EINTR) {}

Also applies to: line 78

@K1ngst0m
Copy link
Copy Markdown
Collaborator

K1ngst0m commented Jan 9, 2026

the reaper pattern

Goggles (main process, multi-threaded)
    ↓ fork() + exec("goggles-reaper") [SAFE: immediate exec]
goggles-reaper (watchdog, single-threaded)
    ↓ PR_SET_CHILD_SUBREAPER + PR_SET_PDEATHSIG
    ↓ fork() + exec(target_app) [SAFE: no threads]
Target Application

@BANANASJIM BANANASJIM changed the title fix(app): terminate child process on parent crash fix(app): terminate child process on parent crash via reaper Jan 9, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @src/app/reaper_main.cpp:
- Around line 73-78: The signal handler signal_handler currently calls
kill_all_children which invokes non-async-signal-safe functions; change
signal_handler to only set the atomic/volatile flag g_should_run = 0 and return
(remove the kill_all_children call), and move any child-killing logic into the
main loop that checks g_should_run (the existing loop around where main checks
the flag near the comment at line ~125) so that when g_should_run becomes 0 the
main thread performs kill_all_children and other cleanup synchronously (reusing
the existing cleanup code at lines ~135-138).

In @tests/app/test_child_death_signal.cpp:
- Around line 34-52: The test uses posix_spawnp("goggles-reaper") which assumes
the binary is in PATH; update tests/app/test_child_death_signal.cpp to use an
absolute path instead: either replicate the reaper path-resolution logic used by
your main (resolve the current executable via /proc/self/exe then derive the
goggles-reaper absolute path and pass that string to posix_spawnp instead of
"goggles-reaper"), or (alternative) add a CMake dependency so the test depends
on the goggles-reaper target and pass the built artifact's full path into the
test (replace the literal "goggles-reaper" in argv/posix_spawnp with the
resolved absolute path). Ensure the new path string lifetime outlives the
posix_spawnp call and adjust argv/posix_spawnp usage accordingly.
🧹 Nitpick comments (3)
src/app/reaper_main.cpp (1)

80-87: Consider checking sigaction return values.

While failures are unlikely, checking return values would improve robustness and aid debugging if signal setup fails unexpectedly.

openspec/changes/fix-child-process-cleanup/proposal.md (1)

27-33: Consider expanding the "Impact" section with spec deltas.

Per coding guidelines for proposal.md, the "What Changes" or "Impact" section should list specific spec deltas. Currently it mentions "Affected specs: app-window" but doesn't detail what specifically changes in that spec. Consider adding a brief description of the behavioral change (e.g., "Target app now terminates reliably when Goggles crashes").

src/app/CMakeLists.txt (1)

32-37: Consider enabling sanitizers for goggles-reaper.

The main goggles target enables sanitizers via goggles_enable_sanitizers(goggles), but the reaper does not. For consistency and to catch memory/threading issues during testing, consider enabling sanitizers for the reaper as well.

Proposed fix
 # Reaper watchdog process
 add_executable(goggles-reaper reaper_main.cpp)
 set_target_properties(goggles-reaper PROPERTIES
     RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin
 )
 goggles_enable_clang_tidy(goggles-reaper)
+goggles_enable_sanitizers(goggles-reaper)
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 209416b and 938371b.

📒 Files selected for processing (6)
  • openspec/changes/fix-child-process-cleanup/proposal.md
  • openspec/changes/fix-child-process-cleanup/tasks.md
  • src/app/CMakeLists.txt
  • src/app/main.cpp
  • src/app/reaper_main.cpp
  • tests/app/test_child_death_signal.cpp
✅ Files skipped from review due to trivial changes (1)
  • openspec/changes/fix-child-process-cleanup/tasks.md
🧰 Additional context used
📓 Path-based instructions (2)
openspec/**/*.md

📄 CodeRabbit inference engine (openspec/AGENTS.md)

openspec/**/*.md: Use #### Scenario: format (4 hashtags) for scenario headers in requirements, not bullets or bold text, with at least one scenario per requirement
Use format - **WHEN** [condition] - **THEN** [expected result] for scenario steps in requirements

Files:

  • openspec/changes/fix-child-process-cleanup/proposal.md
openspec/changes/**/proposal.md

📄 CodeRabbit inference engine (openspec/AGENTS.md)

List spec deltas in proposal.md under 'What Changes' section, marking breaking changes with BREAKING

Files:

  • openspec/changes/fix-child-process-cleanup/proposal.md
🧠 Learnings (4)
📚 Learning: 2026-01-07T07:20:57.054Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: docs/project_policies.md:0-0
Timestamp: 2026-01-07T07:20:57.054Z
Learning: Applies to docs/src/render/**/*.cpp : Single-threaded render loop on main thread by default. Render backend (`goggles::render`) runs on main thread. Pipeline execution runs on main thread. Job system for non-render work on worker threads. Threading introduced only when profiling justifies it (main thread CPU consistently >8ms).

Applied to files:

  • src/app/CMakeLists.txt
📚 Learning: 2026-01-07T07:20:57.053Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: docs/project_policies.md:0-0
Timestamp: 2026-01-07T07:20:57.053Z
Learning: Applies to docs/**/*.{cpp,hpp} : Use top-level namespace `goggles`. Use module namespaces `goggles::<module>` (e.g., `goggles::capture`, `goggles::render`). Use sub-module namespaces `goggles::<module>::<submodule>`. Never use `using namespace` in headers. Prefer explicit namespace qualification or scoped `using` declarations.

Applied to files:

  • src/app/CMakeLists.txt
📚 Learning: 2025-12-29T15:44:30.520Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: openspec/project.md:0-0
Timestamp: 2025-12-29T15:44:30.520Z
Learning: Applies to openspec/src/**/*.{cpp,hpp} : Use inter-thread communication via `goggles::util::SPSCQueue` (lock-free SPSC queue) for task parallelism

Applied to files:

  • src/app/reaper_main.cpp
📚 Learning: 2026-01-07T07:20:57.054Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: docs/project_policies.md:0-0
Timestamp: 2026-01-07T07:20:57.054Z
Learning: Applies to docs/**/*.{cpp,hpp} : Never use raw `int` file descriptors when possible. Use `goggles::util::UniqueFd` for all owned file descriptors. Raw `int` acceptable only at uncontrollable boundaries (syscall returns, IPC receive). Wrap immediately after receive. API boundaries may use `int` with clear documentation.

Applied to files:

  • src/app/main.cpp
🧬 Code graph analysis (1)
src/app/main.cpp (4)
src/ui/imgui_layer.hpp (2)
  • path (74-74)
  • path (95-95)
src/render/texture/texture_loader.hpp (1)
  • path (32-33)
src/app/cli.hpp (1)
  • arg (76-76)
src/util/error.hpp (1)
  • make_error (50-52)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Static Analysis (clang-tidy)
  • GitHub Check: Build and Test (test preset)
🔇 Additional comments (6)
src/app/reaper_main.cpp (4)

53-59: LGTM!

The recursive kill approach correctly handles the process tree. Killing the parent before children prevents the parent from spawning new processes during cleanup.


61-71: LGTM!

The cleanup functions correctly gather and terminate child process trees, and the wait loop properly handles EINTR interruptions.


97-108: LGTM - correct parent death race handling.

The pattern of capturing parent_pid before PR_SET_PDEATHSIG and checking getppid() afterward correctly handles the race condition where the parent might die between the two calls.

Consider checking prctl return values for better diagnostics if these calls fail.


123-144: LGTM!

The main loop correctly handles EINTR interruptions, and the cleanup sequence properly terminates and reaps all remaining children before returning the appropriate exit status.

src/app/main.cpp (2)

25-34: LGTM!

The reaper path resolution correctly uses readlink to find the executable's directory and falls back to PATH lookup if that fails. The buffer size and null-termination are handled properly.


84-101: LGTM!

The argv construction correctly prepends the reaper path, and the switch from posix_spawnp to posix_spawn is appropriate since the full path is now computed explicitly.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Fix all issues with AI agents
In @src/app/reaper_main.cpp:
- Around line 53-71: The current wait_all_children() can block forever because
children may ignore SIGTERM; update the termination strategy: in
kill_all_children() send SIGTERM to each child (use kill_process_tree(pid,
SIGTERM) so the function exists), then implement a bounded wait in
wait_all_children() that repeatedly loops with non-blocking waitpid(...,
WNOHANG) and usleep short intervals until either all children have exited or a
deadline (e.g., N seconds) is reached, handling EINTR correctly; if the deadline
is exceeded, escalate by calling kill_process_tree(child, SIGKILL) for any
still-running children and perform a final blocking waitpid loop to reap them;
ensure functions reference get_child_pids, kill_process_tree, and
kill_all_children so remaining PIDs can be re-queried or tracked during
escalation.
- Around line 1-141: The reaper is Linux-only; either prevent building it on
non-Linux or guard platform-specific APIs: update the CMake target that
currently adds src/app/reaper_main.cpp (around the existing target in
CMakeLists.txt) to build only when Linux (e.g., if(UNIX AND NOT APPLE) or
if(CMAKE_SYSTEM_NAME STREQUAL Linux)), OR wrap the Linux-specific source body in
#ifdef __linux__ / #endif so symbols using /proc, prctl (PR_SET_CHILD_SUBREAPER,
PR_SET_PDEATHSIG), fork(), execvp(), get_child_pids(), kill_process_tree(), etc.
are excluded on non-Linux platforms; choose one approach and apply it
consistently so the file is not compiled/used on non-Linux systems.

In @tests/app/test_child_death_signal.cpp:
- Around line 52-76: The child branch uses posix_spawn(..., environ) without
declaring extern char** environ (portability issue) and relies on a fixed
std::this_thread::sleep_for(200ms) race; fix by adding a portable usage of the
environment to the spawn call (declare extern char** environ at top of the test
file or pass nullptr if inheriting env is acceptable) and replace the fixed
sleep_for wait with a deterministic readiness check: after posix_spawn, poll for
the spawned reaper child (reaper_pid) existence or use a pipe/handshake where
the child (spawned via posix_spawn with argv from reaper_path) writes a ready
byte before the parent proceeds; update the child branch around
fork()/posix_spawn() and the sleep_for() usage (symbols: child, reaper_pid,
posix_spawn, reaper_path, environ, std::this_thread::sleep_for) to use the
chosen readiness mechanism.
- Around line 78-84: The waitpid call can fail with EINTR and the test doesn't
validate the child's exit, so replace the direct waitpid(child, &status, 0) with
an EINTR-safe loop that retries on errno==EINTR, checks the return value
(ensuring it equals child), and then validate the child's termination using
WIFEXITED/WEXITSTATUS or WIFSIGNALED to assert expected outcome before
continuing; reference the existing variables and symbols waitpid, child, and
status in tests/app/test_child_death_signal.cpp and ensure the test fails loudly
if waitpid returns an unexpected error or the child exited abnormally.
🧹 Nitpick comments (1)
src/app/reaper_main.cpp (1)

137-140: Consider propagating signaled exit status (helps callers/debuggability).

Right now a SIGKILL’d child returns EXIT_FAILURE rather than a conventional 128+signal mapping.

Possible tweak
     if (WIFEXITED(status)) {
         return WEXITSTATUS(status);
     }
+    if (WIFSIGNALED(status)) {
+        return 128 + WTERMSIG(status);
+    }
     return EXIT_FAILURE;
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 938371b and 1923d18.

📒 Files selected for processing (2)
  • src/app/reaper_main.cpp
  • tests/app/test_child_death_signal.cpp
🧰 Additional context used
🧬 Code graph analysis (1)
tests/app/test_child_death_signal.cpp (1)
src/app/main.cpp (2)
  • get_reaper_path (25-34)
  • get_reaper_path (25-25)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Static Analysis (clang-tidy)
  • GitHub Check: Build and Test (test preset)

Comment on lines +1 to +141
// goggles-reaper: single-threaded watchdog process
// Ensures target app and descendants are terminated when goggles dies

#include <array>
#include <cerrno>
#include <csignal>
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <dirent.h>
#include <sys/prctl.h>
#include <sys/wait.h>
#include <unistd.h>
#include <vector>

namespace {

volatile sig_atomic_t g_should_run = 1;

auto get_child_pids(pid_t parent_pid) -> std::vector<pid_t> {
std::vector<pid_t> pids;
DIR* proc_dir = opendir("/proc");
if (proc_dir == nullptr) {
return pids;
}

struct dirent* entry = nullptr;
while ((entry = readdir(proc_dir)) != nullptr) {
if (entry->d_type != DT_DIR || entry->d_name[0] < '0' || entry->d_name[0] > '9') {
continue;
}

std::array<char, 64> path{};
std::snprintf(path.data(), path.size(), "/proc/%s/stat", entry->d_name);

FILE* stat_file = std::fopen(path.data(), "r");
if (stat_file == nullptr) {
continue;
}

pid_t ppid = -1;
std::fscanf(stat_file, "%*d %*s %*c %d", &ppid);
std::fclose(stat_file);

if (ppid == parent_pid) {
pids.push_back(static_cast<pid_t>(std::atoi(entry->d_name)));
}
}
closedir(proc_dir);
return pids;
}

auto kill_process_tree(pid_t pid, int sig) -> void {
const auto children = get_child_pids(pid);
kill(pid, sig);
for (const pid_t child : children) {
kill_process_tree(child, sig);
}
}

auto kill_all_children(int sig) -> void {
const auto children = get_child_pids(getpid());
for (const pid_t child : children) {
kill_process_tree(child, sig);
}
}

auto wait_all_children() -> void {
while (waitpid(-1, nullptr, 0) > 0 || errno == EINTR) {
}
}

auto signal_handler(int /*sig*/) -> void {
g_should_run = 0;
}

auto setup_signal_handlers() -> void {
struct sigaction sa{};
sa.sa_handler = signal_handler;
sigemptyset(&sa.sa_mask);
sigaction(SIGTERM, &sa, nullptr);
sigaction(SIGINT, &sa, nullptr);
sigaction(SIGHUP, &sa, nullptr);
}

} // namespace

auto main(int argc, char** argv) -> int {
if (argc < 2) {
std::fprintf(stderr, "goggles-reaper: missing command\n");
return EXIT_FAILURE;
}

const pid_t parent_pid = getppid();

// Become subreaper - orphaned descendants reparent to us
prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0);

// Die when parent (goggles) dies
prctl(PR_SET_PDEATHSIG, SIGTERM, 0, 0, 0);
if (getppid() != parent_pid) {
return EXIT_FAILURE;
}

setup_signal_handlers();

// Fork and exec target app
const pid_t child = fork();
if (child < 0) {
std::fprintf(stderr, "goggles-reaper: fork failed: %s\n", std::strerror(errno));
return EXIT_FAILURE;
}

if (child == 0) {
execvp(argv[1], &argv[1]);
std::fprintf(stderr, "goggles-reaper: exec failed: %s\n", std::strerror(errno));
_exit(EXIT_FAILURE);
}

// Wait for primary child or signal
int status = 0;
while (g_should_run != 0) {
const pid_t pid = waitpid(child, &status, 0);
if (pid == child) {
break;
}
if (pid == -1 && errno != EINTR) {
break;
}
}

// Cleanup all remaining children
g_should_run = 0;
kill_all_children(SIGTERM);
wait_all_children();

if (WIFEXITED(status)) {
return WEXITSTATUS(status);
}
return EXIT_FAILURE;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for reaper_main.cpp and Linux-specific symbols
rg -n "reaper_main\.cpp|goggles-reaper|PR_SET_CHILD_SUBREAPER|PR_SET_PDEATHSIG" -C 3

Repository: K1ngst0m/Goggles

Length of output: 10761


🏁 Script executed:

#!/bin/bash
# Search for Linux platform guards in CMake and code
rg -n "if.*LINUX|__linux__|CMAKE_SYSTEM_NAME.*Linux|CMAKE_SYSTEM" -C 2

Repository: K1ngst0m/Goggles

Length of output: 402


🏁 Script executed:

#!/bin/bash
# Look for CMakeLists.txt files to understand build structure
find . -name "CMakeLists.txt" -type f | head -20

Repository: K1ngst0m/Goggles

Length of output: 317


🏁 Script executed:

#!/bin/bash
# Check the structure of src/app directory
ls -la src/app/ 2>/dev/null || find . -path "*/src/app*" -type f -name "*.cpp" -o -name "*.h" -o -name "CMakeLists.txt" | head -20

Repository: K1ngst0m/Goggles

Length of output: 823


🏁 Script executed:

#!/bin/bash
# Read the src/app/CMakeLists.txt to see the full build configuration
cat -n src/app/CMakeLists.txt

Repository: K1ngst0m/Goggles

Length of output: 1107


🏁 Script executed:

#!/bin/bash
# Search for any platform guards around the reaper_main.cpp or goggles-reaper build
rg -n -B10 -A5 "add_executable.*reaper|goggles-reaper" src/app/CMakeLists.txt

Repository: K1ngst0m/Goggles

Length of output: 525


🏁 Script executed:

#!/bin/bash
# Check if there are any #ifdef or platform checks in reaper_main.cpp itself
head -30 src/app/reaper_main.cpp | cat -n

Repository: K1ngst0m/Goggles

Length of output: 1051


🏁 Script executed:

#!/bin/bash
# Check the main CMakeLists.txt for any platform-specific logic
cat -n CMakeLists.txt | head -50

Repository: K1ngst0m/Goggles

Length of output: 1182


🏁 Script executed:

#!/bin/bash
# Check the main CMakeLists.txt to see if there's any CMAKE_SYSTEM_NAME check
rg -n "CMAKE_SYSTEM_NAME|if.*WIN|if.*APPLE|if.*UNIX|if.*Linux" CMakeLists.txt

Repository: K1ngst0m/Goggles

Length of output: 42


🏁 Script executed:

#!/bin/bash
# Verify the full reaper_main.cpp for any platform guards
wc -l src/app/reaper_main.cpp
cat -n src/app/reaper_main.cpp | grep -E "ifdef|if.*linux|if.*LINUX" || echo "No #ifdef guards found"

Repository: K1ngst0m/Goggles

Length of output: 110


🏁 Script executed:

#!/bin/bash
# Check if there are any platform-specific configurations in Dependencies.cmake or CompilerConfig.cmake
rg -n "CMAKE_SYSTEM_NAME|WIN32|APPLE|LINUX" cmake/

Repository: K1ngst0m/Goggles

Length of output: 185


Add Linux platform guards to goggles-reaper build or source.

This file is inherently Linux-specific (/proc filesystem access, prctl() syscalls, fork()/execvp()). Currently, src/app/CMakeLists.txt (line 33) builds it unconditionally, and the source has no #ifdef __linux__ guards. Either add if(UNIX AND NOT APPLE) or if(CMAKE_SYSTEM_NAME STREQUAL Linux) to the CMake target, or wrap Linux-specific code sections with #ifdef __linux__ in the source.

🤖 Prompt for AI Agents
In @src/app/reaper_main.cpp around lines 1 - 141, The reaper is Linux-only;
either prevent building it on non-Linux or guard platform-specific APIs: update
the CMake target that currently adds src/app/reaper_main.cpp (around the
existing target in CMakeLists.txt) to build only when Linux (e.g., if(UNIX AND
NOT APPLE) or if(CMAKE_SYSTEM_NAME STREQUAL Linux)), OR wrap the Linux-specific
source body in #ifdef __linux__ / #endif so symbols using /proc, prctl
(PR_SET_CHILD_SUBREAPER, PR_SET_PDEATHSIG), fork(), execvp(), get_child_pids(),
kill_process_tree(), etc. are excluded on non-Linux platforms; choose one
approach and apply it consistently so the file is not compiled/used on non-Linux
systems.

Comment on lines +53 to +71
auto kill_process_tree(pid_t pid, int sig) -> void {
const auto children = get_child_pids(pid);
kill(pid, sig);
for (const pid_t child : children) {
kill_process_tree(child, sig);
}
}

auto kill_all_children(int sig) -> void {
const auto children = get_child_pids(getpid());
for (const pid_t child : children) {
kill_process_tree(child, sig);
}
}

auto wait_all_children() -> void {
while (waitpid(-1, nullptr, 0) > 0 || errno == EINTR) {
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Termination/wait strategy can hang indefinitely; needs timeout + SIGKILL escalation.

Even with perfect PID discovery, children can ignore SIGTERM. wait_all_children() currently blocks until all children exit, with no deadline.

Sketch of bounded wait + escalation (minimal deps using usleep)
-auto wait_all_children() -> void {
-    while (waitpid(-1, nullptr, 0) > 0 || errno == EINTR) {
-    }
-}
+auto reap_all_children_nonblocking() -> bool {
+    for (;;) {
+        const pid_t r = waitpid(-1, nullptr, WNOHANG);
+        if (r > 0) continue;
+        if (r == 0) return false; // still have children
+        return errno == ECHILD;   // true => none left
+    }
+}
+
+auto wait_all_children_with_timeout_ms(int timeout_ms) -> bool {
+    const int step_us = 20 * 1000;
+    int waited_ms = 0;
+    while (waited_ms < timeout_ms) {
+        if (reap_all_children_nonblocking()) return true;
+        usleep(step_us);
+        waited_ms += 20;
+    }
+    return reap_all_children_nonblocking();
+}

Then in cleanup:

-    kill_all_children(SIGTERM);
-    wait_all_children();
+    kill_all_children(SIGTERM);
+    if (!wait_all_children_with_timeout_ms(2000)) {
+        kill_all_children(SIGKILL);
+        (void)wait_all_children_with_timeout_ms(2000);
+    }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In @src/app/reaper_main.cpp around lines 53 - 71, The current
wait_all_children() can block forever because children may ignore SIGTERM;
update the termination strategy: in kill_all_children() send SIGTERM to each
child (use kill_process_tree(pid, SIGTERM) so the function exists), then
implement a bounded wait in wait_all_children() that repeatedly loops with
non-blocking waitpid(..., WNOHANG) and usleep short intervals until either all
children have exited or a deadline (e.g., N seconds) is reached, handling EINTR
correctly; if the deadline is exceeded, escalate by calling
kill_process_tree(child, SIGKILL) for any still-running children and perform a
final blocking waitpid loop to reap them; ensure functions reference
get_child_pids, kill_process_tree, and kill_all_children so remaining PIDs can
be re-queried or tracked during escalation.

Comment on lines +94 to +105
const pid_t parent_pid = getppid();

// Become subreaper - orphaned descendants reparent to us
prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0);

// Die when parent (goggles) dies
prctl(PR_SET_PDEATHSIG, SIGTERM, 0, 0, 0);
if (getppid() != parent_pid) {
return EXIT_FAILURE;
}

setup_signal_handlers();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

prctl(...) return values are unchecked (silent failure → incorrect guarantees).

If PR_SET_CHILD_SUBREAPER or PR_SET_PDEATHSIG fails, the core safety property may not hold.

Proposed error checks
-    prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0);
+    if (prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0) != 0) {
+        std::fprintf(stderr, "goggles-reaper: PR_SET_CHILD_SUBREAPER failed: %s\n", std::strerror(errno));
+        return EXIT_FAILURE;
+    }
@@
-    prctl(PR_SET_PDEATHSIG, SIGTERM, 0, 0, 0);
+    if (prctl(PR_SET_PDEATHSIG, SIGTERM, 0, 0, 0) != 0) {
+        std::fprintf(stderr, "goggles-reaper: PR_SET_PDEATHSIG failed: %s\n", std::strerror(errno));
+        return EXIT_FAILURE;
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const pid_t parent_pid = getppid();
// Become subreaper - orphaned descendants reparent to us
prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0);
// Die when parent (goggles) dies
prctl(PR_SET_PDEATHSIG, SIGTERM, 0, 0, 0);
if (getppid() != parent_pid) {
return EXIT_FAILURE;
}
setup_signal_handlers();
const pid_t parent_pid = getppid();
// Become subreaper - orphaned descendants reparent to us
if (prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0) != 0) {
std::fprintf(stderr, "goggles-reaper: PR_SET_CHILD_SUBREAPER failed: %s\n", std::strerror(errno));
return EXIT_FAILURE;
}
// Die when parent (goggles) dies
if (prctl(PR_SET_PDEATHSIG, SIGTERM, 0, 0, 0) != 0) {
std::fprintf(stderr, "goggles-reaper: PR_SET_PDEATHSIG failed: %s\n", std::strerror(errno));
return EXIT_FAILURE;
}
if (getppid() != parent_pid) {
return EXIT_FAILURE;
}
setup_signal_handlers();

Comment on lines +19 to +38
auto get_reaper_path() -> std::string {
std::array<char, 4096> exe_path{};
const ssize_t len = readlink("/proc/self/exe", exe_path.data(), exe_path.size() - 1);
if (len <= 0) {
return "goggles-reaper";
}
exe_path[static_cast<size_t>(len)] = '\0';

std::string path(exe_path.data());
// Test binary is in build/debug/tests/, reaper is in build/debug/bin/
const auto pos = path.rfind('/');
if (pos != std::string::npos) {
path = path.substr(0, pos);
const auto pos2 = path.rfind('/');
if (pos2 != std::string::npos) {
path = path.substr(0, pos2);
}
}
return path + "/bin/goggles-reaper";
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Test reaper path resolution is brittle (assumes a specific build directory layout).

This will likely break in CI/packaging/out-of-tree builds. Prefer a CMake-provided absolute path (compile-time define) or an env var fallback; also validate the resolved path is executable before using it.

Proposed hardening (existence/executable check + fallback)
+#include <sys/stat.h>
+
 auto get_reaper_path() -> std::string {
@@
-    return path + "/bin/goggles-reaper";
+    const std::string candidate = path + "/bin/goggles-reaper";
+    if (access(candidate.c_str(), X_OK) == 0) {
+        return candidate;
+    }
+    // Fallback to PATH (lets CTest env control discovery)
+    return "goggles-reaper";
 }

Comment on lines +52 to +76
const pid_t child = fork();
if (child < 0) {
std::fprintf(stderr, "fork failed: %s\n", std::strerror(errno));
return EXIT_TEST_FAIL;
}

if (child == 0) {
// Child: spawn goggles-reaper with sleep 60
std::array<char*, 4> argv = {const_cast<char*>(reaper_path.c_str()),
const_cast<char*>("sleep"), const_cast<char*>("60"), nullptr};

pid_t reaper_pid = -1;
const int rc =
posix_spawn(&reaper_pid, reaper_path.c_str(), nullptr, nullptr, argv.data(), environ);
if (rc != 0) {
std::fprintf(stderr, "posix_spawn failed: %s\n", std::strerror(rc));
_exit(EXIT_TEST_FAIL);
}

// Wait a bit for reaper to start
std::this_thread::sleep_for(std::chrono::milliseconds(200));

// Now exit - reaper should receive SIGTERM and kill its children
_exit(EXIT_TEST_PASS);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Child-side spawn uses environ without a portable declaration and relies on fixed sleeps.

  • In C++, environ may need extern char** environ; (depends on libc headers).
  • The fixed 200ms delay is a race; prefer a readiness check (e.g., poll for reaper child existence) or a pipe handshake from the spawned process (if you add a test-only “ready” signal).
Minimal portability tweak
+#if defined(__linux__) || defined(__APPLE__)
+extern char** environ;
+#endif

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In @tests/app/test_child_death_signal.cpp around lines 52 - 76, The child branch
uses posix_spawn(..., environ) without declaring extern char** environ
(portability issue) and relies on a fixed std::this_thread::sleep_for(200ms)
race; fix by adding a portable usage of the environment to the spawn call
(declare extern char** environ at top of the test file or pass nullptr if
inheriting env is acceptable) and replace the fixed sleep_for wait with a
deterministic readiness check: after posix_spawn, poll for the spawned reaper
child (reaper_pid) existence or use a pipe/handshake where the child (spawned
via posix_spawn with argv from reaper_path) writes a ready byte before the
parent proceeds; update the child branch around fork()/posix_spawn() and the
sleep_for() usage (symbols: child, reaper_pid, posix_spawn, reaper_path,
environ, std::this_thread::sleep_for) to use the chosen readiness mechanism.

Comment on lines +78 to +84
// Parent: wait for child to exit
int status = 0;
waitpid(child, &status, 0);

// Give reaper time to clean up
std::this_thread::sleep_for(std::chrono::milliseconds(500));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

waitpid(child, ...) should handle EINTR and validate child exit status.

Right now a signal can cause waitpid to fail and the test will proceed as if setup succeeded.

Proposed EINTR-safe wait + status check
-    int status = 0;
-    waitpid(child, &status, 0);
+    int status = 0;
+    while (waitpid(child, &status, 0) == -1 && errno == EINTR) {}
+    if (!WIFEXITED(status) || WEXITSTATUS(status) != EXIT_TEST_PASS) {
+        std::fprintf(stderr, "child setup failed\n");
+        return EXIT_TEST_FAIL;
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Parent: wait for child to exit
int status = 0;
waitpid(child, &status, 0);
// Give reaper time to clean up
std::this_thread::sleep_for(std::chrono::milliseconds(500));
// Parent: wait for child to exit
int status = 0;
while (waitpid(child, &status, 0) == -1 && errno == EINTR) {}
if (!WIFEXITED(status) || WEXITSTATUS(status) != EXIT_TEST_PASS) {
std::fprintf(stderr, "child setup failed\n");
return EXIT_TEST_FAIL;
}
// Give reaper time to clean up
std::this_thread::sleep_for(std::chrono::milliseconds(500));
🤖 Prompt for AI Agents
In @tests/app/test_child_death_signal.cpp around lines 78 - 84, The waitpid call
can fail with EINTR and the test doesn't validate the child's exit, so replace
the direct waitpid(child, &status, 0) with an EINTR-safe loop that retries on
errno==EINTR, checks the return value (ensuring it equals child), and then
validate the child's termination using WIFEXITED/WEXITSTATUS or WIFSIGNALED to
assert expected outcome before continuing; reference the existing variables and
symbols waitpid, child, and status in tests/app/test_child_death_signal.cpp and
ensure the test fails loudly if waitpid returns an unexpected error or the child
exited abnormally.

Comment on lines +85 to +103
// Check if any sleep process is still running (there shouldn't be)
FILE* pgrep = popen("pgrep -x sleep 2>/dev/null | wc -l", "r");
if (pgrep == nullptr) {
std::fprintf(stderr, "popen failed\n");
return EXIT_TEST_FAIL;
}

int sleep_count = 0;
std::fscanf(pgrep, "%d", &sleep_count);
pclose(pgrep);

if (sleep_count == 0) {
std::printf("PASS: No orphaned sleep processes found\n");
return EXIT_TEST_PASS;
}

std::fprintf(stderr, "FAIL: Found %d sleep processes still running\n", sleep_count);
return EXIT_TEST_FAIL;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find and examine the test file
fd test_child_death_signal.cpp

Repository: K1ngst0m/Goggles

Length of output: 97


🏁 Script executed:

# Look for Linux gating and test registration
rg -n "test_child_death_signal|child_death_signal" --hidden -C3

Repository: K1ngst0m/Goggles

Length of output: 1415


🏁 Script executed:

# Check for Linux-specific guards in the test directory
rg -n "__linux__|CMAKE_SYSTEM_NAME" tests -C2

Repository: K1ngst0m/Goggles

Length of output: 42


🏁 Script executed:

cat -n tests/app/test_child_death_signal.cpp

Repository: K1ngst0m/Goggles

Length of output: 3937


🏁 Script executed:

# Verify the test is registered without Linux gating and check for any recent platform-specific work
grep -A10 "goggles_test_child_death_signal" tests/CMakeLists.txt | head -20

Repository: K1ngst0m/Goggles

Length of output: 412


🏁 Script executed:

# Check if there are any other tests using similar approach that might show best practices
rg -l "pgrep|popen.*|wc -l" tests/

Repository: K1ngst0m/Goggles

Length of output: 97


Fix non-hermetic process verification: use unique helper binary instead of pgrep -x sleep.

The test's use of pgrep -x sleep will fail on shared CI runners or dev machines where unrelated processes are named sleep. This test must verify only the specific sleep spawned by goggles-reaper, not all system sleep processes. Replace the generic sleep binary with a test-specific helper (e.g., /tmp/goggles_test_sleep_$$ or similar with a unique identifier) to ensure the process count verification is hermetic.

@BANANASJIM BANANASJIM force-pushed the fix/child-process-cleanup branch from 1923d18 to 0882d0a Compare January 9, 2026 10:45
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/app/CMakeLists.txt (1)

32-37: Consider enabling sanitizers for goggles-reaper during development.

The main goggles executable has sanitizers enabled (line 29), but goggles-reaper does not. Given that the reaper handles critical process cleanup logic, enabling sanitizers would help catch memory errors, undefined behavior, and other issues during development.

♻️ Proposed addition
 goggles_enable_clang_tidy(goggles-reaper)
+goggles_enable_sanitizers(goggles-reaper)
src/app/main.cpp (2)

25-34: Consider extracting the duplicated path resolution logic.

This function is duplicated in src/app/test_child_death_signal.cpp (lines 18-36), but with different path resolution logic. The test version navigates up two directory levels, while this version goes up one level. Consider extracting to a shared utility if the logic should be identical, or clearly document why they differ.


27-29: Handle potential path truncation edge case.

If the executable path is exactly 4095 bytes or longer, readlink() will truncate without error (returning 4095). While extremely rare, you could detect this by checking len == exe_path.size() - 1.

♻️ Optional defensive check
-    if (len <= 0) {
+    if (len <= 0 || len >= static_cast<ssize_t>(exe_path.size() - 1)) {
         return "goggles-reaper";
     }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1923d18 and 0882d0a.

📒 Files selected for processing (8)
  • openspec/changes/fix-child-process-cleanup/proposal.md
  • openspec/changes/fix-child-process-cleanup/specs/app-window/spec.md
  • openspec/changes/fix-child-process-cleanup/tasks.md
  • src/app/CMakeLists.txt
  • src/app/main.cpp
  • src/app/reaper_main.cpp
  • tests/CMakeLists.txt
  • tests/app/test_child_death_signal.cpp
🚧 Files skipped from review as they are similar to previous changes (5)
  • openspec/changes/fix-child-process-cleanup/tasks.md
  • openspec/changes/fix-child-process-cleanup/specs/app-window/spec.md
  • tests/app/test_child_death_signal.cpp
  • openspec/changes/fix-child-process-cleanup/proposal.md
  • tests/CMakeLists.txt
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-12-29T15:44:30.520Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: openspec/project.md:0-0
Timestamp: 2025-12-29T15:44:30.520Z
Learning: Applies to openspec/src/**/*.{cpp,hpp} : Use inter-thread communication via `goggles::util::SPSCQueue` (lock-free SPSC queue) for task parallelism

Applied to files:

  • src/app/main.cpp
📚 Learning: 2026-01-07T07:20:57.054Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: docs/project_policies.md:0-0
Timestamp: 2026-01-07T07:20:57.054Z
Learning: Applies to docs/**/*.{cpp,hpp} : Never use raw `int` file descriptors when possible. Use `goggles::util::UniqueFd` for all owned file descriptors. Raw `int` acceptable only at uncontrollable boundaries (syscall returns, IPC receive). Wrap immediately after receive. API boundaries may use `int` with clear documentation.

Applied to files:

  • src/app/main.cpp
  • src/app/reaper_main.cpp
📚 Learning: 2026-01-07T07:20:57.054Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: docs/project_policies.md:0-0
Timestamp: 2026-01-07T07:20:57.054Z
Learning: Applies to docs/src/render/**/*.cpp : Single-threaded render loop on main thread by default. Render backend (`goggles::render`) runs on main thread. Pipeline execution runs on main thread. Job system for non-render work on worker threads. Threading introduced only when profiling justifies it (main thread CPU consistently >8ms).

Applied to files:

  • src/app/CMakeLists.txt
📚 Learning: 2026-01-07T07:20:57.053Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: docs/project_policies.md:0-0
Timestamp: 2026-01-07T07:20:57.053Z
Learning: Applies to docs/**/*.{cpp,hpp} : Use top-level namespace `goggles`. Use module namespaces `goggles::<module>` (e.g., `goggles::capture`, `goggles::render`). Use sub-module namespaces `goggles::<module>::<submodule>`. Never use `using namespace` in headers. Prefer explicit namespace qualification or scoped `using` declarations.

Applied to files:

  • src/app/CMakeLists.txt
🧬 Code graph analysis (1)
src/app/main.cpp (1)
src/util/error.hpp (1)
  • make_error (50-52)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and Test (test preset)
  • GitHub Check: Static Analysis (clang-tidy)
🔇 Additional comments (11)
src/app/main.cpp (2)

5-5: LGTM!

The new includes are properly used for path resolution in get_reaper_path().

Also applies to: 13-13


87-106: LGTM. The transition from posix_spawnp to posix_spawn with an explicit reaper path is correct. The lifetime of reaper_path and the command strings are properly managed—both remain valid through the posix_spawn call, which reads (not stores) the argv array.

Note: While functionally safe, the const_cast pattern could be clarified by ensuring command is documented as a non-temporary reference parameter to prevent future misuse of this pattern.

src/app/reaper_main.cpp (9)

16-19: LGTM!

Proper use of volatile sig_atomic_t for signal-safe communication between signal handler and main loop.


20-65: LGTM!

The /proc scanning implementation is robust:

  • Properly handles errors at each step (opendir, fopen, fgets failures)
  • Correctly closes all resources (FILE*, DIR*)
  • Handles edge cases like processes disappearing during iteration
  • Correctly parses /proc/PID/stat format by finding the last ) to handle command names with parentheses

67-73: LGTM!

The recursive process tree termination is correct. Killing the parent before recursing prevents it from spawning additional children. Any orphaned descendants will reparent to this reaper process (due to PR_SET_CHILD_SUBREAPER), ensuring complete cleanup.


75-80: LGTM!

Clean implementation that terminates all immediate children and their descendants.


82-85: LGTM!

Correctly reaps all children with proper EINTR handling.


87-98: LGTM!

Signal handlers are correctly implemented and signal-safe. The handler only modifies sig_atomic_t, which is appropriate for signal context.


113-117: LGTM! Correct PR_SET_PDEATHSIG race condition handling.

The implementation correctly handles the race condition where the parent could die between getppid() and prctl(). By checking that getppid() returns the same PID after setting PR_SET_PDEATHSIG, you ensure the signal will be delivered when the parent dies.


134-144: LGTM! Robust wait loop with proper signal handling.

The wait loop correctly handles:

  • Normal child exit (breaking immediately)
  • Signal interruption (EINTR handled, loop continues to check g_should_run)
  • Signal-triggered shutdown (checking g_should_run after each iteration)

146-154: LGTM! Complete cleanup ensures no orphaned processes.

The cleanup path ensures:

  1. All remaining children receive SIGTERM
  2. All children are properly reaped via wait_all_children()
  3. Primary child's exit status is correctly propagated

Combined with PR_SET_CHILD_SUBREAPER, this guarantees complete process tree termination.

@BANANASJIM BANANASJIM merged commit f003f6f into main Jan 9, 2026
3 of 4 checks passed
@BANANASJIM BANANASJIM deleted the fix/child-process-cleanup branch January 9, 2026 12:47
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @src/app/reaper_main.cpp:
- Around line 91-98: setup_signal_handlers currently ignores sigaction() return
values so handlers may not be installed; update the function to check each
sigaction() call for failure, retrieve errno, and handle errors (e.g., log a
descriptive message including strerror(errno) via your logging mechanism or
perror and abort/exit with non‑zero status or throw an exception) to ensure the
process doesn’t silently continue without handlers (especially since
PR_SET_PDEATHSIG sends SIGTERM). Specifically, wrap the three sigaction calls
for SIGTERM, SIGINT and SIGHUP in checks, and if any returns -1 perform the
chosen error handling path so failures are visible and the program stops or
propagates the error.
- Around line 108-118: The prctl() calls for PR_SET_CHILD_SUBREAPER and
PR_SET_PDEATHSIG are currently unchecked; modify the code around
prctl(PR_SET_CHILD_SUBREAPER, ...) and prctl(PR_SET_PDEATHSIG, ...) to check
each return value (non-zero indicates failure), emit an error (e.g., via perror
or process logger) that includes which prctl failed and errno, and return
EXIT_FAILURE if either call fails before proceeding to the getppid() parent
check; keep the existing parent_pid/getppid logic and only proceed when both
prctl calls succeed.
🧹 Nitpick comments (1)
src/app/reaper_main.cpp (1)

20-65: Consider RAII wrapper for DIR* resource management.

The DIR* pointer is correctly managed with proper closedir() cleanup, but consider wrapping it in an RAII helper (e.g., unique_ptr with custom deleter) to make the lifetime management more robust and eliminate the need to track manual cleanup.

♻️ Example RAII pattern
struct DirCloser {
    void operator()(DIR* d) const { if (d) closedir(d); }
};
using UniqueDIR = std::unique_ptr<DIR, DirCloser>;

auto get_child_pids(pid_t parent_pid) -> std::vector<pid_t> {
    std::vector<pid_t> pids;
    UniqueDIR proc_dir(opendir("/proc"));
    if (!proc_dir) {
        return pids;
    }
    // ... rest of function, no manual closedir needed
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0882d0a and f83d552.

📒 Files selected for processing (1)
  • src/app/reaper_main.cpp
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-07T07:20:57.054Z
Learnt from: CR
Repo: K1ngst0m/Goggles PR: 0
File: docs/project_policies.md:0-0
Timestamp: 2026-01-07T07:20:57.054Z
Learning: Applies to docs/**/*.{cpp,hpp} : Never use raw `int` file descriptors when possible. Use `goggles::util::UniqueFd` for all owned file descriptors. Raw `int` acceptable only at uncontrollable boundaries (syscall returns, IPC receive). Wrap immediately after receive. API boundaries may use `int` with clear documentation.

Applied to files:

  • src/app/reaper_main.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and Test (test preset)
  • GitHub Check: Static Analysis (clang-tidy)
🔇 Additional comments (7)
src/app/reaper_main.cpp (7)

1-19: LGTM: Clean setup and appropriate signal-safe global.

The includes are appropriate for Linux process management, and volatile sig_atomic_t is correctly used for the signal handler flag.


67-73: LGTM: Recursive termination with subreaper safety.

The function correctly performs recursive process tree termination. The order (kill parent, then recurse to children) is safe because this process is a subreaper: any orphaned descendants reparent to us and are killed later by kill_all_children() in main. The architecture handles races where processes spawn children during traversal.


75-85: LGTM: Correct cleanup and reaping logic.

Both functions correctly handle child process cleanup. The wait_all_children() loop properly handles EINTR and reaps all children including any that may have accumulated as zombies during the main wait loop.


102-106: LGTM: Clear argument validation.


119-132: LGTM: Correct fork/exec pattern.

The fork and exec logic is correctly implemented. Using _exit() in the child after exec failure is correct (avoids flushing parent's stdio buffers), and the execvp() arguments are properly structured.


134-144: LGTM: Robust wait loop with signal awareness.

The wait loop correctly waits for the primary child while handling signals. Any descendants that reparent to the reaper during this phase will be reaped by wait_all_children() at line 149.


146-154: LGTM: Complete cleanup and proper exit status propagation.

The cleanup logic ensures all remaining descendants are terminated with SIGKILL (as intended per commit message), and the child's exit status is correctly propagated to the caller.

Comment on lines +91 to +98
auto setup_signal_handlers() -> void {
struct sigaction sa{};
sa.sa_handler = signal_handler;
sigemptyset(&sa.sa_mask);
sigaction(SIGTERM, &sa, nullptr);
sigaction(SIGINT, &sa, nullptr);
sigaction(SIGHUP, &sa, nullptr);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Check sigaction return values.

If sigaction() fails, signal handlers won't be installed. Since PR_SET_PDEATHSIG sends SIGTERM (line 114), failure to install the SIGTERM handler means the reaper would terminate without cleaning up children when the parent dies, defeating the core purpose of this watchdog.

🛡️ Proposed fix
 auto setup_signal_handlers() -> void {
     struct sigaction sa{};
     sa.sa_handler = signal_handler;
     sigemptyset(&sa.sa_mask);
-    sigaction(SIGTERM, &sa, nullptr);
-    sigaction(SIGINT, &sa, nullptr);
-    sigaction(SIGHUP, &sa, nullptr);
+    if (sigaction(SIGTERM, &sa, nullptr) != 0) {
+        std::fprintf(stderr, "goggles-reaper: sigaction SIGTERM failed: %s\n", std::strerror(errno));
+        std::abort();
+    }
+    if (sigaction(SIGINT, &sa, nullptr) != 0) {
+        std::fprintf(stderr, "goggles-reaper: sigaction SIGINT failed: %s\n", std::strerror(errno));
+        std::abort();
+    }
+    if (sigaction(SIGHUP, &sa, nullptr) != 0) {
+        std::fprintf(stderr, "goggles-reaper: sigaction SIGHUP failed: %s\n", std::strerror(errno));
+        std::abort();
+    }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto setup_signal_handlers() -> void {
struct sigaction sa{};
sa.sa_handler = signal_handler;
sigemptyset(&sa.sa_mask);
sigaction(SIGTERM, &sa, nullptr);
sigaction(SIGINT, &sa, nullptr);
sigaction(SIGHUP, &sa, nullptr);
}
auto setup_signal_handlers() -> void {
struct sigaction sa{};
sa.sa_handler = signal_handler;
sigemptyset(&sa.sa_mask);
if (sigaction(SIGTERM, &sa, nullptr) != 0) {
std::fprintf(stderr, "goggles-reaper: sigaction SIGTERM failed: %s\n", std::strerror(errno));
std::abort();
}
if (sigaction(SIGINT, &sa, nullptr) != 0) {
std::fprintf(stderr, "goggles-reaper: sigaction SIGINT failed: %s\n", std::strerror(errno));
std::abort();
}
if (sigaction(SIGHUP, &sa, nullptr) != 0) {
std::fprintf(stderr, "goggles-reaper: sigaction SIGHUP failed: %s\n", std::strerror(errno));
std::abort();
}
}
🤖 Prompt for AI Agents
In @src/app/reaper_main.cpp around lines 91 - 98, setup_signal_handlers
currently ignores sigaction() return values so handlers may not be installed;
update the function to check each sigaction() call for failure, retrieve errno,
and handle errors (e.g., log a descriptive message including strerror(errno) via
your logging mechanism or perror and abort/exit with non‑zero status or throw an
exception) to ensure the process doesn’t silently continue without handlers
(especially since PR_SET_PDEATHSIG sends SIGTERM). Specifically, wrap the three
sigaction calls for SIGTERM, SIGINT and SIGHUP in checks, and if any returns -1
perform the chosen error handling path so failures are visible and the program
stops or propagates the error.

Comment on lines +108 to +118
const pid_t parent_pid = getppid();

// Become subreaper - orphaned descendants reparent to us
prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0);

// Die when parent (goggles) dies
prctl(PR_SET_PDEATHSIG, SIGTERM, 0, 0, 0);
if (getppid() != parent_pid) {
return EXIT_FAILURE;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Check prctl return values.

Both prctl() calls are critical to the reaper's functionality and their return values must be checked:

  • If PR_SET_CHILD_SUBREAPER fails: orphaned descendants won't reparent to the reaper, causing incomplete cleanup.
  • If PR_SET_PDEATHSIG fails: the reaper won't terminate when the parent dies, leaving processes running headlessly.
🐛 Proposed fix
     const pid_t parent_pid = getppid();
 
     // Become subreaper - orphaned descendants reparent to us
-    prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0);
+    if (prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0) != 0) {
+        std::fprintf(stderr, "goggles-reaper: PR_SET_CHILD_SUBREAPER failed: %s\n", std::strerror(errno));
+        return EXIT_FAILURE;
+    }
 
     // Die when parent (goggles) dies
-    prctl(PR_SET_PDEATHSIG, SIGTERM, 0, 0, 0);
+    if (prctl(PR_SET_PDEATHSIG, SIGTERM, 0, 0, 0) != 0) {
+        std::fprintf(stderr, "goggles-reaper: PR_SET_PDEATHSIG failed: %s\n", std::strerror(errno));
+        return EXIT_FAILURE;
+    }
     if (getppid() != parent_pid) {
         return EXIT_FAILURE;
     }
🤖 Prompt for AI Agents
In @src/app/reaper_main.cpp around lines 108 - 118, The prctl() calls for
PR_SET_CHILD_SUBREAPER and PR_SET_PDEATHSIG are currently unchecked; modify the
code around prctl(PR_SET_CHILD_SUBREAPER, ...) and prctl(PR_SET_PDEATHSIG, ...)
to check each return value (non-zero indicates failure), emit an error (e.g.,
via perror or process logger) that includes which prctl failed and errno, and
return EXIT_FAILURE if either call fails before proceeding to the getppid()
parent check; keep the existing parent_pid/getppid logic and only proceed when
both prctl calls succeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants