reap zombie processes#608
Conversation
✅ Deploy Preview for jumpstarter-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
WalkthroughIntroduces centralized exporter-exception handling, a supervised fork/exec flow with parent/child roles, SIGCHLD-based zombie reaping when running as PID 1, and a helper to wait for child exit. Also adjusts shell driver logging by adding a single pre-launch debug log and removing repetitive per-iteration logs. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Parent as Parent (run.py)
participant Child as Child (session leader)
participant Exporter as exporter.serve()
participant OS as OS Signals
User->>Parent: start serve
Parent->>Parent: fork()
alt Parent process
OS-->>Parent: SIGINT/SIGTERM/...
Parent->>Child: forward signal
OS-->>Parent: SIGCHLD
Note over Parent: If PID == 1, reap zombies via _reap_zombie_processes()
Parent->>Parent: _wait_for_child(pid, child_info)
Parent-->>User: exit with child status
else Child process
Child->>Child: os.setsid()
Child->>Exporter: run exporter.serve()
Exporter-->>Child: ExceptionGroup?
Child->>Child: _handle_exporter_exceptions()
Child-->>Parent: exit(code)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
packages/jumpstarter-cli/jumpstarter_cli/run.py (3)
15-24: Also emit tracebacks to logs for easier debugging.Stderr one-liners are fine for users, but we’ll want tracebacks in logs for post-mortems.
Apply this diff to add debug traceback logging without changing user-facing output:
def _handle_exporter_exceptions(excgroup): """Handle exceptions from exporter serving.""" from jumpstarter_cli_common.exceptions import leaf_exceptions + import traceback for exc in leaf_exceptions(excgroup): if not isinstance(exc, anyio.get_cancelled_exc_class()): click.echo( f"Exception while serving on the exporter: {type(exc).__name__}: {exc}", err=True, ) + # Log full traceback at debug level for diagnostics + logger.debug("Exporter exception traceback:\n%s", "".join(traceback.format_exception(exc)))
63-67: Avoid noisy info logs for frequent SIGCHLD.SIGCHLD can be very chatty; keep logs clean by suppressing or demoting it.
Apply this to log non-CHLD at info, CHLD at debug:
- logger.info("CHILD: Received %d (%s)", received_signal, signal.Signals(received_signal).name) - - # Handle SIGCHLD for zombie process cleanup - if received_signal == signal.SIGCHLD: + if received_signal == signal.SIGCHLD: + logger.debug("CHILD: Received %d (%s)", received_signal, signal.Signals(received_signal).name) await _reap_zombie_processes() continue + logger.info("CHILD: Received %d (%s)", received_signal, signal.Signals(received_signal).name)
85-85: Centralizing exporter exception handling is fine; consider inlining if it remains single-use.Given past preference for minimal helpers, inlining
_handle_exporter_exceptions()into thisexcept*is an option if this remains the only call site.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py(3 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#584
File: packages/jumpstarter/jumpstarter/client/grpc.py:38-54
Timestamp: 2025-08-18T07:13:37.619Z
Learning: User michalskrivanek prefers inline code over small helper function extractions when the abstraction doesn't significantly improve readability or reduce complexity. They value straightforward, readable code over reducing minor duplication.
📚 Learning: 2025-08-14T13:11:35.034Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO 4.6.2+, `open_signal_receiver()` is a regular synchronous context manager that should be used with a plain `with` statement, not `async with`. However, it returns an asynchronous iterator that must be consumed with `async for`. The correct usage pattern is: `with open_signal_receiver(...) as signals: async for signum in signals:`. Reference: https://anyio.readthedocs.io/en/stable/signals.html
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO, `open_signal_receiver()` returns a synchronous context manager (used with regular `with`), not an asynchronous one. While the context manager is synchronous, it yields an asynchronous iterator that should be consumed with `async for`. Reference: https://anyio.readthedocs.io/en/stable/signals.html
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
🧬 Code graph analysis (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (2)
packages/jumpstarter-cli-common/jumpstarter_cli_common/exceptions.py (1)
leaf_exceptions(78-102)packages/jumpstarter/jumpstarter/exporter/exporter.py (1)
status(164-183)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
- GitHub Check: e2e
🔇 Additional comments (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (1)
54-56: Correct AnyIO signal receiver usage; SIGCHLD wired properly.Using
with open_signal_receiver(...): async for sig in signals:matches AnyIO guidance and adds SIGCHLD as intended.
Head branch was pushed to by a user without write access
13b29ae to
ccca9e9
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
packages/jumpstarter-cli/jumpstarter_cli/run.py (4)
15-24: Centralized exporter exception handling looks goodClear, filters cancellations, and standardizes stderr reporting.
Optionally also log via
logger.error(..., exc_info=exc)for easier postmortems.
26-47: Zombie reaper with grace period — good approach; consider optional gating/metricsThe grace avoids stealing exit statuses; loop/handling are sound.
- Gate behind an env flag (e.g.,
JUMPSTARTER_ENABLE_ZOMBIE_REAPER=1) for staged rollout.- Expose a counter metric (reaped pids, errors) to aid observability.
58-60: Correct AnyIO signal receiver usage; SIGCHLD addedUsing
with open_signal_receiver(...)matches AnyIO guidance.If Windows support matters, guard SIGCHLD:
if hasattr(signal, "SIGCHLD"): ....
67-71: Reap on SIGCHLD — avoid log noise for high-rate signalsFlow is correct; to reduce log volume, consider demoting the generic “CHILD: Received …” log to debug specifically for SIGCHLD, keeping info level for termination signals.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
packages/jumpstarter-cli/jumpstarter_cli/run.py(3 hunks)packages/jumpstarter-driver-shell/jumpstarter_driver_shell/driver.py(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#584
File: packages/jumpstarter/jumpstarter/client/grpc.py:38-54
Timestamp: 2025-08-18T07:13:37.619Z
Learning: User michalskrivanek prefers inline code over small helper function extractions when the abstraction doesn't significantly improve readability or reduce complexity. They value straightforward, readable code over reducing minor duplication.
📚 Learning: 2025-08-14T13:11:35.034Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO 4.6.2+, `open_signal_receiver()` is a regular synchronous context manager that should be used with a plain `with` statement, not `async with`. However, it returns an asynchronous iterator that must be consumed with `async for`. The correct usage pattern is: `with open_signal_receiver(...) as signals: async for signum in signals:`. Reference: https://anyio.readthedocs.io/en/stable/signals.html
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO, `open_signal_receiver()` returns a synchronous context manager (used with regular `with`), not an asynchronous one. While the context manager is synchronous, it yields an asynchronous iterator that should be consumed with `async for`. Reference: https://anyio.readthedocs.io/en/stable/signals.html
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
🧬 Code graph analysis (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (1)
packages/jumpstarter-cli-common/jumpstarter_cli_common/exceptions.py (1)
leaf_exceptions(78-102)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: e2e
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
- GitHub Check: pytest-matrix (macos-15, 3.13)
🔇 Additional comments (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (1)
89-89: Good: reuse centralized exception handlerThis removes duplication and keeps reporting consistent.
| self.logger.debug( | ||
| f"running {method} with cmd: {cmd} and env: {combined_env} " | ||
| f"and args: {args} in cwd: {self.cwd}" | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Do not log full environment or raw args; risk of secrets leakage
Dumping combined_env and args can leak credentials/tokens into logs.
Apply this to log only env keys and arg count:
- self.logger.debug(
- f"running {method} with cmd: {cmd} and env: {combined_env} "
- f"and args: {args} in cwd: {self.cwd}"
- )
+ safe_env_keys = sorted(combined_env.keys())
+ self.logger.debug(
+ "running %s with cmd: %r env_keys: %s args_len: %d cwd: %r",
+ method, cmd, safe_env_keys, len(args), self.cwd,
+ )Also consider sanitizing other logs in this file that print env/args (e.g., Line 39) to avoid exposing secrets.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| self.logger.debug( | |
| f"running {method} with cmd: {cmd} and env: {combined_env} " | |
| f"and args: {args} in cwd: {self.cwd}" | |
| ) | |
| safe_env_keys = sorted(combined_env.keys()) | |
| self.logger.debug( | |
| "running %s with cmd: %r env_keys: %s args_len: %d cwd: %r", | |
| method, cmd, safe_env_keys, len(args), self.cwd, | |
| ) |
🤖 Prompt for AI Agents
In packages/jumpstarter-driver-shell/jumpstarter_driver_shell/driver.py around
lines 141-144, the debug log prints full combined_env and raw args which can
leak secrets; change the log to only include the list of environment variable
keys (not values) and the args length (or a sanitized/trimmed preview, e.g.,
first N items with values redacted). Update the log message to something like
"running {method} with env keys: {sorted(combined_env.keys())} and args_count:
{len(args)} in cwd: {self.cwd}" and apply the same sanitization approach to
other places in this file that currently print env/args (notably line 39) to
avoid exposing secrets.
ccca9e9 to
2ea0f8a
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
packages/jumpstarter-cli/jumpstarter_cli/run.py (2)
107-109: Register SIGCHLD handler only when available (POSIX-only).Prevents issues on non-POSIX platforms and matches the conditional behavior of the reaper.
- for sig in (signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT, signal.SIGCHLD): - signal.signal(sig, parent_signal_handler) + signals = [signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT] + if hasattr(signal, "SIGCHLD"): + signals.append(signal.SIGCHLD) + for sig in signals: + signal.signal(sig, parent_signal_handler)
15-24: Helper is fine; consider optional debug trace for diagnosis.Optional: add
logger.debug("Exporter error", exc_info=exc)after echoing to aid troubleshooting without polluting stderr.for exc in leaf_exceptions(excgroup): if not isinstance(exc, anyio.get_cancelled_exc_class()): click.echo( f"Exception while serving on the exporter: {type(exc).__name__}: {exc}", err=True, ) + logger.debug("Exporter exception detail", exc_info=exc)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
packages/jumpstarter-cli/jumpstarter_cli/run.py(4 hunks)packages/jumpstarter-driver-shell/jumpstarter_driver_shell/driver.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/jumpstarter-driver-shell/jumpstarter_driver_shell/driver.py
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#584
File: packages/jumpstarter/jumpstarter/client/grpc.py:38-54
Timestamp: 2025-08-18T07:13:37.619Z
Learning: User michalskrivanek prefers inline code over small helper function extractions when the abstraction doesn't significantly improve readability or reduce complexity. They value straightforward, readable code over reducing minor duplication.
📚 Learning: 2025-08-14T13:11:35.034Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO 4.6.2+, `open_signal_receiver()` is a regular synchronous context manager that should be used with a plain `with` statement, not `async with`. However, it returns an asynchronous iterator that must be consumed with `async for`. The correct usage pattern is: `with open_signal_receiver(...) as signals: async for signum in signals:`. Reference: https://anyio.readthedocs.io/en/stable/signals.html
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO, `open_signal_receiver()` returns a synchronous context manager (used with regular `with`), not an asynchronous one. While the context manager is synchronous, it yields an asynchronous iterator that should be consumed with `async for`. Reference: https://anyio.readthedocs.io/en/stable/signals.html
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
🧬 Code graph analysis (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (1)
packages/jumpstarter-cli-common/jumpstarter_cli_common/exceptions.py (1)
leaf_exceptions(78-102)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: pytest-matrix (macos-15, 3.13)
- GitHub Check: pytest-matrix (macos-15, 3.12)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
- GitHub Check: pytest-matrix (macos-15, 3.11)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
- GitHub Check: e2e
- GitHub Check: Redirect rules - jumpstarter-docs
- GitHub Check: Header rules - jumpstarter-docs
- GitHub Check: Pages changed - jumpstarter-docs
🔇 Additional comments (2)
packages/jumpstarter-cli/jumpstarter_cli/run.py (2)
54-60: Correct usage of anyio.open_signal_receiver.Using a regular
withand thenasync formatches AnyIO 4.6+ guidance from PR #448. No issues.
78-79: Centralized exporter exception reporting looks good.Flattening with
leaf_exceptions()and skipping cancellations is appropriate for concise CLI error output.
a415938 to
b4976aa
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (3)
packages/jumpstarter-cli/jumpstarter_cli/run.py (3)
108-109: Be aware: installing a SIGCHLD handler can interfere with subprocess owners; gating to PID 1 mitigatesGiven Python’s subprocess/anyio watchers may rely on collecting their own child statuses, keeping SIGCHLD handling only when PID 1 avoids unintended interference in normal shells/hosts.
If you want extra safety, consider gating with an env var for phased rollout:
if os.getenv("JUMPSTARTER_ENABLE_ZOMBIE_REAPER", "0") != "1": return
26-46: skip_pid cannot work here and will steal the child's exit status; also gate reaping to PID 1waitpid(-1, WNOHANG) already reaps whatever child exits; comparing to skip_pid after the call can’t “un-reap” it. This will race with the blocking waitpid(pid, 0) below and can raise ECHILD. Additionally, the helper claims “when running as PID 1” but doesn’t enforce it.
Apply this diff to make the helper safe and aligned with its intent (we’ll handle the exporter reaping in the parent handler patch below):
-def _reap_zombie_processes(skip_pid=None): +def _reap_zombie_processes(): """ Reap zombie child processes when SIGCHLD is received. - This prevents zombie processes from accumulating in the system when running as PID 1. + This prevents zombie processes from accumulating in the system when running as PID 1. - :param skip_pid: PID to skip reaping (because it's being waited for explicitly) """ + # Only act as a generic reaper if we are init (PID 1) in this namespace. + if os.getpid() != 1: + return try: while True: try: pid, status = os.waitpid(-1, os.WNOHANG) if pid == 0: break # No more children to reap - if skip_pid and pid == skip_pid: - continue - logger.debug(f"PARENT: Reaped zombie process {pid} with status {status}") + logger.debug("PARENT: Reaped zombie process %d with status %d", pid, status) except ChildProcessError: break # No more children except Exception as e: - logger.warning(f"PARENT: Error during zombie reaping: {e}") + logger.debug("PARENT: Error during zombie reaping: %s", e)
93-111: Fix SIGCHLD handler vs waitpid(pid, 0) race by capturing the exporter’s status in the handlerThe current handler calls the generic reaper and attempts to “skip” the child, which isn’t possible; it can consume the exporter’s exit status, making the blocking wait fail with ECHILD. Capture the exporter’s status inside the handler and use it later instead of calling the generic reaper here.
Apply this diff:
def _handle_parent(pid): """Handle parent process waiting for child and signal forwarding.""" - def parent_signal_handler(signum, _): - if signum == signal.SIGCHLD: - # Skip reaping the direct child to avoid race with waitpid below - _reap_zombie_processes(skip_pid=pid) + exporter_status = None # set when the handler reaps the exporter + + def parent_signal_handler(signum, _): + nonlocal exporter_status + if signum == signal.SIGCHLD: + # Only act as a generic reaper if we are PID 1 (init) in this namespace. + if os.getpid() != 1: + return + # Reap exited children without blocking. If we reap our exporter, stash its status. + while True: + try: + rpid, status = os.waitpid(-1, os.WNOHANG) + except ChildProcessError: + break # No children + if rpid == 0: + break # No more + if rpid == pid: + exporter_status = status + logger.debug("PARENT: Reaped exporter PID %d with status %d (in handler)", rpid, status) + else: + logger.debug("PARENT: Reaped orphan PID %d with status %d", rpid, status) else: logger.info("PARENT: Got %d (%s), forwarding to child PID %d", signum, signal.Signals(signum).name, pid) if pid and pid > 0: try: os.kill(pid, signum) except ProcessLookupError: pass @@ - _, status = os.waitpid(pid, 0) + # If the handler already reaped the exporter, use that status; otherwise wait for it. + if exporter_status is None: + try: + _, status = os.waitpid(pid, 0) + except ChildProcessError: + if exporter_status is None: + raise + status = exporter_status + else: + status = exporter_status
🧹 Nitpick comments (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (1)
58-64: Correct AnyIO signal usage; tiny spacing nitPattern “with open_signal_receiver(...)” + “async for” is correct. Drop the stray space after the opening parenthesis.
- with open_signal_receiver( signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT) as signals: + with open_signal_receiver(signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT) as signals:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
packages/jumpstarter-cli/jumpstarter_cli/run.py(4 hunks)packages/jumpstarter-driver-shell/jumpstarter_driver_shell/driver.py(1 hunks)packages/jumpstarter-kubernetes/jumpstarter_kubernetes/install.py(0 hunks)
💤 Files with no reviewable changes (1)
- packages/jumpstarter-kubernetes/jumpstarter_kubernetes/install.py
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/jumpstarter-driver-shell/jumpstarter_driver_shell/driver.py
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#584
File: packages/jumpstarter/jumpstarter/client/grpc.py:38-54
Timestamp: 2025-08-18T07:13:37.619Z
Learning: User michalskrivanek prefers inline code over small helper function extractions when the abstraction doesn't significantly improve readability or reduce complexity. They value straightforward, readable code over reducing minor duplication.
📚 Learning: 2025-08-14T13:11:35.034Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO 4.6.2+, `open_signal_receiver()` is a regular synchronous context manager that should be used with a plain `with` statement, not `async with`. However, it returns an asynchronous iterator that must be consumed with `async for`. The correct usage pattern is: `with open_signal_receiver(...) as signals: async for signum in signals:`. Reference: https://anyio.readthedocs.io/en/stable/signals.html
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO, `open_signal_receiver()` returns a synchronous context manager (used with regular `with`), not an asynchronous one. While the context manager is synchronous, it yields an asynchronous iterator that should be consumed with `async for`. Reference: https://anyio.readthedocs.io/en/stable/signals.html
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
🧬 Code graph analysis (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (1)
packages/jumpstarter-cli-common/jumpstarter_cli_common/exceptions.py (1)
leaf_exceptions(78-102)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: e2e
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
- GitHub Check: pytest-matrix (macos-15, 3.13)
- GitHub Check: pytest-matrix (macos-15, 3.12)
- GitHub Check: pytest-matrix (macos-15, 3.11)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
- GitHub Check: Redirect rules - jumpstarter-docs
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
- GitHub Check: Header rules - jumpstarter-docs
- GitHub Check: Pages changed - jumpstarter-docs
🔇 Additional comments (3)
packages/jumpstarter-cli/jumpstarter_cli/run.py (3)
15-24: Centralized exporter exception reporting looks goodUsing leaf_exceptions and filtering out cancellations is appropriate for clear stderr reporting.
82-82: Good: exception handling delegated cleanlyCentralizing exporter exception printing reduces duplication in the serve flow.
111-124: Approve exit code logic in jumpstarter_cli/run.py. No conflicting os.wait/os.waitpid calls found in this module; exit-code mapping remains correct.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
packages/jumpstarter-cli/jumpstarter_cli/run.py (2)
15-24: Exception reporting LGTM; optionally include debug tracebacks for triage.Consider adding a debug log with the full traceback for each non-cancelled leaf to aid postmortems without spamming stderr.
for exc in leaf_exceptions(excgroup): if not isinstance(exc, anyio.get_cancelled_exc_class()): click.echo( f"Exception while serving on the exporter: {type(exc).__name__}: {exc}", err=True, ) + # Preserve full traceback at debug level for diagnostics + logger.debug( + "Exporter raised %s: %s", + type(exc).__name__, exc, + exc_info=(exc.__class__, exc, exc.__traceback__), + )
131-136: Consider a gentle respawn throttle to avoid log/CPU thrash on repeated fast exits.If the exporter repeatedly exits quickly with code 0 (restart), a tiny backoff (e.g., 100–250 ms) after a few rapid cycles can help. Keep it small to not impact legitimate lease-based restarts.
def _serve_with_exc_handling(config): - while True: + restart_count = 0 + last_restart = 0.0 + while True: pid = os.fork() @@ - if pid > 0: - if (exit_code := _handle_parent(pid)) is not None: + if pid > 0: + if (exit_code := _handle_parent(pid)) is not None: return exit_code else: _handle_child(config) sys.exit(1) # should never happen + # Child exited with 0 => restart; add minimal jitter if flapping + now = anyio.current_time() if hasattr(anyio, "current_time") else 0.0 + if now and (now - last_restart) < 1.0: + restart_count += 1 + if restart_count >= 3: + # sleep a touch; tiny pause to reduce thrash + try: + import time; time.sleep(0.2) + except Exception: + pass + else: + restart_count = 0 + last_restart = now
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
packages/jumpstarter-cli/jumpstarter_cli/run.py(4 hunks)packages/jumpstarter-driver-shell/jumpstarter_driver_shell/driver.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/jumpstarter-driver-shell/jumpstarter_driver_shell/driver.py
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.150Z
Learning: Zombie process reaping in jumpstarter is useful beyond just PID 1 cases, as drivers can leave behind processes in various execution contexts.
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.150Z
Learning: The Shell driver in jumpstarter can spawn multiple processes and leave them behind, creating zombie processes that need to be reaped even in non-PID 1 scenarios.
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Learnt from: NickCao
PR: jumpstarter-dev/jumpstarter#521
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:1-2
Timestamp: 2025-06-12T17:18:48.519Z
Learning: Exporters in this project are intended to run only on Unix systems, so Unix-specific APIs like `os.fork()` are acceptable and Windows compatibility is not required.
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#584
File: packages/jumpstarter/jumpstarter/client/grpc.py:38-54
Timestamp: 2025-08-18T07:13:37.619Z
Learning: User michalskrivanek prefers inline code over small helper function extractions when the abstraction doesn't significantly improve readability or reduce complexity. They value straightforward, readable code over reducing minor duplication.
📚 Learning: 2025-08-14T13:11:35.034Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-09-06T05:25:18.150Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.150Z
Learning: Zombie process reaping in jumpstarter is useful beyond just PID 1 cases, as drivers can leave behind processes in various execution contexts.
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-09-06T05:25:18.150Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.150Z
Learning: The Shell driver in jumpstarter can spawn multiple processes and leave them behind, creating zombie processes that need to be reaped even in non-PID 1 scenarios.
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO 4.6.2+, `open_signal_receiver()` is a regular synchronous context manager that should be used with a plain `with` statement, not `async with`. However, it returns an asynchronous iterator that must be consumed with `async for`. The correct usage pattern is: `with open_signal_receiver(...) as signals: async for signum in signals:`. Reference: https://anyio.readthedocs.io/en/stable/signals.html
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO, `open_signal_receiver()` returns a synchronous context manager (used with regular `with`), not an asynchronous one. While the context manager is synchronous, it yields an asynchronous iterator that should be consumed with `async for`. Reference: https://anyio.readthedocs.io/en/stable/signals.html
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
🧬 Code graph analysis (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (2)
packages/jumpstarter-cli-common/jumpstarter_cli_common/exceptions.py (1)
leaf_exceptions(78-102)packages/jumpstarter/jumpstarter/exporter/exporter.py (1)
status(164-183)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: e2e
- GitHub Check: pytest-matrix (macos-15, 3.13)
- GitHub Check: pytest-matrix (macos-15, 3.11)
- GitHub Check: pytest-matrix (macos-15, 3.12)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
🔇 Additional comments (4)
packages/jumpstarter-cli/jumpstarter_cli/run.py (4)
58-64: Correct AnyIO signal receiver usage — LGTM.Using
with open_signal_receiver(...): async for ...matches AnyIO’s API. Signal escalation behavior for SIGHUP also looks right.
81-83: Exporter exception grouping handling — LGTM.
except* Exceptionwith leaf flattening is appropriate here.
96-105: Parent signal forwarding and SIGCHLD handling — LGTM, contingent on reaper fix.Forwarding non-SIGCHLD signals to the child and reaping others via
_reap_zombie_processes(skip_pid=pid)is sound once the infinite-loop fix is applied.Please re-test a scenario where only the exporter exits to ensure the handler no longer spins and the parent’s
waitpid(pid, 0)proceeds as expected.
108-109: Registering SIGCHLD in parent — LGTM.This is required for the reaper flow.
b4976aa to
1e04d00
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (1)
50-66: Terminating signals received before exporter creation are dropped. Handle pre-start signals.If SIGINT/SIGTERM arrives after starting the signal task but before the exporter exists, we set received_signal but never act on it. Add pre-checks to exit early or stop immediately after creation.
# Start signal handler immediately signal_tg.start_soon(signal_handler) + + # If a termination signal already arrived, exit before creating the exporter + if received_signal: + return received_signal # Create exporter and run it async with config.create_exporter() as exporter: + # If a signal arrived during exporter creation, honor it before serving + if received_signal: + if received_signal != signal.SIGHUP: + signal_handled = True + exporter.stop(wait_for_lease_exit=received_signal == signal.SIGHUP) try: await exporter.serve()Also applies to: 69-75
🧹 Nitpick comments (4)
packages/jumpstarter-cli/jumpstarter_cli/run.py (4)
26-41: Gate reaper to PID 1 and down-level the catch-all log to debug.Prevents accidental reaping if this helper is reused elsewhere and reduces log noise on benign errors.
def _reap_zombie_processes(capture_child=None): - """Reap zombie processes when running as PID 1.""" + """Reap zombie processes (only when running as PID 1). Optionally capture our direct child's status.""" + if os.getpid() != 1: + return @@ - except Exception as e: - logger.warning(f"PARENT: Error during zombie reaping: {e}") + except Exception as e: + logger.debug(f"PARENT: Error during zombie reaping: {e}")
53-53: Nit: remove stray space after '(' in open_signal_receiver call.Keeps style consistent.
-with open_signal_receiver( signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT) as signals: +with open_signal_receiver(signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT) as signals:
88-95: ECHILD fallback: log when status wasn’t captured to aid debugging.Helps diagnose rare races where neither waitpid nor the handler captured status.
except ChildProcessError: - status = child_info['status'] + status = child_info['status'] + if status is None: + logger.debug("PARENT: Child %d already reaped but no status captured; will trigger restart.", pid) return status
135-145: Optional: forward to a process group to catch grandchildren.Consider creating a process group for the child (setsid in child, then use killpg in parent) so forwarded signals reach grandchildren spawned by drivers.
- else: - _handle_child(config) + else: + # Start new session so the child becomes a PG leader; helps signal propagation to grandchildren. + try: + os.setsid() + except Exception: + pass + _handle_child(config) @@ - logger.info("PARENT: Got %d (%s), forwarding to child PID %d", signum, signal.Signals(signum).name, pid) - if pid > 0: - try: - os.kill(pid, signum) - except ProcessLookupError: - pass + logger.info("PARENT: Got %d (%s), forwarding to child PG %d", signum, signal.Signals(signum).name, pid) + try: + os.killpg(pid, signum) + except ProcessLookupError: + pass
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
packages/jumpstarter-cli/jumpstarter_cli/run.py(4 hunks)packages/jumpstarter-driver-shell/jumpstarter_driver_shell/driver.py(1 hunks)packages/jumpstarter-kubernetes/jumpstarter_kubernetes/install.py(0 hunks)
💤 Files with no reviewable changes (1)
- packages/jumpstarter-kubernetes/jumpstarter_kubernetes/install.py
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/jumpstarter-driver-shell/jumpstarter_driver_shell/driver.py
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.150Z
Learning: Zombie process reaping in jumpstarter is useful beyond just PID 1 cases, as drivers can leave behind processes in various execution contexts.
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.150Z
Learning: The Shell driver in jumpstarter can spawn multiple processes and leave them behind, creating zombie processes that need to be reaped even in non-PID 1 scenarios.
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#584
File: packages/jumpstarter/jumpstarter/client/grpc.py:38-54
Timestamp: 2025-08-18T07:13:37.619Z
Learning: User michalskrivanek prefers inline code over small helper function extractions when the abstraction doesn't significantly improve readability or reduce complexity. They value straightforward, readable code over reducing minor duplication.
📚 Learning: 2025-08-14T13:11:35.034Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-09-06T05:25:18.150Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.150Z
Learning: Zombie process reaping in jumpstarter is useful beyond just PID 1 cases, as drivers can leave behind processes in various execution contexts.
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-09-06T05:25:18.150Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.150Z
Learning: The Shell driver in jumpstarter can spawn multiple processes and leave them behind, creating zombie processes that need to be reaped even in non-PID 1 scenarios.
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO 4.6.2+, `open_signal_receiver()` is a regular synchronous context manager that should be used with a plain `with` statement, not `async with`. However, it returns an asynchronous iterator that must be consumed with `async for`. The correct usage pattern is: `with open_signal_receiver(...) as signals: async for signum in signals:`. Reference: https://anyio.readthedocs.io/en/stable/signals.html
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO, `open_signal_receiver()` returns a synchronous context manager (used with regular `with`), not an asynchronous one. While the context manager is synchronous, it yields an asynchronous iterator that should be consumed with `async for`. Reference: https://anyio.readthedocs.io/en/stable/signals.html
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
🧬 Code graph analysis (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (2)
packages/jumpstarter-cli-common/jumpstarter_cli_common/exceptions.py (1)
leaf_exceptions(78-102)packages/jumpstarter/jumpstarter/exporter/exporter.py (1)
status(164-183)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: pytest-matrix (macos-15, 3.12)
- GitHub Check: pytest-matrix (macos-15, 3.13)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
- GitHub Check: e2e
- GitHub Check: pytest-matrix (macos-15, 3.11)
🔇 Additional comments (5)
packages/jumpstarter-cli/jumpstarter_cli/run.py (5)
15-24: Exporter exception surfacing is clear and avoids noise from cancellations.Good use of leaf_exceptions and filtering CancelledError before echoing.
76-78: LGTM on ExceptionGroup handling.Centralizing exporter exception rendering here reads well.
101-110: PID 1–gated SIGCHLD reaping + selective forwarding looks right.Avoids status stealing outside PID 1 while still forwarding user signals to the child.
113-114: Registering signal handlers post-fork is correct.Handler set covers the expected signals including SIGCHLD.
116-133: Verify exit-code contract (child returns signal numbers only).Parent maps nonzero exit codes to 128+code assuming the child returns a signal number. Confirm exporter never returns arbitrary non-signal codes, or guard against that case.
1e04d00 to
aa27baf
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
packages/jumpstarter-driver-shell/jumpstarter_driver_shell/driver.py (1)
141-141: Do not log env values or raw args; sanitize to avoid secrets leakage.Logging
combined_envandargscan expose credentials/tokens. Log only env keys and args count (plus cwd), not values or full args.Apply:
- self.logger.debug( f"running {method} with cmd: {cmd} and env: {combined_env} " f"and args: {args}") + safe_env_keys = sorted(combined_env.keys()) + self.logger.debug( + "running %s cmd=%r env_keys=%s args_len=%d cwd=%r", + method, cmd, safe_env_keys, len(args), self.cwd, + )Also sanitize similar logs (e.g., Line 39) to avoid printing raw
env/args.
🧹 Nitpick comments (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (1)
26-41: Reduce log noise and guard against accidental use beyond PID 1.Lower the catch-all log to debug (common/benign errors) and early-return if not PID 1 to prevent misuse.
def _reap_zombie_processes(capture_child=None): - """Reap zombie processes when running as PID 1.""" + """Reap zombie processes when running as PID 1.""" + if os.getpid() != 1: + return @@ - except Exception as e: - logger.warning(f"PARENT: Error during zombie reaping: {e}") + except Exception as e: + logger.debug(f"PARENT: Error during zombie reaping: {e}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
packages/jumpstarter-cli/jumpstarter_cli/run.py(5 hunks)packages/jumpstarter-driver-shell/jumpstarter_driver_shell/driver.py(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.150Z
Learning: Zombie process reaping in jumpstarter is useful beyond just PID 1 cases, as drivers can leave behind processes in various execution contexts.
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.150Z
Learning: The Shell driver in jumpstarter can spawn multiple processes and leave them behind, creating zombie processes that need to be reaped even in non-PID 1 scenarios.
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#584
File: packages/jumpstarter/jumpstarter/client/grpc.py:38-54
Timestamp: 2025-08-18T07:13:37.619Z
Learning: User michalskrivanek prefers inline code over small helper function extractions when the abstraction doesn't significantly improve readability or reduce complexity. They value straightforward, readable code over reducing minor duplication.
📚 Learning: 2025-08-14T13:11:35.034Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-09-06T05:25:18.150Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.150Z
Learning: Zombie process reaping in jumpstarter is useful beyond just PID 1 cases, as drivers can leave behind processes in various execution contexts.
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-09-06T05:25:18.150Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.150Z
Learning: The Shell driver in jumpstarter can spawn multiple processes and leave them behind, creating zombie processes that need to be reaped even in non-PID 1 scenarios.
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
🧬 Code graph analysis (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (2)
packages/jumpstarter-cli-common/jumpstarter_cli_common/exceptions.py (1)
leaf_exceptions(78-102)packages/jumpstarter/jumpstarter/exporter/exporter.py (1)
status(164-183)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Redirect rules - jumpstarter-docs
- GitHub Check: Header rules - jumpstarter-docs
- GitHub Check: Pages changed - jumpstarter-docs
- GitHub Check: e2e
- GitHub Check: pytest-matrix (macos-15, 3.11)
- GitHub Check: pytest-matrix (macos-15, 3.13)
- GitHub Check: pytest-matrix (macos-15, 3.12)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
🔇 Additional comments (7)
packages/jumpstarter-cli/jumpstarter_cli/run.py (7)
15-24: Centralized exporter exception reporting looks good.
73-78: Replacing inline handling with _handle_exporter_exceptions is a clean improvement.
88-95: Graceful wait with fallback to captured status is fine.If
child_info['status']remainsNone, caller already restarts. OK.
101-110: SIGCHLD-only reaping and PG signal forwarding are correct.Skipping direct wait here and using the reaper + dedicated wait avoids ECHILD races.
113-114: Signal set registration LGTM.
116-119: Handling unknown status by restarting is acceptable here.
143-143: setsid in the child before serving is correct to enable PG-level signal control.
mangelajo
left a comment
There was a problem hiding this comment.
it looks good just one comment
some drivers (e.g.Shell) can spawn multiple processes and leave them behind. We also kill processes when timeout is exceeded, and that also may not necessarily kill all of them. Let's reap them when they exit so that we don't end up accumulating zombie processes, especially when running as PID 1.
and update uv.lock with recent dependency change
aa27baf to
aeb2a67
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
packages/jumpstarter-cli/jumpstarter_cli/run.py (3)
26-41: Tighten SIGCHLD reaper behavior: lower log level on handler errors and clarify intent in docstringTo avoid noisy logs from occasional EINTR/benign races in the signal path and to document the status-capture contract:
- Downgrade the catch-all warning to debug.
- Expand the docstring to note that this may reap the direct child and stash its status for later.
Apply:
-def _reap_zombie_processes(capture_child=None): - """Reap zombie processes when running as PID 1.""" +def _reap_zombie_processes(capture_child=None): + """Reap zombie processes when running as PID 1. + + If the direct child exits first, its wait status may be consumed here and + stored into capture_child['status'] for the main wait path to use. + """ @@ - except Exception as e: - logger.warning(f"PARENT: Error during zombie reaping: {e}") + except Exception as e: + logger.debug(f"PARENT: Error during zombie reaping: {e}")
88-95: Make ECHILD fallback explicit when no status was capturedVery edge-case: if ECHILD occurs and, for any reason, the handler didn’t capture a status, we currently return None upstream and silently restart. Make this branch explicit and log it for traceability (functionally equivalent restart).
def _wait_for_child(pid, child_info): @@ - except ChildProcessError: - status = child_info['status'] + except ChildProcessError: + status = child_info.get('status') + if status is None: + logger.debug("PARENT: Child %d already reaped but no status captured; assuming clean restart", pid) + return 0 return status
101-110: Forwarding signals: add fallback to kill(pid, sig) to cover the fork→setsid raceRight after fork and before the child calls setsid(), killpg(pid, sig) can fail with ESRCH because the child’s PGID isn’t established yet. Fallback to os.kill(pid, sig) ensures no signals are dropped in that narrow window.
elif signum != signal.SIGCHLD: logger.info("PARENT: Got %d (%s), forwarding to child PG %d", signum, signal.Signals(signum).name, pid) if pid > 0: try: os.killpg(pid, signum) - except (ProcessLookupError, OSError): - pass + except (ProcessLookupError, OSError): + try: + os.kill(pid, signum) + except (ProcessLookupError, OSError): + pass
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
packages/jumpstarter-cli/jumpstarter_cli/run.py(5 hunks)packages/jumpstarter-driver-shell/jumpstarter_driver_shell/driver.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/jumpstarter-driver-shell/jumpstarter_driver_shell/driver.py
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.150Z
Learning: Zombie process reaping in jumpstarter is useful beyond just PID 1 cases, as drivers can leave behind processes in various execution contexts.
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.150Z
Learning: The Shell driver in jumpstarter can spawn multiple processes and leave them behind, creating zombie processes that need to be reaped even in non-PID 1 scenarios.
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#584
File: packages/jumpstarter/jumpstarter/client/grpc.py:38-54
Timestamp: 2025-08-18T07:13:37.619Z
Learning: User michalskrivanek prefers inline code over small helper function extractions when the abstraction doesn't significantly improve readability or reduce complexity. They value straightforward, readable code over reducing minor duplication.
📚 Learning: 2025-08-14T13:11:35.034Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-09-06T05:25:18.150Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.150Z
Learning: Zombie process reaping in jumpstarter is useful beyond just PID 1 cases, as drivers can leave behind processes in various execution contexts.
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-09-06T05:25:18.150Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.150Z
Learning: The Shell driver in jumpstarter can spawn multiple processes and leave them behind, creating zombie processes that need to be reaped even in non-PID 1 scenarios.
Applied to files:
packages/jumpstarter-cli/jumpstarter_cli/run.py
🧬 Code graph analysis (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (1)
packages/jumpstarter-cli-common/jumpstarter_cli_common/exceptions.py (1)
leaf_exceptions(78-102)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: e2e
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
- GitHub Check: pytest-matrix (macos-15, 3.13)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
- GitHub Check: pytest-matrix (macos-15, 3.12)
- GitHub Check: pytest-matrix (macos-15, 3.11)
🔇 Additional comments (5)
packages/jumpstarter-cli/jumpstarter_cli/run.py (5)
15-24: Centralized exporter exception handling looks goodFiltering out cancellations and flattening the exception group improves signal-to-noise for users. No issues spotted.
77-77: Good call to centralize exception reportingCatching ExceptionGroup with except* and delegating to the helper is clean and keeps serve() readable.
113-114: Signal set is appropriateCatching INT/TERM/HUP/QUIT plus CHLD in the parent covers the expected lifecycle. Looks good.
116-119: Graceful restart path on unknown statusReturning None to trigger a restart when status is unavailable aligns with the loop semantics. No change needed.
143-144: setsid early in the child is correctBecoming a session leader ensures group-directed signals from the parent reach descendants. This pairs well with the killpg() forwarding.
|
Successfully created backport PR for |
some drivers (e.g.Shell) can spawn multiple processes and leave them behind. We also kill processes when timeout is exceeded, and that also may not necessarily kill all of them. Let's reap them when they exit so that we don't end up accumulating zombie processes forever
Summary by CodeRabbit
Bug Fixes
Refactor
Chores