Skip to content
This repository was archived by the owner on Jan 23, 2026. It is now read-only.

close the session on lease expire#757

Merged
mangelajo merged 6 commits intojumpstarter-dev:mainfrom
evakhoni:close_ended_lease
Dec 3, 2025
Merged

close the session on lease expire#757
mangelajo merged 6 commits intojumpstarter-dev:mainfrom
evakhoni:close_ended_lease

Conversation

@evakhoni
Copy link
Copy Markdown
Contributor

@evakhoni evakhoni commented Dec 1, 2025

closes https://github.com/jumpstarter-dev/jumpstarter/issues/701

current implementation retrieves the process via callback, and updates the lease instance, where inside the monitor loop edit: re-implemented with setting a lease ending callback, where a SIGHUP is sent to the process. I have tried previously with shell_process.terminate() (which sends SIGTERM) but it's not closing the shell, while shell_process.kill() is causing an exit code 247

Summary by CodeRabbit

  • New Features

    • Shell launches and command execution can accept a lease object so runtime can be monitored and callbacks triggered as expiration approaches.
    • Leases now provide a hook that is called when expiration is imminent or immediate.
  • Bug Fixes

    • Processes started under a lease are terminated when the lease expires, reducing orphaned processes.
  • Documentation

    • Usage notes updated to describe lease handling for interactive shells and commands.

✏️ Tip: You can customize this high-level summary in your review settings.

@netlify
Copy link
Copy Markdown

netlify Bot commented Dec 1, 2025

Deploy Preview for jumpstarter-docs ready!

Name Link
🔨 Latest commit 17de08c
🔍 Latest deploy log https://app.netlify.com/projects/jumpstarter-docs/deploys/693024bd2eb62a0008d96bf1
😎 Deploy Preview https://deploy-preview-757--jumpstarter-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Dec 1, 2025

Walkthrough

Adds lease-aware process handling: the CLI forwards a lease into launch_shell, launched subprocesses are started via a new helper that wires a lease callback, and the Lease monitor invokes that callback when nearing or reaching expiry.

Changes

Cohort / File(s) Summary
CLI shell invocation
packages/jumpstarter-cli/jumpstarter_cli/shell.py
Passes the lease object into launch_shell so launched shells receive lease wiring.
Shell launch and process wiring
packages/jumpstarter/jumpstarter/common/utils.py
Added lease_ending_handler(process, lease, remaining_time) and _run_process(cmd, env, lease=None); extended launch_shell(..., lease=None) signature; replaced direct Popen usage with _run_process; when provided, sets lease.lease_ending_callback to a partial bound to the started process.
Lease monitoring callback
packages/jumpstarter/jumpstarter/client/lease.py
Added `lease_ending_callback: Callable[[Self, timedelta], None]

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CLI as CLI
    participant Launcher as launch_shell / _run_process
    participant Sub as Subprocess (shell)
    participant Lease as Lease monitor

    CLI->>Launcher: launch_shell(..., lease)
    Launcher->>Launcher: prepare env & command
    Launcher->>Sub: start subprocess via _run_process
    Launcher->>Lease: set lease.lease_ending_callback -> partial(lease_ending_handler, process)
    Lease->>Lease: monitor remaining time
    alt nearing expiry
        Lease->>Sub: lease.lease_ending_callback(remaining_timedelta)
        Note right of Sub: handler may send SIGHUP or initiate shutdown
    else expired
        Lease->>Sub: lease.lease_ending_callback(timedelta(0))
    end
    Sub-->>Launcher: exit status
    Launcher-->>CLI: return exit code
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Verify lease.lease_ending_callback is set before the lease monitor can invoke it.
  • Review lease_ending_handler signal usage and cross-platform behavior (SIGHUP semantics).
  • Check _run_process for correct env propagation, proper waiting/exit-code passthrough, and race conditions between process exit and callback invocation.

Possibly related PRs

Suggested reviewers

  • mangelajo
  • kirkbrauer
  • NickCao

Poem

"I hopped in soft with whiskers bright, 🐇
Bound the shell to lease's light,
When time draws near or falls to naught,
I give a nudge — the process caught.
Burrow calm, the session right."

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: implementing lease expiration handling to close shell sessions when leases expire, which is the primary objective.
Linked Issues check ✅ Passed The PR successfully addresses issue #701 by implementing lease expiration detection and shell session closure via SIGHUP callback, as required.
Out of Scope Changes check ✅ Passed All changes are scoped to implementing the lease expiration callback mechanism required by issue #701; no unrelated modifications detected.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
packages/jumpstarter/jumpstarter/common/utils.py (1)

59-67: Enhance docstring to document the callback signature.

The docstring mentions process_callback but doesn't specify what argument(s) it receives.

Apply this diff to clarify the callback signature:

     """Launch a shell with a custom prompt indicating the exporter type.
 
     Args:
         host: The jumpstarter host path
         context: The context of the shell (e.g. "local" or exporter name)
         allow: List of allowed drivers
         unsafe: Whether to allow drivers outside of the allow list
-        process_callback: Optional callback to receive the process object before waiting
+        process_callback: Optional callback function that receives the Popen process object before waiting
     """
packages/jumpstarter/jumpstarter/client/lease.py (2)

56-56: Refine type annotation from Any to subprocess.Popen | None.

The Any type is too permissive. Since this field stores a subprocess.Popen instance, use a more specific type.

Apply this diff:

+from subprocess import Popen
+
 @dataclass(kw_only=True)
 class Lease(ContextManagerMixin, AsyncContextManagerMixin):
     channel: Channel
     ...
-    shell_process: Any = field(default=None, init=False)  # Shell process to terminate on expiry
+    shell_process: Popen | None = field(default=None, init=False)  # Shell process to terminate on expiry

267-267: Consider moving the signal import to the top of the file.

While importing inside the conditional works, the Python convention is to place all imports at module level for clarity and to avoid repeated imports if this code path executes multiple times (though here it only runs once before breaking).

Apply this diff:

 import logging
 import os
+import signal
 import sys
 from collections.abc import AsyncGenerator, Generator

Then remove the inline import:

                         if self.shell_process is not None:
-                            import signal
                             try:
                                 self.shell_process.send_signal(signal.SIGHUP)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between afaf058 and fea8bea.

📒 Files selected for processing (3)
  • packages/jumpstarter-cli/jumpstarter_cli/shell.py (1 hunks)
  • packages/jumpstarter/jumpstarter/client/lease.py (2 hunks)
  • packages/jumpstarter/jumpstarter/common/utils.py (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project-structure.mdc)

Ruff should be used for code formatting and linting, excluding jumpstarter-protocol package

Files:

  • packages/jumpstarter/jumpstarter/client/lease.py
  • packages/jumpstarter-cli/jumpstarter_cli/shell.py
  • packages/jumpstarter/jumpstarter/common/utils.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: michalskrivanek
Repo: jumpstarter-dev/jumpstarter PR: 608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.184Z
Learning: The Shell driver in jumpstarter can spawn multiple processes and leave them behind, creating zombie processes that need to be reaped even in non-PID 1 scenarios.
📚 Learning: 2025-09-06T05:25:18.184Z
Learnt from: michalskrivanek
Repo: jumpstarter-dev/jumpstarter PR: 608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.184Z
Learning: The Shell driver in jumpstarter can spawn multiple processes and leave them behind, creating zombie processes that need to be reaped even in non-PID 1 scenarios.

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/shell.py
🧬 Code graph analysis (1)
packages/jumpstarter/jumpstarter/common/utils.py (2)
packages/jumpstarter-cli/jumpstarter_cli/shell.py (1)
  • shell (79-116)
packages/jumpstarter/jumpstarter/utils/env.py (1)
  • env (42-51)
🪛 GitHub Actions: Lint
packages/jumpstarter/jumpstarter/common/utils.py

[error] 49-49: Ruff check failed: C901 'launch_shell' is too complex (13 > 10). Command: 'ruff check /home/runner/work/jumpstarter/jumpstarter'

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: e2e
  • GitHub Check: Redirect rules - jumpstarter-docs
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: Header rules - jumpstarter-docs
  • GitHub Check: Pages changed - jumpstarter-docs
🔇 Additional comments (2)
packages/jumpstarter-cli/jumpstarter_cli/shell.py (1)

23-24: LGTM! Process callback correctly captures the shell process.

The lambda callback cleanly stores the shell process on the lease object, enabling the monitor to terminate it on expiry.

packages/jumpstarter/jumpstarter/common/utils.py (1)

77-81: LGTM! Callback invocation is consistent across all branches.

The process callback is correctly invoked after creating the Popen object in every code path, ensuring the caller can capture the process handle regardless of which shell type is used.

Also applies to: 88-94, 110-114, 124-133, 136-139

Comment thread packages/jumpstarter/jumpstarter/client/lease.py Outdated
Comment thread packages/jumpstarter/jumpstarter/client/lease.py Outdated
Comment thread packages/jumpstarter/jumpstarter/client/lease.py Outdated
Comment thread packages/jumpstarter/jumpstarter/common/utils.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
packages/jumpstarter/jumpstarter/common/utils.py (1)

52-76: Lease callback implementation is sound; consider restoring previous callback after exit

The new lease_ending_handler + _run_process satisfy the PR goal: when the lease monitor drives lease.lease_ending_callback(remaining_time), the handler will send SIGHUP to the shell once the lease has actually ended, and wait() ensures the child process is reaped even if it’s killed from that callback (which aligns well with the prior zombie‑process concern). Based on learnings, this is a good direction.

Two follow‑ups to consider:

  1. Type/contract of remaining_time
    The comparison remaining_time <= timedelta(0) assumes the callback is always invoked with a datetime.timedelta. If there’s any possibility of passing raw seconds or some other type, this will raise at runtime. Please double‑check the Lease.lease_ending_callback contract to ensure it’s always a timedelta and that negative/zero values only occur once the lease is effectively ended.

  2. Do not permanently overwrite lease.lease_ending_callback
    _run_process currently overwrites lease.lease_ending_callback and never restores it. If in the future the same Lease is reused or other code installs its own callback, this could lead to surprising behavior and leaves a stale callback pointing at a finished process. You can make this more robust by saving and restoring any previous callback:

     def _run_process(
         cmd: list[str],
         env: dict[str, str],
         lease=None,
     ) -> int:
         """Helper to run a process with an option to set a lease ending callback."""
  •   process = Popen(cmd, stdin=sys.stdin, stdout=sys.stdout, stderr=sys.stderr, env=env)
    
  •   if lease is not None:
    
  •       lease.lease_ending_callback = partial(lease_ending_handler, process)
    
  •   return process.wait()
    
  •   process = Popen(cmd, stdin=sys.stdin, stdout=sys.stdout, stderr=sys.stderr, env=env)
    
  •   previous_callback = None
    
  •   if lease is not None:
    
  •       previous_callback = getattr(lease, "lease_ending_callback", None)
    
  •       lease.lease_ending_callback = partial(lease_ending_handler, process)
    
  •   try:
    
  •       return process.wait()
    
  •   finally:
    
  •       if lease is not None:
    
  •           lease.lease_ending_callback = previous_callback
    
    
    This keeps the lease callback focused on the lifetime of this process and avoids leaving a dangling reference to `process` on the lease.
    
    
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fea8bea and 5fec1e0.

📒 Files selected for processing (3)
  • packages/jumpstarter-cli/jumpstarter_cli/shell.py (1 hunks)
  • packages/jumpstarter/jumpstarter/client/lease.py (4 hunks)
  • packages/jumpstarter/jumpstarter/common/utils.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/jumpstarter/jumpstarter/client/lease.py
  • packages/jumpstarter-cli/jumpstarter_cli/shell.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project-structure.mdc)

Ruff should be used for code formatting and linting, excluding jumpstarter-protocol package

Files:

  • packages/jumpstarter/jumpstarter/common/utils.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: michalskrivanek
Repo: jumpstarter-dev/jumpstarter PR: 608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.184Z
Learning: The Shell driver in jumpstarter can spawn multiple processes and leave them behind, creating zombie processes that need to be reaped even in non-PID 1 scenarios.
🧬 Code graph analysis (1)
packages/jumpstarter/jumpstarter/common/utils.py (3)
packages/jumpstarter/jumpstarter/utils/env.py (1)
  • env (42-51)
packages/jumpstarter/jumpstarter/config/client.py (1)
  • lease (135-143)
packages/jumpstarter-cli/jumpstarter_cli/shell.py (1)
  • shell (78-115)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Redirect rules - jumpstarter-docs
  • GitHub Check: Header rules - jumpstarter-docs
  • GitHub Check: Pages changed - jumpstarter-docs
  • GitHub Check: build
  • GitHub Check: e2e
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: pytest-matrix (macos-15, 3.12)
🔇 Additional comments (3)
packages/jumpstarter/jumpstarter/common/utils.py (3)

2-7: Imports for lease-aware process handling look appropriate

signal, timedelta, and partial are all used in the new lease-related helpers; import ordering and usage look consistent with the rest of the module.


87-102: New lease parameter and docstring align with behavior

Extending launch_shell with lease=None and documenting both the parameter and the integer exit‑code return value match what _run_process does. The function’s public API now clearly communicates that it can be bound to a lease without changing existing callers.


112-114: Consistent use of _run_process ensures lease handling across all shell paths

Routing all shell/command execution paths (custom command, bash, fish, zsh, and the default branch) through _run_process is a nice consolidation. It guarantees that:

  • The environment setup remains as before for each shell.
  • Lease‑driven termination via lease_ending_callback is applied uniformly, so any shell/command tied to a lease will be SIGHUPed on lease end.
  • Processes are always wait()ed on, which helps avoid zombies even when the lease monitor kills them. Based on learnings, this is beneficial.

No issues spotted with these call sites.

Also applies to: 122-122, 139-139, 152-152, 155-155

Comment thread packages/jumpstarter/jumpstarter/client/lease.py Outdated
Comment thread packages/jumpstarter/jumpstarter/client/lease.py Outdated
Copy link
Copy Markdown
Member

@mangelajo mangelajo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline comments

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
packages/jumpstarter/jumpstarter/client/lease.py (1)

278-280: Consider documenting the threshold callback behavior.

The callback is invoked both when the lease expires AND when approaching the threshold. Currently, the handler in utils.py only acts on expiration (remaining_time <= 0). The threshold invocation could be useful for future warning messages but isn't utilized.

This is fine for the current implementation, but consider adding a brief inline comment explaining that the threshold callback is for optional pre-expiration warnings.

packages/jumpstarter/jumpstarter/common/utils.py (1)

68-77: Add type hints for better clarity and maintainability.

The helper function works correctly, but adding type hints for the lease parameter would improve code clarity. Consider using a forward reference or importing the type conditionally to avoid circular imports.

+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from jumpstarter.client.lease import Lease
+
 def _run_process(
     cmd: list[str],
     env: dict[str, str],
-    lease=None,
+    lease: "Lease | None" = None,
 ) -> int:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5fec1e0 and 63368c3.

📒 Files selected for processing (2)
  • packages/jumpstarter/jumpstarter/client/lease.py (4 hunks)
  • packages/jumpstarter/jumpstarter/common/utils.py (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project-structure.mdc)

Ruff should be used for code formatting and linting, excluding jumpstarter-protocol package

Files:

  • packages/jumpstarter/jumpstarter/client/lease.py
  • packages/jumpstarter/jumpstarter/common/utils.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: michalskrivanek
Repo: jumpstarter-dev/jumpstarter PR: 608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.184Z
Learning: The Shell driver in jumpstarter can spawn multiple processes and leave them behind, creating zombie processes that need to be reaped even in non-PID 1 scenarios.
📚 Learning: 2025-10-14T17:43:07.788Z
Learnt from: michalskrivanek
Repo: jumpstarter-dev/jumpstarter PR: 704
File: packages/jumpstarter/jumpstarter/client/grpc.py:100-107
Timestamp: 2025-10-14T17:43:07.788Z
Learning: In the Jumpstarter client lease model (packages/jumpstarter/jumpstarter/client/grpc.py), `effective_duration` represents the elapsed time for an active lease so far, not the total duration. To calculate expected release time, use `effective_begin_time + duration` (where `duration` is the configured/requested duration), not `effective_begin_time + effective_duration`.

Applied to files:

  • packages/jumpstarter/jumpstarter/client/lease.py
🧬 Code graph analysis (1)
packages/jumpstarter/jumpstarter/common/utils.py (2)
packages/jumpstarter/jumpstarter/config/client.py (1)
  • lease (135-143)
packages/jumpstarter/jumpstarter/utils/env.py (1)
  • env (42-51)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Redirect rules - jumpstarter-docs
  • GitHub Check: Header rules - jumpstarter-docs
  • GitHub Check: Pages changed - jumpstarter-docs
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
  • GitHub Check: build
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: e2e
🔇 Additional comments (4)
packages/jumpstarter/jumpstarter/client/lease.py (2)

56-58: LGTM! Clean callback design for lease expiration.

The callback field is well-typed and properly decoupled from shell-specific logic, following the architectural guidance from previous reviews. Using init=False correctly prevents accidental constructor usage.


268-270: Callback invocation looks correct for lease expiration.

Passing timedelta(0) when the lease has expired is the right approach. The handler in utils.py correctly checks for remaining_time <= timedelta(0) to trigger termination.

packages/jumpstarter/jumpstarter/common/utils.py (2)

52-65: Error handling correctly addresses past review feedback.

The ProcessLookupError and OSError catch is appropriate for handling the case where the process has already terminated. The pass with comment is acceptable since there's nothing to do if the process is already gone.


113-114: LGTM! Consistent integration of lease across all shell branches.

All execution paths (command, bash, fish, zsh, default) now correctly use _run_process with the lease parameter, ensuring consistent lease callback handling regardless of shell type.

Also applies to: 123-123, 140-140, 153-153, 156-156

Comment on lines +74 to +77
process = Popen(cmd, stdin=sys.stdin, stdout=sys.stdout, stderr=sys.stderr, env=env)
if lease is not None:
lease.lease_ending_callback = partial(lease_ending_handler, process)
return process.wait()
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Search for SIGHUP, process group, and related signal handling
rg -n "SIGHUP|process.?group|setpgrp|setsid|signal|Popen" --type py -C 2

Repository: jumpstarter-dev/jumpstarter

Length of output: 198


🏁 Script executed:

# Read the full context of the file to understand the function and its purpose
wc -l packages/jumpstarter/jumpstarter/common/utils.py

Repository: jumpstarter-dev/jumpstarter

Length of output: 198


🏁 Script executed:

# Read the relevant section with more context
sed -n '60,90p' packages/jumpstarter/jumpstarter/common/utils.py

Repository: jumpstarter-dev/jumpstarter

Length of output: 198


🌐 Web query:

Python subprocess.Popen SIGHUP propagation shell child processes cleanup

💡 Result:

Short answer

  • By default signals (SIGHUP) sent to your Python process are NOT automatically propagated to subprocess children (including grandchildren created by a shell). Use process groups or a parent-death signal to get predictable cleanup. [1][2][3]

Recommended options

  1. Put the child (and its descendants) in their own process group, then signal the group
  • Start with preexec_fn=os.setsid (POSIX) so the child is leader of a new session / process group, then send os.killpg(pgid, signal.SIGHUP/SIGTERM). Works for shell=True and direct execs (shell becomes group leader if it’s the immediate child). Example pattern:
    • p = Popen(cmd, preexec_fn=os.setsid)
    • os.killpg(os.getpgid(p.pid), signal.SIGTERM).
      References: Python subprocess docs; examples/StackOverflow. [1][2]
  1. Arrange children to die when the parent dies (pdeathsig / prctl)
  • In the child before exec call prctl(PR_SET_PDEATHSIG, SIGHUP) so the kernel delivers a signal to the child when its parent exits. This requires using a small C/ctypes helper or a library (prctl/pyprctl) via preexec_fn. This makes grandchildren created later still get killed when the parent exits only if they inherit that behavior or are direct children. [2][3]
  1. Use a watchdog in the child (application-level)
  • Have the child detect parent death (getppid()==1) or open a pipe/socket to parent and exit when it closes. Useful for portable/fine‑grained control. [2]

Caveats

  • shell=True: the immediate child is the shell; signaling the Popen.pid affects the shell, not the grandchildren directly. Using a process group (setsid) and killing the group is the usual fix. [1][2]
  • Race conditions: processes spawned after you enumerate children may escape; kill-by-process-group or design children to exit on pdeathsig is more robust. [2][3]
  • Windows differs: use job objects or TerminateProcess (not covered here). [1]

Key references
[1] Python subprocess docs (Popen, preexec_fn/process groups, signals).
[2] "Process Groups / Sessions" example and discussion (PyMOTW / subprocess guides).
[3] StackOverflow threads showing patterns: preexec_fn=os.setsid + os.killpg and prctl(PR_SET_PDEATHSIG) examples.

If you want, I can paste a minimal code example for (a) setsid + killpg or (b) prctl-based pdeathsig.


🌐 Web query:

zombie processes Popen waitpid signal handling Python shell spawned processes

💡 Result:

Short answer: Zombies happen when a child has exited but its parent never reaped its exit status. With subprocess.Popen this occurs if you never call wait()/communicate() (or otherwise call waitpid()) on the child, or if your parent is blocked/ignores reaping. Fixes: explicitly reap children or let the kernel auto-reap them. Sources: [1][2][3][4].

What causes zombies (quick):

  • Child exits but parent doesn't call wait()/waitpid() → process stays as "zombie". [3][7]
  • Keeping Popen objects alive without calling wait/communicate/poll can leave zombies. [3][7]
  • Use of shell=True can add an extra shell process (sh -c) to consider; you still must reap the real child. [2]

Practical fixes (pick what's appropriate):

  1. Call wait()/communicate() or poll() and then wait when you detect termination
  • For each Popen: p = Popen(...); later p.wait() or p.communicate(); or poll() in a loop and then wait. This is the simplest and most robust. [3][4]
  1. Reap in a SIGCHLD handler (non-blocking waitpid loop)
  • Install a handler that calls os.waitpid(-1, os.WNOHANG) in a loop to reap all exited children. Careful with reentrancy and threads. Example patterns: loop on waitpid until no more children. [1]
  1. If you never need child statuses, tell the OS to discard them
  • signal.signal(signal.SIGCHLD, signal.SIG_IGN) will cause immediate reaping on many Unixes (usable only if you never need to wait for children). Note: only allowed in main thread on some platforms. [2][3]
  1. Avoid accidentally keeping references / let subprocess auto-reap behavior be understood
  • Don’t retain Popen objects you won’t wait on; if you spawn many children, track and reap them. Historically subprocess had auto-reaping behavior; rely on explicit reaping for control. [8]
  1. For detached/daemon children
  • If you want fully detached background processes, use double-fork or start the child in its own session (start_new_session=True or preexec_fn=os.setsid) and ensure stdio/fds are closed (DEVNULL, close_fds=True) so the parent won’t be left with zombies or hung pipes. Then reap the immediate child or use SIG_IGN if you don’t care about exit status. [2][3]

Caveats and gotchas

  • Popen.wait() can be affected by spurious wakeups/interrupts in some Python versions—use robust patterns and test under your target Python. [5]
  • signal.signal(SIGCHLD, SIG_IGN) is process-global and not safe in libraries that need child exit info; it may only be set in the main thread. [2][1]

Selected references

  • SIGCHLD handler / reaping example. [1]
  • StackOverflow discussion: SIG_IGN and practical Popen advice. [2][3]
  • Official subprocess behavior (Popen methods). [4]
  • Python bug discussion on waitpid/spurious wakeups. [5]

If you want, tell me your Python version and a short snippet of how you spawn processes and I’ll give a specific fix you can apply.


Reconsider reliance on shell cleanup; explicit signal handling recommended.

SIGHUP is not automatically propagated to grandchildren spawned by the shell—this depends on shell session configuration and process group setup. While process.wait() reaps the immediate shell process, any grandchildren it creates may not receive SIGHUP and could become zombies if the shell doesn't propagate the signal or if they detach.

For reliable cleanup on lease expiry, consider using preexec_fn=os.setsid to place the shell in its own process group, then signal the group explicitly when cleanup is needed. Alternatively, rely on process.wait() combined with explicit child monitoring if the current behavior has not been validated in the Shell driver's actual deployment scenarios.

Copy link
Copy Markdown
Contributor Author

@evakhoni evakhoni Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried the suggested solution, however using it introduced a bug where sending a SIGINT (with ctrl+c for example) is causing a crash and leaves the terminal in a misbehaving state.
I also tested locally, in zsh under fedora, the use of SIGHUP with the current approach is correctly closing subprocesses, even background ones, unless intentionally launched with nohup. nevertheless, I'm open to suggestions

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evakhoni, thank you for testing the process group approach and reporting the issue. The SIGINT problem you encountered is expected behavior—when you use os.setsid(), the shell becomes a session leader in a new session, which removes it from the terminal's foreground process group. As a result, Ctrl+C signals from the terminal are no longer delivered to the shell, breaking interactive control.

Given your testing shows SIGHUP correctly propagates to subprocesses in zsh (and likely bash), and since this implementation is specifically for handling lease expiry rather than being a general-purpose process manager, the current approach appears reasonable. The key constraints here are:

  1. Interactive shell behavior must be preserved – users need Ctrl+C to work normally
  2. Lease expiry is a specific event – you're handling a known scenario (lease ends, close session) rather than managing arbitrary process trees

Your local testing is the right validation approach. For production confidence, you might want to verify behavior across the shell types you support (bash, zsh, fish) and document any known edge cases where subprocesses might not receive SIGHUP (like when users explicitly use nohup or disown).

The current implementation balances the need for lease-based cleanup with maintaining a working interactive shell, which seems appropriate for this use case.


✏️ Learnings added
Learnt from: evakhoni
Repo: jumpstarter-dev/jumpstarter PR: 757
File: packages/jumpstarter/jumpstarter/common/utils.py:74-77
Timestamp: 2025-12-02T19:57:16.556Z
Learning: Using `preexec_fn=os.setsid` with subprocess.Popen breaks SIGINT (Ctrl+C) handling for interactive shells in jumpstarter because it moves the shell to a different session, removing it from the terminal's foreground process group.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: michalskrivanek
Repo: jumpstarter-dev/jumpstarter PR: 608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.184Z
Learning: The Shell driver in jumpstarter can spawn multiple processes and leave them behind, creating zombie processes that need to be reaped even in non-PID 1 scenarios.

Learnt from: michalskrivanek
Repo: jumpstarter-dev/jumpstarter PR: 608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.184Z
Learning: Zombie process reaping in jumpstarter is useful beyond just PID 1 cases, as drivers can leave behind processes in various execution contexts.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
packages/jumpstarter/jumpstarter/client/lease.py (1)

286-287: LGTM! Consider adding exception handling for robustness.

The callback invocations correctly pass the lease instance and appropriate remaining time. The logic properly differentiates between approaching expiration (with actual remaining time) and full expiration (with timedelta(0)).

For improved robustness, consider wrapping callback invocations in try-except to prevent unexpected callback failures from crashing the monitor task:

                     if remain < timedelta(0):
                         # lease already expired, stopping monitor
                         logger.info("Lease {} ended at {}".format(self.name, end_time))
                         if self.lease_ending_callback is not None:
-                            self.lease_ending_callback(self, timedelta(0))
+                            try:
+                                self.lease_ending_callback(self, timedelta(0))
+                            except Exception as e:
+                                logger.error("Error in lease_ending_callback for lease %s: %s", self.name, e)
                         break

Apply similar handling at line 298 for the approaching-expiration callback.

Also applies to: 296-298

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 63368c3 and a8d0e9a.

📒 Files selected for processing (1)
  • packages/jumpstarter/jumpstarter/client/lease.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project-structure.mdc)

Ruff should be used for code formatting and linting, excluding jumpstarter-protocol package

Files:

  • packages/jumpstarter/jumpstarter/client/lease.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: michalskrivanek
Repo: jumpstarter-dev/jumpstarter PR: 608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.184Z
Learning: The Shell driver in jumpstarter can spawn multiple processes and leave them behind, creating zombie processes that need to be reaped even in non-PID 1 scenarios.
📚 Learning: 2025-10-14T17:43:07.788Z
Learnt from: michalskrivanek
Repo: jumpstarter-dev/jumpstarter PR: 704
File: packages/jumpstarter/jumpstarter/client/grpc.py:100-107
Timestamp: 2025-10-14T17:43:07.788Z
Learning: In the Jumpstarter client lease model (packages/jumpstarter/jumpstarter/client/grpc.py), `effective_duration` represents the elapsed time for an active lease so far, not the total duration. To calculate expected release time, use `effective_begin_time + duration` (where `duration` is the configured/requested duration), not `effective_begin_time + effective_duration`.

Applied to files:

  • packages/jumpstarter/jumpstarter/client/lease.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Redirect rules - jumpstarter-docs
  • GitHub Check: Header rules - jumpstarter-docs
  • GitHub Check: Pages changed - jumpstarter-docs
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
  • GitHub Check: e2e
  • GitHub Check: build
🔇 Additional comments (2)
packages/jumpstarter/jumpstarter/client/lease.py (2)

4-4: LGTM!

The addition of Callable to the imports is necessary for the new lease_ending_callback type annotation and follows proper typing conventions.


57-59: LGTM!

The callback field is correctly typed and follows the agreed-upon architectural pattern from previous reviews to decouple lease lifecycle management from shell-specific logic.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
packages/jumpstarter/jumpstarter/client/lease.py (1)

57-59: Consider adding an ended boolean parameter for clarity.

The callback signature takes (Self, timedelta), using timedelta(0) to indicate an ended lease. While functional, adding an explicit ended: bool parameter (as suggested in past reviews) would make the callback interface clearer and avoid requiring callers to check if remaining_time <= timedelta(0).

Based on past review comments, the signature could be: Callable[[Self, timedelta, bool], None] where the boolean explicitly indicates whether the lease has ended.

Apply this diff to add the ended parameter:

-    lease_ending_callback: Callable[[Self, timedelta], None] | None = field(
+    lease_ending_callback: Callable[[Self, timedelta, bool], None] | None = field(
         default=None, init=False
-    )  # Called when lease is ending
+    )  # Called when lease is ending or has ended (ended=True when lease expired)

Note: This change would require updating the callback invocations in monitor_async and the callback implementation in the related files.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a8d0e9a and 17de08c.

📒 Files selected for processing (1)
  • packages/jumpstarter/jumpstarter/client/lease.py (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project-structure.mdc)

Ruff should be used for code formatting and linting, excluding jumpstarter-protocol package

Files:

  • packages/jumpstarter/jumpstarter/client/lease.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: michalskrivanek
Repo: jumpstarter-dev/jumpstarter PR: 608
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:0-0
Timestamp: 2025-09-06T05:25:18.184Z
Learning: The Shell driver in jumpstarter can spawn multiple processes and leave them behind, creating zombie processes that need to be reaped even in non-PID 1 scenarios.
📚 Learning: 2025-10-14T17:43:07.788Z
Learnt from: michalskrivanek
Repo: jumpstarter-dev/jumpstarter PR: 704
File: packages/jumpstarter/jumpstarter/client/grpc.py:100-107
Timestamp: 2025-10-14T17:43:07.788Z
Learning: In the Jumpstarter client lease model (packages/jumpstarter/jumpstarter/client/grpc.py), `effective_duration` represents the elapsed time for an active lease so far, not the total duration. To calculate expected release time, use `effective_begin_time + duration` (where `duration` is the configured/requested duration), not `effective_begin_time + effective_duration`.

Applied to files:

  • packages/jumpstarter/jumpstarter/client/lease.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Redirect rules - jumpstarter-docs
  • GitHub Check: Header rules - jumpstarter-docs
  • GitHub Check: Pages changed - jumpstarter-docs
  • GitHub Check: build
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: e2e
🔇 Additional comments (2)
packages/jumpstarter/jumpstarter/client/lease.py (2)

4-4: LGTM!

The Callable import is correctly added to support the new callback type annotation.


218-221: LGTM!

The conditional logging improves UX by suppressing the "Releasing Lease" message when the lease has already expired, reducing noise during cleanup.

Comment thread packages/jumpstarter/jumpstarter/client/lease.py
Comment on lines +299 to +301
# Notify callback about approaching expiration
if self.lease_ending_callback is not None:
self.lease_ending_callback(self, remain)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add error handling around callback invocation.

Similar to the expired case, if the callback raises an exception during the warning phase, it will crash the monitor task before the lease actually expires, preventing proper expiration handling.

Apply this diff to add error handling:

                         # Notify callback about approaching expiration
                         if self.lease_ending_callback is not None:
-                            self.lease_ending_callback(self, remain)
+                            try:
+                                self.lease_ending_callback(self, remain)
+                            except Exception as e:
+                                logger.error("Error in lease ending callback for lease %s: %s", self.name, e)
🤖 Prompt for AI Agents
In packages/jumpstarter/jumpstarter/client/lease.py around lines 299 to 301, the
lease_ending_callback is invoked without protection so any exception it raises
will crash the monitor task; wrap the callback invocation in a try/except block
that catches Exception, logs the exception (including context: that it occurred
during the warning/remaining-time callback and the lease id or relevant
identifiers), and continues without re-raising so the monitor keeps running;
follow the same error-handling pattern used for the expired-case callback to
ensure consistent behavior.

@mangelajo mangelajo merged commit 5300757 into jumpstarter-dev:main Dec 3, 2025
18 checks passed
@evakhoni evakhoni deleted the close_ended_lease branch December 3, 2025 13:59
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

If lease ends in jmp shell, j fails with huge tracebacks

2 participants