Use subreaper to kill unowned subprocesses in raylet. #42992

rynewang · 2024-02-05T20:25:31Z

Currently when user code spawns subprocess (from core worker), we don't have a good way to track them. We do best effort to kill direct child procs on core worker exit (#33976), but if a worker crashed (e.g. sigkill'd), or there are grandchild processes those processes leak. They may still hold valuable resources e.g. GPU memory.

This patches adds a feature to kill any recursive children from a core worker on its death. It's gated by a flag RAY_kill_child_processes_on_worker_exit_with_raylet_subreaper default disabled.

Once enabled:

raylet as Linux subreaper.
raylet tracks "known subprocesses" it spawned.
raylet auto-kills unknown children every 10s.
core worker as Linux subreaper, no auto-kills.

so that

if a core worker is running, a (grand)child dies -> all other (grand)children keeps running
if a core worker dies -> all (grand)children killed by raylet.

To avoid zombies, sets core worker to auto reap zombies by ignoring SIGCHLD. The raylet already reaps zombies, but now it also removes the dead children in the "known subprocesses".

Added a linux only unit test, and a doc page.

Fixes #42861, #26118

Fixes ray-project#42861 Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

rynewang · 2024-02-05T21:13:04Z

removeOwnedChild needs a redo. Now we call it in ProcessFD dtor. However there are times we deallocate ProcessFD and does not track the process anymore, yet we don't want to kill it immediately either, e.g. when you spawn a one time util script. So we really need to track exit of those processes, i.e. in sigchld handler.

while (waitpid(&pid)) {
if pid in children {
    children.remove(pid)
  }
}

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

cadedaniel · 2024-02-06T05:45:49Z

src/ray/util/process.cc

+// Register a signal handler for the given signal.
+// The handler will be called with the signal_set and the error code.
+// After the handler is called, the signal will be re-registered.
+void RegisterSignalHandlerLoop(boost::asio::signal_set &signals,


is it possible for the reference to be invalid during shutdown

cadedaniel · 2024-02-06T05:46:51Z

src/ray/util/process.cc

+}
+
+void SigchldHandlerPlain(const boost::system::error_code &error, int signal_number) {
+  if (!error) {


what does this case mean?

+1. Can you comment here? Also it is no-op if it is an error. Is this the legit behavior?

cadedaniel · 2024-02-06T05:50:01Z

src/ray/util/process.cc

+  if (!error) {
+    int status;
+    pid_t pid;
+    while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {


seems we should check for EINVAL to fail loudly.

cadedaniel · 2024-02-06T05:51:40Z

src/ray/util/process.cc

+      if (WIFEXITED(status)) {
+        RAY_LOG(INFO) << "Child process " << pid << " exited with status "
+                      << WEXITSTATUS(status);
+      } else if (WIFSIGNALED(status)) {
+        RAY_LOG(INFO) << "Child process " << pid << " exited from signal "
+                      << WTERMSIG(status);
+      }
+      removeOwnedChild(pid);


Does this replace existing raylet-side process cleanup in MacOS / when the new code path is disabled on linux? I am not familiar with the double forking stuff in Ray but if it's meant to replace it then will need to think a bit to make sure edge cases are handled

cadedaniel · 2024-02-06T06:17:48Z

src/ray/util/process.cc

    // Use pipe to track process lifetime. (The pipe closes when process terminates.)
    fd = pipefds[0];
    if (pid == -1) {
      ec = std::error_code(errno, std::system_category());
    }
 #endif
+    addOwnedChild(pid);


You need a signal block between process creation and adding it to the owned set. Otherwise there is a race where the process dies before it is added to the owned set. I'll leave to you whether this is something that should be fixed now or not.

(maybe boost magic solves this for you)

cadedaniel · 2024-02-06T06:18:54Z

src/ray/util/process.cc

    // Use pipe to track process lifetime. (The pipe closes when process terminates.)
    fd = pipefds[0];
    if (pid == -1) {
      ec = std::error_code(errno, std::system_category());
    }
 #endif
+    addOwnedChild(pid);
    return ProcessFD(pid, fd);


seems the constructor is public, what happens if someone creates a process using another mechanism? tricky

cadedaniel · 2024-02-06T06:27:03Z

src/ray/util/process.cc

+      if (WIFEXITED(status)) {
+        RAY_LOG(INFO) << "Child process " << pid << " exited with status "
+                      << WEXITSTATUS(status);
+      } else if (WIFSIGNALED(status)) {
+        RAY_LOG(INFO) << "Child process " << pid << " exited from signal "
+                      << WTERMSIG(status);
+      }


why do we need the notion of ownership at all? seems this will only run for dead child procs which need to be reaped anyways

rkooo567 · 2024-02-06T13:58:12Z

I will review this by today!

rkooo567

Current status: it is not working yet. Waiting for it to work

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

rynewang · 2024-02-12T19:52:07Z

Updated, now the subreaper test works. i.e. if the worker is dead, its subprocesses are sigkilled.

rynewang · 2024-02-12T20:07:22Z

Note:

we may not want a hard sigkill at first. We can do something like "sigterm, after 5s then sigkill".
I am wondering how waterproof is this approach. If we spawn a one time command (bash start_something.sh) which spawns a tool, will it be killed? Maybe we can bring back the "decouple" arg and only care about the "coupled" processes, i.e. workers.

rkooo567

In general, it lgtm. But I feel like we don't need SigchldHandlerPlain for this PR? I think it is the best we only handle killing unowned children in this PR, and replace stdin read -> wait on child procs in another PR. wait on child procs will also improve observabiilty.

we may not want a hard sigkill at first. We can do something like "sigterm, after 5s then sigkill".

SGTM

I am wondering how waterproof is this approach. If we spawn a one time command (bash start_something.sh) which spawns a tool, will it be killed? Maybe we can bring back the "decouple" arg and only care about the "coupled" processes, i.e. workers.

Do we ever do this from raylet though? Besides, agreed to keep decouple here.

rkooo567 · 2024-02-14T13:55:16Z

src/ray/raylet/main.cc

@@ -71,6 +72,14 @@ DEFINE_string(session_dir, "", "The path of this ray session directory.");
 DEFINE_string(log_dir, "", "The path of the dir where log files are created.");
 DEFINE_string(resource_dir, "", "The path of this ray resource directory.");
 DEFINE_int32(ray_debugger_external, 0, "Make Ray debugger externally accessible.");
+// TODO(ryw): maybe instead of a new flag, we use kill_child_processes_on_worker_exit ?


I think this makes sense

rkooo567 · 2024-02-14T13:56:23Z

src/ray/raylet/worker_pool.cc

-  // become zombies instead of dying gracefully.
-  signal(SIGCHLD, SIG_IGN);
-#endif
+  // No need to treat SIGCHLD as it's handled in raylet main.cc.


What's going to happen to this behavior?>

// Ignore SIGCHLD signals. If we don't do this, then worker processes will // become zombies instead of dying gracefully.

also do you have any guess when this happens? This seems like an old behavior (when workers don't fate share with raylet)?

we added it in subreaper.cc when the flag was disabled https://github.com/ray-project/ray/pull/42992/files#diff-b0a94cc61107f7633adfcda0a86e9472617b35b4931b182fab82785e0d4a81ebR138

rkooo567 · 2024-02-14T13:58:31Z

src/ray/util/process.cc

@@ -123,7 +244,6 @@ class ProcessFD {
    }
 #ifdef _WIN32

-    (void)decouple;  // Windows doesn't require anything particular for decoupling.


can you explain me what this argument was for?

rkooo567 · 2024-02-14T14:02:33Z

src/ray/util/process.cc

+
+void addOwnedChild(pid_t pid) {
+#ifdef __linux__
+  std::lock_guard<std::mutex> guard(m);


nit, but just use absl::MutexLock lock(&mutex_);? (we use this for locks)

rkooo567 · 2024-02-14T14:04:06Z

src/ray/util/process.cc

+// Set this process as a subreaper.
+void SetThisProcessAsSubreaper() {
+  if (prctl(PR_SET_CHILD_SUBREAPER, 1) == -1) {
+    perror("prctl");


Consider RAY_LOG(FATAL) here?

rkooo567 · 2024-02-14T14:10:48Z

src/ray/util/process.cc

+      });
+}
+
+void SigchldHandlerPlain(const boost::system::error_code &error, int signal_number) {


Is it possible for owned children (the workers), we just handle it in the same way as before? I think it is probably unnecessary change for this PR?

rkooo567 · 2024-02-14T14:11:13Z

src/ray/util/process.cc

+#ifdef __linux__
+
+void KillUnownedChildren() {
+  auto maybe_child_procs = GetAllProcsWithPpid(GetPID());


how slow is this?

1 file read per process in the current linux. so not very fast. Fortunately we are not blocking anything here.

rkooo567 · 2024-02-14T14:11:46Z

src/ray/util/process.cc

+
+  // Enumerating child procs is not supported on this platform.
+  if (!maybe_child_procs) {
+    RAY_LOG(WARNING)


maybe this should just exit here and we shouldn't even start this function if it is not linux

rkooo567 · 2024-02-14T14:12:10Z

src/ray/util/process.cc

+  std::vector<pid_t> to_kill;
+  to_kill.reserve(maybe_child_procs->size());
+  {
+    std::lock_guard<std::mutex> guard(m);


why do we need a lock btw? is it happening in a different thread?

if 1 thread is spawning a child process, while another thread is gathering this owned children list.

rkooo567 · 2024-02-14T14:13:08Z

src/ray/util/process.cc

+      RAY_LOG(INFO) << "Killing leaked child process " << pid;
+      auto error = KillProc(pid);
+      if (error) {
+        RAY_LOG(ERROR) << "Failed to kill leaked child process " << pid << " with error "


Use WARNING here. ERROR will be printed to the user driver.

rkooo567 · 2024-02-15T01:01:07Z

btw to be more clear about feedback (not 100% sure if it is possible);

Keep everything (decouple or how we track child process' health) as it is.
Ignore sigchld from owned children
Only handle sigchld from unonwned children and sigkill.
We can remove owned children pid when a worker is killed (there must be some sort of hook here)
Make sure things are logged properly with pid with INFO.
The behavior should be clearly specified from core doc.
Probably we can remove parent core worker killing child workers and convert it to this mechanism?

Some other comments;

Do you think we need a way to exclude subprocesses from being killed? E.g., if an actor start a new job (.sh file) and exits, is there a way to not kill it? My guess is it is probably not a requirement given we already kill child procs and no one complained (meaning no regression)
I wonder if we want to do this for subprocesses started from agent.py. Maybe it is okay (because agent.py fate share with raylet).

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

rkooo567 · 2024-02-20T10:08:46Z

Btw, can we accelerate the progress of this PR as the branch cut is coming? We'd like to merge this by this week to meet 2/29

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

rkooo567 · 2024-03-01T22:16:00Z

please remove the label when it is ready to review again! I will take a look asap

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

rynewang · 2024-03-04T22:07:03Z

Only failure is a linkcheck which is in rllib, not relevant. Should be ok now.

jjyao · 2024-03-08T00:28:00Z

doc/source/ray-core/user-spawn-processes.rst

+
+On non-Linux platforms, user-spawned process is not controlled by Ray. The user is responsible for managing the lifetime of the child processes. If the parent Ray worker process dies, the child processes will continue to run.
+
+


What we provide is more like a last resort. We should mention that ideally child processes should catch the death of the parent and self-exit so this feature doesn't need to be turned on.

Since I reversed the default sigchld handler in core worker to reap-all-dead, users need to toggle it back to really do the signal handling. See the new ⚠️ Caution section.

Oh, I mean not the signal handle on the parent side but on the child side: https://stackoverflow.com/questions/284325/how-to-make-child-process-die-after-parent-exits.

Understood. Added Note in the last section in the intro.

jjyao · 2024-03-08T00:29:24Z

doc/source/ray-core/user-spawn-processes.rst

+
+- ``RAY_kill_child_processes_on_worker_exit`` (default ``true``): Only works on Linux. If true, core worker kills all subprocesses on exit. This won't work if the core worker crashed or was killed by a signal. If a process is created as a daemon, e.g. double forked, it will not be killed by this mechanism.
+
+- ``RAY_kill_child_processes_on_worker_exit_with_raylet_subreaper`` (default ``false``): Only works on Linux greater than or equal to 3.4. If true, Raylet kills any subprocesses that were spawned by the core worker after the core worker exits. This works even if the core worker crashed or was killed by a signal. Even if a process is created as a daemon, e.g. double forked, it will still be killed by this mechanism. The killing happens within 10 seconds after the core worker death.


What's the plan to turn this on by default and remove the other one?

I think we should never remove the flag as the behavior is kind of inconsistent with pure python

maybe better way is to allow this per task/actor eventually (if there's a request)

I mean removing the other RAY_kill_child_processes_on_worker_exit flag and the related code since we don't need two cleanup mechanisms.

jjyao · 2024-03-08T00:33:24Z

doc/source/ray-core/user-spawn-processes.rst

+  ray.init(_system_config={"kill_child_processes_on_worker_exit_with_raylet_subreaper":True})
+
+
+⚠️ Caution: Core worker needs to reap zombies


Is it more of an issue if the core work is long live since otherwise they will be killed by raylet.

yes, long running driver for example

I can reverse the default on core worker: it defaults to reap all zombies, and if users want to call waitpid they can set the signal handler to default (no waiting).

rkooo567

Generally LGTM!

PR description still relevant? can you update it?

rkooo567 · 2024-03-08T01:37:55Z

doc/source/ray-core/user-spawn-processes.rst

+
+We have 2 environment variables to handle subprocess killing on core worker exit:
+
+- ``RAY_kill_child_processes_on_worker_exit`` (default ``true``): Only works on Linux. If true, core worker kills all subprocesses on exit. This won't work if the core worker crashed or was killed by a signal. If a process is created as a daemon, e.g. double forked, it will not be killed by this mechanism.


e.g. double forked, it will not be killed by this mechanism.

I don't think people can understand this. Either remove it or add more details?

changed to "grandchild processes" which should be more understandable.

rkooo567 · 2024-03-08T01:40:42Z

doc/source/ray-core/user-spawn-processes.rst

+Lifetimes of a User-Spawn Process
+=================================
+
+To avoid leaking user-spawned processes, Ray provides mechanisms to kill all user-spawned processes when a worker that starts it exits. This feature prevents GPU memory leaks from child processes (e.g., torch).


Let's improve this a bit.

When you spawns child processes from Ray workers, you are responsible for managing the lifetime of child processes. However, it is not always possible, especially when worker crashes and child processes are spawned from libraries (torch dataloader).

To avoid leak...

rkooo567 · 2024-03-08T01:41:41Z

doc/source/ray-core/user-spawn-processes.rst

+
+When the feature is enabled, the core worker process becomes a subreaper (see the next section), meaning there can be some grandchildren processes that are reparented to the core worker process. If these processes exit, core worker needs to reap them to avoid zombies, even though they are not spawn by core worker. If core worker does not reap them, the zombies will accumulate and eventually cause the system to run out of resources like memory.
+
+You can add this code to the Ray Actors or Tasks to reap zombies, if you choose to enable the feature:


is this tested? can you add a test?

so it is the default behavior now right? Can you add info here?

rkooo567 · 2024-03-08T01:41:56Z

doc/source/ray-core/user-spawn-processes.rst

+⚠️ Caution: Core worker needs to reap zombies
+----------------------------------------------
+
+When the feature is enabled, the core worker process becomes a subreaper (see the next section), meaning there can be some grandchildren processes that are reparented to the core worker process. If these processes exit, core worker needs to reap them to avoid zombies, even though they are not spawn by core worker. If core worker does not reap them, the zombies will accumulate and eventually cause the system to run out of resources like memory.


We should not mention "core worker". It should be just "worker"

renamed all core worker to the worker

rkooo567 · 2024-03-08T01:42:29Z

python/ray/tests/test_kill_subprocesses.py

+    sys.platform != "linux",
+    reason="Orphan process killing only works on Linux.",
+)
+def test_daemon_processes_not_killed_until_actor_dead(enable_subreaper, shutdown_only):


Can you add a test where adding signal can reap grandchildren?

no need, reversed to core worker to default-reap all children.

got it! please make sure you added a test case for reap. (and document how to diable automatic reap)

rkooo567 · 2024-03-08T01:54:49Z

src/ray/util/subreaper.cc

+// The handler will be called with the signal_set and the error code.
+// After the handler is called, the signal will be re-registered.
+// The callback keeps a reference of the shared ptr to make sure it's not destroyed.
+void RegisterSignalHandlerLoop(std::shared_ptr<boost::asio::signal_set> signals,


is this necessary? no better API from asio that does this automatically?

Not any I know of. I did a quick search in github and did not find a good one

rkooo567 · 2024-03-08T01:55:06Z

src/ray/util/subreaper.cc

+  pid_t pid;
+  // Reaps any children that have exited. WNOHANG makes waitpid non-blocking and returns
+  // 0 if there's no zombie children.
+  while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {


this should be non-blocking right? (since proc should have already dead)

Non-blocking, yes. It's controlled by the WNOHANG. If there's no child dead already, it returns -1

rkooo567 · 2024-03-08T01:55:45Z

src/ray/util/subreaper.cc

+void SigchldHandlerReapZombieAndRemoveKnownChildren(
+    const boost::system::error_code &error, int signal_number) {
+  if (error) {
+    RAY_LOG(WARNING) << "Error in SIGCHLD handler: " << error.message();


should we just return here? what kind of error can happen?

The only error is "signal set cancelled" as boost::asio::error::operation_aborted but we won't ever cancel it.

https://www.boost.org/doc/libs/1_84_0/doc/html/boost_asio/reference/basic_signal_set/async_wait.html

It does not hurt to waitpid anyway.

can you comment in code? Also comment in the waitpid that it is non blocking (so reader can know although it is due to abort, it is okay)

added comments for the error and WNOHANG.

rkooo567 · 2024-03-08T01:57:10Z

src/ray/util/subreaper.cc

+// TODO: Checking PIDs is not 100% reliable because of PID recycling. If we find issues
+// later due to this, we can use pidfd.
+void KillUnknownChildren() {
+  auto to_kill =


Q: Is this automatically optimized by compiler?

I think it can do guaranteed copy elision but not 100% sure

rkooo567 · 2024-03-08T01:58:23Z

src/ray/util/subreaper.cc

+    create_child_fn();
+    return;
+  }
+  absl::MutexLock lock(&m_);


we don't need a lock when we create a child proc?

pid_t pid = create_child_fn(); absl::MutexLock lock(&m_); children_.insert(pid);

We don't want the creation of a pid (in procfs) happen when we are reading the procfs for pids, I assume

e.g. if 2 racing threads:

create process
a. then add pid to the known list

kill unknown children

If the order is 1 -> 2 -> a, the new process is killed which is not what we want. We have to make 1 and a be atomic on the list.

Why call the create child fn inside the AddKnownChild function? It should be cleaner if the creation and adding is decoupled. Like, in the caller:

child = create() tracker.addknownchild(child)

see the race condition I gave: if a killing happens between create() and addknownchild it may kill the newborn child process.

Co-authored-by: SangBin Cho <rkooo567@gmail.com> Signed-off-by: Ruiyang Wang <56065503+rynewang@users.noreply.github.com>

fishbone · 2024-03-08T06:47:50Z

src/ray/core_worker/core_worker.cc

+    // until this worker exits and raylet reaps it.
+    if (SetThisProcessAsSubreaper()) {
+      RAY_LOG(INFO) << "Set this core_worker process as subreaper: " << getpid();
+    } else {


When kill_child_processes_on_worker_exit_with_raylet_subreaper is set, why do we still need to set the core worker ? The raylet should just work as the reaper I think?

See https://github.com/ray-project/ray/pull/42992/files#diff-f6c49babcc6278c29b15f311f28011254c7e7d20a0ff06b03a1ac17a6fe352bbR103

tldr: if a child of core worker spawns a grandchild and dies, the grandchild is reparented to the nearest subreaper. But we don't want it to be killed (since the core worker is still alive), so we set the core worker as a subreaper as well to avoid it be noticed and killed by raylet.

Hmm, let me know if I understand it wrong:

if the core worker is dead => the grand child reparented to the raylet

if the raylet is dead => the core worker is still alive => later it'll kill itself => before it exits, it'll kill all the children?

it handles this case:

raylet -> core worker -> A -> B

then

A exited, we don't want to kill B.

Now, B is reparented to core worker who does NOT kill B, so it's good. If we don't do that, B'd be reparented to raylet and be killed, which is what we don't want

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

fishbone · 2024-03-08T07:36:07Z

src/ray/util/subreaper.cc

+// later due to this, we can use pidfd.
+void KillUnknownChildren() {
+  auto to_kill =
+      KnownChildrenTracker::instance().ListUnknownChildren([]() -> std::vector<pid_t> {


Why do we pass the lambda to the function instead of just move the lambda to the function body?

This is to ensure the KnownChildrenTracker only cares about the lock on the hash set, and it separates from the children finding code.

Hmm, I doubt the separation making things simpler, but I'm fine with it.

fishbone · 2024-03-08T07:39:29Z

src/ray/raylet/main.cc

+  // not allow a macro invocation (#ifdef) in another macro invocation (RAY_CHECK_OK),
+  // so we have to put it here.
+  auto enable_subreaper = [&]() {
+#ifdef __linux__


This probably is not an issue. Maybe we can put the linux into the SetThisProcessAsSubreaper.

For the platform specific code, the lower level we put it, the better.

fishbone

Overall it looks good.

Question is that, should we just turn it on by default? At least let raylet be the reaper?

The lifecycle of subprocess now becomes complicated, in different cases, the behavior is different.

IMO, if we just turned it on by default for most cases, and let the user know this might be better. Some special flag in remote options to turn it off for corner cases.

rkooo567 · 2024-03-08T13:52:55Z

@fishbone we will turn it on by default. It is off because it is merged in the last minute and a bit of breaking change.

rkooo567

LGTM if tests pass. Let's make sure the core worker reaping grand children is the default behavior

rkooo567 · 2024-03-08T14:00:25Z

also before merge, let's make sure to run mac test/window test/release test

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

jjyao · 2024-03-09T00:04:31Z

I'll merge this for now. @angelinalg could you still review the doc related changes and we will address comments in the follow up PR.

Use subreaper to kill unowned subprocesses in raylet.

c001ca5

Fixes ray-project#42861 Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

rynewang assigned jjyao, cadedaniel and rkooo567 Feb 5, 2024

fix compile errors

e052e8f

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

rynewang added 4 commits February 5, 2024 13:15

add test to CI

67cbec0

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

fix cpp

d0cec51

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

fix build

1353d9d

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

move removeOwnedChild from process dtor to sigchld handler

aa391fa

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

cadedaniel reviewed Feb 6, 2024

View reviewed changes

rkooo567 reviewed Feb 8, 2024

View reviewed changes

rkooo567 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 8, 2024

fix sigchld did not registered

eee7f46

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

rynewang removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 12, 2024

rkooo567 reviewed Feb 14, 2024

View reviewed changes

rynewang added 2 commits February 16, 2024 18:21

refactor the arg

f5536b7

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

Merge remote-tracking branch 'origin/master' into raylet-subreaper

fd0d1c7

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

rynewang added 7 commits February 20, 2024 12:07

fix compile

5983253

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

refactor: put the subreaper code to a separate file

d69e39b

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

fix linux compile

fd58d65

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

fix comments and stale defs

4396092

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

remove useless header

894454a

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

fix logic, rename Owned -> Known

3255f8e

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

fix windows

d52e207

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

rynewang and others added 5 commits March 1, 2024 16:55

fix compile

f8aed6b

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

fix compile

a323d57

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

Merge branch 'master' into raylet-subreaper

77a7bed

update doctest, update naming, enable flag of the tracker

e1b42a5

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

Merge branch 'master' into raylet-subreaper

fb2c8c8

rynewang removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Mar 4, 2024

rynewang mentioned this pull request Mar 5, 2024

[Core] Subprocesses not terminated when actor dies of segment fault #26118

Closed

Merge branch 'master' into raylet-subreaper

40fd8fd

fishbone self-assigned this Mar 8, 2024

jjyao reviewed Mar 8, 2024

View reviewed changes

rkooo567 reviewed Mar 8, 2024

View reviewed changes

Update src/ray/util/subreaper.h

3d0e364

Co-authored-by: SangBin Cho <rkooo567@gmail.com> Signed-off-by: Ruiyang Wang <56065503+rynewang@users.noreply.github.com>

fishbone reviewed Mar 8, 2024

View reviewed changes

set core worker default behavior to auto reap.

1bc2f59

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

fishbone reviewed Mar 8, 2024

View reviewed changes

rkooo567 approved these changes Mar 8, 2024

View reviewed changes

rynewang added 2 commits March 8, 2024 11:40

move to medium, doc lint

d221358

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

add comment

0c3aa18

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

fishbone approved these changes Mar 8, 2024

View reviewed changes

rynewang mentioned this pull request Mar 8, 2024

Core: Join zombie subprocesses after task completion #42913

Open

matthewdeng approved these changes Mar 9, 2024

View reviewed changes

jjyao merged commit 944bfe7 into ray-project:master Mar 9, 2024
9 checks passed

rynewang deleted the raylet-subreaper branch March 9, 2024 01:55


		On non-Linux platforms, user-spawned process is not controlled by Ray. The user is responsible for managing the lifetime of the child processes. If the parent Ray worker process dies, the child processes will continue to run.


		- ``RAY_kill_child_processes_on_worker_exit`` (default ``true``): Only works on Linux. If true, core worker kills all subprocesses on exit. This won't work if the core worker crashed or was killed by a signal. If a process is created as a daemon, e.g. double forked, it will not be killed by this mechanism.

		- ``RAY_kill_child_processes_on_worker_exit_with_raylet_subreaper`` (default ``false``): Only works on Linux greater than or equal to 3.4. If true, Raylet kills any subprocesses that were spawned by the core worker after the core worker exits. This works even if the core worker crashed or was killed by a signal. Even if a process is created as a daemon, e.g. double forked, it will still be killed by this mechanism. The killing happens within 10 seconds after the core worker death.

		ray.init(_system_config={"kill_child_processes_on_worker_exit_with_raylet_subreaper":True})


		⚠️ Caution: Core worker needs to reap zombies


		We have 2 environment variables to handle subprocess killing on core worker exit:

		- ``RAY_kill_child_processes_on_worker_exit`` (default ``true``): Only works on Linux. If true, core worker kills all subprocesses on exit. This won't work if the core worker crashed or was killed by a signal. If a process is created as a daemon, e.g. double forked, it will not be killed by this mechanism.


		When the feature is enabled, the core worker process becomes a subreaper (see the next section), meaning there can be some grandchildren processes that are reparented to the core worker process. If these processes exit, core worker needs to reap them to avoid zombies, even though they are not spawn by core worker. If core worker does not reap them, the zombies will accumulate and eventually cause the system to run out of resources like memory.

		You can add this code to the Ray Actors or Tasks to reap zombies, if you choose to enable the feature:

Use subreaper to kill unowned subprocesses in raylet. #42992

Use subreaper to kill unowned subprocesses in raylet. #42992

Conversation

rynewang commented Feb 5, 2024 • edited

rynewang commented Feb 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rkooo567 commented Feb 6, 2024

rkooo567 left a comment

Choose a reason for hiding this comment

rynewang commented Feb 12, 2024

rynewang commented Feb 12, 2024

rkooo567 left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rkooo567 commented Feb 15, 2024

rkooo567 commented Feb 20, 2024

rkooo567 commented Mar 1, 2024

rynewang commented Mar 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rkooo567 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rkooo567 Mar 8, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rynewang Mar 8, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rynewang commented Feb 5, 2024 •

edited

rkooo567 left a comment •

edited

rkooo567 Mar 8, 2024 •

edited

rynewang Mar 8, 2024 •

edited

fishbone Mar 8, 2024 •

edited