Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flang] use setsid to assign the child to prevent zombie as it will be clean up by init process #77944

Merged
merged 6 commits into from
Jan 19, 2024

Conversation

yi-wu-arm
Copy link
Contributor

When using setsid() in a child process created by fork(), a new session is created, and the child becomes a session leader. If the parent process terminates before the child, the child becomes an orphan and is adopted by the init process. The init process will eventually clean up the child process once it exits.

However, killing the parent does not automatically kill the child; the child will continue running until it exits.
Proper cleanup involves waiting for the child process to exit using wait() or waitpid() in the parent process to avoid zombie processes, but this approach is not valid for EXECUTE_COMMAND_LINE with async mode.

@llvmbot llvmbot added flang:runtime flang Flang issues not falling into any other category labels Jan 12, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 12, 2024

@llvm/pr-subscribers-flang-runtime

Author: Yi Wu (yi-wu-arm)

Changes

When using setsid() in a child process created by fork(), a new session is created, and the child becomes a session leader. If the parent process terminates before the child, the child becomes an orphan and is adopted by the init process. The init process will eventually clean up the child process once it exits.

However, killing the parent does not automatically kill the child; the child will continue running until it exits.
Proper cleanup involves waiting for the child process to exit using wait() or waitpid() in the parent process to avoid zombie processes, but this approach is not valid for EXECUTE_COMMAND_LINE with async mode.


Full diff: https://github.com/llvm/llvm-project/pull/77944.diff

1 Files Affected:

  • (modified) flang/runtime/execute.cpp (+12-2)
diff --git a/flang/runtime/execute.cpp b/flang/runtime/execute.cpp
index 48773ae8114b0b..1bd5bb81ec8461 100644
--- a/flang/runtime/execute.cpp
+++ b/flang/runtime/execute.cpp
@@ -180,8 +180,6 @@ void RTNAME(ExecuteCommandLine)(const Descriptor &command, bool wait,
     }
     FreeMemory((void *)wcmd);
 #else
-    // terminated children do not become zombies
-    signal(SIGCHLD, SIG_IGN);
     pid_t pid{fork()};
     if (pid < 0) {
       if (!cmdstat) {
@@ -191,6 +189,18 @@ void RTNAME(ExecuteCommandLine)(const Descriptor &command, bool wait,
         CheckAndCopyCharsToDescriptor(cmdmsg, "Fork failed");
       }
     } else if (pid == 0) {
+      if (setsid() == -1) {
+        if (!cmdstat) {
+          terminator.Crash(
+              "setsid() failed with errno: %d, asynchronous process initiation failed.",
+              errno);
+        } else {
+          StoreIntToDescriptor(cmdstat, ASYNC_NO_SUPPORT_ERR, terminator);
+          CheckAndCopyCharsToDescriptor(
+              cmdmsg, "setsid() failed, asynchronous process initiation failed.");
+        }
+        exit(EXIT_FAILURE);
+      }
       int status{std::system(newCmd)};
       TerminationCheck(status, cmdstat, cmdmsg, terminator);
       exit(status);

Copy link

github-actions bot commented Jan 12, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@EugeneZelenko EugeneZelenko removed the flang Flang issues not falling into any other category label Jan 12, 2024
@llvmbot llvmbot added the flang Flang issues not falling into any other category label Jan 12, 2024
@klausler klausler removed their request for review January 12, 2024 17:03
@yi-wu-arm
Copy link
Contributor Author

yi-wu-arm commented Jan 12, 2024

debug printout printf("Child process (PID: %d) is running in a new session (SID: %d).\n", getpid(), getsid(0)); from child and parent:

$ ./a.out 
Parent process (PID: 273500) is running in a new session (SID: 2347).
Child process (PID: 273501) is running in a new session (SID: 273501).

A different session id is what we want.

Copy link
Member

@DavidTruby DavidTruby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but give a bit of time for anyone else to have a look in case my understanding (which matches yours) is wrong.

if (!cmdstat) {
terminator.Crash("setsid() failed with errno: %d, asynchronous "
"process initiation failed.",
errno);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this formatting comes from clang-format, I wonder why it does such a bad job here. (This isn't a request for you to change anything just a comment)

Copy link
Contributor

@psteinfeld psteinfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this change? Is there a test that's failing?

Also, I don't see the changes listed here, but when I try to build from this repository, I get errors. The function ExecuteCommandLine declares a variable called newCmd. The version of that declaration that I see in this repository looks like this:

    const char *newCmd{EnsureNullTerminated(
     ...

Then, later in this function this variable is deallocated. For me, this causes the following error message:

/local/home/psteinfeld/main/yiwu/llvm-project/flang/runtime/execute.cpp:213:24: error: cast from type ‘const char*’ to type ‘void*’ casts away qualifiers [-Werror=cast-qual]
  213 |     FreeMemory((void *)newCmd);
      |                        ^~~~~~
At global scope:
cc1plus: error: unrecognized command line option ‘-Wno-ctad-maybe-unsupported’ [-Werror]
cc1plus: all warnings being treated as errors

@yi-wu-arm
Copy link
Contributor Author

Sorry, let me rebase on main, there are patches has been uploaded to solve this problem.

The problem is listed here: #77803 In short, once a async EXECUTE_COMMAND_LINE is called, all future EXECUTE_COMMAND_LINE will have a cmdstat of 2 (execution error) because std:::system return -1. It will fail on gfortran llvm test suite.

A simple reproducer would be

program test()
call execute_command_line("echo hi", .false.)
call execute_command_line("echo hi")
end program test

console output

hi

fatal Fortran runtime error(/home/yiwu02/gitrepo/llvm-project/test_fortran_code/test.f90:3): Execution error with system status code: -1
hi

fatal Fortran runtime error(/home/yiwu02/gitrepo/llvm-project/test_fortran_code/test.f90:2): Execution error with system status code: -1
Aborted (core dumped)

Copy link
Contributor

@psteinfeld psteinfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All builds and tests correctly and looks good.

@kiranchandramohan
Copy link
Contributor

Could you add a test?

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG.

@yi-wu-arm yi-wu-arm merged commit 5a7f9a5 into llvm:main Jan 19, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:runtime flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants