-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
With SHOPT_DEVFD, process substitution leaks file descriptor when passed to function as argument #67
Comments
There's all sorts of stuff found via Google re RH knowing about a file descriptor leak in ksh93. I can't open the bugzilla for it though... (fix file descriptor leak (#1058563)). Scientific Linux has a source RPM that supposedly includes that fix, I downloaded the src.rpm but there's a ton of patches in it that aren't identified by bugzilla numbers. :( http://linuxsoft.cern.ch/cern/updates/slc54/x86_64/RPMS/repoview/ksh.html Maybe it helps. |
https://bugzilla.redhat.com/1058563 is an old RHEL-5 bug that was fixed by reverting previously applied downstream patches. |
This seems to be the patch that was used for the same bug in RHEL-6 and it appears to be already included in upstream: https://bugzilla.redhat.com/attachment.cgi?id=566049&action=diff |
That one-line patch is mentioned twice in the src.rpm I was looking at: ksh-20100621-cloexec.patch ksh93u+m does not have the second one, but I couldn't tell you what it does.
|
@kdudka: I'm "not authorized" to see your link, https://bugzilla.redhat.com/1058563. True, the subsequent patch that you linked to is already applied. Undoing it makes no difference to the behaviour of the reproducer. @posguy99: Unfortunately that patch does not fix this bug. The output of the reproducer is unchanged. Yes, it would be nice to know what bug it does fix. |
Hm. OpenSuse claimed they fixed something similar, ksh93u+m doesn't have this one either. From https://build.opensuse.org/package/show/openSUSE:Leap:42.3:Update/ksh
|
I know that the RHEL-5 bug is available to Red Hat employees only but there is nothing really useful behind the link anyway. The reproducer mentioned there is very similar to the reproducer from the public RHEL-6 bug:
|
I'm getting the feeling that there's some confusion going on. This particular bug is with process substitution, not I/O redirection. The function call in the reproducer above fdUser <(echo '========') does not have a redirection in it, just a process substitution. Unfortunately, they made the syntax of process substitution confusingly similar to that of I/O redirection, with Process substitution is more akin to command substitution (backticks or So, patches that fix bugs with redirection are not that likely to fix this particular bug. |
Don't get me wrong though, thanks for the links to those redirection-related patches. Unfortunately I can't get to the documentation on what they do. Googling for "ksh-20100621-fdstatus.patch" (pasted by @posguy99 above) shows it's linked to Red Hat bug 924440 but it is also closed to non-employees. @kdudka, can you take a look there to figure out what this patch does, so we can determine whether it needs to be included? The same is true for the OpenSUSE patch. It says it's linked to OpenSUSE bug 954856 and once again I'm not authorized to access it. |
From https://download.rhn.redhat.com/errata/RHBA-2013-1599.html:
Doesn't say what the reproducer would be. :( |
Yes, the summary of RHBZ#924440 is "crash in bestreclaim() after traversing a memory block with a very large size". We did not have any in house reproducer for the bug. The mentioned patch was provided and verified by a customer. |
This applies ksh-20100621-fdstatus.patch from Red Hat. Not very much information is available, so this one is more or less taken on faith. But it seems to make sense on the face of it: calling sh_fcntl() instead of fcntl(2) directly makes the shell update its internal file descriptor state more frequently. It claims to fix Red Hat bug 924440. The report is currently closed to the public: https://bugzilla.redhat.com/show_bug.cgi?id=924440 However, Kamil Dudka at Red Hat writes: #67 (comment) | Yes, the summary of RHBZ#924440 is "crash in bestreclaim() after | traversing a memory block with a very large size". We did not have | any in house reproducer for the bug. The mentioned patch was | provided and verified by a customer. ...and Marc Wilson dug up a Red Hat erratum containing this info: https://download.rhn.redhat.com/errata/RHBA-2013-1599.html | Previously, the ksh shell did not resize the file descriptor list | every time it was necessary. This could lead to memory corruption | when several file descriptors were used. As a consequence, ksh | terminated unexpectedly. This updated version resizes the file | descriptor list every time it is needed, and ksh no longer | crashes in the described scenario. (BZ#924440) No reproducer means no regression test can be added now. src/cmd/ksh93/sh/io.c, src/cmd/ksh93/sh/subshell.c, src/cmd/ksh93/sh/xec.c: - Change several fcntl(2) calls to sh_fcntl(). This function calls fcntl(2) and then updates the shell's file descriptor state.
I figured out a reproducer for the file descriptor leak @posguy99 flagged up here. The bug is that a file descriptor (at least 3, can't reproduce for 4 and up) opened with
Expected behaviour:
I'll apply the OpenSUSE patch and add a regression test. Thanks. |
File descriptors are not properly closed, causing a leak, when using a process substitution as an argument to a shell function. See: #67 Process substitution uses /dev/fd/NN pseudofiles if the kernel provides them. This is tested in src/cmd/ksh93/features/options which causes SHOPT_DEVFD to be defined if /dev/fd/9 can be used. If not, ksh uses a fallback mechanism involving a temporary FIFO, which works on all Unix variants. As it happens, the leak only occurs when using the /dev/fd mechanism. So, until a fix is found, we can work around the bug by disabling it. The FIFO mechanism might be slightly less robust, but it's an improvement over leaking file descriptors. Plus, there is room for improving it. src/cmd/ksh93/include/defs.h: - Unconditionally redefine SHOPT_DEVFD as 0 for now. src/cmd/ksh93/sh/args.c: sh_argprocsub(): - pathtemp() does appropriate access checks using access(2), but there is an inherent race condition between calling it and mkfifo(). Make the FIFO mechanism more robust by handling errors, trying again if an error occurs that must have resulted from losing that race, e.g. file name conflict or temp dir permission/location change. - Initially create the FIFO without any permissions, then chmod() the appropriate user read/write permissions. Since mkfifo() honours the umask and chmod() does not, this ensures that process substitution continues to work if a shell script sets a umask that disallows user read or write. (The /dev/fd/ mechanism does not care about the umask, so neither should the fallback.)
In ab5dedd I've applied a workaround that disables the use of /dev/fd/NN pseudofiles and falls back to the older FIFO method instead. (That commit also improves the robustness of the FIFO method.) This effectively works around this bug, but is not a real fix. The /dev/fd method is the most robust as it doesn't involve the disk file system. So I'm leaving this issue open. To debug this, we now first need to uncomment/remove the redefinition of I think the file descriptor leak is possibly caused by a bug in the parser. So far, I've been able to trace the following. At parse time, a process substitution is parsed using this function: Lines 1356 to 1367 in ab5dedd
>(…) gets the flags TFORK|FPIN|FAMP|FPCL and a process substitution of the form <(…) gets the flags TFORK|FPOU . These parser tree flags are defined/commented here: ksh/src/cmd/ksh93/include/shnodes.h Lines 33 to 71 in ab5dedd
At execution time, a process substitution is processed while handling the arguments for a simple command ( Line 991 in ab5dedd
arg_expand() : Line 615 in ab5dedd
sh_argprocsub() from this block: Lines 739 to 746 in ab5dedd
Lines 668 to 732 in ab5dedd
sh_iosave() with the file descriptor turned negative. But sh_iosave() appears to be designed to handle this: Lines 1589 to 1669 in ab5dedd
The file descriptor saved by Lines 1582 to 1634 in ab5dedd
FPCL flag is handled with a call to sh_close() . This should cover "output"-type process substitution of the form >(…) (parser flags TFORK|FPIN|FAMP|FPCL ). But for some reason it doesn't work: the file descriptor leak is also reproducible for output process substitutions. And I can see nothing here that handles closing file descriptors for input process substitutions (parser flags TFORK|FPOU ). Yet, this bug is only reproducible if a process substitution is given as an argument (without redirections) to a shell function call. The bug does not occur if it is given as an argument to any other kind of simple command. So those file descriptors get closed somewhere/somehow, but I've no idea where or how.
And that's as far as I've got so far with tracing how all this works. Maybe someone else finds this information useful to continue the hunt for this bug. |
After tracing where ksh goes after Lines 1422 to 1424 in d4adc8f
Normally for process substitutions
Alternatively, when using the
However, when a process substitution is passed to a function the
diff --git a/src/cmd/ksh93/sh/xec.c b/src/cmd/ksh93/sh/xec.c
index c116e341..991dad04 100644
--- a/src/cmd/ksh93/sh/xec.c
+++ b/src/cmd/ksh93/sh/xec.c
@@ -1420,6 +1420,7 @@ int sh_exec(register const Shnode_t *t, int flags)
bp->ptr = (void*)save_ptr;
bp->data = (void*)save_data;
/* don't restore for 'exec' or 'redirect' in subshell */
+ error(ERROR_warn(0), "shp->topfd: %d; topfd: %d", shp->topfd, topfd);
if((shp->topfd>topfd) && !(shp->subshell && (np==SYSEXEC || np==SYSREDIR)))
sh_iorestore(shp,topfd,jmpval);
When running the reproducer from #67 (comment):
When
|
Another minor note to add on: the first Line 761 in d4adc8f
In ksh93v- that line was changed to use sh_close . It doesn't affect the file descriptor leak but it might serve useful in fixing future process substitution bugs:https://github.com/att/ast/blob/2f2b1b8be315df029ce83c2ccc12a16fdcf73f29/src/cmd/ksh93/sh/args.c#L1071 |
On systems where ksh needs to use the older and less secure FIFO method for process substitutions (which is currently all of them as the more modern and solid /dev/fd method is still broken, see #67), process substitutions could leave background processes hanging in these two scenarios: 1. If the parent process exits without opening a pipe to the child process forked by the process substitution. The fifo_check() function in xec.c, which is periodically called to check if the parent process still exists while waiting for it to open the FIFO, verified the parent process's existence by checking if the PPID had reverted to 1, the traditional PID of init. However, POSIX specifies that the PPID can revert to any implementation- defined system process in that case. So this breaks on certain systems, causing unused process substitutions to hang around forever as they never detect that the parent disappeared. The fix is to save the current PID before forking and having the child check if the PPID has changed from that saved PID. 2. If command invoked from the main shell is passed a process substitution, but terminates without opening the pipe to the process substitution. In that case, the parent process never disappears in the first place, because the parent process is the main shell. So the same infinite wait occurs in unused process substitutions, even after correcting problem 1. The fix is to remember all FIFOs created for any number of process substitutions passed to a single command, and unlink any remaining FIFOs as they represent unused command substitutions. Unlinking them FIFOs causes sh_open() in the child to fail with ENOENT on the next periodic check, which can easily be handled. Fixing these problems causes the FIFO method to act identically to the /dev/fd method, which is good for compatibility. Even when #67 is fixed this will still be important, as ksh also runs on systems that do not have /dev/fd (such as AIX, HP-UX, and QNX), so will fall back to using FIFOs. --- Fix problem 1 --- src/cmd/ksh93/sh/xec.c: - Add new static fifo_save_ppid variable. - sh_exec(): If a FIFO is defined, save the current PID in fifo_save_ppid for the forked child to use. - fifo_check(): Compare PPID against the saved value instead of 1. --- Fix problem 2 --- To keep things simple I'm abusing the name-value pair routines used for variables for this purpose. The overhead is negligible. A more elegant solution is possible but would involve adding more code. src/cmd/ksh93/include/defs.h: _SH_PRIVATE: - Define new sh.fifo_tree pointer to a new FIFO cleanup tree. src/cmd/ksh93/sh/args.c: sh_argprocsubs(): - After launching a process substitution in the background, add the FIFO to the cleanup list before freeing it. src/cmd/ksh93/sh/xec.c: - Add fifo_cleanup() that unlinks all FIFOs in the cleanup list and clears/closes the list. They should only still exist if the command never used them, however, just run 'unlink' and don't check for existence first as that would only add overhead. - sh_exec(): * Call fifo_cleanup() on finishing all simple commands (when setting $?) or when a special builtin fails. * When forking, clear/close the cleanup list; we do not want children doing duplicate cleanup, particularly as this can interfere when using multiple process substitutions in one command. * Process substitution handling: > Change FIFO check frequency from 500ms to 50ms. Note that each check sends a signal that interrupts open(2), causing sh_open() to reinvoke it. This causes sh_open() to fail with ENOENT on the next check when the FIFO no longer exists, so we do not need to add an additional check for existence to fifo_check(). Unused process substitutions now linger for a maximum of 50ms. > Do not issue an error message if errno == ENOENT. - sh_funct(): Process substitutions can be passed to functions as well, and we do not want commands within the function to clean up the FIFOs for the process substitutions passed to it from the outside. The problem is solved by simply saving fifo_tree in a local variable, setting it to null before running the function, and cleaning it up before restoring the parent one at the end. Since sh_funct() is called recursively for multiple-level function calls, this correctly gives each function a locally scoped fifo_tree. --- Tests --- src/cmd/ksh93/tests/io.sh: - Add tests covering the failing scenarios. Co-authored-by: Martijn Dekker <martijn@inlv.org>
This commit fixes a long-standing bug that caused a file descriptor leak when passing a process substitution to a function. The leak only occured when ksh was compiled with SHOPT_DEVFD; the FIFO method was unaffected. src/cmd/ksh93/sh/xec.c: - When a process substitution is passed to a builtin, the remaining file descriptor is closed with sh_iorestore. Do the same thing when passing a process substitution to a function. src/cmd/ksh93/include/defs.h: - Since the file descriptor leak is now fixed, remove the workaround that forced ksh to use the FIFO method. Fixes: ksh93#67
This commit fixes a long-standing bug that caused a file descriptor leak when passing a process substitution to a function. The leak only occured when ksh was compiled with SHOPT_DEVFD; the FIFO method was unaffected. src/cmd/ksh93/sh/xec.c: - When a process substitution is passed to a builtin, the remaining file descriptor is closed with sh_iorestore. Do the same thing when passing a process substitution to a function. - This fix alone isn't enough, as a file descriptor leak could still occur if 'command' was given a function as an argument, then passed a process substitution. Add another sh_iorestore in this edge case to fix the second file descriptor leak. src/cmd/ksh93/include/defs.h: - Since the file descriptor leak is now fixed, remove the workaround that forced ksh to use the FIFO method. Fixes: ksh93#67
This commit fixes a long-standing bug (present since at least ksh93r) that caused a file descriptor leak when passing a process substitution to a function. The leak only occured when ksh was compiled with SHOPT_DEVFD; the FIFO method was unaffected. src/cmd/ksh93/sh/xec.c: - When a process substitution is passed to a builtin, the remaining file descriptor is closed with sh_iorestore. Do the same thing when passing a process substitution to a function. - This fix alone isn't enough, as a file descriptor leak could still occur if 'command' was given a function as an argument, then passed a process substitution. Add another sh_iorestore in this edge case to fix the second file descriptor leak. src/cmd/ksh93/include/defs.h: - Since the file descriptor leaks are now fixed, remove the workaround that forced ksh to use the FIFO method. Fixes: ksh93#67
Link for posterity: work continues in #218. |
This commit fixes a long-standing bug (present since at least ksh93r) that caused a file descriptor leak when passing a process substitution to a function, or (if compiled with SHOPT_SPAWN) to a nonexistent command. The leaks only occurred when ksh was compiled with SHOPT_DEVFD; the FIFO method was unaffected. src/cmd/ksh93/sh/xec.c: sh_exec(): - When a process substitution is passed to a built-in, the remaining file descriptor is closed with sh_iorestore. Do the same thing when passing a process substitution to a function. This is done by delaying the sh_iorestore() call to 'setexit:' where both built-ins and functions terminate and set the exit status ($?). This means that call now will not be executed if a longjmp is done, e.g. due to an error in a special built-in. However, there is already another sh_iorestore() call in main.c, exfile(), line 418, that handles that scenario. - sh_ntfork() can fail, so rather than assume it will succeed, handle a failure by closing extra file descriptors with sh_iorestore(). This fixes the leak on command not found with SHOPT_SPAWN. src/cmd/ksh93/include/defs.h: - Since the file descriptor leaks are now fixed, remove the workaround that forced ksh to use the FIFO method. src/cmd/ksh93/SHOPT.sh: - Add SHOPT_DEVFD as a configurable option (default: probe). src/cmd/ksh93/tests/io.sh: - Add a regression test for the 'not found' file descriptor leak. - Add a test to ensure it keeps working with 'command'. Fixes: #67
edit: In ab5dedd I've applied a workaround that disables the use of /dev/fd/NN pseudofiles and falls back to the older FIFO method instead. (That commit also improves the robustness of the FIFO method.) This effectively works around this bug, but is not a real fix. The /dev/fd method is the most robust as it doesn't involve the file system. So I'm leaving this issue open. To debug this, we now first need to comment out/remove the redefinition of
SHOPT_DEVFD
to 0 insrc/cmd/ksh93/include/defs.h
.This was first reported on the old mailing list.
Reproducer based on the one in that message:
The output shows one more open file for each function call. This bug does not occur if you replace the function call with a built-in or external command.
The text was updated successfully, but these errors were encountered: