With SHOPT_DEVFD, process substitution leaks file descriptor when passed to function as argument #67

McDutchie · 2020-07-09T06:14:38Z

edit: In ab5dedd I've applied a workaround that disables the use of /dev/fd/NN pseudofiles and falls back to the older FIFO method instead. (That commit also improves the robustness of the FIFO method.) This effectively works around this bug, but is not a real fix. The /dev/fd method is the most robust as it doesn't involve the file system. So I'm leaving this issue open. To debug this, we now first need to comment out/remove the redefinition of SHOPT_DEVFD to 0 in src/cmd/ksh93/include/defs.h.

This was first reported on the old mailing list.

Reproducer based on the one in that message:

fdUser() {
	cat "$1"
}

for ((i=1; i<10; i++))
do	fdUser <(echo '========')
	if	[ -d /proc ]
	then	ls /proc/$$/fd
	else	lsof -p $$ | wc -l
	fi
done

The output shows one more open file for each function call. This bug does not occur if you replace the function call with a built-in or external command.

The text was updated successfully, but these errors were encountered:

posguy99 · 2020-07-09T15:31:19Z

There's all sorts of stuff found via Google re RH knowing about a file descriptor leak in ksh93. I can't open the bugzilla for it though... (fix file descriptor leak (#1058563)).

Scientific Linux has a source RPM that supposedly includes that fix, I downloaded the src.rpm but there's a ton of patches in it that aren't identified by bugzilla numbers. :(

http://linuxsoft.cern.ch/cern/updates/slc54/x86_64/RPMS/repoview/ksh.html

Maybe it helps.

kdudka · 2020-07-09T16:00:54Z

https://bugzilla.redhat.com/1058563 is an old RHEL-5 bug that was fixed by reverting previously applied downstream patches.

kdudka · 2020-07-09T16:09:23Z

This seems to be the patch that was used for the same bug in RHEL-6 and it appears to be already included in upstream: https://bugzilla.redhat.com/attachment.cgi?id=566049&action=diff

posguy99 · 2020-07-09T19:12:21Z

That one-line patch is mentioned twice in the src.rpm I was looking at:

ksh-20100621-cloexec.patch
ksh-20100621-fdstatus.patch

ksh93u+m does not have the second one, but I couldn't tell you what it does.

diff -up ksh-20100621/src/cmd/ksh93/sh/subshell.c.orig ksh-20100621/src/cmd/ksh93/sh/subshell.c
--- ksh-20100621/src/cmd/ksh93/sh/subshell.c.orig	2013-04-02 10:24:47.706650906 -0500
+++ ksh-20100621/src/cmd/ksh93/sh/subshell.c	2013-04-02 10:27:07.602651596 -0500
@@ -109,7 +109,7 @@ void	sh_subtmpfile(Shell_t *shp)
 		register struct checkpt	*pp = (struct checkpt*)shp->jmplist;
 		register struct subshell *sp = subshell_data->pipe;
 		/* save file descriptor 1 if open */
-		if((sp->tmpfd = fd = fcntl(1,F_DUPFD,10)) >= 0)
+		if((sp->tmpfd = fd = sh_fcntl(1,F_DUPFD,10)) >= 0)
 		{
 			fcntl(fd,F_SETFD,FD_CLOEXEC);
 			shp->fdstatus[fd] = shp->fdstatus[1]|IOCLEX;
diff -up ksh-20100621/src/cmd/ksh93/sh/xec.c.orig ksh-20100621/src/cmd/ksh93/sh/xec.c
--- ksh-20100621/src/cmd/ksh93/sh/xec.c.orig	2013-04-02 10:24:21.268900283 -0500
+++ ksh-20100621/src/cmd/ksh93/sh/xec.c	2013-04-02 10:28:25.537656455 -0500
@@ -90,7 +90,7 @@ static void iousepipe(Shell_t *shp)
 {
 	int i;
 	fcntl(subpipe[0],F_SETFD,FD_CLOEXEC);
-	subpipe[2] = fcntl(1,F_DUPFD,10);
+	subpipe[2] = sh_fcntl(1,F_DUPFD,10);
 	fcntl(subpipe[2],F_SETFD,FD_CLOEXEC);
 	shp->fdstatus[subpipe[2]] = shp->fdstatus[1];
 	close(1);
@@ -2961,7 +2961,7 @@ static void coproc_init(Shell_t *shp, in
 		sh_pipe(shp->cpipe);
 		if((outfd=shp->cpipe[1]) < 10) 
 		{
-		        int fd=fcntl(shp->cpipe[1],F_DUPFD,10);
+		        int fd=sh_fcntl(shp->cpipe[1],F_DUPFD,10);
 			if(fd>=10)
 			{
 			        shp->fdstatus[fd] = (shp->fdstatus[outfd]&~IOCLEX);

McDutchie · 2020-07-09T20:11:53Z

@kdudka: I'm "not authorized" to see your link, https://bugzilla.redhat.com/1058563. True, the subsequent patch that you linked to is already applied. Undoing it makes no difference to the behaviour of the reproducer.

@posguy99: Unfortunately that patch does not fix this bug. The output of the reproducer is unchanged. Yes, it would be nice to know what bug it does fix.

posguy99 · 2020-07-09T20:32:42Z

Hm. OpenSuse claimed they fixed something similar, ksh93u+m doesn't have this one either.

From https://build.opensuse.org/package/show/openSUSE:Leap:42.3:Update/ksh

- fix fd leak when doing redirects in a subshell [bnc#954856]
  new patch: ksh93-redirectleak.dif

--- src/cmd/ksh93/sh/io.c.orig	2015-12-09 11:18:00.657295950 +0000
+++ src/cmd/ksh93/sh/io.c	2015-12-09 11:18:57.719080685 +0000
@@ -1541,7 +1541,17 @@ int	sh_redirect(Shell_t *shp,struct iono
 					sh_iosave(shp,fn,indx,tname?fname:(trunc?Empty:0));
 				}
 				else if(sh_subsavefd(fn))
+				{
+					if(fd==fn)
+					{
+						if((r=sh_fcntl(fd,F_DUPFD,10)) > 0)
+						{
+							fd = r;
+							sh_close(fn);
+						}
+					}
 					sh_iosave(shp,fn,indx|IOSUBSHELL,tname?fname:0);
+				}
 			}
 			if(fd<0)
 			{

kdudka · 2020-07-09T20:36:41Z

I know that the RHEL-5 bug is available to Red Hat employees only but there is nothing really useful behind the link anyway. The reproducer mentioned there is very similar to the reproducer from the public RHEL-6 bug:

#!/bin/sh
if ksh -c 'A=$(vgs);B=$(vgs | cat)' 2>&1 | grep -q leaked
then
  echo BUG
  exit 1
else
  echo pass
fi

vgs is just one of the lvm2 utilities that check for outstanding open file descriptors upon startup (no idea why).

McDutchie · 2020-07-09T21:33:23Z

I'm getting the feeling that there's some confusion going on. This particular bug is with process substitution, not I/O redirection. The function call in the reproducer above

fdUser <(echo '========')

does not have a redirection in it, just a process substitution.

Unfortunately, they made the syntax of process substitution confusingly similar to that of I/O redirection, with < and >, but they're really completely different things.

Process substitution is more akin to command substitution (backticks or $(...)). The difference is that, instead of the output of a command, process substitution substitutes the path to a file from which to read the output of the command(s) within the parentheses – or, if >(...) is used, it substitutes the path to a file into which to write the input for the command(s) within the parentheses.

So, patches that fix bugs with redirection are not that likely to fix this particular bug.

McDutchie · 2020-07-09T21:44:58Z

Don't get me wrong though, thanks for the links to those redirection-related patches. Unfortunately I can't get to the documentation on what they do.

Googling for "ksh-20100621-fdstatus.patch" (pasted by @posguy99 above) shows it's linked to Red Hat bug 924440 but it is also closed to non-employees. @kdudka, can you take a look there to figure out what this patch does, so we can determine whether it needs to be included?

The same is true for the OpenSUSE patch. It says it's linked to OpenSUSE bug 954856 and once again I'm not authorized to access it.

posguy99 · 2020-07-09T22:15:33Z

From https://download.rhn.redhat.com/errata/RHBA-2013-1599.html:

Previously, the ksh shell did not resize the file descriptor list every time
it was necessary. This could lead to memory corruption when several file
descriptors were used. As a consequence, ksh terminated unexpectedly. This
updated version resizes the file descriptor list every time it is needed, and
ksh no longer crashes in the described scenario. (BZ#924440)

Doesn't say what the reproducer would be. :(

kdudka · 2020-07-09T22:24:28Z

Yes, the summary of RHBZ#924440 is "crash in bestreclaim() after traversing a memory block with a very large size". We did not have any in house reproducer for the bug. The mentioned patch was provided and verified by a customer.

This applies ksh-20100621-fdstatus.patch from Red Hat. Not very much information is available, so this one is more or less taken on faith. But it seems to make sense on the face of it: calling sh_fcntl() instead of fcntl(2) directly makes the shell update its internal file descriptor state more frequently. It claims to fix Red Hat bug 924440. The report is currently closed to the public: https://bugzilla.redhat.com/show_bug.cgi?id=924440 However, Kamil Dudka at Red Hat writes: #67 (comment) | Yes, the summary of RHBZ#924440 is "crash in bestreclaim() after | traversing a memory block with a very large size". We did not have | any in house reproducer for the bug. The mentioned patch was | provided and verified by a customer. ...and Marc Wilson dug up a Red Hat erratum containing this info: https://download.rhn.redhat.com/errata/RHBA-2013-1599.html | Previously, the ksh shell did not resize the file descriptor list | every time it was necessary. This could lead to memory corruption | when several file descriptors were used. As a consequence, ksh | terminated unexpectedly. This updated version resizes the file | descriptor list every time it is needed, and ksh no longer | crashes in the described scenario. (BZ#924440) No reproducer means no regression test can be added now. src/cmd/ksh93/sh/io.c, src/cmd/ksh93/sh/subshell.c, src/cmd/ksh93/sh/xec.c: - Change several fcntl(2) calls to sh_fcntl(). This function calls fcntl(2) and then updates the shell's file descriptor state.

McDutchie · 2020-07-21T02:48:27Z

I figured out a reproducer for the file descriptor leak @posguy99 flagged up here. The bug is that a file descriptor (at least 3, can't reproduce for 4 and up) opened with exec or redirect in a virtual/non-forked subshell survive that subshell after exiting it:

$ ksh -c '(redirect 3>&1); echo bug >&3'
bug

Expected behaviour:

$ ksh -c '(redirect 3>&1); echo bad >&3'
ksh: 3: cannot open [Bad file descriptor]

I'll apply the OpenSUSE patch and add a regression test. Thanks.

File descriptors are not properly closed, causing a leak, when using a process substitution as an argument to a shell function. See: #67 Process substitution uses /dev/fd/NN pseudofiles if the kernel provides them. This is tested in src/cmd/ksh93/features/options which causes SHOPT_DEVFD to be defined if /dev/fd/9 can be used. If not, ksh uses a fallback mechanism involving a temporary FIFO, which works on all Unix variants. As it happens, the leak only occurs when using the /dev/fd mechanism. So, until a fix is found, we can work around the bug by disabling it. The FIFO mechanism might be slightly less robust, but it's an improvement over leaking file descriptors. Plus, there is room for improving it. src/cmd/ksh93/include/defs.h: - Unconditionally redefine SHOPT_DEVFD as 0 for now. src/cmd/ksh93/sh/args.c: sh_argprocsub(): - pathtemp() does appropriate access checks using access(2), but there is an inherent race condition between calling it and mkfifo(). Make the FIFO mechanism more robust by handling errors, trying again if an error occurs that must have resulted from losing that race, e.g. file name conflict or temp dir permission/location change. - Initially create the FIFO without any permissions, then chmod() the appropriate user read/write permissions. Since mkfifo() honours the umask and chmod() does not, this ensures that process substitution continues to work if a shell script sets a umask that disallows user read or write. (The /dev/fd/ mechanism does not care about the umask, so neither should the fallback.)

McDutchie · 2020-09-12T22:02:12Z

In ab5dedd I've applied a workaround that disables the use of /dev/fd/NN pseudofiles and falls back to the older FIFO method instead. (That commit also improves the robustness of the FIFO method.) This effectively works around this bug, but is not a real fix. The /dev/fd method is the most robust as it doesn't involve the disk file system. So I'm leaving this issue open. To debug this, we now first need to uncomment/remove the redefinition of SHOPT_DEVFD to 0 in src/cmd/ksh93/include/defs.h.

I think the file descriptor leak is possibly caused by a bug in the parser. So far, I've been able to trace the following.

At parse time, a process substitution is parsed using this function:

ksh/src/cmd/ksh93/sh/parse.c

Lines 1356 to 1367 in ab5dedd

    
           static struct argnod *process_sub(Lex_t *lexp,int tok) 
        
           { 
        
           	struct argnod *argp; 
        
           	Shnode_t *t; 
        
           	int mode = (tok==OPROCSYM); 
        
           	t = sh_cmd(lexp,RPAREN,SH_NL); 
        
           	argp = (struct argnod*)stkalloc(lexp->sh->stk,sizeof(struct argnod)); 
        
           	*argp->argval = 0; 
        
           	argp->argchn.ap = (struct argnod*)makeparent(lexp,mode?TFORK|FPIN|FAMP|FPCL:TFORK|FPOU,t); 
        
           	argp->argflag =  (ARG_EXP|mode); 
        
           	return(argp); 
        
           }

So a process substitution of the form >(…) gets the flags TFORK|FPIN|FAMP|FPCL and a process substitution of the form <(…) gets the flags TFORK|FPOU. These parser tree flags are defined/commented here:

ksh/src/cmd/ksh93/include/shnodes.h

Lines 33 to 71 in ab5dedd

    
           /* command tree for tretyp */ 
        
           #define FINT		(02<<COMBITS)		/* non-interruptable */ 
        
           #define FAMP		(04<<COMBITS)		/* background */ 
        
           #define FPIN		(010<<COMBITS)		/* input is a pipe */ 
        
           #define FPOU		(040<<COMBITS)		/* output is a pipe */ 
        
           #define FPCL		(0100<<COMBITS)		/* close the pipe */ 
        
           #define FCOOP		(0200<<COMBITS)		/* cooperating process */ 
        
           #define FSHOWME		(0400<<COMBITS)		/* set for showme commands  */ 
        
           #define FALTPIPE	(02000<<COMBITS)	/* alternate pipes &| */ 
        
           #define FPOSIX		(02<<COMBITS)		/* posix semantics function */ 
        
           #define FLINENO		(04<<COMBITS)		/* for/case has line number */ 
        
           #define FOPTGET		(0200<<COMBITS)		/* function calls getopts */ 
        
           #define TNEGATE		(01<<COMBITS)		/* ! inside [[...]] */ 
        
           #define TBINARY		(02<<COMBITS)		/* binary operator in [[...]] */ 
        
           #define TUNARY		(04<<COMBITS)		/* unary operator in [[...]] */ 
        
           #define TTEST		(010<<COMBITS) 
        
           #define TPAREN		(TBINARY|TUNARY) 
        
           #define TSHIFT		(COMBITS+4) 
        
           #define TNSPACE		(TFUN|COMSCAN) 
        
           #define TCOM	0 
        
           #define TPAR	1 
        
           #define TFIL	2 
        
           #define TLST	3 
        
           #define TIF	4 
        
           #define TWH	5 
        
           #define TUN	(TWH|COMSCAN) 
        
           #define TTST	6 
        
           #define TSW	7 
        
           #define TAND	8 
        
           #define TORF	9 
        
           #define TFORK	10 
        
           #define TFOR	11 
        
           #define TSELECT	(TFOR|COMSCAN) 
        
           #define TARITH	12 
        
           #define	TTIME	13 
        
           #define TSETIO	14 
        
           #define TFUN	15

At execution time, a process substitution is processed while handling the arguments for a simple command (TCOM), because it is an argument (not a redirection). This starts at sh_exec() calling sh_argbuild() ("build an argument list"):

ksh/src/cmd/ksh93/sh/xec.c

Line 991 in ab5dedd

com = sh_argbuild(shp,&argn,&(t->com),OPTIMIZE);

…which calls arg_expand():

ksh/src/cmd/ksh93/sh/args.c

Line 615 in ab5dedd

n = arg_expand(shp,argp,&arghead,flag);

…which calls sh_argprocsub() from this block:

ksh/src/cmd/ksh93/sh/args.c

Lines 739 to 746 in ab5dedd

    
           if(*argp->argval==0 && (argp->argflag&ARG_EXP)) 
        
           { 
        
           	struct argnod *ap; 
        
           	ap = sh_argprocsub(shp,argp); 
        
           	ap->argchn.ap = *argchain; 
        
           	*argchain = ap; 
        
           	count++; 
        
           }

…which is the function that handles a process substitution, so now we're getting closer:

ksh/src/cmd/ksh93/sh/args.c

Lines 668 to 732 in ab5dedd

    
           struct argnod *sh_argprocsub(Shell_t *shp,struct argnod *argp) 
        
           { 
        
           	/* argument of the form <(cmd) or >(cmd) */ 
        
           	register struct argnod *ap; 
        
           	int monitor, interactive, fd, pv[3]; 
        
           	int subshell = shp->subshell; 
        
           	ap = (struct argnod*)stkseek(shp->stk,ARGVAL); 
        
           	ap->argflag |= ARG_MAKE; 
        
           	ap->argflag &= ~ARG_RAW; 
        
           	fd = argp->argflag&ARG_RAW; 
        
           	if(fd==0 && shp->subshell) 
        
           		sh_subtmpfile(shp); 
        
           #if SHOPT_DEVFD 
        
           	sfwrite(shp->stk,e_devfdNN,8); 
        
           	pv[2] = 0; 
        
           	sh_pipe(pv); 
        
           #else 
        
           	pv[0] = -1; 
        
           	while((shp->fifo = pathtemp(0,0,0,"ksh.fifo",0), mkfifo(shp->fifo,0))<0) 
        
           	{ 
        
           		if(errno==EEXIST || errno==EACCES || errno==ENOENT || errno==ENOTDIR || errno==EROFS) 
        
           			continue;		/* lost race (name conflict or tmp dir change); try again */ 
        
           		errormsg(SH_DICT,ERROR_system(128),"process substitution: FIFO creation failed"); 
        
           	} 
        
           	chmod(shp->fifo,S_IRUSR|S_IWUSR);	/* mkfifo + chmod works regardless of umask */ 
        
           	sfputr(shp->stk,shp->fifo,0); 
        
           #endif /* SHOPT_DEVFD */ 
        
           	sfputr(shp->stk,fmtbase((long)pv[fd],10,0),0); 
        
           	ap = (struct argnod*)stkfreeze(shp->stk,0); 
        
           	shp->inpipe = shp->outpipe = 0; 
        
           	/* turn off job control */ 
        
           	if(interactive = (sh_isstate(SH_INTERACTIVE)!=0)) 
        
           		sh_offstate(SH_INTERACTIVE); 
        
           	if(monitor = (sh_isstate(SH_MONITOR)!=0)) 
        
           		sh_offstate(SH_MONITOR); 
        
           	/* do the process substitution */ 
        
           	shp->subshell = 0; 
        
           	if(fd) 
        
           	{ 
        
           		shp->inpipe = pv; 
        
           		sh_exec((Shnode_t*)argp->argchn.ap,(int)sh_isstate(SH_ERREXIT)); 
        
           	} 
        
           	else 
        
           	{ 
        
           		shp->outpipe = pv; 
        
           		sh_exec((Shnode_t*)argp->argchn.ap,(int)sh_isstate(SH_ERREXIT)); 
        
           	} 
        
           	/* restore the previous state */ 
        
           	shp->subshell = subshell; 
        
           	if(monitor) 
        
           		sh_onstate(SH_MONITOR); 
        
           	if(interactive) 
        
           		sh_onstate(SH_INTERACTIVE); 
        
           #if SHOPT_DEVFD 
        
           	close(pv[1-fd]); 
        
           	sh_iosave(shp,-pv[fd], shp->topfd, (char*)0); 
        
           #else 
        
           	free(shp->fifo); 
        
           	shp->fifo = 0; 
        
           #endif /* SHOPT_DEVFD */ 
        
           	return(ap); 
        
           }

Note line 726, where it calls sh_iosave() with the file descriptor turned negative. But sh_iosave() appears to be designed to handle this:

ksh/src/cmd/ksh93/sh/io.c

Lines 1589 to 1669 in ab5dedd

    
           /* 
        
            * copy file <origfd> into a save place 
        
            * The saved file is set close-on-exec 
        
            * if <origfd> < 0, then -origfd is saved, but not duped so that it 
        
            *   will be closed with sh_iorestore. 
        
            */ 
        
           void sh_iosave(Shell_t *shp, register int origfd, int oldtop, char *name) 
        
           { 
        
           	register int	savefd; 
        
           	int flag = (oldtop&(IOSUBSHELL|IOPICKFD)); 
        
           	oldtop &= ~(IOSUBSHELL|IOPICKFD); 
        
           	/* see if already saved, only save once */ 
        
           	for(savefd=shp->topfd; --savefd>=oldtop; ) 
        
           	{ 
        
           		if(filemap[savefd].orig_fd == origfd) 
        
           			return; 
        
           	} 
        
           	/* make sure table is large enough */ 
        
           	if(shp->topfd >= filemapsize) 
        
           	{ 
        
           		char 	*cp, *oldptr = (char*)filemap; 
        
           		char 	*oldend = (char*)&filemap[filemapsize]; 
        
           		long	moved; 
        
           		filemapsize += 8; 
        
           		if(!(filemap = (struct fdsave*)realloc(filemap,filemapsize*sizeof(struct fdsave)))) 
        
           			errormsg(SH_DICT,ERROR_exit(4),e_nospace); 
        
           		if(moved = (char*)filemap - oldptr) 
        
           		{ 
        
           			for(savefd=shp->gd->lim.open_max; --savefd>=0; ) 
        
           			{ 
        
           				cp = (char*)shp->fdptrs[savefd]; 
        
           				if(cp >= oldptr && cp < oldend) 
        
           					shp->fdptrs[savefd] = (int*)(cp+moved); 
        
           			} 
        
           		} 
        
           	} 
        
           #if SHOPT_DEVFD 
        
           	if(origfd <0) 
        
           	{ 
        
           		savefd = origfd; 
        
           		origfd = -origfd; 
        
           	} 
        
           	else 
        
           #endif /* SHOPT_DEVFD */ 
        
           	if(flag&IOPICKFD) 
        
           		savefd = -1; 
        
           	else 
        
           	{ 
        
           		if((savefd = sh_fcntl(origfd, F_DUPFD, 10)) < 0 && errno!=EBADF) 
        
           		{ 
        
           			shp->toomany=1; 
        
           			((struct checkpt*)shp->jmplist)->mode = SH_JMPERREXIT; 
        
           			errormsg(SH_DICT,ERROR_system(1),e_toomany); 
        
           		} 
        
           	} 
        
           	filemap[shp->topfd].tname = name; 
        
           	filemap[shp->topfd].subshell = (flag&IOSUBSHELL); 
        
           	filemap[shp->topfd].orig_fd = origfd; 
        
           	filemap[shp->topfd++].save_fd = savefd; 
        
           	if(savefd >=0) 
        
           	{ 
        
           		register Sfio_t* sp = shp->sftable[origfd]; 
        
           		/* make saved file close-on-exec */ 
        
           		sh_fcntl(savefd,F_SETFD,FD_CLOEXEC); 
        
           		if(origfd==job.fd) 
        
           			job.fd = savefd; 
        
           		shp->fdstatus[savefd] = shp->fdstatus[origfd]; 
        
           		shp->fdptrs[savefd] = &filemap[shp->topfd-1].save_fd; 
        
           		if(!(shp->sftable[savefd]=sp)) 
        
           			return; 
        
           		sfsync(sp); 
        
           		if(origfd <=2) 
        
           		{ 
        
           			/* copy standard stream to new stream */ 
        
           			sp = sfswap(sp,NIL(Sfio_t*)); 
        
           			shp->sftable[savefd] = sp; 
        
           		} 
        
           		else 
        
           			shp->sftable[origfd] = 0; 
        
           	} 
        
           }

The file descriptor saved by sh_iosave() is then supposed to be restored/closed during TFORK handling in sh_exec(), in the parent part of the fork, somewhere in this block:

ksh/src/cmd/ksh93/sh/xec.c

Lines 1582 to 1634 in ab5dedd

    
           			if(job.parent=parent) 
        
           			/* This is the parent branch of fork 
        
           			 * It may or may not wait for the child 
        
           			 */ 
        
           			{ 
        
           				if(pipejob==2) 
        
           				{ 
        
           					pipejob = 1; 
        
           					nlock--; 
        
           					job_unlock(); 
        
           				} 
        
           				if(type&FPCL) 
        
           					sh_close(shp->inpipe[0]); 
        
           				if(type&(FCOOP|FAMP)) 
        
           					shp->bckpid = parent; 
        
           				else if(!(type&(FAMP|FPOU))) 
        
           				{ 
        
           					if(!sh_isoption(SH_MONITOR)) 
        
           					{ 
        
           						if(!(shp->sigflag[SIGINT]&(SH_SIGFAULT|SH_SIGOFF))) 
        
           							sh_sigtrap(SIGINT); 
        
           						shp->trapnote |= SH_SIGIGNORE; 
        
           					} 
        
           					if(shp->pipepid) 
        
           						shp->pipepid = parent; 
        
           					else 
        
           						job_wait(parent); 
        
           					if(shp->topfd > topfd) 
        
           						sh_iorestore(shp,topfd,0); 
        
           					if(usepipe && tsetio &&  subdup && unpipe) 
        
           						iounpipe(shp); 
        
           					if(!sh_isoption(SH_MONITOR)) 
        
           					{ 
        
           						shp->trapnote &= ~SH_SIGIGNORE; 
        
           						if(shp->exitval == (SH_EXITSIG|SIGINT)) 
        
           							kill(getpid(),SIGINT); 
        
           					} 
        
           				} 
        
           				if(type&FAMP) 
        
           				{ 
        
           					if(sh_isstate(SH_PROFILE) || sh_isstate(SH_INTERACTIVE)) 
        
           					{ 
        
           						/* print job number */ 
        
           #ifdef JOBS 
        
           						sfprintf(sfstderr,"[%d]\t%d\n",jobid,parent); 
        
           #else 
        
           						sfprintf(sfstderr,"%d\n",parent); 
        
           #endif /* JOBS */ 
        
           					} 
        
           				} 
        
           				break; 
        
           			} 
        
           			else

On lines 1593-1594, we can see that the FPCL flag is handled with a call to sh_close(). This should cover "output"-type process substitution of the form >(…) (parser flags TFORK|FPIN|FAMP|FPCL). But for some reason it doesn't work: the file descriptor leak is also reproducible for output process substitutions. And I can see nothing here that handles closing file descriptors for input process substitutions (parser flags TFORK|FPOU). Yet, this bug is only reproducible if a process substitution is given as an argument (without redirections) to a shell function call. The bug does not occur if it is given as an argument to any other kind of simple command. So those file descriptors get closed somewhere/somehow, but I've no idea where or how.

And that's as far as I've got so far with tracing how all this works. Maybe someone else finds this information useful to continue the hunt for this bug.

JohnoKing · 2021-03-12T09:03:40Z

After tracing where ksh goes after sh_argprocsub by using strace and dummy getpgrp() calls, I've found the bug is probably occurring in sh_exec at this instance of sh_iorestore:

ksh/src/cmd/ksh93/sh/xec.c

Lines 1422 to 1424 in d4adc8f

    
           /* don't restore for 'exec' or 'redirect' in subshell */ 
        
           if((shp->topfd>topfd) && !(shp->subshell && (np==SYSEXEC || np==SYSREDIR))) 
        
           	sh_iorestore(shp,topfd,jmpval);

Normally for process substitutions sh_iorestore will close the remaining file descriptor, which in the strace output looks like this (when true is given a process substitution):

close(4)                                = 0
close(3)                                = 0
read(10, "", 65536)                     = 0
close(10)                               = 0

Alternatively, when using the cat builtin:

close(4)                                = 0
newfstatat(AT_FDCWD, ".", {st_mode=S_IFDIR|0755, st_size=158, ...}, 0) = 0
openat(AT_FDCWD, "/dev/fd/3", O_RDONLY) = 4
newfstatat(4, "", {st_mode=S_IFIFO|0600, st_size=0, ...}, AT_EMPTY_PATH) = 0
read(4, "", 65536)                      = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16800, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED|WCONTINUED, NULL) = 16800
wait4(-1, 0x7ffc4762e278, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)
rt_sigaction(SIGCHLD, {sa_handler=0x56235c931668, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7f019a325f80}, {sa_handler=0x56235c931668, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f019a325f80}, 8) = 0
rt_sigreturn({mask=[]})                 = 0
close(4)                                = 0
newfstatat(AT_FDCWD, ".", {st_mode=S_IFDIR|0755, st_size=158, ...}, 0) = 0
close(3)                                = 0

However, when a process substitution is passed to a function the close(3) disappears:

close(4)                                = 0
read(10, "", 65536)                     = 0
close(10)                               = 0

sh_iorestore is not being run in sh_exec because when a function is passed a process substitution argument shp->topfd and topfd will be the same value. This can be verified by inserting a debug call and running the reproducer:

diff --git a/src/cmd/ksh93/sh/xec.c b/src/cmd/ksh93/sh/xec.c
index c116e341..991dad04 100644
--- a/src/cmd/ksh93/sh/xec.c
+++ b/src/cmd/ksh93/sh/xec.c
@@ -1420,6 +1420,7 @@ int sh_exec(register const Shnode_t *t, int flags)
 					bp->ptr = (void*)save_ptr;
 					bp->data = (void*)save_data;
 					/* don't restore for 'exec' or 'redirect' in subshell */
+					error(ERROR_warn(0), "shp->topfd: %d;   topfd: %d", shp->topfd, topfd);
 					if((shp->topfd>topfd) && !(shp->subshell && (np==SYSEXEC || np==SYSREDIR)))
 						sh_iorestore(shp,topfd,jmpval);

When running the reproducer from #67 (comment):

$ arch/*/bin/ksh ./reproducer.ksh
./reproducer.ksh: line 6: warning: shp->topfd: 0;   topfd: 0
========
./reproducer.ksh: line 7: warning: shp->topfd: 1;   topfd: 1
0  1  10  2  3
./reproducer.ksh: line 6: warning: shp->topfd: 1;   topfd: 1
========
./reproducer.ksh: line 7: warning: shp->topfd: 2;   topfd: 2
0  1  10  2  3	4
./reproducer.ksh: line 6: warning: shp->topfd: 2;   topfd: 2
========
./reproducer.ksh: line 7: warning: shp->topfd: 3;   topfd: 3
0  1  10  2  3	4  5
./reproducer.ksh: line 6: warning: shp->topfd: 3;   topfd: 3
========
./reproducer.ksh: line 7: warning: shp->topfd: 4;   topfd: 4
0  1  10  2  3	4  5  6
./reproducer.ksh: line 6: warning: shp->topfd: 4;   topfd: 4
========
./reproducer.ksh: line 7: warning: shp->topfd: 5;   topfd: 5
0  1  10  2  3	4  5  6  7
./reproducer.ksh: line 6: warning: shp->topfd: 5;   topfd: 5
========
./reproducer.ksh: line 7: warning: shp->topfd: 6;   topfd: 6
0  1  10  2  3	4  5  6  7  8
./reproducer.ksh: line 6: warning: shp->topfd: 6;   topfd: 6
========
./reproducer.ksh: line 7: warning: shp->topfd: 7;   topfd: 7
0  1  10  2  3	4  5  6  7  8  9
./reproducer.ksh: line 6: warning: shp->topfd: 7;   topfd: 7
========
./reproducer.ksh: line 7: warning: shp->topfd: 8;   topfd: 8
0  1  10  11  2  3  4  5  6  7	8  9
./reproducer.ksh: line 6: warning: shp->topfd: 8;   topfd: 8
========
./reproducer.ksh: line 7: warning: shp->topfd: 9;   topfd: 9
0  1  10  11  12  2  3	4  5  6  7  8  9

When fdUser is replaced by true (can also be the ksh cat for slightly different output) shp->topfd isn't always equal to topfd:

./reproducer.ksh: line 5: warning: shp->topfd: 0;   topfd: 0
./reproducer.ksh: line 7: warning: shp->topfd: 1;   topfd: 0
./reproducer.ksh: line 8: warning: shp->topfd: 0;   topfd: 0
0  1  10  2
./reproducer.ksh: line 7: warning: shp->topfd: 1;   topfd: 0
./reproducer.ksh: line 8: warning: shp->topfd: 0;   topfd: 0
0  1  10  2
./reproducer.ksh: line 7: warning: shp->topfd: 1;   topfd: 0
./reproducer.ksh: line 8: warning: shp->topfd: 0;   topfd: 0
0  1  10  2
./reproducer.ksh: line 7: warning: shp->topfd: 1;   topfd: 0
./reproducer.ksh: line 8: warning: shp->topfd: 0;   topfd: 0
0  1  10  2
./reproducer.ksh: line 7: warning: shp->topfd: 1;   topfd: 0
./reproducer.ksh: line 8: warning: shp->topfd: 0;   topfd: 0
0  1  10  2
./reproducer.ksh: line 7: warning: shp->topfd: 1;   topfd: 0
./reproducer.ksh: line 8: warning: shp->topfd: 0;   topfd: 0
0  1  10  2
./reproducer.ksh: line 7: warning: shp->topfd: 1;   topfd: 0
./reproducer.ksh: line 8: warning: shp->topfd: 0;   topfd: 0
0  1  10  2
./reproducer.ksh: line 7: warning: shp->topfd: 1;   topfd: 0
./reproducer.ksh: line 8: warning: shp->topfd: 0;   topfd: 0
0  1  10  2
./reproducer.ksh: line 7: warning: shp->topfd: 1;   topfd: 0
./reproducer.ksh: line 8: warning: shp->topfd: 0;   topfd: 0
0  1  10  2

JohnoKing · 2021-03-12T09:33:53Z

Another minor note to add on: the first close(4) occurs in sh_argprocsub on this line:

ksh/src/cmd/ksh93/sh/args.c

Line 761 in d4adc8f

close(pv[1-fd]);

In ksh93v- that line was changed to use sh_close. It doesn't affect the file descriptor leak but it might serve useful in fixing future process substitution bugs:
https://github.com/att/ast/blob/2f2b1b8be315df029ce83c2ccc12a16fdcf73f29/src/cmd/ksh93/sh/args.c#L1071

On systems where ksh needs to use the older and less secure FIFO method for process substitutions (which is currently all of them as the more modern and solid /dev/fd method is still broken, see #67), process substitutions could leave background processes hanging in these two scenarios: 1. If the parent process exits without opening a pipe to the child process forked by the process substitution. The fifo_check() function in xec.c, which is periodically called to check if the parent process still exists while waiting for it to open the FIFO, verified the parent process's existence by checking if the PPID had reverted to 1, the traditional PID of init. However, POSIX specifies that the PPID can revert to any implementation- defined system process in that case. So this breaks on certain systems, causing unused process substitutions to hang around forever as they never detect that the parent disappeared. The fix is to save the current PID before forking and having the child check if the PPID has changed from that saved PID. 2. If command invoked from the main shell is passed a process substitution, but terminates without opening the pipe to the process substitution. In that case, the parent process never disappears in the first place, because the parent process is the main shell. So the same infinite wait occurs in unused process substitutions, even after correcting problem 1. The fix is to remember all FIFOs created for any number of process substitutions passed to a single command, and unlink any remaining FIFOs as they represent unused command substitutions. Unlinking them FIFOs causes sh_open() in the child to fail with ENOENT on the next periodic check, which can easily be handled. Fixing these problems causes the FIFO method to act identically to the /dev/fd method, which is good for compatibility. Even when #67 is fixed this will still be important, as ksh also runs on systems that do not have /dev/fd (such as AIX, HP-UX, and QNX), so will fall back to using FIFOs. --- Fix problem 1 --- src/cmd/ksh93/sh/xec.c: - Add new static fifo_save_ppid variable. - sh_exec(): If a FIFO is defined, save the current PID in fifo_save_ppid for the forked child to use. - fifo_check(): Compare PPID against the saved value instead of 1. --- Fix problem 2 --- To keep things simple I'm abusing the name-value pair routines used for variables for this purpose. The overhead is negligible. A more elegant solution is possible but would involve adding more code. src/cmd/ksh93/include/defs.h: _SH_PRIVATE: - Define new sh.fifo_tree pointer to a new FIFO cleanup tree. src/cmd/ksh93/sh/args.c: sh_argprocsubs(): - After launching a process substitution in the background, add the FIFO to the cleanup list before freeing it. src/cmd/ksh93/sh/xec.c: - Add fifo_cleanup() that unlinks all FIFOs in the cleanup list and clears/closes the list. They should only still exist if the command never used them, however, just run 'unlink' and don't check for existence first as that would only add overhead. - sh_exec(): * Call fifo_cleanup() on finishing all simple commands (when setting $?) or when a special builtin fails. * When forking, clear/close the cleanup list; we do not want children doing duplicate cleanup, particularly as this can interfere when using multiple process substitutions in one command. * Process substitution handling: > Change FIFO check frequency from 500ms to 50ms. Note that each check sends a signal that interrupts open(2), causing sh_open() to reinvoke it. This causes sh_open() to fail with ENOENT on the next check when the FIFO no longer exists, so we do not need to add an additional check for existence to fifo_check(). Unused process substitutions now linger for a maximum of 50ms. > Do not issue an error message if errno == ENOENT. - sh_funct(): Process substitutions can be passed to functions as well, and we do not want commands within the function to clean up the FIFOs for the process substitutions passed to it from the outside. The problem is solved by simply saving fifo_tree in a local variable, setting it to null before running the function, and cleaning it up before restoring the parent one at the end. Since sh_funct() is called recursively for multiple-level function calls, this correctly gives each function a locally scoped fifo_tree. --- Tests --- src/cmd/ksh93/tests/io.sh: - Add tests covering the failing scenarios. Co-authored-by: Martijn Dekker <martijn@inlv.org>

This commit fixes a long-standing bug that caused a file descriptor leak when passing a process substitution to a function. The leak only occured when ksh was compiled with SHOPT_DEVFD; the FIFO method was unaffected. src/cmd/ksh93/sh/xec.c: - When a process substitution is passed to a builtin, the remaining file descriptor is closed with sh_iorestore. Do the same thing when passing a process substitution to a function. src/cmd/ksh93/include/defs.h: - Since the file descriptor leak is now fixed, remove the workaround that forced ksh to use the FIFO method. Fixes: ksh93#67

This commit fixes a long-standing bug that caused a file descriptor leak when passing a process substitution to a function. The leak only occured when ksh was compiled with SHOPT_DEVFD; the FIFO method was unaffected. src/cmd/ksh93/sh/xec.c: - When a process substitution is passed to a builtin, the remaining file descriptor is closed with sh_iorestore. Do the same thing when passing a process substitution to a function. - This fix alone isn't enough, as a file descriptor leak could still occur if 'command' was given a function as an argument, then passed a process substitution. Add another sh_iorestore in this edge case to fix the second file descriptor leak. src/cmd/ksh93/include/defs.h: - Since the file descriptor leak is now fixed, remove the workaround that forced ksh to use the FIFO method. Fixes: ksh93#67

This commit fixes a long-standing bug (present since at least ksh93r) that caused a file descriptor leak when passing a process substitution to a function. The leak only occured when ksh was compiled with SHOPT_DEVFD; the FIFO method was unaffected. src/cmd/ksh93/sh/xec.c: - When a process substitution is passed to a builtin, the remaining file descriptor is closed with sh_iorestore. Do the same thing when passing a process substitution to a function. - This fix alone isn't enough, as a file descriptor leak could still occur if 'command' was given a function as an argument, then passed a process substitution. Add another sh_iorestore in this edge case to fix the second file descriptor leak. src/cmd/ksh93/include/defs.h: - Since the file descriptor leaks are now fixed, remove the workaround that forced ksh to use the FIFO method. Fixes: ksh93#67

McDutchie · 2021-03-13T11:47:59Z

Link for posterity: work continues in #218.

This commit fixes a long-standing bug (present since at least ksh93r) that caused a file descriptor leak when passing a process substitution to a function, or (if compiled with SHOPT_SPAWN) to a nonexistent command. The leaks only occurred when ksh was compiled with SHOPT_DEVFD; the FIFO method was unaffected. src/cmd/ksh93/sh/xec.c: sh_exec(): - When a process substitution is passed to a built-in, the remaining file descriptor is closed with sh_iorestore. Do the same thing when passing a process substitution to a function. This is done by delaying the sh_iorestore() call to 'setexit:' where both built-ins and functions terminate and set the exit status ($?). This means that call now will not be executed if a longjmp is done, e.g. due to an error in a special built-in. However, there is already another sh_iorestore() call in main.c, exfile(), line 418, that handles that scenario. - sh_ntfork() can fail, so rather than assume it will succeed, handle a failure by closing extra file descriptors with sh_iorestore(). This fixes the leak on command not found with SHOPT_SPAWN. src/cmd/ksh93/include/defs.h: - Since the file descriptor leaks are now fixed, remove the workaround that forced ksh to use the FIFO method. src/cmd/ksh93/SHOPT.sh: - Add SHOPT_DEVFD as a configurable option (default: probe). src/cmd/ksh93/tests/io.sh: - Add a regression test for the 'not found' file descriptor leak. - Add a test to ensure it keeps working with 'command'. Fixes: #67

McDutchie added the bug Something is not working label Jul 9, 2020

McDutchie changed the title ~~Process substitution leaks file descriptor when passed to function as argument~~ With SHOPT_DEVFD, process substitution leaks file descriptor when passed to function as argument Sep 12, 2020

McDutchie mentioned this issue Mar 10, 2021

bin/shtests io leaves processes running #213

Closed

JohnoKing mentioned this issue Mar 13, 2021

Fix two SHOPT_DEVFD process substitution file descriptor leaks #218

Merged

McDutchie added a commit to JohnoKing/ksh that referenced this issue Mar 13, 2021

args.c: call sh_close() per ksh93#67 (comment)

c89f3f1

McDutchie closed this as completed in #218 Mar 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

With SHOPT_DEVFD, process substitution leaks file descriptor when passed to function as argument #67

With SHOPT_DEVFD, process substitution leaks file descriptor when passed to function as argument #67

McDutchie commented Jul 9, 2020 •

edited

Loading

posguy99 commented Jul 9, 2020

kdudka commented Jul 9, 2020

kdudka commented Jul 9, 2020

posguy99 commented Jul 9, 2020

McDutchie commented Jul 9, 2020

posguy99 commented Jul 9, 2020

kdudka commented Jul 9, 2020

McDutchie commented Jul 9, 2020

McDutchie commented Jul 9, 2020

posguy99 commented Jul 9, 2020

kdudka commented Jul 9, 2020

McDutchie commented Jul 21, 2020

McDutchie commented Sep 12, 2020 •

edited

Loading

JohnoKing commented Mar 12, 2021 •

edited

Loading

JohnoKing commented Mar 12, 2021 •

edited

Loading

McDutchie commented Mar 13, 2021

With SHOPT_DEVFD, process substitution leaks file descriptor when passed to function as argument #67

With SHOPT_DEVFD, process substitution leaks file descriptor when passed to function as argument #67

Comments

McDutchie commented Jul 9, 2020 • edited Loading

posguy99 commented Jul 9, 2020

kdudka commented Jul 9, 2020

kdudka commented Jul 9, 2020

posguy99 commented Jul 9, 2020

McDutchie commented Jul 9, 2020

posguy99 commented Jul 9, 2020

kdudka commented Jul 9, 2020

McDutchie commented Jul 9, 2020

McDutchie commented Jul 9, 2020

posguy99 commented Jul 9, 2020

kdudka commented Jul 9, 2020

McDutchie commented Jul 21, 2020

McDutchie commented Sep 12, 2020 • edited Loading

JohnoKing commented Mar 12, 2021 • edited Loading

JohnoKing commented Mar 12, 2021 • edited Loading

McDutchie commented Mar 13, 2021

McDutchie commented Jul 9, 2020 •

edited

Loading

McDutchie commented Sep 12, 2020 •

edited

Loading

JohnoKing commented Mar 12, 2021 •

edited

Loading

JohnoKing commented Mar 12, 2021 •

edited

Loading