Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.sh.tilde discipline is too brittle #346

Closed
McDutchie opened this issue Nov 24, 2021 · 7 comments · Fixed by #356
Closed

.sh.tilde discipline is too brittle #346

McDutchie opened this issue Nov 24, 2021 · 7 comments · Fixed by #356
Labels
bug Something is not working

Comments

@McDutchie
Copy link

Defining a .sh.tilde.get or .sh.tilde.set discipline function to extend tilde expansion works well as long as the discipline function doesn't get interrupted (e.g. with Crtl+C) or produce an error message. Either of those will cause the shell to become unstable and crash.

To reproduce: To trigger an interrupt, adding 'sleep 1' to the discipline function makes it easy to press Ctrl+C. To trigger an error, give an invalid option to a special built-in, e.g. trap -Q.

I have experimented with two different techniques to fix this, with no success at all so far. sh_pushcontext is getting me nowhere and running the command via sh_trap doesn't seem to help either.

A workaround for the interruption crash is to make it uninterruptible with sigblock(SIGINT) in tilde_expand2() before running the discipline and sigrelease(SIGINT) after. But that does nothing to fix the crash on error.

This feature should probably be removed from the 1.0 branch as it is not ready for prime time. It can return to a release branch if/when we manage to fix it on the master branch.

@McDutchie McDutchie added the bug Something is not working label Nov 24, 2021
McDutchie added a commit that referenced this issue Nov 24, 2021
Defining a .sh.tilde.get or .sh.tilde.set discipline function to
extend tilde expansion works well as long as the discipline
function doesn't get interrupted (e.g. with Crtl+C) or produce an
error message. Either of those will cause the shell to become
unstable and crash.

This feature is now removed from the 1.0 branch as it is not ready
for prime time. It can return to a release branch if/when we manage
to fix it on the master branch.

Related: #346
@JohnoKing
Copy link

JohnoKing commented Nov 25, 2021

Perhaps setjmp can be used to fix the error handling segfault. Solaris patch 330-30841535.patch claims to fix a coredump caused by improper error handling. Repurposing the idea from that patch for .sh.tilde seems to fix the memory faults from inserting trap -Q in the discipline function (although it does nothing for SIGINT). Patch for testing (causes no regression test fails on my end):

--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -2638,6 +2638,7 @@ static void tilde_expand2(Shell_t *shp, register int offset)
 	{
 		stakfreeze(1);				/* terminate current stack object to avoid data corruption */
 		block++;
+		setjmp(*shp->jmplist);
 		nv_putval(SH_TILDENOD, &stakp[offset], 0);
 		cp = nv_getval(SH_TILDENOD);
 		block--;
# Results after applying the patch above
$ arch/*/bin/ksh
$ echo ~
arch/linux.i386-64/bin/ksh[44]: trap: -Q: unknown option
Usage: trap [-p] [action condition ...]
 Help: trap [ --help | --man ] 2>&1
/home/johno
ksh $ echo ~ # Press tab
arch/linux.i386-64/bin/ksh[44]: trap: -Q: unknown option
Usage: trap [-p] [action condition ...]
 Help: trap [ --help | --man ] 2>&1
/home/johno 
/home/johno
$ echo ~ksh
arch/linux.i386-64/bin/ksh[44]: trap: -Q: unknown option
Usage: trap [-p] [action condition ...]
 Help: trap [ --help | --man ] 2>&1
~ksh

Edit: See also #347 (comment), which demonstrates this bug likely applies to all discipline functions.

@JohnoKing
Copy link

JohnoKing commented Nov 25, 2021

As expected, the discipline function for PS1 is also affected:

$ PS1.get() { trap -Q; }
/usr/bin/ksh: trap: -Q: unknown option
Usage: trap [-p] [action condition ...]
 Help: trap [ --help | --man ] 2>&1
/usr/bin/ksh: trap: -Q: unknown option
Usage: trap [-p] [action condition ...]
 Help: trap [ --help | --man ] 2>&1

# Freezes

Also as expected, setjmp avoids that lockup (although setjmp before PS1.get is handled causes a segfault when the shell receives SIGINT).

@JohnoKing
Copy link

JohnoKing commented Nov 29, 2021

After yet more experimentation I've come up with this patch (edit: patch updated to avoid using shp):

--- a/src/cmd/ksh93/sh/nvdisc.c
+++ b/src/cmd/ksh93/sh/nvdisc.c
@@ -283,11 +283,16 @@ static void	assign(Namval_t *np,const char* val,int flags,Namfun_t *handle)
 		nq =  vp->disc[type=UNASSIGN];
 	if(nq && !isblocked(bp,type))
 	{
-		int bflag=0, savexit=sh.savexit;
+		int bflag=0, savexit=sh.savexit, jmpval=0;
+		struct checkpt buff;
 		block(bp,type);
 		if (type==APPEND && (bflag= !isblocked(bp,LOOKUPS)))
 			block(bp,LOOKUPS);
-		sh_fun(nq,np,(char**)0);
+		sh_pushcontext(&sh,&buff,1);
+		jmpval = sigsetjmp(buff.buff,0);
+		if(!jmpval)
+			sh_fun(nq,np,(char**)0);
+		sh_popcontext(&sh,&buff);
 		unblock(bp,type);
 		if(bflag)
 			unblock(bp,LOOKUPS);
@@ -376,7 +381,8 @@ static char*	lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
 	union Value		*up = np->nvalue.up;
 	if(nq && !isblocked(bp,type))
 	{
-		int		savexit = sh.savexit;
+		int		savexit = sh.savexit, jmpval = 0;
+		struct checkpt	buff;
 		node = *SH_VALNOD;
 		if(!nv_isnull(SH_VALNOD))
 		{
@@ -389,7 +395,11 @@ static char*	lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
 			nv_setsize(SH_VALNOD,10);
 		}
 		block(bp,type);
-		sh_fun(nq,np,(char**)0);
+		sh_pushcontext(&sh,&buff,1);
+		jmpval = sigsetjmp(buff.buff,0);
+		if(!jmpval)
+			sh_fun(nq,np,(char**)0);
+		sh_popcontext(&sh,&buff);
 		unblock(bp,type);
 		if(!vp->disc[type])
 			chktfree(np,vp);

It makes ksh do a longjmp back to the correct context after a discipline function exits with an error. The .sh.tilde reproducer now works correctly. Despite that the PS2.get discipline function can still crash, although it now crashes in the dcl_dehacktivate function added in commit 5d35321:

$ test \
/home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh[138]: trap: -Q: unknown option
Usage: trap [-p] [action condition ...]
 Help: trap [ --help | --man ] 2>&1
> true

Program received signal SIGABRT, Aborted.
0x00007ffff7cabd22 in raise () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff7cabd22 in raise () from /usr/lib/libc.so.6
#1  0x00007ffff7c95862 in abort () from /usr/lib/libc.so.6
#2  0x00005555555b20e3 in dcl_dehacktivate () at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:278
#3  0x00005555555b2ebb in sh_cmd (lexp=0x5555556fdba0, sym=10, flag=132) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:591
#4  0x00005555555b2950 in sh_parse (shp=0x5555556f5580 <sh>, iop=0x5555556f3a80 <_Sfstdin>, flag=0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:440
#5  0x00005555555684f0 in exfile (shp=0x2, iop=0x7fffffffdd40, fno=0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:571
#6  0x0000555555567aee in sh_main (ac=1, av=0x7fffffffe468, userinit=0x0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:361
#7  0x0000555555566e1e in main (argc=1, argv=0x7fffffffe468) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:46
(gdb) frame 2
#2  0x00005555555b20e3 in dcl_dehacktivate () at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:278
278			abort();
(gdb) frame 3
#3  0x00005555555b2ebb in sh_cmd (lexp=0x5555556fdba0, sym=10, flag=132) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:591
591		dcl_dehacktivate();
(gdb) frame 4
#4  0x00005555555b2950 in sh_parse (shp=0x5555556f5580 <sh>, iop=0x5555556f3a80 <_Sfstdin>, flag=0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:440
440		t = sh_cmd(lexp,(flag&SH_EOF)?EOFSYM:'\n',SH_SEMI|SH_EMPTY|(flag&SH_NL))

@McDutchie
Copy link
Author

Does this fix the crash in dcl_dehacktivate()?

--- a/src/cmd/ksh93/sh/parse.c
+++ b/src/cmd/ksh93/sh/parse.c
@@ -284,6 +284,7 @@ static void dcl_dehacktivate(void)
 }
 static noreturn void dcl_exit(int e)
 {
+	dcl_recursion = 1;
 	dcl_dehacktivate();
 	(*error_info.exit)(e);
 	UNREACHABLE();

@McDutchie
Copy link
Author

BTW, side note: let's not bother with introducing more pointless shp pointers as I'm going to remove them all at some point anyway. No matter what overly clever thing AT&T might have been planning, we're never going to have multiple shell states at the same time. Where something demands to be passed a pointer to the shell state, you can use &sh instead. (Until recently that threw a syntax error in the sh_pushcontext and sh_popcontext macros but I fixed that in 802e8db.)

@JohnoKing
Copy link

Does this fix the crash in dcl_dehacktivate()?

Nope, it still crashes.

Program received signal SIGABRT, Aborted.
0x00007ffff7cabd22 in raise () from /usr/lib/libc.so.6
(gdb) p dcl_recursion
$1 = 0
(gdb) p dcl_tree
$2 = (Dt_t *) 0x55555571edc0
(gdb) bt
#0  0x00007ffff7cabd22 in raise () from /usr/lib/libc.so.6
#1  0x00007ffff7c95862 in abort () from /usr/lib/libc.so.6
#2  0x00005555555b20e3 in dcl_dehacktivate () at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:278
#3  0x00005555555b2ec5 in sh_cmd (lexp=0x5555556fdb40, sym=10, flag=132) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:592
#4  0x00005555555b295a in sh_parse (shp=0x5555556f5580 <sh>, iop=0x5555556f3a80 <_Sfstdin>, flag=0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:441
#5  0x00005555555684f0 in exfile (shp=0x2, iop=0x7fffffffdda0, fno=0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:571
#6  0x0000555555567aee in sh_main (ac=1, av=0x7fffffffe4c8, userinit=0x0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:361
#7  0x0000555555566e1e in main (argc=1, argv=0x7fffffffe4c8) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:46

BTW, side note: let's not bother with introducing more pointless shp pointers as I'm going to remove them all at some point anyway.

I've updated the patch to avoid using shp pointers.

@McDutchie
Copy link
Author

McDutchie commented Nov 29, 2021

I think I've made sense of why that crash occurs. The PS2 discipline is run while trying to parse a command. So it's another context where arbitrary code might be executed at parse time, which messes up that hack. Aaaargh. Apparently at AT&T they'd never heard about the idea of having a clean separation between distinct subsystems! Anyway, the backtrace confirms that hypothesis, as it includes sh_parse(). It'll need a fix similar to this one.

edit: But that'll have to be for later, I simply don't have time for that right now. I've reverted that commit until it can be fixed.

JohnoKing added a commit to JohnoKing/ksh that referenced this issue Dec 1, 2021
This patch fixes the crashes experienced when a discipline function
exited because of a signal or an error from a special builtin. The
crashes were caused by ksh entering an inconsistent state after
performing a longjmp away from the assign() and lookup() functions in
nvdisc.c. Fixing the crash requires entering a new context, then setting
a nonlocal goto with sigsetjmp(3). Any longjmps that happen while
running the discipline function will go back to assign/lookup, allowing
ksh to do a proper cleanup afterwards.

Resolves: ksh93#346
McDutchie pushed a commit that referenced this issue Dec 2, 2021
…356)

This patch fixes the crashes experienced when a discipline function
exited because of a signal or an error from a special builtin. The
crashes were caused by ksh entering an inconsistent state after
performing a longjmp away from the assign() and lookup() functions in
nvdisc.c. Fixing the crash requires entering a new context, then setting
a nonlocal goto with sigsetjmp(3). Any longjmps that happen while
running the discipline function will go back to assign/lookup, allowing
ksh to do a proper cleanup afterwards.

Resolves: #346
McDutchie pushed a commit that referenced this issue Dec 5, 2021
…356)

This patch fixes the crashes experienced when a discipline function
exited because of a signal or an error from a special builtin. The
crashes were caused by ksh entering an inconsistent state after
performing a longjmp away from the assign() and lookup() functions in
nvdisc.c. Fixing the crash requires entering a new context, then setting
a nonlocal goto with sigsetjmp(3). Any longjmps that happen while
running the discipline function will go back to assign/lookup, allowing
ksh to do a proper cleanup afterwards.

Resolves: #346
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants