New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segfault with read -rd $'\200' in multibyte locales #590
Comments
Stack trace:
|
The root cause of the crash is the mbchar() call here: ksh/src/cmd/ksh93/bltins/read.c Lines 101 to 108 in 9c65871
The rest of the delimiter handling code has no knowledge of multibyte characters at all. The mbchar() call parses a multibyte character, yielding -1 for the The mbchar() change was introduced in 2006-12-22a ksh93s (as per the ksh93-history repo). It's the umpteenth example of a half-assed change they simply seem to have forgotten about. Until diff --git a/src/cmd/ksh93/bltins/read.c b/src/cmd/ksh93/bltins/read.c
index 6cdb317d9..08bf52e0d 100644
--- a/src/cmd/ksh93/bltins/read.c
+++ b/src/cmd/ksh93/bltins/read.c
@@ -101,9 +101,8 @@ int b_read(int argc,char *argv[], Shbltin_t *context)
case 'd':
if(opt_info.arg && *opt_info.arg!='\n')
{
- char *cp = opt_info.arg;
flags &= ((1<<D_FLAG+1)-1);
- flags |= (mbchar(cp)<<D_FLAG+1) | (1<<D_FLAG);
+ flags |= ((*(unsigned char*)opt_info.arg)<<D_FLAG+1) | (1<<D_FLAG);
}
break;
case 'p': |
@stephane-chazelas wrote: > % locale charmap > UTF-8 > % ksh93u+m > $ read -rd $'\200' > zsh: segmentation fault ksh93u+m > > Same in locales using GB18030, OK in single-byte locales. > > 0x00005555555defbd in sh_readline (names=0x5555556f8cd0, fd=0, > flags=-255, size=0, timeout=0) at src/cmd/ksh93/bltins/read.c:312 > 312 sh.ifstable[delim] = S_NL; > (gdb) p delim > $4 = 8388607 So the problem is that delim, the delimiter character, gets a pathological value. This occurs here in read.c: 101: case 'd': 102: if(opt_info.arg && *opt_info.arg!='\n') 103: { 104: char *cp = opt_info.arg; 105: flags &= ((1<<D_FLAG+1)-1); 106: flags |= (mbchar(cp)<<D_FLAG+1) | (1<<D_FLAG); 107: } 108: break; The culprit is hte mbchar() macro expansion on line 106. When an invalid multibyte character is passed, mbchar() yields -1, causing wreckage with the bit shifts that unify the delimiter with some bit flags in the 'flags' variable that is passed on to sh_readline(). But multibyte character delimiters have never worked on ksh93, so that mbchar() call can be removed for now. It was introduced in 2006-12-22a ksh93s (as per the ksh93-history repo), presumably with a view to properly supporting multibyte delimiters, but so far that has not happened. <future>At some point, we'll want to change the design to store the delimiter in its own variable instead of awkwardly unifying it with flags to save a couple of bytes. At that point it should not be too hard to update the code to support multibyte delimiters.</future> src/cmd/ksh93/bltins/read.c: b_read(): case 'd': - Instead of using mbchar(), directly read the character, with a typecast to unsigned char* to avoid negative values being used for the bit shift operations. Resolves: #590
Same in locales using GB18030, OK in single-byte locales.
In GB18030, If I use a delimiter < 0x80, I find that such a byte value when occurring inside a character is not taken as delimiter, which is fine but the fact that the delimiter is not found is not reflected in the exit status (which may or may not be a separate bug):
(see also the extra garbage at the end).
The text was updated successfully, but these errors were encountered: