-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make command substitution split on NUL #3164
Comments
Your proposal is simple and the change in behavior is very unlikely to cause any problems in practice. Partly because the current behavior, as you noted, is to discard anything after the first null byte. Too, how likely is it anyone currently using an unset IFS for its current side effect of not splitting on newlines is also expecting it to keep embedded nulls given that doesn't work? I like it. |
IFS is legacy. I would rather see this implemented by piping to |
So you'd say this is a bug 😄:
|
Yes, but one we cannot fix as long as our builtins take their arguments as null terminated strings. |
I wonder if implementing the proposed IFS behavior first might be easier than changing the builtins to operate on wcstring objects rather than c-style strings. I doubt anyone working on the project disagrees that the latter should be done sooner rather than later. But the question is whether we can implement the IFS solution with very little work versus the presumably non-trivial work to switch from char * to wcstring. |
Yeah, the best way to fix this long-term is making On a side note, I don't like handling builtin arguments differently because it would break consistency. I would prefer |
What's inconsistent here? Yes, it's not like |
A non-builtin has no way to accept a \0 in its arguments, why should builtins have a different calling convention (or so to speak)? |
I did the change locally, and I think it's more confusing than it's worth. With this change, What would be a reasonable output for |
0 of course, like it already does without messing with IFS. |
I can see an argument for IFS splitting on NUL if you set IFS to '\0'. But if IFS is set to empty, it shouldn't split at all. I also think IFS is sort of legacy and we don't want to encourage its use. One approach to fix this today is to introduce a |
See my comment on PR #3174 but in short I've changed my mind and would prefer we not overload IFS with yet another behavior. |
Specifically, \0 is indistinguishable from the empty string:
The question is how ok it is to let |
Seems like a bad idea to me for much the same reasons as overloading IFS. We should either allow specifying |
I want a way to express I think if the following was default command substitution behavior, it might cover both this and #159:
For some reason I long thought that's the default behavior of fish but I see that's not the case. I'm not sure how exactly this would interact with modifying IFS (and frankly I don't care :). Why not
IIRC preventing normal splitting was part of the "superpowers" that have been discussed. Plus less to write. The best current solution, as mentioned in OP, is repeated
This could be simplified by adding some |
That's quite a clever idea. One possible issue here is that, if there is only one "thing" to print and the program does not print a trailing NUL (which means no NULs will show up in the output), this would still split on newlines. I mean I just checked that my find (from GNU findutils) prints one, but what about other implementations? Or other things? At that point we're back to |
Good point, but all the nul-outputting commands I've ever seen use them as \n replacement, as terminators not separators. I'm very not concerned about that in practice. Another use for appending \0 would then be #159, assuming input contains no \0:
See, with this you can implement any splitting modes using regular commands! It does return array instead of single string if input contained \0. |
Note: For
For variables, I think it'd be technically possible for fish to allow them inside fish, but once exported they'd be lost (since the length information would be gone). As a commandline argument, however, this is a lost cause. Those are sent as c-style strings without any additional length information (the
I've never really found the "terminator"/"separator" distinction clear enough. Also someone once tried to defend windows' notepad.exe behavior regarding \n with that, so I'm kinda lukewarm on it. Anyway.... what I'd like to see here is some data. Have you tested |
I don't have a mac, can't say anything about that. I can try to look into line-oriented languages like awk/sed/perl -p configured for NUL record separators. |
Currently, there is no easy way to use e.g. `find -print0` in command substitutions. What this does is check if a NUL is in the comsub output and then split on that instead of newline, which makes `something (find -print0)` just work. The one fly in the ointment is if there is a command that uses NULs to separate its output, but doesn't print a trailing NUL if there is just one "thing" to print. That would cause us to fall back to splitting on newlines, which might do the wrong thing. So far, everything I've tested (`find`, `git config -z`) seemed to print a trailing NUL. This implements a proposal by @cben. Fixes fish-shell#3164.
@cben: I've just implemented your proposal - which turned out to be quite easy. So let's test it a bit! |
Incomplete list of tools to check (from man -K NUL): I'm listing things that take NUL on input, but won't test most of them.
✅ So far all I checked all emit NUL on last output line. P.S. posix discussing |
Yes. That's one of the larger failings of POSIX, I'd argue, and a major bummer. There are a bunch of tools in there that use NUL delimiters in their input, which is irrelevant here. For instance fish's I've taken the liberty of marking the tools I've checked. The one that is going to be weird is GNU's
So you would have to split on both, or you'd get the Description of the first file mixed in with the name of the second file.
I don't think there's anything we can do about that, and I wouldn't assume that many people actually parse the output of I think the picture is clear: We haven't found a utility that breaks the assumption that NUL is used as "the terminator", i.e. that it is printed even for one thing. What helps us here is the weirdness of bash's
That means that tools that want to work with bash here need to print a NUL, so we got a bunch of testing for free. |
There is another risk:
-> [ I don't know how to quantify this risk. |
At that point, something is already wrong. There is no way to read that correctly, because it is incorrect.
Well, both are wrong. But I'd argue that the current behavior is worse, because it's dropping text. The new behavior doesn't drop anything but the NUL. It's also worth noting that bash loudly drops NULs. When you do And that means two things: One, the proposed behavior for fish results in the same string if you And two: The only reason I know about bash's behavior is that I explicitly tested it. I've never seen that message before, because anecdotally at least, it just does not happen that "NUL sneaked in".
How exactly? I mean sure I can do zsh's behavior is also of interest here. The same So there's no indication that anything is wrong, and there isn't anything wrong, until you pass it to an external command. It's possible there are options to configure this, I haven't checked. mksh behaves like bash without the error. |
This proposal seems to suggest that we split on If we are concerned about backwards comparability, then we can add an implicit The advantage of this is that we can always loslesllly store the result of a command to a variable, e.g.
Unfortunantlly, this requires that a trailing '\0' be treated as a separate empty string. An additional benifit of my proposal, is that you can now use |
84b7c2b adds support for |
Currently, if the output from command substitution contains a null character, there is no way to actually get the characters after the null. Which kinda makes sense, since both argv and environment variables are null-terminated.
This makes it impossible to use
find -print0
inside command substitution, and therefore having to use it without-print0
(as mentioned in the docs). This is slightly wrong in the sense that filenames with newlines (Yuck, but it's sadly allowed) will not be handled correctly.I suggest making fish split command substitution output on null characters when IFS is empty. The current behavior is silently discarding everything after the null. If someone really wants this, it can be done with
(...)[1]
. I'm not sure if this null-splitting should also be done in addition to \n when IFS is non-empty.Reproduction Steps:
Expected behavior:
Additional information:
This is possibly related to #159.
A workaround would be using a loop and
read --null
(#1694). But it's cumbersome and I love fish's simplicity. If I wanted to write awkward loops to handle filenames with newlines, I would still be using bash.Fish version: fish, version 2.3.0
Operating system: Cygwin, Fedora 23
Terminal or terminal emulator: Windows's cmd.exe window, Konsole
The text was updated successfully, but these errors were encountered: