-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
special-casing PATH/MANPATH/CDPATH is weird; we need a more general solution like zsh "tied" variables #436
Comments
Is the concern that there's no way to represent \x1e? Or are you thinking about architectural improvements? I think the array separator is mainly used in persisting arrays in places that only take strings, like universal variables or environment variables. |
在 2012-12-12 上午2:09,"ridiculousfish" notifications@github.com写道:
I'd say I have both conerns in mind. For the former, just think about a
If persisting means "serialized", no. Arrays are always stored in
|
How about using a private use area character as separator? Fish already uses some of those in some cases. |
@JanKanis That's no better; it's perfectly possible in filenames (considering filesystems using non-utf8, native encodings) and other strings. I would say that among others, |
Fish already handles private use characters and invalid bytes when it encodes external strings to wchars. These special values are encoded byte by byte into a specific set of private use chars that fish also decodes again on output, so in principle using another private use char could work. However I agree using true arrays is much better. There is one complication in that communication between fish and fishd happens over a socket using utf8 strings, and there fish uses (I think) the escape sequence "\x1e" (rather than an 0x1e byte) to separate array items. But that could probably be solved by using e.g. a private unused escape sequence. |
I share xiaq's concerns, but (for the practical ones) I find the implicit splitting on \n way more offensive:
The (newline delimited) array interpretation of subprocess output should be explicit & optional! But that has probably nothing to do with the underlying storage of arrays… |
@xiaq: What's wrong with using |
You can't use |
I'm migrating from zsh where I use this sequence to define the $LESS env var in a sane fashion by leveraging its "tied" variables feature:
I've omitted the complete list of options for brevity. In zsh that results in the $LESS env var being a space separated list of the options in the $less array. The equivalent in fish results in the elements being separated by the record separator (\x1e) character. Despite the documentation saying that the elements of the array will be separated by spaces (modulo the special arrays such as PATH). I have to explicitly do an assignment that interpolates the values into a single string to get the expected result:
At the moment I don't really care if \x1e is used internally for serializing arrays rather than \x00. I do care that exported arrays have their elements separated by \x1e. That's just broken, wrong, fubar. Pick your adjective. It's also inconsistent with the aforementioned workaround and documented behavior. This issue should be tagged as a bug IMHO. P.S., Nowhere in the documentation is the use of the record separator character (\x1e) mentioned. Which is another problem. |
@krader1961 Thanks for sharing this. There's no standard Unix convention for list-like environment variables - some are colon delimited, others are space delimited. fish uses \x1e so it can distinguish its own arrays. Can you please point us at the erroneous documentation? How do you think arrays be exported - colons, spaces, newlines, something else? Should fish tokenize environment variables on this character as well? It looks like less expects space-delimited arguments. Probably the simplest workaround is |
There's no standard for list like environment variables because by definition they're an arbitrary sequence of bytes composed of a key and value separated by an equal sign and terminated by a null byte. They don't even have to be printable characters. The only widely accepted convention for a higher level of abstraction is the one established by the execlp() function for the PATH env var. The documentation is erroneous in as far as it makes no mention of using \x1E, \036, 30, or the "record separator" character to separate elements of an array when exporting a var with more than one element. The documentation does state that
That's from the section "Variable expansion" in http://fishshell.com/docs/current/index.html. It's reasonable to infer that statement also applies to exported vars that are not special-cased as documented in the "Arrays" and "Special variables" sections of that same document. It's my feeling that fish should not automatically tokenize env vars into a list outside of the colon-delimited special case vars such as PATH. There should, however, be a robust means by which a user can tokenize a var into an array on an arbitrary character. Absent a mechanism for configuring the character to be used on a var by var basis (ala the zsh "typeset -T" command) a space should be used when concatenating the elements of the array (again, excluding the colon separated special-case vars). Obviously this does not apply to private data stores such as the storage of universal variables. Lastly, I couldn't find any uses in the standard fish functions where an env var is used to pass an array containing more than one element to another function or script. Such use cases may exist but it should require the scripts to explicitly cooperate in the serialization/deserialization of the data rather than rely on fish to implicitly reconstruct arrays from vars whose strings contain the record separator character. |
Thanks for your thoughtful response. The section you quoted about concatenation using space is specifically for double-quoted strings. We should add some discussion of what happens with exported arrays. Users can tokenize strings with e.g. The downside of space-concatenating exported variables is that they get changed when fish is run recursively. Today this works:
But if we exported with spaces, this would show 1 for the recursive call. As you say we don't rely on this, but it's nice from a consistency standpoint. |
I'll have to second that! Always nice to have a fresh perspective on things. For those new to this discussion, I think I'll have to bring in some of the things that are related to this. What comes to mind immediately is the listify whitelist, which shows up in issues like #2090. This means that for $PATH, $CDPATH and $MANPATH, they'll appear as lists/arrays to fish, but when exported, will be joined with ":" again. Then a fish-inside-a-fish will split them again. This operates on colons, not \x1e. From my understanding of the code it appears to do it on every colon, with no chance for escaping, so it might break on $PATH entries with a colon inside them - which UNIX allows inside filepaths, though it seems broken for $PATH at least. This scheme is also used for e.g. PYTHONPATH and GOPATH. I'd love to have something slightly more explicit for splitting environment variables than the implicit always-split-on-\x1e-except-for-these-three-split-them-on-colon, because this is actually two different schemes in one and exporting a list currently will always confuse everything but fish. My preferred solution would be a function like function splitenv --no-scope-shadowing
set -e IFS # string split doesn't have superpowers, so unset IFS lest we split on both : and \n
for arg in $argv
set -q $arg; and set $arg (string split ":" $$arg)
end
end (If All lists would then be joined-with-colons when exported, so a user can explicitly unjoin them with All of this means that we no longer need \x1e, we have a scheme that at least has a fighting chance of being understood by other programs, but the (rather exotic IMHO) fish-inside-fish now becomes The problem is of course that, as mentioned, the usual colon-separated-list scheme has no way of escaping a colon, and if we wanted to add one, Am I making any sense? |
@faho I think that idea has merit. The worst part of the old scheme was implicitly splitting on colons, which would mangle variables that should not be split. In your idea this is (almost) always explicit so I think it's quite safe. Regarding escaping, not escaping colon in PATH is intentional per the link you found. I doubt PYTHONPATH, CLASSPATH, etc. are any more consistent in this regard. Since you can't use a colon in these paths, we can choose whether or not we escape it; but if we escape a colon we need to escape backslashes, and I'll bet you can have a backslash in PATH. We may need a "don't escape" whitelist (ugh). Alternatively we don't worry about it, and just let any colon act as delimiters. I think I lean towards this for simplicity and familiarity with other shells. We still are faced with the problem that some list-like variables are space-delimited, and others are colon delimited. One possibility is that
These calls now play the dual roles of importing any existing variable, and marking how it gets exported. What do you think? Also, is there a way to do this without editing config.fish? Maybe as part of universal variables? |
Sounds good. Though at that point making splitenv a script probably wouldn't help, since we'd need cooperation from the C++ side anyway.
It's possible that now "splitenv" isn't the perfect name anymore (it was when I thought of it, of course 😆 ) - I've also considered "listify". Though it's bugging me that I can't remember where we've had a related discussion before - I think I'll need to scour the issues again tonight. |
The
That behavior is, however, surprising. I'm willing to bet that if you ask 100 people what happens when a var with more than one element is exported 90 of them will say the values are concatenated with space as a separator. A few might say comma or other char is used as the separator. And the two persons who ran
I'm sorry but I don't see any merit to that user's complaint. The problem is trivially worked around by explicitly testing whether MANPATH is already set. Which, it seems to me, is something you have to do in any event given the semantics of leading versus trailing colons.
It is at least thirty years too late to fix that. We should not implement escaping of colons (and by extension the escape character) as that would be non-standard behavior. Until recently I spent 20+ years as a UNIX support specialist. I have never heard someone complain that the presence of a colon in a directory embedded in $PATH or a similar variable was an issue.
That's fine although it's not clear why the (undocumented) However, having said that there certainly should be a way to register that a given var (e.g., PYTHONPATH) should have its elements concatenated with a specific separator char when being exported. The most natural way of doing this is via a new option for the
This would not affect how the var is stored in the universal var data store where the record-separator char would still be used and it would be auto-split when loaded from that data store. To be determined is whether the registered separator char for export should also affect string interpolation. My feeling is that it should. That is, if the above "set" command is executed then a subsequent
should use a colon rather than a space for concatenating the values of PYTHONPATH. The default separator is a space to preserve existing semantics and minimize surprise for the user. Note that the special-cased vars like PATH would also use a colon in that example. Which is incompatible with the current behavior but is consistent with the new semantics and less surprising. In other words, why are the elements of $PATH separated by colons in the exported environment but spaces in the output of
|
Easy, tiger. It's in the development versions - see https://github.com/fish-shell/fish-shell/blob/master/doc_src/string.txt |
My original idea was that it's a convenience function so this operation becomes completely trivial. With @ridiculousfish's proposal it becomes something more and adjusts some kind of store so the variable will also be joined on that character when exported.
That's another option, though I'm not completely sold on the semantics. E.g.
That's actually a good question. Of course you wouldn't expect the separator to show up in say
There's a general problem with doing design-by-survey and fish. Because the surveyed people would frequently have knowledge of bash (and to a lesser extent other POSIXy shells) while the very idea of fish is to do something better by abandoning at least some of POSIX. That's not to say it's completely worthless, it's just something to keep in mind - if we stuck to these kinds of ideas, we'd have bash's word-splitting behavior and if-fi. |
It would set it to an empty list. If the user wants to retain the existing value they have to explicitly include it (see below). We already have all the needed capabilities with the exception of a means to configure the character (or the empty string) to be used when concatenating array elements of a given var for export or interpolation. If someone wants to manipulate a var like PYTHONPATH they can either treat it as a simple string:
Or they can treat it as an array:
Note that my proposal to use the split/concatenation character rather than a space when interpolating into a string provides consistent behavior regardless of whether or not the user splits the var into an array. I am definitely not suggesting design by committee. That way lies madness and bogosities like zsh. I'm simply pointing out that when given two or more options with no other reason to choose one over another then picking the option that will least surprise a user of the shell is the best choice. It's also why I'm (for the moment) opposed to introducing new commands or behaviors such as auto-splitting vars (other than PATH and CDPATH, of course). This is the sort of thing that is done infrequently and usually only in config.fish and a few specialized functions like Anaconda's "activate" script. And the way to make the latter behave correctly regardless of whether or not the user has already split the var into an array in his config.fish is to always treat it as a string that needs to be split. For example, if PYTHONPATH needed to be amended it might do something like this:
Or, more simply,
Yes, that potentially turns what may have been a simple string into an array. But with my rule that the character specified by the
That won't execute the body of the for loop just once with the value in the form of colon-separated directories as the user might expect since they were unaware of the splitting done by the hypothetical "activate" script. I think we can live with that as it would be perverse for a user to do something like that. |
tl;dr I think lists should "remember" their delimiter and below is why. I agree with a lot of the most of the above. One thing that still seems tediuous though is that the commands above still seem overly verbose; i.e., it's sometimes simpler to describe some of these commands in plain English. As an example (and I'm not focusing on length as much as on the number of repeated things):
There are two repeated things here:
In contrast, I suggest another way to specify the delimiter on lists: associate it with the list indefinitely. So the example above might be done as follows: # Changes the delimiter for this list. This might be done in some global config file for common lists as this one.
set -S ':' --no-modify PYTHONPATH
# or, workaround if you don't want to add extra options to set:
set -S ':' PYTHONPATH $PYTHONPATH
# The actual append operation
set --prepend PYTHONPATH /activated/python/tree
# or, workaround if you don't want to add extra options to set:
set PYTHONPATH /activated/python/tree $PYTHONPATH Implications/follow-up questions/etc:
Here's the bottom line:
set --no-modify -S : PYTHONPATH
set --prepend PYTHONPATH /activated/python/tree |
Thank you, @szhu, for the detailed comment regarding my proposal. However, there are many problems with your proposed solution. For example, the addition of the --no-modify option does in fact modify the variable by converting it into a list and thus does modify the variable. While I reject nearly all the elements of your proposal it did make me think about a more straightforward solution that would address most, if not all, of your points. Perhaps there should be a mechanism for telling fish that a given env var should always be automatically split and reconstituted on a given token (e.g., ":" or " "). This might be called an auto-array designation and when executed any existing value would be immediately split if it was not already an array. A new option could be added to the set command to activate this behavior. However, I'm concerned that doing so is ambiguous and could be interpreted as defining a variable with no value. Would adding a -A token option to the set command be unambiguous? For example:
Presumably that would immediately convert any existing PYTHONPATH env var to an array after being split on a colon. It would conversely result in the values being concatenated on a colon when exported or interpolated into a string. Similarly, even if PYTHONPATH did not exist at the time that command was executed the auto-array specification would be remembered and subsequent uses would be affected. For example, this would obviously create an array:
But what about that last argument? Should it be automatically split into two tokens? Note that this behavior should only apply to exported vars and an error would be raised otherwise. There are also some corner cases that need to be spelled out. For example, what if the original auto-split declaration includes values as in this example:
Should those values be split on the auto-split token? Or should it result in an error and require modifying the value be done in a separate statement? And whichever syntax is chosen you still have the issue of what to do about values that contain the auto-split token. The devil is in the details. Which is to say there may be other ramifications of this proposal I haven't thought about. My original proposal with a more verbose syntax avoids those issues as far as I can tell. |
@krader1961, thanks for your response. However, you seem to think that I'm converting variables from strings to lists. I think you're misunderstanding one important concept in fish: every variable is a list of strings. The variables that appear to be strings are actually lists of length 1. fish treats them no differently than lists of length 0 or 2 or any other length. Also, note that while the underlying string used to pass around environment variables might change when you change delimiters, one of the fortes of fish is that the user typically does not need to think about delimiters at all. This is why I recommend that all the By the way, here is one example that shows how clean my proposal is. Here is code for converting a variable set -S \x1e L $L One more thing: the
(I've stated the following before but I think I can do a better job explaining it now.) By emphasizing how "dumb" these three arguments are, some may question if they're needed at all. One may cite that fish has a design principle of orthogonality. When all things are orthogonal, this means that for any big task you want to do, it should be obvious which set of features to pick to do that task -- there should be only one right way to do it. Here, I indeed add another way to prepend to/append to/prevent modification to a list, but this is only because I think the equivalents being replaced are unnecessarily verbose; they should not be the right ways to append modify lists. One way to convince yourself of this is to think about how you think of appending to a list. You probably think "append Let me know what you think, and if this makes a more convincing case for my proposal. (Also it's pretty common for me to misunderstand things so feel free to correct me.) |
@szhu I am quite aware that all vars in fish are lists of zero, one, or more values. You also apparently did not read my earlier comments where I clearly state that the associated delimiter should not affect the internal representation or how the values are persisted to the universal data store (other than storing the delimiter). You also did not address my previous points. Consider your last example:
What should that do if L already contains two or more values? Presumably nothing other than changing the associated delimiter. Would the $L argument be optional in that case? Or should it first convert the existing array into a simple string (presumably concatenating using the existing delimiter) then split that string on the new delimiter? As I said before, the devil is in the details. Ultimately the established designers and maintainers will decide whether or not your |
Sorry, this thread is long, I must have missed your acknowledgement of this above; good to know we're on the same page! I think I've addressed most of your concerns above as well, but not all of it. I'll specifically address each of your concerns below. 1. Is the
|
It seems that there are two issues here. Let's just talk about the first one? +1 for true array Is that separator trick really needed for fish? Ping #627 |
re: #436 (comment) @zanchey To include a double-colon in $ set -x MANPATH 1 2 '' 3
# Check if it's set
$ bash -c 'echo $MANPATH'
1:2::3 To start $ set -x MANPATH '' 1 2 3
# Check if it's set
$ bash -c 'echo $MANPATH'
:1:2:3 |
I haven't followed everything here, but as a user I want to advocate for "no configuration". A fixed "if it ends with |
@ridiculousfish that may be true in current shells, but I imagine that as fish gains more traction, users might want to leverage fish's ability to send lists to child fish shells. I can imagine there may eventually be programs/plugins that manage the state of a fish session (I'll check back on this comment in a few years to see if this is true), and being able to universally auto-de/serialize lists will make that code more clean and less workaround-y. Sort of similar but slightly different thought: Treating |
@ridiculousfish I think one possible solution is to associate each environment variable/array with its own separator (and you can keep |
For those who are only reading recent comments, just a note that @thuzhf's |
@szhu Sorry for not noticing the previous
Besides, I really think this problem is obvious and urgent and a big pain point for users and hope this problem can be solved as soon as possible because this really affects user experience greatly. Actually I haven't encountered other (even very small) problems for now using fish except this so big one. |
@thuzhf: I'd say you're overestimating that. One reason being that your problem in #5169 was with $LD_LIBRARY_PATH, but that's not actually a list in fish! You should set it like Fish turns exactly three inherited/exported variables into lists automatically: $PATH, $MANPATH and $CDPATH. And exactly this list will have a ":" separator when exported. Other "standardized" variables like $LD_LIBRARY_PATH should not be handled as lists in fishscript, so you don't have this issue. Variables that aren't standardized you can handle however you want, since other programs won't do anything with them anyway, so the separator is non-critical. |
@faho Thanks for your clear explanation. That really makes a lot sense to me. OK, I can say this problem is solved for me. |
I took a look at the MANPATH issue described in #2090. The scenario is to append to manpath such that it continues using system paths. In bash one would write this as A "tied variables" approach would allow us to have e.g. My proposal here doesn't make the MANPATH situation better or worse; I think the thing to do is punt and just have an easy story for appending to MANPATH, which is this:
That is not too painful to paste into config.fish. |
@ridiculousfish: I've been thinking of going one step further, actually: Split these special variables on ":" also on assignment, and join them with ":" instead of space in quoted expansion. That means when you do Now, this means that ":" can't appear in a path in $MANPATH (and $PATH, and $CDPATH), but they can't do that anyway because it'd break non-fish utilities! That would also allow us to maybe one day remove the special handling, because it adds a cross-compatible way of handling it - you'd just have to assign with the |
splitenv takes a variable name, and sets that variable to its value but split on colons. Proposed in fish-shell#436
Prior to this fix, fish would export arrays as ASCII record separator delimited, except for a whitelist (PATH, CDPATH, MANPATH). This is surprising and awkward for other programs to deal with, and there's no way to get similar behavior for other variables like GOPATH or LD_LIBRARY_PATH. This commit does the following: 1. Exports all arrays as colon delimited strings, instead of RS. 2. When importing environment variables, if the variable ends in PATH, split it on colons (i.e. splitenv); otherwise do not split it. Colons are not escaped; this is deliberate to support uses like `set -x PYTHONPATH "/foo:/bar"` which ought to work (and already do, we don't want to make a compat break here). Fixes (or at least mitigates) fish-shell#436
@faho I'm warming to the idea - how would the user mark a variable as receiving this special treatment? Would |
My idea was actually to not allow marking at all - just leave it as special behavior for $PATH et al. Which would let us get away from listifying at some point in the future. However, I've since come to understand that allowing this for other variables helps us with other variables as well - e.g. I've said before that my $EDITOR is set as one element ( So I'd probably default to space as a delimiter, unless it's a PATH-like (and assuming that it is if the name ends in PATH is probably okay).
I don't really love introducing another builtin for this, so I'd probably go with the argument-to-set option. |
I agree that "special-casing PATH/MANPATH/CDPATH is weird; we need a more general solution". I propose that we STOP special-casing PATH/MANPATH/CDPATH. They would be treated (by the fish end-user) the way they are in other shells. $PATH (and the others) would be a single string (or in fish jargon a list with a length of 1) with colons in it. Note that I am referring to the fish user experience, not how these things are handled internally; I don't know what the implementation inside fish would look like--I rely on others to point out any problems there. Granted, it would have the disadvantage of a backward incompatibility, but I think it would be worth it as a big gain in simplicity and elegance. I think it would address #2090 too. What does everyone think? |
splitenv takes a variable name, and sets that variable to its value but split on colons. Proposed in fish-shell#436
Prior to this fix, fish would export arrays as ASCII record separator delimited, except for a whitelist (PATH, CDPATH, MANPATH). This is surprising and awkward for other programs to deal with, and there's no way to get similar behavior for other variables like GOPATH or LD_LIBRARY_PATH. This commit does the following: 1. Exports all arrays as colon delimited strings, instead of RS. 2. When importing environment variables, if the variable ends in PATH, split it on colons (i.e. splitenv); otherwise do not split it. Colons are not escaped; this is deliberate to support uses like `set -x PYTHONPATH "/foo:/bar"` which ought to work (and already do, we don't want to make a compat break here). Fixes (or at least mitigates) fish-shell#436
splitenv takes a variable name, and sets that variable to its value but split on colons. Proposed in fish-shell#436
Prior to this fix, fish would export arrays as ASCII record separator delimited, except for a whitelist (PATH, CDPATH, MANPATH). This is surprising and awkward for other programs to deal with, and there's no way to get similar behavior for other variables like GOPATH or LD_LIBRARY_PATH. This commit does the following: 1. Exports all arrays as colon delimited strings, instead of RS. 2. When importing environment variables, if the variable ends in PATH, split it on colons (i.e. splitenv); otherwise do not split it. Colons are not escaped; this is deliberate to support uses like `set -x PYTHONPATH "/foo:/bar"` which ought to work (and already do, we don't want to make a compat break here). Fixes (or at least mitigates) fish-shell#436
splitenv takes a variable name, and sets that variable to its value but split on colons. Proposed in fish-shell#436
Prior to this fix, fish would export arrays as ASCII record separator delimited, except for a whitelist (PATH, CDPATH, MANPATH). This is surprising and awkward for other programs to deal with, and there's no way to get similar behavior for other variables like GOPATH or LD_LIBRARY_PATH. This commit does the following: 1. Exports all arrays as colon delimited strings, instead of RS. 2. When importing environment variables, if the variable ends in PATH, split it on colons (i.e. splitenv); otherwise do not split it. Colons are not escaped; this is deliberate to support uses like `set -x PYTHONPATH "/foo:/bar"` which ought to work (and already do, we don't want to make a compat break here). Fixes (or at least mitigates) fish-shell#436
#5245 has been merged, so this seems solved. |
fish special cases three PATH-style: PATH, CDPATH, and MANPATH. These and only these must be set with spaces between items rather than colons. See: fish-shell/fish-shell#2198 fish-shell/fish-shell#436
In
expand.h
:This results in:
It's clear that the char \x1e is treated specially as element delimiter.
The text was updated successfully, but these errors were encountered: