Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

special-casing PATH/MANPATH/CDPATH is weird; we need a more general solution like zsh "tied" variables #436

Closed
xiaq opened this issue Dec 11, 2012 · 52 comments

Comments

@xiaq
Copy link
Contributor

xiaq commented Dec 11, 2012

In expand.h:

/** Character for separating two array elements. We use 30, i.e. the ascii record separator since that seems logical. */
#define ARRAY_SEP 0x1e

/** String containing the character for separating two array elements */
#define ARRAY_SEP_STR L"\x1e"

This results in:

xiaq@blackie ~> set a (printf 'a\x1eb')
xiaq@blackie ~> count $a
2
xiaq@blackie ~> set a (printf 'a\x1fb')
xiaq@blackie ~> count $a
1

It's clear that the char \x1e is treated specially as element delimiter.

@ridiculousfish
Copy link
Member

Is the concern that there's no way to represent \x1e? Or are you thinking about architectural improvements?

I think the array separator is mainly used in persisting arrays in places that only take strings, like universal variables or environment variables.

@xiaq
Copy link
Contributor Author

xiaq commented Dec 12, 2012

在 2012-12-12 上午2:09,"ridiculousfish" notifications@github.com写道:

Is the concern that there's no way to represent \x1e? Or are you thinking
about architectural improvements?

I'd say I have both conerns in mind. For the former, just think about a
filename with embeded \x1e. POSIX says anything but \0 is allowed in
filenames, so it's perfectly possible. Using \0 as array delimiters may be
a slightly better choice, but that leads to the second concern - it's
fragile and plainly wrong.

I think the array separator is mainly used in persisting arrays in places
that only take strings, like universal variables or environment variables.

If persisting means "serialized", no. Arrays are always stored in
\x1e-delimited form. Exported environment arrays are joined by ":". They
are used in universal variables though - which IMHO should be implemented
with proper escaping instead.


Reply to this email directly or view it on GitHub.

@JanKanis
Copy link
Contributor

JanKanis commented Jan 4, 2013

How about using a private use area character as separator? Fish already uses some of those in some cases.

@xiaq
Copy link
Contributor Author

xiaq commented Jan 5, 2013

@JanKanis That's no better; it's perfectly possible in filenames (considering filesystems using non-utf8, native encodings) and other strings.

I would say that among others, "\0" is a slightly better choice among others, since that's the only character UNIX forbids in filenames, but it still has the bad smell to me. Also, we are already doing a lot of splitting and assembling of arrays, I would expect implementing true arrays to result in both better architecture and less code.

@JanKanis
Copy link
Contributor

JanKanis commented Jan 5, 2013

Fish already handles private use characters and invalid bytes when it encodes external strings to wchars. These special values are encoded byte by byte into a specific set of private use chars that fish also decodes again on output, so in principle using another private use char could work. However I agree using true arrays is much better. There is one complication in that communication between fish and fishd happens over a socket using utf8 strings, and there fish uses (I think) the escape sequence "\x1e" (rather than an 0x1e byte) to separate array items. But that could probably be solved by using e.g. a private unused escape sequence.

@anordal
Copy link
Contributor

anordal commented Oct 16, 2013

I share xiaq's concerns, but (for the practical ones) I find the implicit splitting on \n way more offensive:

a@raspeball ~> count (printf 'a\x1eb')
1
a@raspeball ~> count (printf 'a\nb')
2

The (newline delimited) array interpretation of subprocess output should be explicit & optional!

But that has probably nothing to do with the underlying storage of arrays…

@szhu
Copy link

szhu commented Oct 6, 2015

@xiaq: What's wrong with using \0? xargs seems to use it as its "reliable" option.

@zanchey
Copy link
Member

zanchey commented Oct 9, 2015

You can't use \0 in environment variables, so it would make it hard to export arrays to child shells.

@krader1961
Copy link
Contributor

krader1961 commented Nov 26, 2015

I'm migrating from zsh where I use this sequence to define the $LESS env var in a sane fashion by leveraging its "tied" variables feature:

typeset -xT LESS less ' '
less=(
    --HILITE-UNREAD
    --LONG-PROMPT
)

I've omitted the complete list of options for brevity. In zsh that results in the $LESS env var being a space separated list of the options in the $less array. The equivalent in fish results in the elements being separated by the record separator (\x1e) character. Despite the documentation saying that the elements of the array will be separated by spaces (modulo the special arrays such as PATH). I have to explicitly do an assignment that interpolates the values into a single string to get the expected result:

set -x LESS --HILITE-UNREAD \
    --LONG-PROMPT
set LESS "$LESS"

At the moment I don't really care if \x1e is used internally for serializing arrays rather than \x00. I do care that exported arrays have their elements separated by \x1e. That's just broken, wrong, fubar. Pick your adjective. It's also inconsistent with the aforementioned workaround and documented behavior. This issue should be tagged as a bug IMHO.

P.S., Nowhere in the documentation is the use of the record separator character (\x1e) mentioned. Which is another problem.

@ridiculousfish
Copy link
Member

@krader1961 Thanks for sharing this. There's no standard Unix convention for list-like environment variables - some are colon delimited, others are space delimited. fish uses \x1e so it can distinguish its own arrays.

Can you please point us at the erroneous documentation?

How do you think arrays be exported - colons, spaces, newlines, something else? Should fish tokenize environment variables on this character as well?

It looks like less expects space-delimited arguments. Probably the simplest workaround is set -x LESS '--HILITE-UNREAD --LONG-PROMPT', etc.

@krader1961
Copy link
Contributor

There's no standard for list like environment variables because by definition they're an arbitrary sequence of bytes composed of a key and value separated by an equal sign and terminated by a null byte. They don't even have to be printable characters. The only widely accepted convention for a higher level of abstraction is the one established by the execlp() function for the PATH env var.

The documentation is erroneous in as far as it makes no mention of using \x1E, \036, 30, or the "record separator" character to separate elements of an array when exporting a var with more than one element. The documentation does state that

..., and array variables will be concatenated using the space character.

That's from the section "Variable expansion" in http://fishshell.com/docs/current/index.html. It's reasonable to infer that statement also applies to exported vars that are not special-cased as documented in the "Arrays" and "Special variables" sections of that same document.

It's my feeling that fish should not automatically tokenize env vars into a list outside of the colon-delimited special case vars such as PATH. There should, however, be a robust means by which a user can tokenize a var into an array on an arbitrary character.

Absent a mechanism for configuring the character to be used on a var by var basis (ala the zsh "typeset -T" command) a space should be used when concatenating the elements of the array (again, excluding the colon separated special-case vars). Obviously this does not apply to private data stores such as the storage of universal variables.

Lastly, I couldn't find any uses in the standard fish functions where an env var is used to pass an array containing more than one element to another function or script. Such use cases may exist but it should require the scripts to explicitly cooperate in the serialization/deserialization of the data rather than rely on fish to implicitly reconstruct arrays from vars whose strings contain the record separator character.

@ridiculousfish
Copy link
Member

Thanks for your thoughtful response. The section you quoted about concatenation using space is specifically for double-quoted strings. We should add some discussion of what happens with exported arrays.

Users can tokenize strings with e.g. set SOMEPATH (string split : $SOMEPATH).

The downside of space-concatenating exported variables is that they get changed when fish is run recursively. Today this works:

> set -x list 1 2 3
> count $list
3
> fish -c 'count $list'
3

But if we exported with spaces, this would show 1 for the recursive call. As you say we don't rely on this, but it's nice from a consistency standpoint.

@faho
Copy link
Member

faho commented Nov 27, 2015

Thanks for your thoughtful response.

I'll have to second that! Always nice to have a fresh perspective on things.

For those new to this discussion, I think I'll have to bring in some of the things that are related to this.

What comes to mind immediately is the listify whitelist, which shows up in issues like #2090.

This means that for $PATH, $CDPATH and $MANPATH, they'll appear as lists/arrays to fish, but when exported, will be joined with ":" again. Then a fish-inside-a-fish will split them again. This operates on colons, not \x1e. From my understanding of the code it appears to do it on every colon, with no chance for escaping, so it might break on $PATH entries with a colon inside them - which UNIX allows inside filepaths, though it seems broken for $PATH at least. This scheme is also used for e.g. PYTHONPATH and GOPATH.

I'd love to have something slightly more explicit for splitting environment variables than the implicit always-split-on-\x1e-except-for-these-three-split-them-on-colon, because this is actually two different schemes in one and exporting a list currently will always confuse everything but fish.

My preferred solution would be a function like splitenv var1 var2 var3:

function splitenv --no-scope-shadowing
    set -e IFS # string split doesn't have superpowers, so unset IFS lest we split on both : and \n
    for arg in $argv
        set -q $arg; and set $arg (string split ":" $$arg)
    end
end

(If string split had superpowers, this would be a bit simpler)

All lists would then be joined-with-colons when exported, so a user can explicitly unjoin them with splitenv (though I'm not dead-set on a helper function, I do believe making this trivial is a good thing to do). For backwards-compatibility, splitenv PATH CDPATH MANPATH would be executed on startup. If a user wishes to export it differently, string join is available.

All of this means that we no longer need \x1e, we have a scheme that at least has a fighting chance of being understood by other programs, but the (rather exotic IMHO) fish-inside-fish now becomes fish -c 'splitenv list; count $list'.

The problem is of course that, as mentioned, the usual colon-separated-list scheme has no way of escaping a colon, and if we wanted to add one, string split doesn't have a "--unescaped" option to only split on unescaped separators.

Am I making any sense?

@ridiculousfish
Copy link
Member

@faho I think that idea has merit. The worst part of the old scheme was implicitly splitting on colons, which would mangle variables that should not be split. In your idea this is (almost) always explicit so I think it's quite safe.

Regarding escaping, not escaping colon in PATH is intentional per the link you found. I doubt PYTHONPATH, CLASSPATH, etc. are any more consistent in this regard. Since you can't use a colon in these paths, we can choose whether or not we escape it; but if we escape a colon we need to escape backslashes, and I'll bet you can have a backslash in PATH. We may need a "don't escape" whitelist (ugh).

Alternatively we don't worry about it, and just let any colon act as delimiters. I think I lean towards this for simplicity and familiarity with other shells.

We still are faced with the problem that some list-like variables are space-delimited, and others are colon delimited. One possibility is that splitenv accepts a delimiter, remembers it, and uses it to rebuild the value on export:

splitenv --on ' ' LESS
splitenv --on ':' PYTHONPATH

These calls now play the dual roles of importing any existing variable, and marking how it gets exported. What do you think?

Also, is there a way to do this without editing config.fish? Maybe as part of universal variables?

@faho
Copy link
Member

faho commented Nov 27, 2015

We still are faced with the problem that some list-like variables are space-delimited, and others are colon delimited. One possibility is that splitenv accepts a delimiter, remembers it, and uses it to rebuild the value on export:

Sounds good. Though at that point making splitenv a script probably wouldn't help, since we'd need cooperation from the C++ side anyway.

These calls now play the dual roles of importing any existing variable, and marking how it gets exported.

It's possible that now "splitenv" isn't the perfect name anymore (it was when I thought of it, of course 😆 ) - I've also considered "listify".

Though it's bugging me that I can't remember where we've had a related discussion before - I think I'll need to scour the issues again tonight.

@krader1961
Copy link
Contributor

Users can tokenize strings with e.g. set SOMEPATH (string split : $SOMEPATH).

The string command is not documented anywhere I can find. Also, man string shows the string(3) man page that documents the string manipulation functions on BSD (and Mac OS X).

But if we exported with spaces, this would show 1 for the recursive call. As you say we don't rely on this, but it's nice from a consistency standpoint.

That behavior is, however, surprising. I'm willing to bet that if you ask 100 people what happens when a var with more than one element is exported 90 of them will say the values are concatenated with space as a separator. A few might say comma or other char is used as the separator. And the two persons who ran env will say the values are concatenated with no separator because unless you filter the output through something like cat -evt the record separator character is invisible.

which shows up in issues like #2090

I'm sorry but I don't see any merit to that user's complaint. The problem is trivially worked around by explicitly testing whether MANPATH is already set. Which, it seems to me, is something you have to do in any event given the semantics of leading versus trailing colons.

so it might break on $PATH entries with a colon inside them

It is at least thirty years too late to fix that. We should not implement escaping of colons (and by extension the escape character) as that would be non-standard behavior. Until recently I spent 20+ years as a UNIX support specialist. I have never heard someone complain that the presence of a colon in a directory embedded in $PATH or a similar variable was an issue.

My preferred solution would be a function like splitenv var1 var2 var3

That's fine although it's not clear why the (undocumented) string split isn't sufficient. Regardless of whether we need a new function we definitely should not add any new auto-split env vars. The only two that are common enough to warrant that behavior are PATH and CDPATH (and MANPATH since its already special-cased). Other vars like PYTHONPATH can be explicitly split by a user if they find that useful.

However, having said that there certainly should be a way to register that a given var (e.g., PYTHONPATH) should have its elements concatenated with a specific separator char when being exported. The most natural way of doing this is via a new option for the set command. For example,

set -x -S ':' PYTHONPATH dir1 dir2 dir3

This would not affect how the var is stored in the universal var data store where the record-separator char would still be used and it would be auto-split when loaded from that data store. To be determined is whether the registered separator char for export should also affect string interpolation. My feeling is that it should. That is, if the above "set" command is executed then a subsequent

echo "PYTHONPATH is $PYTHONPATH"

should use a colon rather than a space for concatenating the values of PYTHONPATH. The default separator is a space to preserve existing semantics and minimize surprise for the user. Note that the special-cased vars like PATH would also use a colon in that example. Which is incompatible with the current behavior but is consistent with the new semantics and less surprising. In other words, why are the elements of $PATH separated by colons in the exported environment but spaces in the output of

echo "PATH is $PATH"

@zanchey
Copy link
Member

zanchey commented Nov 28, 2015

The string command is not documented anywhere I can find. Also, man string shows the string(3) man page that documents the string manipulation functions on BSD (and Mac OS X).

Easy, tiger. It's in the development versions - see https://github.com/fish-shell/fish-shell/blob/master/doc_src/string.txt

@faho
Copy link
Member

faho commented Nov 28, 2015

That's fine although it's not clear why the (undocumented) string split isn't sufficient.

My original idea was that it's a convenience function so this operation becomes completely trivial. With @ridiculousfish's proposal it becomes something more and adjusts some kind of store so the variable will also be joined on that character when exported. string split is just a command that splits a string - basically our version of cut.

The most natural way of doing this is via a new option for the set command.

That's another option, though I'm not completely sold on the semantics. E.g. set -S ':' PYTHONPATH. Would that set PYTHONPATH to the empty list or would it just split the existing PYTHONPATH? So far all set options have done the former, so you'd have to do set -S ':' PYTHONPATH $PYTHONPATH. Or we'd make it not do that and have inconsistency within the same tool.

In other words, why are the elements of $PATH separated by colons in the exported environment but spaces in the output of echo "PATH is $PATH"

That's actually a good question. Of course you wouldn't expect the separator to show up in say for p in $PATH; echo $p; end, but joining it with the per-variable separator char might be the right thing to do. Of course there's string join to do it manually.

That behavior is, however, surprising. I'm willing to bet that if you ask 100 people what happens when a var with more than one element is exported 90 of them will say the values are concatenated with space as a separator.

There's a general problem with doing design-by-survey and fish. Because the surveyed people would frequently have knowledge of bash (and to a lesser extent other POSIXy shells) while the very idea of fish is to do something better by abandoning at least some of POSIX.

That's not to say it's completely worthless, it's just something to keep in mind - if we stuck to these kinds of ideas, we'd have bash's word-splitting behavior and if-fi.

@krader1961
Copy link
Contributor

Would set -S ':' PYTHONPATH set PYTHONPATH to the empty list or would it just split the existing PYTHONPATH?

It would set it to an empty list. If the user wants to retain the existing value they have to explicitly include it (see below).

We already have all the needed capabilities with the exception of a means to configure the character (or the empty string) to be used when concatenating array elements of a given var for export or interpolation. If someone wants to manipulate a var like PYTHONPATH they can either treat it as a simple string:

set PYTHONPATH "/a/new/dir:$PYTHONPATH"

Or they can treat it as an array:

set -S ":" PYTHONPATH /a/new/dir (string split ":" $PYTHONPATH)

Note that my proposal to use the split/concatenation character rather than a space when interpolating into a string provides consistent behavior regardless of whether or not the user splits the var into an array.

I am definitely not suggesting design by committee. That way lies madness and bogosities like zsh. I'm simply pointing out that when given two or more options with no other reason to choose one over another then picking the option that will least surprise a user of the shell is the best choice. It's also why I'm (for the moment) opposed to introducing new commands or behaviors such as auto-splitting vars (other than PATH and CDPATH, of course). This is the sort of thing that is done infrequently and usually only in config.fish and a few specialized functions like Anaconda's "activate" script. And the way to make the latter behave correctly regardless of whether or not the user has already split the var into an array in his config.fish is to always treat it as a string that needs to be split. For example, if PYTHONPATH needed to be amended it might do something like this:

# Hypothetical snippet from the Anaconda activate script.
if test (count PYTHONPATH) -gt 1
    set -S ':' PYTHONPATH /activated/python/tree $PYTHONPATH
else
    set PYTHONPATH "/activated/python/tree:$PYTHONPATH"
end

Or, more simply,

# Hypothetical snippet from the Anaconda activate script.
set -S ':' PYTHONPATH /activated/python/tree (string split ':' $PYTHONPATH)

Yes, that potentially turns what may have been a simple string into an array. But with my rule that the character specified by the -S switch is used when exporting and interpolating that conversion into an array should not matter in practice. There is, however, one corner case. What happens if the user has not explicitly converted the var to an array in his config.fish and he then runs something like the hypothetical script above. The var potentially becomes a multi-element array which means that if they subsequently do

for elem in $PYTHONPATH
   echo $elem
end

That won't execute the body of the for loop just once with the value in the form of colon-separated directories as the user might expect since they were unaware of the splitting done by the hypothetical "activate" script. I think we can live with that as it would be perverse for a user to do something like that.

@szhu
Copy link

szhu commented Nov 29, 2015

tl;dr I think lists should "remember" their delimiter and below is why.


I agree with a lot of the most of the above. One thing that still seems tediuous though is that the commands above still seem overly verbose; i.e., it's sometimes simpler to describe some of these commands in plain English.

As an example (and I'm not focusing on length as much as on the number of repeated things):

  • fish: set -S ':' PYTHONPATH /activated/python/tree (string split ':' $PYTHONPATH)
  • English: "append /activated/python/tree to PYTHONPATH (:-delimited`)"

There are two repeated things here: PYTHONPATH and the delimiter :. That PYTHONPATH should be repeated is arguably okay for two reasons, and neither of these two reasons apply to the delimiter.

  1. It's not hard to intuit what is going on when someone says set PYTHONPATH /activated/python/tree $PYTHONPATH, because this is very similar to things like i = 2 + i, which is a very familiar concept/idiom. (But we still have shortcuts like +=, which is why I propose the --append flag below.) On the other hand, when people think about appending to a list, they do not think about splitting and joining on a delimiter. They don't think about converting the entire list to some other format, doing the real operation to it, and putting it back. In their mind, they naturally read the delimiter as a delimiter instead of changing it to some "internal" or "preferred" delimiter.
  2. Using a plain set command for appending saves adding another command for joining two different lists. On the other hand, converting from one delimiter to another is something that we ideally never want the user to do manually, mostly for the reason above.

In contrast, I suggest another way to specify the delimiter on lists: associate it with the list indefinitely. So the example above might be done as follows:

# Changes the delimiter for this list. This might be done in some global config file for common lists as this one.
set -S ':' --no-modify PYTHONPATH
# or, workaround if you don't want to add extra options to set:
set -S ':' PYTHONPATH $PYTHONPATH

# The actual append operation
set --prepend PYTHONPATH /activated/python/tree
# or, workaround if you don't want to add extra options to set:
set PYTHONPATH /activated/python/tree $PYTHONPATH

Implications/follow-up questions/etc:

  1. This is fairly compatible with the current suggestion. Here are the changes required:
    • Make the delimiter stick (possibly by using another variable as __fish_sep_PYTHONPATH)
    • (optional) Add a flag I'm currently calling --no-modify, which tells fish to change the delimiter of a list without changing its contents. Possibly also add --append and --prepend flags because of reason (1) above. Anyway this one isn't required, as the workaround above shows, à la how appending and prepending are done in fish today.
  2. In fish, lists must at least be treated like first-class citizens. This means that changing the delimiter should change the string representation, not the list representation (unless the delimiter is present in one of the elements, in which case splitting there is inevitable). For example, if you're changing delimiters from , to :, ["0:1", "2"] should become ["0", "1", "2"] and not ["0", "1,2"] (which is what would happen if you simply change the delimiter without changing the string backing the list). This behavior should maximize compatibility with the current behavior and the fact that there is currently a default, immutable delimiter.

Here's the bottom line:

  • This involves a lot less tokens, and almost nothing is repeated.
  • This parallels the mental model many users have. Users think in these terms: "prepend", "set the delimiter", "don't modify".
  • This looks like the only right way to do this task (the old way now looks extra clumsy), and so these additions do not destroy orthogonality.
set --no-modify -S : PYTHONPATH
set --prepend PYTHONPATH /activated/python/tree

@krader1961
Copy link
Contributor

Thank you, @szhu, for the detailed comment regarding my proposal. However, there are many problems with your proposed solution. For example, the addition of the --no-modify option does in fact modify the variable by converting it into a list and thus does modify the variable. While I reject nearly all the elements of your proposal it did make me think about a more straightforward solution that would address most, if not all, of your points. Perhaps there should be a mechanism for telling fish that a given env var should always be automatically split and reconstituted on a given token (e.g., ":" or " "). This might be called an auto-array designation and when executed any existing value would be immediately split if it was not already an array.

A new option could be added to the set command to activate this behavior. However, I'm concerned that doing so is ambiguous and could be interpreted as defining a variable with no value. Would adding a -A token option to the set command be unambiguous? For example:

set -x -A ':' PYTHONPATH

Presumably that would immediately convert any existing PYTHONPATH env var to an array after being split on a colon. It would conversely result in the values being concatenated on a colon when exported or interpolated into a string. Similarly, even if PYTHONPATH did not exist at the time that command was executed the auto-array specification would be remembered and subsequent uses would be affected. For example, this would obviously create an array:

set PYTHONPATH /a/path /b/path /c:/d/path

But what about that last argument? Should it be automatically split into two tokens?

Note that this behavior should only apply to exported vars and an error would be raised otherwise. There are also some corner cases that need to be spelled out. For example, what if the original auto-split declaration includes values as in this example:

set -x -A ':' PYTHONPATH 'hello:goodbye' $PYTHONPATH

Should those values be split on the auto-split token? Or should it result in an error and require modifying the value be done in a separate statement? And whichever syntax is chosen you still have the issue of what to do about values that contain the auto-split token. The devil is in the details. Which is to say there may be other ramifications of this proposal I haven't thought about. My original proposal with a more verbose syntax avoids those issues as far as I can tell.

@szhu
Copy link

szhu commented Nov 30, 2015

@krader1961, thanks for your response. However, you seem to think that I'm converting variables from strings to lists. I think you're misunderstanding one important concept in fish: every variable is a list of strings. The variables that appear to be strings are actually lists of length 1. fish treats them no differently than lists of length 0 or 2 or any other length.

Also, note that while the underlying string used to pass around environment variables might change when you change delimiters, one of the fortes of fish is that the user typically does not need to think about delimiters at all. This is why I recommend that all the -S option does is specify how this list should be converted to a string when it is exported outside fish. -S should not change the in-fish list representation (except in cases where it's impossible to represent that list using the target delimiter).

By the way, here is one example that shows how clean my proposal is. Here is code for converting a variable $L back to the default delimiter of \x1e. It will have absolutely no effect on any variable (any scope, any number of items) that is able to be created in fish today.

set -S \x1e L $L

One more thing: the --no-modify family of arguments are just shortcuts. Here are their equivalents:

shortcut equivalent
set [other args] --no-modify L set [other args] L $L
set [other args] --prepend L $TOADD... set [other args] L $TOADD... $L
set [other args] --append L $TOADD... set [other args] L $L $TOADD...

(I've stated the following before but I think I can do a better job explaining it now.) By emphasizing how "dumb" these three arguments are, some may question if they're needed at all. One may cite that fish has a design principle of orthogonality. When all things are orthogonal, this means that for any big task you want to do, it should be obvious which set of features to pick to do that task -- there should be only one right way to do it. Here, I indeed add another way to prepend to/append to/prevent modification to a list, but this is only because I think the equivalents being replaced are unnecessarily verbose; they should not be the right ways to append modify lists. One way to convince yourself of this is to think about how you think of appending to a list. You probably think "append $TOADD to $L" rather than "set $L to $L $TOADD".

Let me know what you think, and if this makes a more convincing case for my proposal. (Also it's pretty common for me to misunderstand things so feel free to correct me.)

@krader1961
Copy link
Contributor

@szhu I am quite aware that all vars in fish are lists of zero, one, or more values. You also apparently did not read my earlier comments where I clearly state that the associated delimiter should not affect the internal representation or how the values are persisted to the universal data store (other than storing the delimiter). You also did not address my previous points. Consider your last example:

set -S \x1e L $L

What should that do if L already contains two or more values? Presumably nothing other than changing the associated delimiter. Would the $L argument be optional in that case? Or should it first convert the existing array into a simple string (presumably concatenating using the existing delimiter) then split that string on the new delimiter? As I said before, the devil is in the details.

Ultimately the established designers and maintainers will decide whether or not your --no-modify family of functions should be added but I vote no as they don't add enough value in my opinion relative to their cost.

@szhu
Copy link

szhu commented Dec 1, 2015

Sorry, this thread is long, I must have missed your acknowledgement of this above; good to know we're on the same page! I think I've addressed most of your concerns above as well, but not all of it. I'll specifically address each of your concerns below.


1. Is the $L optional in set -S \x1e L $L?

The existing behavior of set will not change. Under current behavior, set L $L does not change L and set L makes L an empty list. Same with set -S \x1e L $L and set -S \x1e L.

1.1 Doesn't set -S \x1e L $L seem overly verbose for just changing the delimiter?

Slightly. That's why I propose the --no-modify option as a shortcut for this.

But my plan won't fall apart if this shortcut does not exist. We already deal with this issue every day when we append lists: set PATH ~/.bin $PATH. That's why, for the same reason, I propose --prepend and --append as well.

2. How would the conversion process take place?

Let's say our old delimiter is \x1e and our new one is :, and that we have a fish list ["hello", "world"] (exported as hello\x1eworld). There two basic ways to do conversion ("conversion options"):

  1. Use ["hello", "world"] and convert it to​ ["hello", "world"] (exported as hello,world)
    Advantage: The list representation doesn't change.
  2. Use hello\x1eworld and convert it to ["hello\x1eworld"] (exported as hello\x1eworld)
    Advantage: The exported value representation doesn't change.

Note that this is from a UI perspective, not an implementation perspective; we're talking about what it appears like to the user. I'll cover implementation in the next question. Note: the rest of this answer is new stuff that I have not addressed above, prompted by your questions. Thanks!

Within fish. First, if we are working entirely in fish, lists are first-class and you should never have to worry about delimiters, and so neither are required. (Again, "should" is from a user's perspective, as developers, it is our responsibility to make this true.)

Changing export format of vars. Thus, the only reason a user will need to change delimiters is to change the exported string for programs that read environment variables. For lists that are created in fish, we will definitely use conversion option 1, because the the variable's meaning as a list is important and well-defined, so we need to preserve the list representation.

Changing import format of vars. However, for environment variables like PATH that are initially created outside of fish, we need to, for a list that already has a delimiter, tell fish what that delimiter is. For this we can use conversion option 2.

2.1 How would this be implemented?

Fish even though the user should not need to know this, fish actually stores lists as strings. The variable x is stored as hello\x1eworld. Under my proposal, there would be another variable, __fish_delimiter_x, that specifies the delimiter of variable x. It does not exist right now and so we use the default delimiter, \x1e.

For conversion option 1:

  1. Split the variable on old the delimiter, resulting in a true list in the implementation language (C++).
  2. Join the list using the new delimiter, resulting in a new string.
  3. Save the new delimiter into __fish_delimiter_x.

For conversion option 1, an equivalent implementation:

  1. In the variable, replace all occurrences of the old delimiter with the new one.
  2. Save the new delimiter into __fish_delimiter_x.

For conversion option 2:

  1. Save the new delimiter into __fish_delimiter_x.

2.2 If we need both conversion options, how would the user specify which to use?

Perhaps we can have two options: -D or --convert-delimiter for option 1 and -d or --set-delimiter for option 2.

2.3 Do we really need both options?

Under current fish we choose to assume that we won't see \x1e in the wild outside of fish. If we keep this as the default delimiter and keep this assumption, conversion option 1 is sufficient to both convert and set the delimiter and we will not need conversion option 2. (A easy way to be convinced of this is to realize that if the assumption is true, when converting lists created externally, conversion option 1's step "replace all occurrences of the old delimiter with the new one" will do nothing, reducing the entire conversion to conversion option 2.)


@ridiculousfish, I would appreciate your feedback on this too, specifically regarding the UI and implementation details. Thanks!

@ghost
Copy link

ghost commented Dec 10, 2015

It seems that there are two issues here. Let's just talk about the first one?

+1 for true array

Is that separator trick really needed for fish? Ping #627

@szhu
Copy link

szhu commented May 29, 2018

re: #436 (comment) @zanchey
I haven't read #2090 in detail, but I believe converting between the colon-delimited-string and array forms is completely seamless (except for when colon does not appears appears in array items).

To include a double-colon in MANPATH, just add an empty string where the double colon should appear:

$ set -x MANPATH 1 2 '' 3
# Check if it's set
$ bash -c 'echo $MANPATH'
1:2::3

To start MANPATH with a colon, just add an empty string item to the beginning:

$ set -x MANPATH '' 1 2 3
# Check if it's set
$ bash -c 'echo $MANPATH'
:1:2:3

@cben
Copy link
Contributor

cben commented Aug 13, 2018

I haven't followed everything here, but as a user I want to advocate for "no configuration".
I think set -S and splitenv are forms of configuration. Some users would do them in fish.config and handle PYTHONPATH as array. Others would not and handle PYTHONPATH as single colon-delimited word. Copy-pasting stackoverflow advice and running scripts manipulating PYTHONPATH from one user to another would then not always work...

A fixed "if it ends with PATH" rule is configuration-free and sounds as perfect as you can get 👍
(I don't have opinion on whether it's worth the backward incompatibility)
Yes, set -x TEST2_PATH color:red font:serif would be imported as color red font serif array but that's the deal with exporting variables. You can't really set an exported var to an array without understanding how it works.

@szhu
Copy link

szhu commented Aug 13, 2018

Yes. My hypothesis is that exporting arrays is not common, outside of path lists.

@ridiculousfish that may be true in current shells, but I imagine that as fish gains more traction, users might want to leverage fish's ability to send lists to child fish shells. I can imagine there may eventually be programs/plugins that manage the state of a fish session (I'll check back on this comment in a few years to see if this is true), and being able to universally auto-de/serialize lists will make that code more clean and less workaround-y.


Sort of similar but slightly different thought: Treating PATH as special case is an anachronistic edge case that users probably understand only if they have a history on typical use cases of shells. This limits fish's ability to be used as a general scripting language, and limits some potential future use cases.

@thuzhf
Copy link

thuzhf commented Sep 4, 2018

@ridiculousfish I think one possible solution is to associate each environment variable/array with its own separator (and you can keep '\x1e' or ' ' or ':' as the default one), and the user who creates the environment variable is responsible for choosing appropriate separators to avoid conflicts. The command may be like: set --separator ':' TMP 1 2 3. Thus for those well-known environment variables, users can just choose the corresponding well-known separators which can also be recognized by other programs and can make fish more compatible with more programs (such as Python).

@szhu
Copy link

szhu commented Sep 4, 2018

For those who are only reading recent comments, just a note that @thuzhf's set --separator recommendation is the same as the set -S recommendation mentioned repeatedly throughout this thread. To get more context on that discussion, you can grep this page for set -S.

@thuzhf
Copy link

thuzhf commented Sep 5, 2018

@szhu Sorry for not noticing the previous set -S. That's basically what I want too. I also noticed that there were several concerns others had on this new option. I can give my thoughts on these concerns below (since fish's set haven't used -s as an option, I will use -s to refer to --separator hereafter):

  1. --no-modify does modify something. Yes, and you should change the name to be explicit, e.g., --change-separator.
  2. There are some corner/tricky cases. This is basically due to the not-well-defined syntax, and can be avoided naturally if we give a strict definition of the syntax. For example:
    1. Basic idea: every var (list) is associated with its own separator when defined (default is ' '). This separator will be used when this var is created from string and when it is converted to string (this is a common idea in some languages such as Python's join() function). A var is converted to string when exported or when the user wants to do this.
    2. How to create env vars
      1. set ENV_VAR a b c. Without -s, we choose ' ' as the default separator.
      2. set -s ':' ENV_VAR. In this case, ENV_VAR is set to be an empty list.
      3. set -s ':' ENV_VAR a b:c d e:f. In this case, users who write this code should clearly understand ':' is the separator and understand ENV_VAR will be an array like ['a b', 'c d e', 'f'] and will be exported as 'a b:c d e:f'. What if you want the exported ENV_VAR starts with spaces and ends with spaces? You should use escapes like: set -s ':' ENV_VAR \ a b:c d e:f\ . Then ENV_VAR will be [' a b', 'c d e', 'f '] and will be exported as ' a b:c d e:f '.
      4. set -s ':' ENV_VAR a b:c d e:f $ENV_VAR. In this case, it depends on how $ works. If it is defined as extracting the string value of ENV_VAR instead of the list, then this command will be the same as just replacing $ENV_VAR with its string value converted from the underneath list, and in this case, set -s ':' ENV_VAR a b:c d e:f:$ENV_VAR is probably what you really want (notice the : after f); if it is defined as extracting the variable of ENV_VAR (which is a list instead of string), then this should become a list extending operation just as in python. E.g., in the latter case, if ENV_VAR is ['x', 'y'] before, then after this operation ENV_VAR will become ['a b', 'c d e', 'f', 'x', 'y']. What if ENV_VAR's previous separator is not ':'? In the former case, it's your responsibility to make sure you are doing the right thing, e.g., you probably should use consistent separator by changing original separator to ':' or by changing current separator to be the original one. In the latter case, this will set the separator of this array from the original one (no matter what it is) to ':'.
    3. How to change separator
      1. set --change-separator ':' ENV_VAR. If ENV_VAR doesn't exist, the program should exit with error code other than 0. Easy and explicit enough.
    4. How to view separator
      1. set --view-separator ENV_VAR.

Besides, I really think this problem is obvious and urgent and a big pain point for users and hope this problem can be solved as soon as possible because this really affects user experience greatly. Actually I haven't encountered other (even very small) problems for now using fish except this so big one.

@faho
Copy link
Member

faho commented Sep 5, 2018

I really think this problem is obvious and urgent

@thuzhf: I'd say you're overestimating that.

One reason being that your problem in #5169 was with $LD_LIBRARY_PATH, but that's not actually a list in fish! You should set it like set LD_LIBRARY_PATH "$LD_LIBRARY_PATH:/some/path", just like in other shells.

Fish turns exactly three inherited/exported variables into lists automatically:

$PATH, $MANPATH and $CDPATH. And exactly this list will have a ":" separator when exported.

Other "standardized" variables like $LD_LIBRARY_PATH should not be handled as lists in fishscript, so you don't have this issue. Variables that aren't standardized you can handle however you want, since other programs won't do anything with them anyway, so the separator is non-critical.

@thuzhf
Copy link

thuzhf commented Sep 5, 2018

@faho Thanks for your clear explanation. That really makes a lot sense to me. OK, I can say this problem is solved for me.

@ridiculousfish
Copy link
Member

I took a look at the MANPATH issue described in #2090. The scenario is to append to manpath such that it continues using system paths.

In bash one would write this as export MANPATH="$MANPATH:/new/path". If MANPATH is set, this will append to it. If unset, this will prepend a colon, which is a man-specific indication to use system directories. This syntax doesn't work in fish; the problem is that MANPATH is an array and so "$MANPATH" will have spaces instead of colons.

A "tied variables" approach would allow us to have e.g. fish_manpath as an array that mirrors MANPATH as a colon-separated string. This could be built entirely in fish script. However we would want to do this for all path-like variables, not just MANPATH, and that would be a significant compatibility break which it's not clear how to manage. Also it has the same problems, e.g. the manpath array-variable in zsh is awkward to append to, so it's not clear why it even exists.

My proposal here doesn't make the MANPATH situation better or worse; I think the thing to do is punt and just have an easy story for appending to MANPATH, which is this:

set -q MANPATH || set MANPATH ''
set -x MANPATH $MANPATH /new/path

That is not too painful to paste into config.fish.

@faho
Copy link
Member

faho commented Sep 16, 2018

My proposal here doesn't make the MANPATH situation better or worse; I think the thing to do is punt and just have an easy story for appending to MANPATH, which is this:

@ridiculousfish: I've been thinking of going one step further, actually: Split these special variables on ":" also on assignment, and join them with ":" instead of space in quoted expansion.

That means when you do set -gx MANPATH "$MANPATH:/new/path", fish goes and performs the splitting automatically, resulting in the equivalent of set -gx MANPATH "" /new/path.

Now, this means that ":" can't appear in a path in $MANPATH (and $PATH, and $CDPATH), but they can't do that anyway because it'd break non-fish utilities!

That would also allow us to maybe one day remove the special handling, because it adds a cross-compatible way of handling it - you'd just have to assign with the :, and use it with (string split : -- $MANPATH), and it would work even if that handling were removed.

ridiculousfish added a commit to ridiculousfish/fish-shell that referenced this issue Sep 16, 2018
splitenv takes a variable name, and sets that variable to its value
but split on colons.

Proposed in fish-shell#436
ridiculousfish added a commit to ridiculousfish/fish-shell that referenced this issue Sep 16, 2018
Prior to this fix, fish would export arrays as ASCII record separator
delimited, except for a whitelist (PATH, CDPATH, MANPATH). This is
surprising and awkward for other programs to deal with, and there's no way
to get similar behavior for other variables like GOPATH or LD_LIBRARY_PATH.

This commit does the following:

1. Exports all arrays as colon delimited strings, instead of RS.

2. When importing environment variables, if the variable ends in PATH,
split it on colons (i.e. splitenv); otherwise do not split it.

Colons are not escaped; this is deliberate to support uses like

   `set -x PYTHONPATH "/foo:/bar"`

which ought to work (and already do, we don't want  to make a compat break
here).

Fixes (or at least mitigates) fish-shell#436
@ridiculousfish
Copy link
Member

@faho I'm warming to the idea - how would the user mark a variable as receiving this special treatment? Would splitenv do it?

@faho
Copy link
Member

faho commented Sep 21, 2018

how would the user mark a variable as receiving this special treatment?

My idea was actually to not allow marking at all - just leave it as special behavior for $PATH et al. Which would let us get away from listifying at some point in the future.

However, I've since come to understand that allowing this for other variables helps us with other variables as well - e.g. I've said before that my $EDITOR is set as one element (set EDITOR "emacs -nw") for compatibility with external tools, but fish would like it better if it were a list.

So I'd probably default to space as a delimiter, unless it's a PATH-like (and assuming that it is if the name ends in PATH is probably okay).

Would splitenv do it?

I don't really love introducing another builtin for this, so I'd probably go with the argument-to-set option.

@GReagle
Copy link
Contributor

GReagle commented Sep 28, 2018

I agree that "special-casing PATH/MANPATH/CDPATH is weird; we need a more general solution".

I propose that we STOP special-casing PATH/MANPATH/CDPATH. They would be treated (by the fish end-user) the way they are in other shells. $PATH (and the others) would be a single string (or in fish jargon a list with a length of 1) with colons in it. Note that I am referring to the fish user experience, not how these things are handled internally; I don't know what the implementation inside fish would look like--I rely on others to point out any problems there.

Granted, it would have the disadvantage of a backward incompatibility, but I think it would be worth it as a big gain in simplicity and elegance. I think it would address #2090 too.

What does everyone think?

ridiculousfish added a commit to ridiculousfish/fish-shell that referenced this issue Sep 29, 2018
splitenv takes a variable name, and sets that variable to its value
but split on colons.

Proposed in fish-shell#436
ridiculousfish added a commit to ridiculousfish/fish-shell that referenced this issue Sep 29, 2018
Prior to this fix, fish would export arrays as ASCII record separator
delimited, except for a whitelist (PATH, CDPATH, MANPATH). This is
surprising and awkward for other programs to deal with, and there's no way
to get similar behavior for other variables like GOPATH or LD_LIBRARY_PATH.

This commit does the following:

1. Exports all arrays as colon delimited strings, instead of RS.

2. When importing environment variables, if the variable ends in PATH,
split it on colons (i.e. splitenv); otherwise do not split it.

Colons are not escaped; this is deliberate to support uses like

   `set -x PYTHONPATH "/foo:/bar"`

which ought to work (and already do, we don't want  to make a compat break
here).

Fixes (or at least mitigates) fish-shell#436
ridiculousfish added a commit to ridiculousfish/fish-shell that referenced this issue Oct 1, 2018
splitenv takes a variable name, and sets that variable to its value
but split on colons.

Proposed in fish-shell#436
ridiculousfish added a commit to ridiculousfish/fish-shell that referenced this issue Oct 1, 2018
Prior to this fix, fish would export arrays as ASCII record separator
delimited, except for a whitelist (PATH, CDPATH, MANPATH). This is
surprising and awkward for other programs to deal with, and there's no way
to get similar behavior for other variables like GOPATH or LD_LIBRARY_PATH.

This commit does the following:

1. Exports all arrays as colon delimited strings, instead of RS.

2. When importing environment variables, if the variable ends in PATH,
split it on colons (i.e. splitenv); otherwise do not split it.

Colons are not escaped; this is deliberate to support uses like

   `set -x PYTHONPATH "/foo:/bar"`

which ought to work (and already do, we don't want  to make a compat break
here).

Fixes (or at least mitigates) fish-shell#436
ridiculousfish added a commit to ridiculousfish/fish-shell that referenced this issue Oct 1, 2018
splitenv takes a variable name, and sets that variable to its value
but split on colons.

Proposed in fish-shell#436
ridiculousfish added a commit to ridiculousfish/fish-shell that referenced this issue Oct 1, 2018
Prior to this fix, fish would export arrays as ASCII record separator
delimited, except for a whitelist (PATH, CDPATH, MANPATH). This is
surprising and awkward for other programs to deal with, and there's no way
to get similar behavior for other variables like GOPATH or LD_LIBRARY_PATH.

This commit does the following:

1. Exports all arrays as colon delimited strings, instead of RS.

2. When importing environment variables, if the variable ends in PATH,
split it on colons (i.e. splitenv); otherwise do not split it.

Colons are not escaped; this is deliberate to support uses like

   `set -x PYTHONPATH "/foo:/bar"`

which ought to work (and already do, we don't want  to make a compat break
here).

Fixes (or at least mitigates) fish-shell#436
@faho
Copy link
Member

faho commented Oct 21, 2018

#5245 has been merged, so this seems solved.

@faho faho closed this as completed Oct 21, 2018
paradigm added a commit to bedrocklinux/bedrocklinux-userland that referenced this issue Dec 12, 2018
fish special cases three PATH-style: PATH, CDPATH, and MANPATH.  These
and only these must be set with spaces between items rather than colons.

See:

fish-shell/fish-shell#2198
fish-shell/fish-shell#436
reitzig added a commit to reitzig/sdkman-for-fish that referenced this issue Jul 31, 2019
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.