New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement string escape --style=xxx
and string unescape
#4159
Conversation
We need a way to encode arbitrary strings into valid fish variable names. It would also be nice if we could convert strings to valid URLs without using the slow and hard to understand `__fish_urlencode` function. In particular, eliminating the need to manipulate the locale. Fixes #4150
We now have a builtin that can do URL escaping so use it. I can't find any uses of our private `__fish_urlencode` function in any Oh-My-Fish or Fisherman code so remove it.
// The above characters don't need to be encoded. | ||
out.push_back((wchar_t)c2); | ||
} else { | ||
// All other chars need to have their UTF-8 representation encoded in hex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this interact with other encodings? E.g. what happens if you send it input in SHIFT-JIS or UTF-16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should probably reword that comment. At the moment it will be whatever narrow representation the current locale uses. If it's UTF-16 for example then we'll end up dropping half the bytes. Tough. Anyone not using UTF-8 externally by now is probably used to seeing weird behavior from programs.
|
||
The third is `--style=url` which ensures the string can be used as a URL by hex encoding any character which is not legal in a URL. The string is first converted to UTF-8 before being encoded. | ||
|
||
`string unescape` performs the inverse of the `string escape` command. If the string to be unescaped is not properly formatted it is ignored. For example, doing `string unescape --style=var (string escape --style=var $str)` will return the original string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing the "not properly formatted" means "not valid in the requested escape-style"? I.e. a valid variable name will be returned, but a string that is not valid as a variable name won't be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. string unescape
will correctly handle any output from string escape
. But if you hand it an arbitrary sequence of characters it may or may not return something useful. It won't crash but there is no guarantee you'll get an unescaped string. The alternative is to return the input string but I think returning nothing and setting status to one is better.
I've merged this since I'm reasonably confident the behavior in this change if the value cannot be unescaped is what we want. If people start using it and disagree we can amend the behavior. |
This addresses issues #3543 and #4150.
The one aspect I think needs some discussion is what happens when you
string unescape
an invalid string. At the moment I ignore the invalid string. If it was the only string then the exit status will be one. This mirrors the behavior of the other string commands. The obvious alternative is to return the original string.