Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add string tokenize builtin command #3823

Closed
amosbird opened this issue Feb 7, 2017 · 9 comments
Closed

Add string tokenize builtin command #3823

amosbird opened this issue Feb 7, 2017 · 9 comments

Comments

@amosbird
Copy link
Contributor

@amosbird amosbird commented Feb 7, 2017

Output tokenized history commands. This is useful for tools like fzf to do further filtering and completion.

implemented in #3818 .

@krader1961 krader1961 added this to the fish-future milestone Feb 7, 2017
@krader1961
Copy link
Contributor

@krader1961 krader1961 commented Feb 7, 2017

The initial description could use more details. I searched the fzf issues and couldn't find anything relevant. Will fzf actually use such a feature?

In any event we've received a couple of questions in the recent past for how to produce tokens from a command line in the same way fish does internally. I recall one answer was to abuse commandline to do so but that can be characterized as a hack. It would be better to augment the string command with a new subcommand, tokenize, to do the job. That could then be combined with history output or any other source of command lines.

@amosbird
Copy link
Contributor Author

@amosbird amosbird commented Feb 8, 2017

Currently we can only rely on history-token-search-backward to do arguments completion. For long arguments, interactive fuzzy filtering is more usable than blindly back stepping history tokens one by one. Consider this example,

I use fzf-tmux with the following fish function,

  function fzf-history-token-widget -d "Show command history"
    set -q FZF_TMUX_HEIGHT; or set FZF_TMUX_HEIGHT 40%
    set str (commandline -jc)
    set tok (commandline -tc)
    begin
      set -lx FZF_DEFAULT_OPTS "--height $FZF_TMUX_HEIGHT $FZF_DEFAULT_OPTS +s --tiebreak=index --bind=ctrl-r:toggle-sort $FZF_CTRL_R_OPTS +m"
      if [ $str = $tok ]
        history | eval (__fzfcmd) -q '$str' | read -l result
        and commandline -- $result
      else
        string tokenize -a -n 1000 | eval (__fzfcmd) -- -q '$tok' | read -l result
        and commandline -tr -- $result
      end
    end
    commandline -f repaint
  end

@krader1961 krader1961 added the RFC label Mar 22, 2017
@krader1961
Copy link
Contributor

@krader1961 krader1961 commented Mar 22, 2017

Please do not implement this until a firm design proposal is written. @faho and I just had a conversation on Gitter about the somewhat surprising behavior of commandline --tokenize. We do not want to implement string tokenize as a 100% compatible version of the former. @faho and I will provide more details on the problems with commandline --tokenize in subsequent comments.

@faho
Copy link
Member

@faho faho commented Mar 22, 2017

@krader1961 and I have just discussed some questionable behavior of commandline --tokenize on gitter that are of interest here.

Specifically, commandline --tokenize currently

  • Ignores all "non-string" tokens - this includes pipes, semicolons and newlines

  • Strips quotes from strings (so echo "wurst" becomes echo and wurst)

  • Prints commandsubstitutions as a single token

The questions are:

  • How should these be handled properly?

  • Should we fix it in both string tokenize and commandline --tokenize or first in the former or keep the latter as-is for all time for backwards-compatibility?

I'd argue that ignoring "non-string" tokens is inherently broken, and causes anything that relies on it to be broken as well.

@krader1961
Copy link
Contributor

@krader1961 krader1961 commented Mar 22, 2017

Note that you can trivially explore the current tokenize behavior. Run

function t; echo; printf "|%s|\n" (commandline --tokenize); end
bind \cT t

Then type various commands that include pipes, semicolons, quoted strings, subcommands, etc. and press [ctrl-T] to see how what you have typed is tokenized.

@krader1961
Copy link
Contributor

@krader1961 krader1961 commented Mar 22, 2017

My recommendation is to leave the current commandline --tokenize behavior unchanged even in the next major release but with a warning that it will be changed in the subsequent major release. Implement string tokenize ASAP with whatever improved behavior we decide upon. Then reimplement all of our core uses of commandline --tokenize in terms of a combination of the two commands. Third-party extensions can then use string tokenize for dealing with both the current command and those from the command history.

@mqudsi
Copy link
Contributor

@mqudsi mqudsi commented Mar 6, 2018

@faho @ridiculousfish are we ditching $IFS or did we cancel that idea?

@faho
Copy link
Member

@faho faho commented Mar 6, 2018

are we ditching $IFS or did we cancel that idea?

I still want to. It's a terrible interface. read already gained a -d option that supersedes IFS, so the only missing piece to gain full feature-parity is adding some way to not split command substitutions.

@faho
Copy link
Member

@faho faho commented May 21, 2019

So, I now have a branch that implements string tokenize: https://github.com/faho/fish-shell/tree/string-tokenize

It keeps "special" tokens like redirections and pipes, so you should get a fairly complete representation of the string.

It doesn't go into command substitutions, because our tokenizer itself doesn't.

It uses the --no-quoted option to remove quotes, without it it keeps them (string tokenize -- "echo 'foo'" prints echo and 'foo' - add -n and the latter becomes foo).

But I'm not sure if that's the right way around or useful at all, because I'm not quite sure what this is used for. I know there was one case where I hacked a completion or binding to set commandline to be able to use commandline --tokenize, but I can't find it.

Essentially if you keep the quotes, you can know that it had quotes (useful for the magic paste stuff, I'd imagine). If you strip them, you can more easily pass it to a second tokenize pass, or to a hypothetical expand builtin (#5811)

faho added a commit to faho/fish-shell that referenced this issue Nov 30, 2019
This splits a string into variables according to the shell's
tokenization rules, considering quoting, escaping etc.

This runs an automatic `unescape` on the string so it's presented like
it would be passed to the command. E.g.

    printf '%s\n' a\ b

returns the tokens

printf
%s\n
a b

It might be useful to add another mode "--tokenize-raw" that doesn't
do that, but this seems to be the more useful of the two.

Fixes fish-shell#3823.
@faho faho mentioned this issue Nov 30, 2019
3 tasks
@zanchey zanchey removed the RFC label Dec 2, 2019
@zanchey zanchey removed this from the fish-future milestone Dec 2, 2019
@zanchey zanchey added this to the fish 3.1.0 milestone Dec 2, 2019
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants