Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add string builtin #2296

Closed
wants to merge 79 commits into from
Closed

Add string builtin #2296

wants to merge 79 commits into from

Conversation

msteed
Copy link
Contributor

@msteed msteed commented Aug 12, 2015

Refs #156.

I think this is ready, with the following exceptions:

  • There are some documentation formatting problems. The lexicon_filter or Doxygen seems to be choking on some of the examples in doc_src/string.txt. I'm not sure how to fix it.
  • The command-substitution magic mentioned in Want variable string manipulation #156 (comment) is not implemented.
  • The Xcode build (and maybe others?) still need to be updated.

@msteed
Copy link
Contributor Author

msteed commented Aug 12, 2015

The pcre2 build is failing due to: /home/travis/build/fish-shell/fish-shell/pcre2-10.20/missing: line 81: aclocal-1.15: command not found

@faho
Copy link
Member

faho commented Aug 12, 2015

It's obviously choking on string split '' abc. An obvious workaround here would be to use double-quotes, i.e. string split "" abc. (Edit: Sorry, doesn't work)

If anyone wants to have a look at what this looks like in real usage, see faho/fish-shell@34a7832 on my personal "string" branch where I've replaced the sed usage in the git completions with what should be equivalent versions.

My questions from that:

  • Do you have any plan for string replace to take multiple PATTERN/REPLACEMENT pairs?
  • Why is the backreference syntax $1 and not \1 like pcre and sed do it?
  • It doesn't seem to do "\t" for a tab character (this has been a frequent annoyance because OSX sed doesn't either but GNU sed does, and since we separate completion options and descriptions for them with a tab this occurs often)
  • The error messages are.... bad - e.g. pacman -Sg | string replace -r '(.*)' '$1 $t Package group' -a (yes, "$t" - I've been trying to get it to print a tab character) gives:

replace: Regular expression match error -49

Are you trying to improve those?

Otherwise, that's some fantastic work, and I look forward to using it in earnest since I've been uncomfortable with adding user input to sed's expressions for a while (e.g. echo $PWD | sed -e "s|^$realhome|~|" $args_pre -e 's-\([^/.]\)[^/]*/-\1/-g' $args_post from prompt_pwd - what happens when $realhome contains a pipe character?).

@msteed
Copy link
Contributor Author

msteed commented Aug 12, 2015

Do you have any plan for string replace to take multiple PATTERN/REPLACEMENT pairs?

I have no objection to this but the command line syntax would have to change. What do you suggest?

Why is the backreference syntax $1 and not \1 like pcre and sed do it?

The regex string replace functionality is exactly what pcre2_substitute() makes available, including the $1 syntax.

To get a tab or other escape sequence, you can do the usual fish quote-unquote thing: pacman -Sg | string replace -a -r '^(.*)' '$1'\t'Package group'. Naturally this is not unique to string replace.

The error messages are.... bad

Thanks for the reminder; pcre2 does provide human-friendly error strings so I will replace the error codes with those.

@faho
Copy link
Member

faho commented Aug 12, 2015

I have no objection to this but the command line syntax would have to change. What do you suggest?

I'd add a flag to specify that the next two arguments are pattern/replacement, something like

string replace -r pattern1 replacement1 -p pattern2 replacement2

where "-r" applies to all pairs, regardless where it is specified, and a "-p" for the first pair is optional. (This means of course that arguments on the commandline need to be protected via "--", but that's already an issue) (It also wouldn't easily work with our current completion syntax, but there's currently not much we could complete anyway)

The regex string replace functionality is exactly what pcre2_substitute() makes available, including the $1 syntax.

All pcre documentation I've found (and all applicable programs I know) use \1, so I'd of course like it if it were different.

To get a tab or other escape sequence, you can do the usual fish quote-unquote thing: pacman -Sg | string replace -a -r '^(.*)' '$1'\t'Package group'. Naturally this is not unique to string replace.

Could you add it? I believe that the current situation is non-intuitive.

Thanks for the reminder; pcre2 does provide human-friendly error strings so I will replace the error codes with those.

Ah, okay.

@msteed
Copy link
Contributor Author

msteed commented Aug 12, 2015

I'd add a flag to specify that the next two arguments are pattern/replacement, something like

string replace -r pattern1 replacement1 -p pattern2 replacement2

where "-r" applies to all pairs, regardless where it is specified, and a "-p" for the first pair is optional.

Okay, I like that. Will do.

All pcre documentation I've found (and all applicable programs I know) use \1, so I'd of course like it if it were different.

FYI here is the description of pcre2_substitute(). The disadvantage of changing the syntax is that it means maintaining a modified copy of pcre2_substitute() within fish.

If we automatically honor backslash escapes within replacement strings, I think this will be the only place within fish where strings are treated this way. Is that the right move?

Another alternative to playing with quotes is substitution with echo -e:

pacman -Sg | string replace -a -r '^(.*)' (echo -e '$1\tPackage group')

@terlar
Copy link
Contributor

terlar commented Aug 12, 2015

The downside with $1 would be if you would want to use double quotes for some reason (like using variables inside).

Then this would conflict with variables. What would happen in this case? I am assuming it would use the variable $1.

string replace -a -r '^(.*)' "$1 $other Something"

@faho
Copy link
Member

faho commented Aug 12, 2015

If we automatically honor backslash escapes within replacement strings, I think this will be the only place within fish where strings are treated this way. Is that the right move?

It's also already the only place "within fish" that we honor "\w". I believe, especially considering that "sed" already does it, that it's quite intuitive that the "string" "command" accepts different arguments from other commands. Whether it's a built-in or not isn't too visible to users (of course they could check, but they have to explicitly do that).

The disadvantage of changing the syntax is that it means maintaining a modified copy of pcre2_substitute() within fish.

Yeah, that's.... not great. Would it be possible to just change every "$([0-9])" to "(\1)" or is that prohibitively expensive or error-prone? I'm also probably massively over-thinking this - it's not the end of the world if I have to get used to "$1" instead of "\1".

Then this would conflict with variables. What would happen in this case? I am assuming it would use the variable $1.

Yes, it would. Of course using single quotes as much as possible is probably better in general and especially for things like this were you use unusual characters and don't want to constantly escape (which is why it's a blessing that fish usually doesn't do that), but if you were to use double quotes (and I'm guilty of using them too much), then you'd need to escape any "$".

@msteed
Copy link
Contributor Author

msteed commented Aug 13, 2015

It's also already the only place "within fish" that we honor "\w". I believe, especially considering that "sed" already does it, that it's quite intuitive that the "string" "command" accepts different arguments from other commands.

Okay, I can see that.

Here is the list of escapes handled by the (undocumented) printf builtin.

 \" = double quote
 \\ = backslash
 \a = alert (bell)
 \b = backspace
 \c = produce no further output
 \e = escape
 \f = form feed
 \n = new line
 \r = carriage return
 \t = horizontal tab
 \v = vertical tab
 \ooo = octal number (ooo is 1 to 3 digits)
 \xhh = hexadecimal number (hhh is 1 to 2 digits)
 \uhhhh = 16-bit Unicode character (hhhh is 4 digits)
 \Uhhhhhhhh = 32-bit Unicode character (hhhhhhhh is 8 digits)

If we go ahead with your suggestion I think it makes sense to handle the same set of escapes. Thoughts?

On a related note, pcre2_substitute() supports references to named capturing groups in the replacement string using $name or ${name} syntax. If we move to the \ syntax for capturing groups, there is an obvious clash between \name and backslash escapes. We probably would want to require the curly braces: \{name} and not \name.

@faho
Copy link
Member

faho commented Aug 13, 2015

Here is the list of escapes handled by the (undocumented) printf builtin.

(I added that documentation in #2290 - though it was also previously added and removed again)

If we go ahead with your suggestion I think it makes sense to handle the same set of escapes. Thoughts?

Yeah, it's best to stay consistent - though I can't say I've ever seen the need for the bell char, and I don't think "\c" applies here. The unicode escapes are nice, though, as is \n and \t.

On a related note, pcre2_substitute() supports references to named capturing groups in the replacement string using $name or ${name} syntax. If we move to the \ syntax for capturing groups, there is an obvious clash between \name and backslash escapes. We probably would want to require the curly braces: {name} and not \name.

You know what? The more I look at it, the more I see the issues that changing around pcre2 would cause for that one single piece of consistency with something else. It would have been nice if pcre2 didn't choose to use incompatible syntax, but as it stands now I'm beginning to convince myself that we should just keep that as is, as long as it's documented - which you've already done.


There's another thing I've found, though, and that's the behavior without the "-a" option:

builtin complete -Cgit- | string replace -r "^git-([^[:space:]]*).*" "\${1}"

i.e. without the "-a" option, produces this output:

cvsserver
git-shell Programm, 832kB
git-upload-archive Programm, 1,7MB
git-upload-pack Programm, 849kB
git-receive-pack Programm, 1,7MB

It only operates on the first line!

Now, I'd have expected this tool to be line-based, to operate on every line (when given input via stdin), and the "-a" option to be analogous to sed's "g" (as in s/PATTERN/REPLACEMENT/g), so that it then operates multiple times per line.

Is this by design?

@msteed
Copy link
Contributor Author

msteed commented Aug 13, 2015

Honestly I simply wasn't thinking of the behavior of line-oriented tools, but that behavior makes more sense. I'll make it so the absence of -a means to operate on the first match in each argument.

I appreciate your feedback. I came at this with a few simple uses cases in mind and it helps to hear from someone wanting to solve different problems.

faho and others added 14 commits August 14, 2015 18:13
Match the whole real home directory in prompt_pwd.
Avoids differences in widths of wchar_t, hopefully addressing
issue fish-shell#2284
This is already done by fish before calling the completion.

It breaks completion with combiners (fish-shell#2025) and also with wrappers.

(This does not include git because that's better solved in fish-shell#2145)
Closes fish-shell/fish-site#25.

Signed-off-by: David Adam <zanchey@ucc.gu.uwa.edu.au>

[skip ci]
It is duplicative of the fish_mode_prompt function

Fixes fish-shell#2228
Make it simpler, and use wcstring instead of wcsdup
- make match/replace without -a operate on the first match on each
  argument
- use different exit codes for "no operation performed" and errors, as
  grep does
- refactor regex compile code
- use human-friendly error messages from pcre2
- improve error handling & reporting elsewhere
- add a few tests
- make some doc fixes
- some simplification & cleanup
- fix ci build failure (I hope)
@msteed
Copy link
Contributor Author

msteed commented Aug 20, 2015

Okay, finally got the CI build of pcre2 squared away.

@faho: I took your suggestions except for accepting multiple pattern/replacement pairs. After working on it for a bit, I concluded that the getopt hackery required to have an option take two arguments, combined with its usual argument permutation behavior, made a reliable implementation more trouble than it was worth. That could be added in the future though.

@ridiculousfish: the problems noted in #2296 (comment) still exist. Otherwise I think this is ready to go.

ridiculousfish added a commit that referenced this pull request Sep 12, 2015
Squashed commit of the following:

commit 4c3eaeb6e57d76463e9683c327142b0aeafb92b8
Author: ridiculousfish <corydoras@ridiculousfish.com>
Date:   Sat Sep 12 12:51:30 2015 -0700

    Remove testdata and doc dirs from pcre2 source

commit b2a8b4b50f2398b204fb72cfe4b5ba77ece2e1ab
Merge: 11c8a47 7974aab
Author: ridiculousfish <corydoras@ridiculousfish.com>
Date:   Sat Sep 12 12:32:40 2015 -0700

    Merge branch 'string' of git://github.com/msteed/fish-shell into string-test

commit 7974aab
Author: Michael Steed <msteed@saltstack.com>
Date:   Fri Sep 11 13:00:02 2015 -0600

    build pcre2 lib only, no docs

commit eb20b43
Merge: 1a09e70 5f519cb
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Sep 10 20:00:47 2015 -0600

    Merge branch 'string' of github.com:msteed/fish-shell into string

commit 1a09e70
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Sep 10 19:58:24 2015 -0600

    rebase on master & address the fallout

commit a0ec977
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Sep 10 19:26:45 2015 -0600

    use fish's wildcard_match() for glob matching

commit 64c25a0
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Aug 27 08:19:23 2015 -0600

    some fixes from review

    - string_get_arg_stdin(): simplify and don't discard the argument when
      the trailing newline is absent
    - fix calls to pcre2 for e.g. string match -r -a 'a*' 'b'
    - correct test for args coming from stdin

commit ece7f35
Author: Michael Steed <msteed68@gmail.com>
Date:   Sat Aug 22 19:35:56 2015 -0600

    fixes from review

    - Makefile.in: restore iwyu target
    - regex_replacer_t::replace_matches(): correct size passed to realloc()

commit 9ff7477
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Aug 20 13:08:33 2015 -0600

    Minor doc improvements

commit baf4e09
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 19 18:29:02 2015 -0600

    another attempt to fix the ci build

commit 896a2c2
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 19 18:03:49 2015 -0600

    Updates after review comments

    - make match/replace without -a operate on the first match on each
      argument
    - use different exit codes for "no operation performed" and errors, as
      grep does
    - refactor regex compile code
    - use human-friendly error messages from pcre2
    - improve error handling & reporting elsewhere
    - add a few tests
    - make some doc fixes
    - some simplification & cleanup
    - fix ci build failure (I hope)

commit efd47dc
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 12 00:26:07 2015 -0600

    fix dependencies for parallel make

commit ed0850e
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 23:37:22 2015 -0600

    Add missing pcre2 files + .gitignore

commit 9492e7a
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 22:44:05 2015 -0600

    add pcre2-10.20 and update license.hdr

commit 1a60b93
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 22:41:19 2015 -0600

    add string builtin files

    - string builtin source, tests, & docs
    - changes to configure.ac & Makefile.in

commit 5f519cb
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Sep 10 19:26:45 2015 -0600

    use fish's wildcard_match() for glob matching

commit 2ecd24f
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Aug 27 08:19:23 2015 -0600

    some fixes from review

    - string_get_arg_stdin(): simplify and don't discard the argument when
      the trailing newline is absent
    - fix calls to pcre2 for e.g. string match -r -a 'a*' 'b'
    - correct test for args coming from stdin

commit 45b777e
Author: Michael Steed <msteed68@gmail.com>
Date:   Sat Aug 22 19:35:56 2015 -0600

    fixes from review

    - Makefile.in: restore iwyu target
    - regex_replacer_t::replace_matches(): correct size passed to realloc()

commit 981cbb6
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Aug 20 13:08:33 2015 -0600

    Minor doc improvements

commit ddb6a2a
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 19 18:29:02 2015 -0600

    another attempt to fix the ci build

commit 1e34e31
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 19 18:03:49 2015 -0600

    Updates after review comments

    - make match/replace without -a operate on the first match on each
      argument
    - use different exit codes for "no operation performed" and errors, as
      grep does
    - refactor regex compile code
    - use human-friendly error messages from pcre2
    - improve error handling & reporting elsewhere
    - add a few tests
    - make some doc fixes
    - some simplification & cleanup
    - fix ci build failure (I hope)

commit 34232e1
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 12 00:26:07 2015 -0600

    fix dependencies for parallel make

commit 00d7e78
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 23:37:22 2015 -0600

    Add missing pcre2 files + .gitignore

commit 4498aa5
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 22:44:05 2015 -0600

    add pcre2-10.20 and update license.hdr

commit 290c58c
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 22:41:19 2015 -0600

    add string builtin files

    - string builtin source, tests, & docs
    - changes to configure.ac & Makefile.in
@ridiculousfish
Copy link
Member

I've squash-merged to a new branch string-staging. I'll use that to add support in the Xcode build.

zanchey, you can take a closer look at how it's plugged into the build too on this branch if you like. I also hope to add support for using OS X's libpcre here.

@zanchey
Copy link
Member

zanchey commented Sep 15, 2015

I think OS X (and most Linux distributions) use the old PCRE API rather than PCRE2, so I'm not sure how straightforward that will be.

Almost no distributions are shipping the PCRE2 libraries yet.

@ridiculousfish
Copy link
Member

It should be straightforward to conditionally use PCRE1 or PCRE2 depending on what's available at build time.

But OS X doesn't ship any PCRE headers, and they hate it when developers reverse engineer headers. So the OS X build will build the pcre2 library, and link against it statically. This is already working in Xcode on the branch.

@ridiculousfish
Copy link
Member

This has been merged as 1883e05, with the squash-commit d83ef07 credited to @msteed.

A big thanks to msteed for pushing this through, both design and implementation! It's not easy to design a piece this big, and I'm totally thrilled with how it turned out!

@ghost
Copy link

ghost commented Sep 21, 2015

🎉 Wonderful news.

@msteed
Copy link
Contributor Author

msteed commented Sep 22, 2015

Excellent!

Thanks to @ridiculousfish and @faho for the careful reviews, helpful feedback, and many improvements. Thanks to @kballard for the original interface design. And thanks to everyone who contributed to the discussion on #156.

@zanchey
Copy link
Member

zanchey commented Sep 22, 2015

Awesome!

Some more work needs to be done on the integration with the autotools build - requiring autoreconf prevents building on RHEL 5 due to too-old autoconf, and as our tarballs aren't marked as depending on aclocal the build also fails on openSUSE. I'll try and take a look in the next few days.

@ridiculousfish
Copy link
Member

The change c1bd3b5 fixes issue 5 in my list, closing the loop on that.

@faho faho added the release notes Something that is or should be mentioned in the release notes label Oct 26, 2015
@danielb2
Copy link
Contributor

What version of fish is this in? It's really difficult to tell when the same milestones (next-2.x) keeps being re-used. Any reason we're doing it that way?

@faho
Copy link
Member

faho commented Dec 16, 2015

Since it's in the next-2.x milestone, that means it's not released yet, so it's in whatever comes after 2.2 (which AFAIK hasn't been decided yet, though I'd quite like it if it were 2.3 since I have a thing for bad movies).

@danielb2
Copy link
Contributor

It looks like the next-2.x milestone bucket is being reused, for 2.0, 2.1 and 2.2. That means I couldn't tell which release it was in and this ticket is all the way back in September and I have no idea when 2.2 was released. Iow, was this ticket filed before 2.2 was released or not.

Am I missing something? I think my concern here is legit.

@faho
Copy link
Member

faho commented Dec 17, 2015

@danielb2: All our releases are git tags. If you wish to know which tag contains a given commit (i.e. "came after" it), use git tag --contains COMMITHASH. Github will also show you on the commit page.

AFAIK, on the last release the "next-2.x" milestone was closed renamed "fish 2.2.0" and then another with the "next-2.x" name opened, so anything with that milestone should not be in a release.

@ridiculousfish
Copy link
Member

faho is right: milestones are never reused across releases.

@danielb2
Copy link
Contributor

help string doesn't show any of the docs for me. Can I be doing something wrong?

@faho
Copy link
Member

faho commented Dec 17, 2015

@danielb2: You need doxygen to build the docs. You can also try man string and string --help in case that isn't it.

@danielb2
Copy link
Contributor

thank you

@danielb2
Copy link
Contributor

tt is a function with definition
function tt
  set name (pwd)
  set name (string replace . - $name)
  tmux ls -F '#{session_name}' | grep -q "^$name\$"
  if test $status -eq 0
    tmux attach -t $name
  else
    tmux new -s $name
  end
end

there. named tmux handling based on dir name. Thanks @msteed! I was using ruby to do this before. All fish now :)

@msteed
Copy link
Contributor Author

msteed commented Dec 17, 2015

@danielb2: Cool! Note that you can also replace grep -q <pattern> with string match -q -r <pattern>.

@danielb2
Copy link
Contributor

oh. sweet! thanks :)

@danielb2
Copy link
Contributor

@faho btw man string and string --help didn't work. Does it mean I have to have doxygen installed before I compile?

@faho
Copy link
Member

faho commented Dec 17, 2015

Does it mean I have to have doxygen installed before I compile?

Effectively yes. You need doxygen to transform the documentation from our input format to the output formats (the website, the man pages and the "--help" output). This is done as part of the build process.

You do not need it just to view the documentation. Doxygen is a build-time dependency.

It also says so in README.md under the heading "Building":

Building the documentation requires Doxygen 1.8.7 or newer.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement release notes Something that is or should be mentioned in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet