Add string builtin #2296

Closed
wants to merge 79 commits into
from

Projects

None yet
@msteed
Contributor
msteed commented Aug 12, 2015

Refs #156.

I think this is ready, with the following exceptions:

  • There are some documentation formatting problems. The lexicon_filter or Doxygen seems to be choking on some of the examples in doc_src/string.txt. I'm not sure how to fix it.
  • The command-substitution magic mentioned in #156 (comment) is not implemented.
  • The Xcode build (and maybe others?) still need to be updated.
@msteed
Contributor
msteed commented Aug 12, 2015

The pcre2 build is failing due to: /home/travis/build/fish-shell/fish-shell/pcre2-10.20/missing: line 81: aclocal-1.15: command not found

@faho
Member
faho commented Aug 12, 2015

It's obviously choking on string split '' abc. An obvious workaround here would be to use double-quotes, i.e. string split "" abc. (Edit: Sorry, doesn't work)

If anyone wants to have a look at what this looks like in real usage, see faho@34a7832 on my personal "string" branch where I've replaced the sed usage in the git completions with what should be equivalent versions.

My questions from that:

  • Do you have any plan for string replace to take multiple PATTERN/REPLACEMENT pairs?
  • Why is the backreference syntax $1 and not \1 like pcre and sed do it?
  • It doesn't seem to do "\t" for a tab character (this has been a frequent annoyance because OSX sed doesn't either but GNU sed does, and since we separate completion options and descriptions for them with a tab this occurs often)
  • The error messages are.... bad - e.g. pacman -Sg | string replace -r '(.*)' '$1 $t Package group' -a (yes, "$t" - I've been trying to get it to print a tab character) gives:

replace: Regular expression match error -49

Are you trying to improve those?

Otherwise, that's some fantastic work, and I look forward to using it in earnest since I've been uncomfortable with adding user input to sed's expressions for a while (e.g. echo $PWD | sed -e "s|^$realhome|~|" $args_pre -e 's-\([^/.]\)[^/]*/-\1/-g' $args_post from prompt_pwd - what happens when $realhome contains a pipe character?).

@msteed
Contributor
msteed commented Aug 12, 2015

Do you have any plan for string replace to take multiple PATTERN/REPLACEMENT pairs?

I have no objection to this but the command line syntax would have to change. What do you suggest?

Why is the backreference syntax $1 and not \1 like pcre and sed do it?

The regex string replace functionality is exactly what pcre2_substitute() makes available, including the $1 syntax.

To get a tab or other escape sequence, you can do the usual fish quote-unquote thing: pacman -Sg | string replace -a -r '^(.*)' '$1'\t'Package group'. Naturally this is not unique to string replace.

The error messages are.... bad

Thanks for the reminder; pcre2 does provide human-friendly error strings so I will replace the error codes with those.

@faho
Member
faho commented Aug 12, 2015

I have no objection to this but the command line syntax would have to change. What do you suggest?

I'd add a flag to specify that the next two arguments are pattern/replacement, something like

string replace -r pattern1 replacement1 -p pattern2 replacement2

where "-r" applies to all pairs, regardless where it is specified, and a "-p" for the first pair is optional. (This means of course that arguments on the commandline need to be protected via "--", but that's already an issue) (It also wouldn't easily work with our current completion syntax, but there's currently not much we could complete anyway)

The regex string replace functionality is exactly what pcre2_substitute() makes available, including the $1 syntax.

All pcre documentation I've found (and all applicable programs I know) use \1, so I'd of course like it if it were different.

To get a tab or other escape sequence, you can do the usual fish quote-unquote thing: pacman -Sg | string replace -a -r '^(.*)' '$1'\t'Package group'. Naturally this is not unique to string replace.

Could you add it? I believe that the current situation is non-intuitive.

Thanks for the reminder; pcre2 does provide human-friendly error strings so I will replace the error codes with those.

Ah, okay.

@msteed
Contributor
msteed commented Aug 12, 2015

I'd add a flag to specify that the next two arguments are pattern/replacement, something like

string replace -r pattern1 replacement1 -p pattern2 replacement2

where "-r" applies to all pairs, regardless where it is specified, and a "-p" for the first pair is optional.

Okay, I like that. Will do.

All pcre documentation I've found (and all applicable programs I know) use \1, so I'd of course like it if it were different.

FYI here is the description of pcre2_substitute(). The disadvantage of changing the syntax is that it means maintaining a modified copy of pcre2_substitute() within fish.

If we automatically honor backslash escapes within replacement strings, I think this will be the only place within fish where strings are treated this way. Is that the right move?

Another alternative to playing with quotes is substitution with echo -e:

pacman -Sg | string replace -a -r '^(.*)' (echo -e '$1\tPackage group')

@terlar
Contributor
terlar commented Aug 12, 2015

The downside with $1 would be if you would want to use double quotes for some reason (like using variables inside).

Then this would conflict with variables. What would happen in this case? I am assuming it would use the variable $1.

string replace -a -r '^(.*)' "$1 $other Something"
@faho
Member
faho commented Aug 12, 2015

If we automatically honor backslash escapes within replacement strings, I think this will be the only place within fish where strings are treated this way. Is that the right move?

It's also already the only place "within fish" that we honor "\w". I believe, especially considering that "sed" already does it, that it's quite intuitive that the "string" "command" accepts different arguments from other commands. Whether it's a built-in or not isn't too visible to users (of course they could check, but they have to explicitly do that).

The disadvantage of changing the syntax is that it means maintaining a modified copy of pcre2_substitute() within fish.

Yeah, that's.... not great. Would it be possible to just change every "$([0-9])" to "(\1)" or is that prohibitively expensive or error-prone? I'm also probably massively over-thinking this - it's not the end of the world if I have to get used to "$1" instead of "\1".

Then this would conflict with variables. What would happen in this case? I am assuming it would use the variable $1.

Yes, it would. Of course using single quotes as much as possible is probably better in general and especially for things like this were you use unusual characters and don't want to constantly escape (which is why it's a blessing that fish usually doesn't do that), but if you were to use double quotes (and I'm guilty of using them too much), then you'd need to escape any "$".

@msteed
Contributor
msteed commented Aug 13, 2015

It's also already the only place "within fish" that we honor "\w". I believe, especially considering that "sed" already does it, that it's quite intuitive that the "string" "command" accepts different arguments from other commands.

Okay, I can see that.

Here is the list of escapes handled by the (undocumented) printf builtin.

 \" = double quote
 \\ = backslash
 \a = alert (bell)
 \b = backspace
 \c = produce no further output
 \e = escape
 \f = form feed
 \n = new line
 \r = carriage return
 \t = horizontal tab
 \v = vertical tab
 \ooo = octal number (ooo is 1 to 3 digits)
 \xhh = hexadecimal number (hhh is 1 to 2 digits)
 \uhhhh = 16-bit Unicode character (hhhh is 4 digits)
 \Uhhhhhhhh = 32-bit Unicode character (hhhhhhhh is 8 digits)

If we go ahead with your suggestion I think it makes sense to handle the same set of escapes. Thoughts?

On a related note, pcre2_substitute() supports references to named capturing groups in the replacement string using $name or ${name} syntax. If we move to the \ syntax for capturing groups, there is an obvious clash between \name and backslash escapes. We probably would want to require the curly braces: \{name} and not \name.

@faho
Member
faho commented Aug 13, 2015

Here is the list of escapes handled by the (undocumented) printf builtin.

(I added that documentation in #2290 - though it was also previously added and removed again)

If we go ahead with your suggestion I think it makes sense to handle the same set of escapes. Thoughts?

Yeah, it's best to stay consistent - though I can't say I've ever seen the need for the bell char, and I don't think "\c" applies here. The unicode escapes are nice, though, as is \n and \t.

On a related note, pcre2_substitute() supports references to named capturing groups in the replacement string using $name or ${name} syntax. If we move to the \ syntax for capturing groups, there is an obvious clash between \name and backslash escapes. We probably would want to require the curly braces: {name} and not \name.

You know what? The more I look at it, the more I see the issues that changing around pcre2 would cause for that one single piece of consistency with something else. It would have been nice if pcre2 didn't choose to use incompatible syntax, but as it stands now I'm beginning to convince myself that we should just keep that as is, as long as it's documented - which you've already done.


There's another thing I've found, though, and that's the behavior without the "-a" option:

builtin complete -Cgit- | string replace -r "^git-([^[:space:]]*).*" "\${1}"

i.e. without the "-a" option, produces this output:

cvsserver
git-shell Programm, 832kB
git-upload-archive Programm, 1,7MB
git-upload-pack Programm, 849kB
git-receive-pack Programm, 1,7MB

It only operates on the first line!

Now, I'd have expected this tool to be line-based, to operate on every line (when given input via stdin), and the "-a" option to be analogous to sed's "g" (as in s/PATTERN/REPLACEMENT/g), so that it then operates multiple times per line.

Is this by design?

@msteed
Contributor
msteed commented Aug 13, 2015

Honestly I simply wasn't thinking of the behavior of line-oriented tools, but that behavior makes more sense. I'll make it so the absence of -a means to operate on the first match in each argument.

I appreciate your feedback. I came at this with a few simple uses cases in mind and it helps to hear from someone wanting to solve different problems.

@pickfire
Contributor

When you remove that, there will be a problem in linux framebuffer which the computer will say that it won't able to print that character and there will always be an error. Look at #2126.

fish: Tried to print invalid wide character string
Member
faho replied Aug 17, 2015

That can be fixed by simply using a UTF-8 locale, which on linux is heavily recommended anyway.

Contributor

I am using a UTF-8 locale, locale | grep -E '(LANG|LC_CTYPE)=(.*\.)?UTF-8' shows:

LANG=en_GB.UTF-8
LC_CTYPE=en_GB.UTF-8

It works in a normal terminal but it does not work in the linux framebuffer.

Member
faho replied Aug 17, 2015

Weird - for me it simply prints a bunch of colored boxes with question marks in them.

Can you open an issue about this so we can take a look and solve it for more than one prompt?

Contributor

Weird - for me it simply prints a bunch of colored boxes with question marks in them.

This is True, as linux framebuffer can only display 256 or 512 characters, it cannot display some character.

Can you open an issue about this so we can take a look and solve it for more than one prompt?

You finally asked that. It is already an issue, look at #2070, it is still an unsolvable, mystery puzzle.

faho and others added some commits Aug 17, 2015
@faho faho Add completions for systemd's localectl cb5d36d
@faho faho Completions: Don't check $cmd[1]
This is already done by fish before calling the completion.

It breaks completion with combiners (#2025) and also with wrappers.

(This does not include git because that's better solved in #2145)
5e555fc
@igalic @zanchey igalic docs/design.hdr: inclusive lanugage
Closes fish-shell/fish-site#25.

Signed-off-by: David Adam <zanchey@ucc.gu.uwa.edu.au>

[skip ci]
34faf76
@ridiculousfish ridiculousfish Remove vi mode indicator from classic_git prompt
It is duplicative of the fish_mode_prompt function

Fixes #2228
2b87705
@ridiculousfish ridiculousfish Remove unused original_pid variable c1b9b27
@ridiculousfish ridiculousfish Rewrite parse_util_unescape_wildcards
Make it simpler, and use wcstring instead of wcsdup
b599046
@msteed msteed Updates after review comments
- make match/replace without -a operate on the first match on each
  argument
- use different exit codes for "no operation performed" and errors, as
  grep does
- refactor regex compile code
- use human-friendly error messages from pcre2
- improve error handling & reporting elsewhere
- add a few tests
- make some doc fixes
- some simplification & cleanup
- fix ci build failure (I hope)
1e34e31
@msteed msteed another attempt to fix the ci build ddb6a2a
@msteed
Contributor
msteed commented Aug 20, 2015

Okay, finally got the CI build of pcre2 squared away.

@faho: I took your suggestions except for accepting multiple pattern/replacement pairs. After working on it for a bit, I concluded that the getopt hackery required to have an option take two arguments, combined with its usual argument permutation behavior, made a reliable implementation more trouble than it was worth. That could be added in the future though.

@ridiculousfish: the problems noted in #2296 (comment) still exist. Otherwise I think this is ready to go.

@faho

"C-style escape sequences like \t"? An example would be useful here because the term might not be that familiar.

@faho faho commented on an outdated diff Aug 20, 2015
doc_src/string.txt
+string join [(-q | --quiet)] SEP [STRING...]
+string trim [(-l | --left)] [(-r | --right)] [(-c | --chars CHARS)]
+ [(-q | --quiet)] [STRING...]
+string escape [(-n | --no-quoted)] [STRING...]
+string match [(-a | --all)] [(-i | --ignore-case)] [(-r | --regex)]
+ [(-n | --index)] [(-q | --quiet)] PATTERN [STRING...]
+string replace [(-a | --all)] [(-i | --ignore-case)] [(-r | --regex)]
+ [(-q | --quiet)] PATTERN REPLACEMENT [STRING...]
+\endfish
+
+
+\subsection string-description Description
+
+`string` performs operations on strings.
+
+STRING arguments are taken from the command line unless standard input is connected to a pipe or a file, in which case they are read from standard input. It is an error to supply STRING arguments on the command line and on standard input.
@faho
faho Aug 20, 2015 Member

"read from standard input one STRING per line" or similar?

@faho
Member
faho commented Aug 20, 2015

Some more stuff I'd like to see (though that probably shouldn't block a merge, I'm antsy to play with this for real):

  • Some way to make escape also escape leading dashes - this is a major source of issues in shellscripting, and a simple way to fix it would be nice
  • Some way to replace multiple PATTERN/REPLACEMENTs in one go - I already mentioned this, but I see the issue
  • Something I forgot.
@msteed
Contributor
msteed commented Aug 20, 2015

@faho: thanks for the suggestions on the documentation.

I am completely okay with further improvements to the string functionality (implemented by me or anyone else), but I would like to see what comes out of testing by a wider audience.

@msteed
Contributor
msteed commented Aug 21, 2015

Update on the executable size:

  • adding the new string code minus regex support: +20KB
  • adding the calls to pcre2_compile() and pcre2_match(): +160KB
  • adding the call to pcre2_substitute(): +50KB

So the total size increase is 230KB. Measurements are from a release build, g++-5.2, Linux, x86_64.

Building string-enabled fish with -Os produces a net 50KB decrease in size over the non-string-enabled fish, so if executable size is a concern that's one area to explore.

@ridiculousfish
Member

In the process of code-reviewing this...beautifully written by the way!

@msteed
Owner

This was inadvertent. Maybe the result of running make depend?

@msteed

Should be std::min(...)

@msteed
Contributor
msteed commented Aug 22, 2015

Thanks, @ridiculousfish, you're too kind!

msteed and others added some commits Aug 23, 2015
@msteed msteed fixes from review
- Makefile.in: restore iwyu target
- regex_replacer_t::replace_matches(): correct size passed to realloc()
45b777e
@faho faho gpg: Complete files for --import 67ed58b
@zanchey zanchey env_universal_common: always pick shmem strategy on Cygwin
Cygwin FIFOs do not support more than one reader, so avoid them on this
platform. An autoconf feature test would be helpful but is tricky to
write.

Closes #2152.
b0504f7
@faho faho Make alt-arrow in iTerm2 do the same thing as elsewhere
nextd-or-forward-word and such

Fixes #1836
7bfad18
@ridiculousfish
Member

Ok, I'm done reviewing. Overall it looks very solid and well tested. Kudos!

The most radical part of the design is the behavior for testing builtin_stdin, i.e. behave differently when on the end of a pipe versus stdin. After playing with it, I am convinced this is absolutely the right design! It feels very natural.

I have some nitpicks that can wait, but here are the things we should address before merging:

  1. Globbing (e.g. string match) supports character classes [a-zA-Z], but we don't support that elsewhere in fish, like in case or argument expansion. Should we add character class support to general wildcard expansion, remove it from string, or can we justify why these should be different?

  2. We unconditionally output a trailing newline. My gut feeling is that this is bad, since it's output that isn't being asked for. I'm having some trouble thinking of a scenario where it could cause problems, but certainly if we enhance string join to avoid splitting in command substitutions, we don't want to introduce extra newlines. So I think we should not, at least in the join case and maybe others.

  3. When reading from stdin, we lose the last line if there is no newline:

    > echo -n abc | string join '_'
    >
    

    BTW in string_get_arg_stdin, it's fine to just read into a std::string until you get a newline, and then use str2wcstring. This is simpler than messing with mbrtowc.

  4. This happens:

    > string match --regex --all 'a*' 'b'
    string match: Regular expression match error: bad offset value
    

    Unless I've misunderstood I think that's a bug.

  5. This happens:

    > cat ~/test.fish
    string join '+' a b
    > source ~/test.fish
    a+b
    > source ~/test.fish < /dev/null
    string join: Too many arguments
    

    The fix is to replace all the isatty(builtin_stdin) checks with builtin_stdin == STDIN_FILENO && isatty(builtin_stdin)

faho and others added some commits Aug 26, 2015
@faho faho {prev,next}d-or-*-word: Fix for multiple lines
Fixes #2333
47b9993
@ridiculousfish ridiculousfish Remove an errant ampersand from the docs
Fixes fish-site issue 26
e0e7325
@msteed msteed some fixes from review
- string_get_arg_stdin(): simplify and don't discard the argument when
  the trailing newline is absent
- fix calls to pcre2 for e.g. string match -r -a 'a*' 'b'
- correct test for args coming from stdin
2ecd24f
@msteed
Contributor
msteed commented Aug 27, 2015

@ridiculousfish: thanks for the review.

Globbing (e.g. string match) supports character classes [a-zA-Z]

I wanted to make use of existing fish code to do glob matching but found that it was tied too closely to matching filesystem paths (not surprisingly). Once I settled on a separate implementation for string match, I added character classes just because I thought it would be useful, but I can't really justify it beyond that. I have seen the suggestion that support for the ? wildcard be removed, so maybe less is more? In any case it would be better if glob matching were implemented in a single place.

We unconditionally output a trailing newline.

Are you referring to cases where an argument coming from stdin lacks a trailing newline? If so I agree that this should be fixed. Otherwise I'm not sure how the behavior of string should change. When is a trailing newline not appropriate?

When reading from stdin, we lose the last line if there is no newline

Fixed! And I simplified string_get_arg_stdin() as you suggested. Thanks.

This happens: [regex error]

Fixed! Thanks.

This happens: [args confusion]

I changed the isatty() tests but I get the same failure. I am still missing something here.

@abackstrom abackstrom Fix spelling
d6c97a6
faho added some commits Aug 17, 2015
@faho faho Improve situation for linux in-kernel VTs (TERM = "linux")
This adds a special colorscheme and prompt function guaranteed to work
on a VT and activates them automatically if $TERM = "linux".

set_color is overridden to only allow the 8 colors VTs have (under the
assumption those are always the same) and the color variables are
shadowed with global ones so they don't pollute our nice capable terms.
f71e877
@faho faho Add a shell suspend function
Squashed commit of the following:

commit ede9e510751497d61ff0e78fd948e901171cf6f9
Merge: 938da30 239d2a2
Author: Fabian Homborg <FHomborg@gmail.com>
Date:   Thu Aug 6 18:47:43 2015 +0200

    Merge branch 'suspend' of https://github.com/mwm/fish-shell into suspend

commit 239d2a2
Author: Mike Meyer <mwm@mired.org>
Date:   Thu Aug 6 11:24:32 2015 -0500

    Handle interactive & login shells, SHLVL checks, and better message.

commit 6334047
Author: Mike Meyer <mwm@mired.org>
Date:   Tue Aug 4 08:53:10 2015 -0500

    Add a description to suspend

commit 080458b
Author: Mike Meyer <mwm@mired.org>
Date:   Tue Aug 4 07:05:17 2015 -0500

    Add a shell suspend function
17c7569
@faho faho Make overriding cnf-handler work
See #1925: This allows users to disable the cnf-logic which can be quite
slow on small hardware (like a raspberry pi).

Squashed commit of the following:

commit 742a59e30d8db24b6bb5067d4204d4b5cc01c1c3
Author: Fabian Homborg <FHomborg@gmail.com>
Date:   Sun Aug 30 18:23:41 2015 +0200

    Erase startup cnf-handler early

    Simplifies the code a bit - in particular it removes the special-casing
    from the startup handler.

commit 638a97e7f31f302b65e044c93c638c03a69e31f5
Author: Fabian Homborg <FHomborg@gmail.com>
Date:   Mon Aug 24 20:14:46 2015 +0200

    Make overriding cnf-handler work

    Do this by renaming the __fish_command_not_found_handler used during
    startup to __fish_startup_command_not_found_handler. That allows us to
    check if __fish_command_not_found_handler has been defined and skip the
    setup of the normal one.

    Now disabling cnf-handling can be done via defining an empty
    __fish_command_not_found_handler in config.fish
2f3123e
@faho faho Fix missing variable expansion $ in psub a17b9fd
@faho faho Revert "Fix missing variable expansion $ in psub"
That change was a bit too eager as the mkfifo route doesn't currently work.

See #1040 and #2052.

This reverts commit a17b9fd.
5043b9d
@faho faho Remove setup outside of fish_prompt from sample prompts
This doesn't work with fish_config.

For terlar and pythonista, remove unnecessary color setting.

For informative+git and pythonista, move variable setup into fish_prompt

Fixes #1141
60089f9
@MarkGriffiths
Contributor

I'll take a look at what's throwing out the lexicon_filter & Doxygen. On first glance it'll just be down to adding some new pattern recognition to the filter to handle some of the unique argument sequencing this introduces.

@zanchey zanchey added the enhancement label Aug 31, 2015
@zanchey zanchey added this to the next-2.x milestone Aug 31, 2015
faho added some commits Sep 1, 2015
@faho faho Add escape sequences for arrows in some linux VTs
Why this is only in some, I don't know, but these don't seem to
interfere with anything.

Fixes #2309
a21e44c
@faho faho Load fish_user_key_bindings for any binding (including vi)
fish_user_key_bindings is the user's, and they should know if they want
vi-ish bindings or emacs-ish (or nano-ish). If they want to define
multiple, they can also do that (e.g. via checking what
$fish_key_bindings is set to).

Fixes #2254

CC @kballard
5f080fc
@ridiculousfish
Member

Just a note that I haven't forgotten this and I'll address msteed's reply

mwm and others added some commits Aug 31, 2015
@mwm @faho mwm Add suspend help page. 0661553
@mwm @faho mwm Add info to suspend help page. 32a3e15
@faho faho Use $VISUAL before $EDITOR in funced
Closes #2268
f3695b9
@jazmit @faho jazmit Added completions for entr 5e1c71b
@coyotebush coyotebush Fix error message for variable used as command
54b6a1c
@jusga

Typo: $SHLVLV -> $SHLVL

Member
faho replied Sep 6, 2015

Thanks, fixed in bd3b4e0.

faho and others added some commits Sep 6, 2015
@faho faho Vcs prompt: Break if vcs isn't installed
Prevents an annoying error message.
Fixes #2363.
cb5511c
@faho faho Suspend docs: Fix typo
While not a huge thing, wrong variable names always carry great
confusion potential.
bd3b4e0
@janernsting janernsting Add completion for git-commit
Modified files are provided for completion
d92c08c
@janernsting janernsting Add file completion for git-reset
Staged files are now offered for completion
787c130
@janernsting janernsting Add missing description
git reset allows for files and branches as completion results
396e01a
@janernsting janernsting Ensure display of modified files for git commit f36d2ff
@janernsting janernsting Complete tags for git-tag only
7f28acc
Chris Pick and others added some commits Aug 22, 2015
@zanchey Chris Pick Use the $TERM value from fish's computed environment for ncurses setup
Previously, the process's inherited $TERM value would be used.
This prevented users from being able to set $TERM in their config.fish files.

To make matters worse, the error message would print the computed $TERM value,
giving the mistaken impression that it was being used.

Signed-off-by: David Adam <zanchey@ucc.gu.uwa.edu.au>
c5bc221
@zanchey zanchey doc_src/complete: update for new options in synopsis
Update complete documentation, hopefully to avoid another #2368.

[ci skip]
8bf1e69
@Fusty
Fusty commented on 8bf1e69 Sep 9, 2015

Excellent, sorry for the bother in the first place!

@ridiculousfish
Member

Ok, sorry for the delay! @msteed

I wanted to make use of existing fish code to do glob matching but found that it was tied too closely to matching filesystem paths (not surprisingly).

It is a maze of twisty little functions all alike, but the one you're looking for is wildcard_match(). That's what's used by e.g. switch and should do what you want.

Otherwise I'm not sure how the behavior of string should change. When is a trailing newline not appropriate?

I want to get to the point where this:

set -l contents (string join \n < file.txt)
echo -n $contents

outputs the contents of file.txt exactly, without any extra newlines. But we need to teach command substitutions how not to split first. Anyways we can defer that until the future, so what you've done now is fine.

I am still missing something here.

No, I was confused, the checks need to be different. builtin_read has to do something different when input is from the tty, but builtin_string has to do something different when it is not the first command in a pipeline. It's not obvious to me how to check for that from the builtin.

I have some changes in development that eliminate the nasty global variables, and instead pass stdin, stdout, etc. directly to the relevant builtins. It makes sense to include the pipeline information there. So let's defer fixing the pipe issue until those changes gets merged.

I'll update the Xcode build. So overall I don't think there's anything blocking merge from me. @zanchey any concerns from you?

This is a huge contribution! Thank you!!

faho added some commits Sep 9, 2015
@faho faho Rename sgrep to __fish_sgrep
Makes it harder to cause issues with aliases, see fish-shell#2245
b85a8bb
@faho faho Add __fish_sgrep
Missed in b85a8bb because of `git commit -a`.

Fixes #2372
bffeb66
@msteed
Contributor
msteed commented Sep 9, 2015

the one you're looking for is wildcard_match()

Okay, I'd like to remove the wildcard matching code from string but it will be a day or two before I can get to it.

@zanchey
Member
zanchey commented Sep 10, 2015

The interface looks great.

I'd like to take a bit of a closer look at the way it's plugged into the main build system, as I think downstream distributors are going to want to be able to pass --use-system-pcre2 or similar. I'm happy to do that after this is merged to master.

faho and others added some commits Sep 10, 2015
@faho faho Also allow bold, underline and printing colors in linux kernel VTs
bold works, printing colors doesn't change anything and underline
doesn't _break_.
40df11b
@faho faho rbenv completion: Remove trailing spaces 2587bbc
@faho faho Allow set_color options in general for linux VTs b231ab7
@faho faho git completion: Ignore stderr for all commands
Might print unrelated crap if we try to complete while not in a git repository
0055673
@faho faho git: Add more options for format-patch and submodule
Not all of them and only those that don't accept arguments for now

Fixes #1996
a828f90
@faho faho git completion: Don't check $cmd[1]
Before we do anything else, remove this senseless piece of code
31d1e04
@ridiculousfish ridiculousfish Don't crash on complete -C in non-interactive mode
Fixes #2361
025b45b
@ridiculousfish ridiculousfish Fix a typo in documentation for 'complete' 9a2ac5f
@msteed msteed use fish's wildcard_match() for glob matching 5f519cb
@msteed msteed add string builtin files
- string builtin source, tests, & docs
- changes to configure.ac & Makefile.in
1a60b93
@msteed msteed add pcre2-10.20 and update license.hdr 9492e7a
@msteed msteed Add missing pcre2 files + .gitignore ed0850e
@msteed msteed fix dependencies for parallel make efd47dc
@msteed msteed Updates after review comments
- make match/replace without -a operate on the first match on each
  argument
- use different exit codes for "no operation performed" and errors, as
  grep does
- refactor regex compile code
- use human-friendly error messages from pcre2
- improve error handling & reporting elsewhere
- add a few tests
- make some doc fixes
- some simplification & cleanup
- fix ci build failure (I hope)
896a2c2
@msteed msteed another attempt to fix the ci build baf4e09
@msteed msteed Minor doc improvements 9ff7477
@msteed msteed fixes from review
- Makefile.in: restore iwyu target
- regex_replacer_t::replace_matches(): correct size passed to realloc()
ece7f35
@msteed msteed some fixes from review
- string_get_arg_stdin(): simplify and don't discard the argument when
  the trailing newline is absent
- fix calls to pcre2 for e.g. string match -r -a 'a*' 'b'
- correct test for args coming from stdin
64c25a0
@msteed msteed use fish's wildcard_match() for glob matching a0ec977
@msteed msteed rebase on master & address the fallout 1a09e70
@msteed msteed Merge branch 'string' of github.com:msteed/fish-shell into string eb20b43
@ridiculousfish
Member

Preparing to merge...any reason not to omit pcre-10.20/doc, which is ~2.1MB? @msteed

@msteed
Contributor
msteed commented Sep 11, 2015

Without those files the pcre2 build fails. I tried unsuccessfully to build the library without the docs. I can look into this further if you like.

@msteed
Contributor
msteed commented Sep 11, 2015

Er, okay, it turns out it wasn't that hard. Yes, we can omit the docs. I'll commit a change shortly to make the build work without those files.

@msteed
Contributor
msteed commented Sep 11, 2015

Can also omit pcre2-10.20/testdata

@ridiculousfish
Member

cool

@msteed
Contributor
msteed commented Sep 12, 2015

@ridiculousfish: if it helps I can close this PR and make a new one with a clean history and no superfluous pcre2 files.

@ridiculousfish
Member

@msteed I'm happy to do a squash merge, to avoid introducing those files into the repo. However if you want to preserve some of your history I can wait for the new PR. Whichever you prefer.

@msteed
Contributor
msteed commented Sep 12, 2015

No worries about history. Squash away!

@ridiculousfish ridiculousfish added a commit that referenced this pull request Sep 12, 2015
@ridiculousfish ridiculousfish Merge new string builtin, from #2296
Squashed commit of the following:

commit 4c3eaeb6e57d76463e9683c327142b0aeafb92b8
Author: ridiculousfish <corydoras@ridiculousfish.com>
Date:   Sat Sep 12 12:51:30 2015 -0700

    Remove testdata and doc dirs from pcre2 source

commit b2a8b4b50f2398b204fb72cfe4b5ba77ece2e1ab
Merge: 11c8a47 7974aab
Author: ridiculousfish <corydoras@ridiculousfish.com>
Date:   Sat Sep 12 12:32:40 2015 -0700

    Merge branch 'string' of git://github.com/msteed/fish-shell into string-test

commit 7974aab
Author: Michael Steed <msteed@saltstack.com>
Date:   Fri Sep 11 13:00:02 2015 -0600

    build pcre2 lib only, no docs

commit eb20b43
Merge: 1a09e70 5f519cb
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Sep 10 20:00:47 2015 -0600

    Merge branch 'string' of github.com:msteed/fish-shell into string

commit 1a09e70
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Sep 10 19:58:24 2015 -0600

    rebase on master & address the fallout

commit a0ec977
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Sep 10 19:26:45 2015 -0600

    use fish's wildcard_match() for glob matching

commit 64c25a0
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Aug 27 08:19:23 2015 -0600

    some fixes from review

    - string_get_arg_stdin(): simplify and don't discard the argument when
      the trailing newline is absent
    - fix calls to pcre2 for e.g. string match -r -a 'a*' 'b'
    - correct test for args coming from stdin

commit ece7f35
Author: Michael Steed <msteed68@gmail.com>
Date:   Sat Aug 22 19:35:56 2015 -0600

    fixes from review

    - Makefile.in: restore iwyu target
    - regex_replacer_t::replace_matches(): correct size passed to realloc()

commit 9ff7477
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Aug 20 13:08:33 2015 -0600

    Minor doc improvements

commit baf4e09
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 19 18:29:02 2015 -0600

    another attempt to fix the ci build

commit 896a2c2
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 19 18:03:49 2015 -0600

    Updates after review comments

    - make match/replace without -a operate on the first match on each
      argument
    - use different exit codes for "no operation performed" and errors, as
      grep does
    - refactor regex compile code
    - use human-friendly error messages from pcre2
    - improve error handling & reporting elsewhere
    - add a few tests
    - make some doc fixes
    - some simplification & cleanup
    - fix ci build failure (I hope)

commit efd47dc
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 12 00:26:07 2015 -0600

    fix dependencies for parallel make

commit ed0850e
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 23:37:22 2015 -0600

    Add missing pcre2 files + .gitignore

commit 9492e7a
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 22:44:05 2015 -0600

    add pcre2-10.20 and update license.hdr

commit 1a60b93
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 22:41:19 2015 -0600

    add string builtin files

    - string builtin source, tests, & docs
    - changes to configure.ac & Makefile.in

commit 5f519cb
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Sep 10 19:26:45 2015 -0600

    use fish's wildcard_match() for glob matching

commit 2ecd24f
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Aug 27 08:19:23 2015 -0600

    some fixes from review

    - string_get_arg_stdin(): simplify and don't discard the argument when
      the trailing newline is absent
    - fix calls to pcre2 for e.g. string match -r -a 'a*' 'b'
    - correct test for args coming from stdin

commit 45b777e
Author: Michael Steed <msteed68@gmail.com>
Date:   Sat Aug 22 19:35:56 2015 -0600

    fixes from review

    - Makefile.in: restore iwyu target
    - regex_replacer_t::replace_matches(): correct size passed to realloc()

commit 981cbb6
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Aug 20 13:08:33 2015 -0600

    Minor doc improvements

commit ddb6a2a
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 19 18:29:02 2015 -0600

    another attempt to fix the ci build

commit 1e34e31
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 19 18:03:49 2015 -0600

    Updates after review comments

    - make match/replace without -a operate on the first match on each
      argument
    - use different exit codes for "no operation performed" and errors, as
      grep does
    - refactor regex compile code
    - use human-friendly error messages from pcre2
    - improve error handling & reporting elsewhere
    - add a few tests
    - make some doc fixes
    - some simplification & cleanup
    - fix ci build failure (I hope)

commit 34232e1
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 12 00:26:07 2015 -0600

    fix dependencies for parallel make

commit 00d7e78
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 23:37:22 2015 -0600

    Add missing pcre2 files + .gitignore

commit 4498aa5
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 22:44:05 2015 -0600

    add pcre2-10.20 and update license.hdr

commit 290c58c
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 22:41:19 2015 -0600

    add string builtin files

    - string builtin source, tests, & docs
    - changes to configure.ac & Makefile.in
85c8bab
@ridiculousfish
Member

I've squash-merged to a new branch string-staging. I'll use that to add support in the Xcode build.

zanchey, you can take a closer look at how it's plugged into the build too on this branch if you like. I also hope to add support for using OS X's libpcre here.

@zanchey
Member
zanchey commented Sep 15, 2015

I think OS X (and most Linux distributions) use the old PCRE API rather than PCRE2, so I'm not sure how straightforward that will be.

Almost no distributions are shipping the PCRE2 libraries yet.

@ridiculousfish
Member

It should be straightforward to conditionally use PCRE1 or PCRE2 depending on what's available at build time.

But OS X doesn't ship any PCRE headers, and they hate it when developers reverse engineer headers. So the OS X build will build the pcre2 library, and link against it statically. This is already working in Xcode on the branch.

@ridiculousfish ridiculousfish added a commit that closed this pull request Sep 21, 2015
@msteed @ridiculousfish msteed + ridiculousfish Merge new string builtin
This adds the new builtin 'string' which supports various string
manipulation and matching algorithms, including PCRE based regular
expressions.

Fixes #2296

Squashed commit of the following:

commit 4c3eaeb6e57d76463e9683c327142b0aeafb92b8
Author: ridiculousfish <corydoras@ridiculousfish.com>
Date:   Sat Sep 12 12:51:30 2015 -0700

    Remove testdata and doc dirs from pcre2 source

commit b2a8b4b50f2398b204fb72cfe4b5ba77ece2e1ab
Merge: 11c8a47 7974aab
Author: ridiculousfish <corydoras@ridiculousfish.com>
Date:   Sat Sep 12 12:32:40 2015 -0700

    Merge branch 'string' of git://github.com/msteed/fish-shell into string-test

commit 7974aab
Author: Michael Steed <msteed@saltstack.com>
Date:   Fri Sep 11 13:00:02 2015 -0600

    build pcre2 lib only, no docs

commit eb20b43
Merge: 1a09e70 5f519cb
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Sep 10 20:00:47 2015 -0600

    Merge branch 'string' of github.com:msteed/fish-shell into string

commit 1a09e70
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Sep 10 19:58:24 2015 -0600

    rebase on master & address the fallout

commit a0ec977
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Sep 10 19:26:45 2015 -0600

    use fish's wildcard_match() for glob matching

commit 64c25a0
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Aug 27 08:19:23 2015 -0600

    some fixes from review

    - string_get_arg_stdin(): simplify and don't discard the argument when
      the trailing newline is absent
    - fix calls to pcre2 for e.g. string match -r -a 'a*' 'b'
    - correct test for args coming from stdin

commit ece7f35
Author: Michael Steed <msteed68@gmail.com>
Date:   Sat Aug 22 19:35:56 2015 -0600

    fixes from review

    - Makefile.in: restore iwyu target
    - regex_replacer_t::replace_matches(): correct size passed to realloc()

commit 9ff7477
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Aug 20 13:08:33 2015 -0600

    Minor doc improvements

commit baf4e09
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 19 18:29:02 2015 -0600

    another attempt to fix the ci build

commit 896a2c2
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 19 18:03:49 2015 -0600

    Updates after review comments

    - make match/replace without -a operate on the first match on each
      argument
    - use different exit codes for "no operation performed" and errors, as
      grep does
    - refactor regex compile code
    - use human-friendly error messages from pcre2
    - improve error handling & reporting elsewhere
    - add a few tests
    - make some doc fixes
    - some simplification & cleanup
    - fix ci build failure (I hope)

commit efd47dc
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 12 00:26:07 2015 -0600

    fix dependencies for parallel make

commit ed0850e
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 23:37:22 2015 -0600

    Add missing pcre2 files + .gitignore

commit 9492e7a
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 22:44:05 2015 -0600

    add pcre2-10.20 and update license.hdr

commit 1a60b93
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 22:41:19 2015 -0600

    add string builtin files

    - string builtin source, tests, & docs
    - changes to configure.ac & Makefile.in

commit 5f519cb
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Sep 10 19:26:45 2015 -0600

    use fish's wildcard_match() for glob matching

commit 2ecd24f
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Aug 27 08:19:23 2015 -0600

    some fixes from review

    - string_get_arg_stdin(): simplify and don't discard the argument when
      the trailing newline is absent
    - fix calls to pcre2 for e.g. string match -r -a 'a*' 'b'
    - correct test for args coming from stdin

commit 45b777e
Author: Michael Steed <msteed68@gmail.com>
Date:   Sat Aug 22 19:35:56 2015 -0600

    fixes from review

    - Makefile.in: restore iwyu target
    - regex_replacer_t::replace_matches(): correct size passed to realloc()

commit 981cbb6
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Aug 20 13:08:33 2015 -0600

    Minor doc improvements

commit ddb6a2a
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 19 18:29:02 2015 -0600

    another attempt to fix the ci build

commit 1e34e31
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 19 18:03:49 2015 -0600

    Updates after review comments

    - make match/replace without -a operate on the first match on each
      argument
    - use different exit codes for "no operation performed" and errors, as
      grep does
    - refactor regex compile code
    - use human-friendly error messages from pcre2
    - improve error handling & reporting elsewhere
    - add a few tests
    - make some doc fixes
    - some simplification & cleanup
    - fix ci build failure (I hope)

commit 34232e1
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 12 00:26:07 2015 -0600

    fix dependencies for parallel make

commit 00d7e78
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 23:37:22 2015 -0600

    Add missing pcre2 files + .gitignore

commit 4498aa5
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 22:44:05 2015 -0600

    add pcre2-10.20 and update license.hdr

commit 290c58c
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 22:41:19 2015 -0600

    add string builtin files

    - string builtin source, tests, & docs
    - changes to configure.ac & Makefile.in
d83ef07
@ridiculousfish
Member

This has been merged as 1883e05, with the squash-commit d83ef07 credited to @msteed.

A big thanks to msteed for pushing this through, both design and implementation! It's not easy to design a piece this big, and I'm totally thrilled with how it turned out!

@ghost
ghost commented Sep 21, 2015

🎉 Wonderful news.

@msteed
Contributor
msteed commented Sep 22, 2015

Excellent!

Thanks to @ridiculousfish and @faho for the careful reviews, helpful feedback, and many improvements. Thanks to @kballard for the original interface design. And thanks to everyone who contributed to the discussion on #156.

@zanchey
Member
zanchey commented Sep 22, 2015

Awesome!

Some more work needs to be done on the integration with the autotools build - requiring autoreconf prevents building on RHEL 5 due to too-old autoconf, and as our tarballs aren't marked as depending on aclocal the build also fails on openSUSE. I'll try and take a look in the next few days.

@ridiculousfish
Member

The change c1bd3b5 fixes issue 5 in my list, closing the loop on that.

@faho faho added the releasenotes label Oct 26, 2015
@danielb2

What version of fish is this in? It's really difficult to tell when the same milestones (next-2.x) keeps being re-used. Any reason we're doing it that way?

@faho
Member
faho commented Dec 16, 2015

Since it's in the next-2.x milestone, that means it's not released yet, so it's in whatever comes after 2.2 (which AFAIK hasn't been decided yet, though I'd quite like it if it were 2.3 since I have a thing for bad movies).

@danielb2

It looks like the next-2.x milestone bucket is being reused, for 2.0, 2.1 and 2.2. That means I couldn't tell which release it was in and this ticket is all the way back in September and I have no idea when 2.2 was released. Iow, was this ticket filed before 2.2 was released or not.

Am I missing something? I think my concern here is legit.

@faho
Member
faho commented Dec 17, 2015

@danielb2: All our releases are git tags. If you wish to know which tag contains a given commit (i.e. "came after" it), use git tag --contains COMMITHASH. Github will also show you on the commit page.

AFAIK, on the last release the "next-2.x" milestone was closed renamed "fish 2.2.0" and then another with the "next-2.x" name opened, so anything with that milestone should not be in a release.

@ridiculousfish
Member

faho is right: milestones are never reused across releases.

@danielb2

help string doesn't show any of the docs for me. Can I be doing something wrong?

@faho
Member
faho commented Dec 17, 2015

@danielb2: You need doxygen to build the docs. You can also try man string and string --help in case that isn't it.

@danielb2

thank you

@danielb2
tt is a function with definition
function tt
  set name (pwd)
  set name (string replace . - $name)
  tmux ls -F '#{session_name}' | grep -q "^$name\$"
  if test $status -eq 0
    tmux attach -t $name
  else
    tmux new -s $name
  end
end

there. named tmux handling based on dir name. Thanks @msteed! I was using ruby to do this before. All fish now :)

@msteed
Contributor
msteed commented Dec 17, 2015

@danielb2: Cool! Note that you can also replace grep -q <pattern> with string match -q -r <pattern>.

@danielb2

oh. sweet! thanks :)

@danielb2

@faho btw man string and string --help didn't work. Does it mean I have to have doxygen installed before I compile?

@faho
Member
faho commented Dec 17, 2015

Does it mean I have to have doxygen installed before I compile?

Effectively yes. You need doxygen to transform the documentation from our input format to the output formats (the website, the man pages and the "--help" output). This is done as part of the build process.

You do not need it just to view the documentation. Doxygen is a build-time dependency.

It also says so in README.md under the heading "Building":

Building the documentation requires Doxygen 1.8.7 or newer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment