New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add string builtin #2296

Closed
wants to merge 79 commits into
base: master
from

Conversation

Projects
None yet
@msteed
Contributor

msteed commented Aug 12, 2015

Refs #156.

I think this is ready, with the following exceptions:

  • There are some documentation formatting problems. The lexicon_filter or Doxygen seems to be choking on some of the examples in doc_src/string.txt. I'm not sure how to fix it.
  • The command-substitution magic mentioned in #156 (comment) is not implemented.
  • The Xcode build (and maybe others?) still need to be updated.
@msteed

This comment has been minimized.

Show comment
Hide comment
@msteed

msteed Aug 12, 2015

Contributor

The pcre2 build is failing due to: /home/travis/build/fish-shell/fish-shell/pcre2-10.20/missing: line 81: aclocal-1.15: command not found

Contributor

msteed commented Aug 12, 2015

The pcre2 build is failing due to: /home/travis/build/fish-shell/fish-shell/pcre2-10.20/missing: line 81: aclocal-1.15: command not found

@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Aug 12, 2015

Member

It's obviously choking on string split '' abc. An obvious workaround here would be to use double-quotes, i.e. string split "" abc. (Edit: Sorry, doesn't work)

If anyone wants to have a look at what this looks like in real usage, see faho/fish-shell@34a7832 on my personal "string" branch where I've replaced the sed usage in the git completions with what should be equivalent versions.

My questions from that:

  • Do you have any plan for string replace to take multiple PATTERN/REPLACEMENT pairs?
  • Why is the backreference syntax $1 and not \1 like pcre and sed do it?
  • It doesn't seem to do "\t" for a tab character (this has been a frequent annoyance because OSX sed doesn't either but GNU sed does, and since we separate completion options and descriptions for them with a tab this occurs often)
  • The error messages are.... bad - e.g. pacman -Sg | string replace -r '(.*)' '$1 $t Package group' -a (yes, "$t" - I've been trying to get it to print a tab character) gives:

replace: Regular expression match error -49

Are you trying to improve those?

Otherwise, that's some fantastic work, and I look forward to using it in earnest since I've been uncomfortable with adding user input to sed's expressions for a while (e.g. echo $PWD | sed -e "s|^$realhome|~|" $args_pre -e 's-\([^/.]\)[^/]*/-\1/-g' $args_post from prompt_pwd - what happens when $realhome contains a pipe character?).

Member

faho commented Aug 12, 2015

It's obviously choking on string split '' abc. An obvious workaround here would be to use double-quotes, i.e. string split "" abc. (Edit: Sorry, doesn't work)

If anyone wants to have a look at what this looks like in real usage, see faho/fish-shell@34a7832 on my personal "string" branch where I've replaced the sed usage in the git completions with what should be equivalent versions.

My questions from that:

  • Do you have any plan for string replace to take multiple PATTERN/REPLACEMENT pairs?
  • Why is the backreference syntax $1 and not \1 like pcre and sed do it?
  • It doesn't seem to do "\t" for a tab character (this has been a frequent annoyance because OSX sed doesn't either but GNU sed does, and since we separate completion options and descriptions for them with a tab this occurs often)
  • The error messages are.... bad - e.g. pacman -Sg | string replace -r '(.*)' '$1 $t Package group' -a (yes, "$t" - I've been trying to get it to print a tab character) gives:

replace: Regular expression match error -49

Are you trying to improve those?

Otherwise, that's some fantastic work, and I look forward to using it in earnest since I've been uncomfortable with adding user input to sed's expressions for a while (e.g. echo $PWD | sed -e "s|^$realhome|~|" $args_pre -e 's-\([^/.]\)[^/]*/-\1/-g' $args_post from prompt_pwd - what happens when $realhome contains a pipe character?).

@msteed

This comment has been minimized.

Show comment
Hide comment
@msteed

msteed Aug 12, 2015

Contributor

Do you have any plan for string replace to take multiple PATTERN/REPLACEMENT pairs?

I have no objection to this but the command line syntax would have to change. What do you suggest?

Why is the backreference syntax $1 and not \1 like pcre and sed do it?

The regex string replace functionality is exactly what pcre2_substitute() makes available, including the $1 syntax.

To get a tab or other escape sequence, you can do the usual fish quote-unquote thing: pacman -Sg | string replace -a -r '^(.*)' '$1'\t'Package group'. Naturally this is not unique to string replace.

The error messages are.... bad

Thanks for the reminder; pcre2 does provide human-friendly error strings so I will replace the error codes with those.

Contributor

msteed commented Aug 12, 2015

Do you have any plan for string replace to take multiple PATTERN/REPLACEMENT pairs?

I have no objection to this but the command line syntax would have to change. What do you suggest?

Why is the backreference syntax $1 and not \1 like pcre and sed do it?

The regex string replace functionality is exactly what pcre2_substitute() makes available, including the $1 syntax.

To get a tab or other escape sequence, you can do the usual fish quote-unquote thing: pacman -Sg | string replace -a -r '^(.*)' '$1'\t'Package group'. Naturally this is not unique to string replace.

The error messages are.... bad

Thanks for the reminder; pcre2 does provide human-friendly error strings so I will replace the error codes with those.

@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Aug 12, 2015

Member

I have no objection to this but the command line syntax would have to change. What do you suggest?

I'd add a flag to specify that the next two arguments are pattern/replacement, something like

string replace -r pattern1 replacement1 -p pattern2 replacement2

where "-r" applies to all pairs, regardless where it is specified, and a "-p" for the first pair is optional. (This means of course that arguments on the commandline need to be protected via "--", but that's already an issue) (It also wouldn't easily work with our current completion syntax, but there's currently not much we could complete anyway)

The regex string replace functionality is exactly what pcre2_substitute() makes available, including the $1 syntax.

All pcre documentation I've found (and all applicable programs I know) use \1, so I'd of course like it if it were different.

To get a tab or other escape sequence, you can do the usual fish quote-unquote thing: pacman -Sg | string replace -a -r '^(.*)' '$1'\t'Package group'. Naturally this is not unique to string replace.

Could you add it? I believe that the current situation is non-intuitive.

Thanks for the reminder; pcre2 does provide human-friendly error strings so I will replace the error codes with those.

Ah, okay.

Member

faho commented Aug 12, 2015

I have no objection to this but the command line syntax would have to change. What do you suggest?

I'd add a flag to specify that the next two arguments are pattern/replacement, something like

string replace -r pattern1 replacement1 -p pattern2 replacement2

where "-r" applies to all pairs, regardless where it is specified, and a "-p" for the first pair is optional. (This means of course that arguments on the commandline need to be protected via "--", but that's already an issue) (It also wouldn't easily work with our current completion syntax, but there's currently not much we could complete anyway)

The regex string replace functionality is exactly what pcre2_substitute() makes available, including the $1 syntax.

All pcre documentation I've found (and all applicable programs I know) use \1, so I'd of course like it if it were different.

To get a tab or other escape sequence, you can do the usual fish quote-unquote thing: pacman -Sg | string replace -a -r '^(.*)' '$1'\t'Package group'. Naturally this is not unique to string replace.

Could you add it? I believe that the current situation is non-intuitive.

Thanks for the reminder; pcre2 does provide human-friendly error strings so I will replace the error codes with those.

Ah, okay.

@msteed

This comment has been minimized.

Show comment
Hide comment
@msteed

msteed Aug 12, 2015

Contributor

I'd add a flag to specify that the next two arguments are pattern/replacement, something like

string replace -r pattern1 replacement1 -p pattern2 replacement2

where "-r" applies to all pairs, regardless where it is specified, and a "-p" for the first pair is optional.

Okay, I like that. Will do.

All pcre documentation I've found (and all applicable programs I know) use \1, so I'd of course like it if it were different.

FYI here is the description of pcre2_substitute(). The disadvantage of changing the syntax is that it means maintaining a modified copy of pcre2_substitute() within fish.

If we automatically honor backslash escapes within replacement strings, I think this will be the only place within fish where strings are treated this way. Is that the right move?

Another alternative to playing with quotes is substitution with echo -e:

pacman -Sg | string replace -a -r '^(.*)' (echo -e '$1\tPackage group')

Contributor

msteed commented Aug 12, 2015

I'd add a flag to specify that the next two arguments are pattern/replacement, something like

string replace -r pattern1 replacement1 -p pattern2 replacement2

where "-r" applies to all pairs, regardless where it is specified, and a "-p" for the first pair is optional.

Okay, I like that. Will do.

All pcre documentation I've found (and all applicable programs I know) use \1, so I'd of course like it if it were different.

FYI here is the description of pcre2_substitute(). The disadvantage of changing the syntax is that it means maintaining a modified copy of pcre2_substitute() within fish.

If we automatically honor backslash escapes within replacement strings, I think this will be the only place within fish where strings are treated this way. Is that the right move?

Another alternative to playing with quotes is substitution with echo -e:

pacman -Sg | string replace -a -r '^(.*)' (echo -e '$1\tPackage group')

@terlar

This comment has been minimized.

Show comment
Hide comment
@terlar

terlar Aug 12, 2015

Contributor

The downside with $1 would be if you would want to use double quotes for some reason (like using variables inside).

Then this would conflict with variables. What would happen in this case? I am assuming it would use the variable $1.

string replace -a -r '^(.*)' "$1 $other Something"
Contributor

terlar commented Aug 12, 2015

The downside with $1 would be if you would want to use double quotes for some reason (like using variables inside).

Then this would conflict with variables. What would happen in this case? I am assuming it would use the variable $1.

string replace -a -r '^(.*)' "$1 $other Something"
@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Aug 12, 2015

Member

If we automatically honor backslash escapes within replacement strings, I think this will be the only place within fish where strings are treated this way. Is that the right move?

It's also already the only place "within fish" that we honor "\w". I believe, especially considering that "sed" already does it, that it's quite intuitive that the "string" "command" accepts different arguments from other commands. Whether it's a built-in or not isn't too visible to users (of course they could check, but they have to explicitly do that).

The disadvantage of changing the syntax is that it means maintaining a modified copy of pcre2_substitute() within fish.

Yeah, that's.... not great. Would it be possible to just change every "$([0-9])" to "(\1)" or is that prohibitively expensive or error-prone? I'm also probably massively over-thinking this - it's not the end of the world if I have to get used to "$1" instead of "\1".

Then this would conflict with variables. What would happen in this case? I am assuming it would use the variable $1.

Yes, it would. Of course using single quotes as much as possible is probably better in general and especially for things like this were you use unusual characters and don't want to constantly escape (which is why it's a blessing that fish usually doesn't do that), but if you were to use double quotes (and I'm guilty of using them too much), then you'd need to escape any "$".

Member

faho commented Aug 12, 2015

If we automatically honor backslash escapes within replacement strings, I think this will be the only place within fish where strings are treated this way. Is that the right move?

It's also already the only place "within fish" that we honor "\w". I believe, especially considering that "sed" already does it, that it's quite intuitive that the "string" "command" accepts different arguments from other commands. Whether it's a built-in or not isn't too visible to users (of course they could check, but they have to explicitly do that).

The disadvantage of changing the syntax is that it means maintaining a modified copy of pcre2_substitute() within fish.

Yeah, that's.... not great. Would it be possible to just change every "$([0-9])" to "(\1)" or is that prohibitively expensive or error-prone? I'm also probably massively over-thinking this - it's not the end of the world if I have to get used to "$1" instead of "\1".

Then this would conflict with variables. What would happen in this case? I am assuming it would use the variable $1.

Yes, it would. Of course using single quotes as much as possible is probably better in general and especially for things like this were you use unusual characters and don't want to constantly escape (which is why it's a blessing that fish usually doesn't do that), but if you were to use double quotes (and I'm guilty of using them too much), then you'd need to escape any "$".

@msteed

This comment has been minimized.

Show comment
Hide comment
@msteed

msteed Aug 13, 2015

Contributor

It's also already the only place "within fish" that we honor "\w". I believe, especially considering that "sed" already does it, that it's quite intuitive that the "string" "command" accepts different arguments from other commands.

Okay, I can see that.

Here is the list of escapes handled by the (undocumented) printf builtin.

 \" = double quote
 \\ = backslash
 \a = alert (bell)
 \b = backspace
 \c = produce no further output
 \e = escape
 \f = form feed
 \n = new line
 \r = carriage return
 \t = horizontal tab
 \v = vertical tab
 \ooo = octal number (ooo is 1 to 3 digits)
 \xhh = hexadecimal number (hhh is 1 to 2 digits)
 \uhhhh = 16-bit Unicode character (hhhh is 4 digits)
 \Uhhhhhhhh = 32-bit Unicode character (hhhhhhhh is 8 digits)

If we go ahead with your suggestion I think it makes sense to handle the same set of escapes. Thoughts?

On a related note, pcre2_substitute() supports references to named capturing groups in the replacement string using $name or ${name} syntax. If we move to the \ syntax for capturing groups, there is an obvious clash between \name and backslash escapes. We probably would want to require the curly braces: \{name} and not \name.

Contributor

msteed commented Aug 13, 2015

It's also already the only place "within fish" that we honor "\w". I believe, especially considering that "sed" already does it, that it's quite intuitive that the "string" "command" accepts different arguments from other commands.

Okay, I can see that.

Here is the list of escapes handled by the (undocumented) printf builtin.

 \" = double quote
 \\ = backslash
 \a = alert (bell)
 \b = backspace
 \c = produce no further output
 \e = escape
 \f = form feed
 \n = new line
 \r = carriage return
 \t = horizontal tab
 \v = vertical tab
 \ooo = octal number (ooo is 1 to 3 digits)
 \xhh = hexadecimal number (hhh is 1 to 2 digits)
 \uhhhh = 16-bit Unicode character (hhhh is 4 digits)
 \Uhhhhhhhh = 32-bit Unicode character (hhhhhhhh is 8 digits)

If we go ahead with your suggestion I think it makes sense to handle the same set of escapes. Thoughts?

On a related note, pcre2_substitute() supports references to named capturing groups in the replacement string using $name or ${name} syntax. If we move to the \ syntax for capturing groups, there is an obvious clash between \name and backslash escapes. We probably would want to require the curly braces: \{name} and not \name.

@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Aug 13, 2015

Member

Here is the list of escapes handled by the (undocumented) printf builtin.

(I added that documentation in #2290 - though it was also previously added and removed again)

If we go ahead with your suggestion I think it makes sense to handle the same set of escapes. Thoughts?

Yeah, it's best to stay consistent - though I can't say I've ever seen the need for the bell char, and I don't think "\c" applies here. The unicode escapes are nice, though, as is \n and \t.

On a related note, pcre2_substitute() supports references to named capturing groups in the replacement string using $name or ${name} syntax. If we move to the \ syntax for capturing groups, there is an obvious clash between \name and backslash escapes. We probably would want to require the curly braces: {name} and not \name.

You know what? The more I look at it, the more I see the issues that changing around pcre2 would cause for that one single piece of consistency with something else. It would have been nice if pcre2 didn't choose to use incompatible syntax, but as it stands now I'm beginning to convince myself that we should just keep that as is, as long as it's documented - which you've already done.


There's another thing I've found, though, and that's the behavior without the "-a" option:

builtin complete -Cgit- | string replace -r "^git-([^[:space:]]*).*" "\${1}"

i.e. without the "-a" option, produces this output:

cvsserver
git-shell Programm, 832kB
git-upload-archive Programm, 1,7MB
git-upload-pack Programm, 849kB
git-receive-pack Programm, 1,7MB

It only operates on the first line!

Now, I'd have expected this tool to be line-based, to operate on every line (when given input via stdin), and the "-a" option to be analogous to sed's "g" (as in s/PATTERN/REPLACEMENT/g), so that it then operates multiple times per line.

Is this by design?

Member

faho commented Aug 13, 2015

Here is the list of escapes handled by the (undocumented) printf builtin.

(I added that documentation in #2290 - though it was also previously added and removed again)

If we go ahead with your suggestion I think it makes sense to handle the same set of escapes. Thoughts?

Yeah, it's best to stay consistent - though I can't say I've ever seen the need for the bell char, and I don't think "\c" applies here. The unicode escapes are nice, though, as is \n and \t.

On a related note, pcre2_substitute() supports references to named capturing groups in the replacement string using $name or ${name} syntax. If we move to the \ syntax for capturing groups, there is an obvious clash between \name and backslash escapes. We probably would want to require the curly braces: {name} and not \name.

You know what? The more I look at it, the more I see the issues that changing around pcre2 would cause for that one single piece of consistency with something else. It would have been nice if pcre2 didn't choose to use incompatible syntax, but as it stands now I'm beginning to convince myself that we should just keep that as is, as long as it's documented - which you've already done.


There's another thing I've found, though, and that's the behavior without the "-a" option:

builtin complete -Cgit- | string replace -r "^git-([^[:space:]]*).*" "\${1}"

i.e. without the "-a" option, produces this output:

cvsserver
git-shell Programm, 832kB
git-upload-archive Programm, 1,7MB
git-upload-pack Programm, 849kB
git-receive-pack Programm, 1,7MB

It only operates on the first line!

Now, I'd have expected this tool to be line-based, to operate on every line (when given input via stdin), and the "-a" option to be analogous to sed's "g" (as in s/PATTERN/REPLACEMENT/g), so that it then operates multiple times per line.

Is this by design?

@msteed

This comment has been minimized.

Show comment
Hide comment
@msteed

msteed Aug 13, 2015

Contributor

Honestly I simply wasn't thinking of the behavior of line-oriented tools, but that behavior makes more sense. I'll make it so the absence of -a means to operate on the first match in each argument.

I appreciate your feedback. I came at this with a few simple uses cases in mind and it helps to hear from someone wanting to solve different problems.

Contributor

msteed commented Aug 13, 2015

Honestly I simply wasn't thinking of the behavior of line-oriented tools, but that behavior makes more sense. I'll make it so the absence of -a means to operate on the first match in each argument.

I appreciate your feedback. I came at this with a few simple uses cases in mind and it helps to hear from someone wanting to solve different problems.

@pickfire

This comment has been minimized.

Show comment
Hide comment
@pickfire

pickfire Aug 17, 2015

Contributor

When you remove that, there will be a problem in linux framebuffer which the computer will say that it won't able to print that character and there will always be an error. Look at #2126.

fish: Tried to print invalid wide character string
Contributor

pickfire commented on b615534 Aug 17, 2015

When you remove that, there will be a problem in linux framebuffer which the computer will say that it won't able to print that character and there will always be an error. Look at #2126.

fish: Tried to print invalid wide character string

This comment has been minimized.

Show comment
Hide comment
@faho

faho Aug 17, 2015

Member

That can be fixed by simply using a UTF-8 locale, which on linux is heavily recommended anyway.

Member

faho replied Aug 17, 2015

That can be fixed by simply using a UTF-8 locale, which on linux is heavily recommended anyway.

This comment has been minimized.

Show comment
Hide comment
@pickfire

pickfire Aug 17, 2015

Contributor

I am using a UTF-8 locale, locale | grep -E '(LANG|LC_CTYPE)=(.*\.)?UTF-8' shows:

LANG=en_GB.UTF-8
LC_CTYPE=en_GB.UTF-8

It works in a normal terminal but it does not work in the linux framebuffer.

Contributor

pickfire replied Aug 17, 2015

I am using a UTF-8 locale, locale | grep -E '(LANG|LC_CTYPE)=(.*\.)?UTF-8' shows:

LANG=en_GB.UTF-8
LC_CTYPE=en_GB.UTF-8

It works in a normal terminal but it does not work in the linux framebuffer.

This comment has been minimized.

Show comment
Hide comment
@faho

faho Aug 17, 2015

Member

Weird - for me it simply prints a bunch of colored boxes with question marks in them.

Can you open an issue about this so we can take a look and solve it for more than one prompt?

Member

faho replied Aug 17, 2015

Weird - for me it simply prints a bunch of colored boxes with question marks in them.

Can you open an issue about this so we can take a look and solve it for more than one prompt?

This comment has been minimized.

Show comment
Hide comment
@pickfire

pickfire Aug 17, 2015

Contributor

Weird - for me it simply prints a bunch of colored boxes with question marks in them.

This is True, as linux framebuffer can only display 256 or 512 characters, it cannot display some character.

Can you open an issue about this so we can take a look and solve it for more than one prompt?

You finally asked that. It is already an issue, look at #2070, it is still an unsolvable, mystery puzzle.

Contributor

pickfire replied Aug 17, 2015

Weird - for me it simply prints a bunch of colored boxes with question marks in them.

This is True, as linux framebuffer can only display 256 or 512 characters, it cannot display some character.

Can you open an issue about this so we can take a look and solve it for more than one prompt?

You finally asked that. It is already an issue, look at #2070, it is still an unsolvable, mystery puzzle.

faho and others added some commits Aug 17, 2015

Completions: Don't check $cmd[1]
This is already done by fish before calling the completion.

It breaks completion with combiners (#2025) and also with wrappers.

(This does not include git because that's better solved in #2145)
docs/design.hdr: inclusive lanugage
Closes fish-shell/fish-site#25.

Signed-off-by: David Adam <zanchey@ucc.gu.uwa.edu.au>

[skip ci]
Remove vi mode indicator from classic_git prompt
It is duplicative of the fish_mode_prompt function

Fixes #2228
Rewrite parse_util_unescape_wildcards
Make it simpler, and use wcstring instead of wcsdup
Updates after review comments
- make match/replace without -a operate on the first match on each
  argument
- use different exit codes for "no operation performed" and errors, as
  grep does
- refactor regex compile code
- use human-friendly error messages from pcre2
- improve error handling & reporting elsewhere
- add a few tests
- make some doc fixes
- some simplification & cleanup
- fix ci build failure (I hope)
@msteed

This comment has been minimized.

Show comment
Hide comment
@msteed

msteed Aug 20, 2015

Contributor

Okay, finally got the CI build of pcre2 squared away.

@faho: I took your suggestions except for accepting multiple pattern/replacement pairs. After working on it for a bit, I concluded that the getopt hackery required to have an option take two arguments, combined with its usual argument permutation behavior, made a reliable implementation more trouble than it was worth. That could be added in the future though.

@ridiculousfish: the problems noted in #2296 (comment) still exist. Otherwise I think this is ready to go.

Contributor

msteed commented Aug 20, 2015

Okay, finally got the CI build of pcre2 squared away.

@faho: I took your suggestions except for accepting multiple pattern/replacement pairs. After working on it for a bit, I concluded that the getopt hackery required to have an option take two arguments, combined with its usual argument permutation behavior, made a reliable implementation more trouble than it was worth. That could be added in the future though.

@ridiculousfish: the problems noted in #2296 (comment) still exist. Otherwise I think this is ready to go.

@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Aug 20, 2015

"C-style escape sequences like \t"? An example would be useful here because the term might not be that familiar.

faho commented on doc_src/string.txt in 1e34e31 Aug 20, 2015

"C-style escape sequences like \t"? An example would be useful here because the term might not be that familiar.

Show outdated Hide outdated doc_src/string.txt
`string` performs operations on strings.
STRING arguments are taken from the command line unless standard input is connected to a pipe or a file, in which case they are read from standard input. It is an error to supply STRING arguments on the command line and on standard input.

This comment has been minimized.

@faho

faho Aug 20, 2015

Member

"read from standard input one STRING per line" or similar?

@faho

faho Aug 20, 2015

Member

"read from standard input one STRING per line" or similar?

@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Aug 20, 2015

Member

Some more stuff I'd like to see (though that probably shouldn't block a merge, I'm antsy to play with this for real):

  • Some way to make escape also escape leading dashes - this is a major source of issues in shellscripting, and a simple way to fix it would be nice
  • Some way to replace multiple PATTERN/REPLACEMENTs in one go - I already mentioned this, but I see the issue
  • Something I forgot.
Member

faho commented Aug 20, 2015

Some more stuff I'd like to see (though that probably shouldn't block a merge, I'm antsy to play with this for real):

  • Some way to make escape also escape leading dashes - this is a major source of issues in shellscripting, and a simple way to fix it would be nice
  • Some way to replace multiple PATTERN/REPLACEMENTs in one go - I already mentioned this, but I see the issue
  • Something I forgot.
@msteed

This comment has been minimized.

Show comment
Hide comment
@msteed

msteed Aug 20, 2015

Contributor

@faho: thanks for the suggestions on the documentation.

I am completely okay with further improvements to the string functionality (implemented by me or anyone else), but I would like to see what comes out of testing by a wider audience.

Contributor

msteed commented Aug 20, 2015

@faho: thanks for the suggestions on the documentation.

I am completely okay with further improvements to the string functionality (implemented by me or anyone else), but I would like to see what comes out of testing by a wider audience.

@msteed

This comment has been minimized.

Show comment
Hide comment
@msteed

msteed Aug 21, 2015

Contributor

Update on the executable size:

  • adding the new string code minus regex support: +20KB
  • adding the calls to pcre2_compile() and pcre2_match(): +160KB
  • adding the call to pcre2_substitute(): +50KB

So the total size increase is 230KB. Measurements are from a release build, g++-5.2, Linux, x86_64.

Building string-enabled fish with -Os produces a net 50KB decrease in size over the non-string-enabled fish, so if executable size is a concern that's one area to explore.

Contributor

msteed commented Aug 21, 2015

Update on the executable size:

  • adding the new string code minus regex support: +20KB
  • adding the calls to pcre2_compile() and pcre2_match(): +160KB
  • adding the call to pcre2_substitute(): +50KB

So the total size increase is 230KB. Measurements are from a release build, g++-5.2, Linux, x86_64.

Building string-enabled fish with -Os produces a net 50KB decrease in size over the non-string-enabled fish, so if executable size is a concern that's one area to explore.

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Aug 22, 2015

Member

In the process of code-reviewing this...beautifully written by the way!

Member

ridiculousfish commented Aug 22, 2015

In the process of code-reviewing this...beautifully written by the way!

@msteed

This comment has been minimized.

Show comment
Hide comment
@msteed

msteed Aug 22, 2015

Owner

This was inadvertent. Maybe the result of running make depend?

Owner

msteed commented on Makefile.in in 290c58c Aug 22, 2015

This was inadvertent. Maybe the result of running make depend?

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Sep 11, 2015

Member

Preparing to merge...any reason not to omit pcre-10.20/doc, which is ~2.1MB? @msteed

Member

ridiculousfish commented Sep 11, 2015

Preparing to merge...any reason not to omit pcre-10.20/doc, which is ~2.1MB? @msteed

@msteed

This comment has been minimized.

Show comment
Hide comment
@msteed

msteed Sep 11, 2015

Contributor

Without those files the pcre2 build fails. I tried unsuccessfully to build the library without the docs. I can look into this further if you like.

Contributor

msteed commented Sep 11, 2015

Without those files the pcre2 build fails. I tried unsuccessfully to build the library without the docs. I can look into this further if you like.

@msteed

This comment has been minimized.

Show comment
Hide comment
@msteed

msteed Sep 11, 2015

Contributor

Er, okay, it turns out it wasn't that hard. Yes, we can omit the docs. I'll commit a change shortly to make the build work without those files.

Contributor

msteed commented Sep 11, 2015

Er, okay, it turns out it wasn't that hard. Yes, we can omit the docs. I'll commit a change shortly to make the build work without those files.

@msteed

This comment has been minimized.

Show comment
Hide comment
@msteed

msteed Sep 11, 2015

Contributor

Can also omit pcre2-10.20/testdata

Contributor

msteed commented Sep 11, 2015

Can also omit pcre2-10.20/testdata

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish
Member

ridiculousfish commented Sep 11, 2015

cool

@msteed

This comment has been minimized.

Show comment
Hide comment
@msteed

msteed Sep 12, 2015

Contributor

@ridiculousfish: if it helps I can close this PR and make a new one with a clean history and no superfluous pcre2 files.

Contributor

msteed commented Sep 12, 2015

@ridiculousfish: if it helps I can close this PR and make a new one with a clean history and no superfluous pcre2 files.

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Sep 12, 2015

Member

@msteed I'm happy to do a squash merge, to avoid introducing those files into the repo. However if you want to preserve some of your history I can wait for the new PR. Whichever you prefer.

Member

ridiculousfish commented Sep 12, 2015

@msteed I'm happy to do a squash merge, to avoid introducing those files into the repo. However if you want to preserve some of your history I can wait for the new PR. Whichever you prefer.

@msteed

This comment has been minimized.

Show comment
Hide comment
@msteed

msteed Sep 12, 2015

Contributor

No worries about history. Squash away!

Contributor

msteed commented Sep 12, 2015

No worries about history. Squash away!

ridiculousfish added a commit that referenced this pull request Sep 12, 2015

Merge new string builtin, from #2296
Squashed commit of the following:

commit 4c3eaeb6e57d76463e9683c327142b0aeafb92b8
Author: ridiculousfish <corydoras@ridiculousfish.com>
Date:   Sat Sep 12 12:51:30 2015 -0700

    Remove testdata and doc dirs from pcre2 source

commit b2a8b4b50f2398b204fb72cfe4b5ba77ece2e1ab
Merge: 11c8a47 7974aab
Author: ridiculousfish <corydoras@ridiculousfish.com>
Date:   Sat Sep 12 12:32:40 2015 -0700

    Merge branch 'string' of git://github.com/msteed/fish-shell into string-test

commit 7974aab
Author: Michael Steed <msteed@saltstack.com>
Date:   Fri Sep 11 13:00:02 2015 -0600

    build pcre2 lib only, no docs

commit eb20b43
Merge: 1a09e70 5f519cb
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Sep 10 20:00:47 2015 -0600

    Merge branch 'string' of github.com:msteed/fish-shell into string

commit 1a09e70
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Sep 10 19:58:24 2015 -0600

    rebase on master & address the fallout

commit a0ec977
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Sep 10 19:26:45 2015 -0600

    use fish's wildcard_match() for glob matching

commit 64c25a0
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Aug 27 08:19:23 2015 -0600

    some fixes from review

    - string_get_arg_stdin(): simplify and don't discard the argument when
      the trailing newline is absent
    - fix calls to pcre2 for e.g. string match -r -a 'a*' 'b'
    - correct test for args coming from stdin

commit ece7f35
Author: Michael Steed <msteed68@gmail.com>
Date:   Sat Aug 22 19:35:56 2015 -0600

    fixes from review

    - Makefile.in: restore iwyu target
    - regex_replacer_t::replace_matches(): correct size passed to realloc()

commit 9ff7477
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Aug 20 13:08:33 2015 -0600

    Minor doc improvements

commit baf4e09
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 19 18:29:02 2015 -0600

    another attempt to fix the ci build

commit 896a2c2
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 19 18:03:49 2015 -0600

    Updates after review comments

    - make match/replace without -a operate on the first match on each
      argument
    - use different exit codes for "no operation performed" and errors, as
      grep does
    - refactor regex compile code
    - use human-friendly error messages from pcre2
    - improve error handling & reporting elsewhere
    - add a few tests
    - make some doc fixes
    - some simplification & cleanup
    - fix ci build failure (I hope)

commit efd47dc
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 12 00:26:07 2015 -0600

    fix dependencies for parallel make

commit ed0850e
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 23:37:22 2015 -0600

    Add missing pcre2 files + .gitignore

commit 9492e7a
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 22:44:05 2015 -0600

    add pcre2-10.20 and update license.hdr

commit 1a60b93
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 22:41:19 2015 -0600

    add string builtin files

    - string builtin source, tests, & docs
    - changes to configure.ac & Makefile.in

commit 5f519cb
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Sep 10 19:26:45 2015 -0600

    use fish's wildcard_match() for glob matching

commit 2ecd24f
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Aug 27 08:19:23 2015 -0600

    some fixes from review

    - string_get_arg_stdin(): simplify and don't discard the argument when
      the trailing newline is absent
    - fix calls to pcre2 for e.g. string match -r -a 'a*' 'b'
    - correct test for args coming from stdin

commit 45b777e
Author: Michael Steed <msteed68@gmail.com>
Date:   Sat Aug 22 19:35:56 2015 -0600

    fixes from review

    - Makefile.in: restore iwyu target
    - regex_replacer_t::replace_matches(): correct size passed to realloc()

commit 981cbb6
Author: Michael Steed <msteed68@gmail.com>
Date:   Thu Aug 20 13:08:33 2015 -0600

    Minor doc improvements

commit ddb6a2a
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 19 18:29:02 2015 -0600

    another attempt to fix the ci build

commit 1e34e31
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 19 18:03:49 2015 -0600

    Updates after review comments

    - make match/replace without -a operate on the first match on each
      argument
    - use different exit codes for "no operation performed" and errors, as
      grep does
    - refactor regex compile code
    - use human-friendly error messages from pcre2
    - improve error handling & reporting elsewhere
    - add a few tests
    - make some doc fixes
    - some simplification & cleanup
    - fix ci build failure (I hope)

commit 34232e1
Author: Michael Steed <msteed68@gmail.com>
Date:   Wed Aug 12 00:26:07 2015 -0600

    fix dependencies for parallel make

commit 00d7e78
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 23:37:22 2015 -0600

    Add missing pcre2 files + .gitignore

commit 4498aa5
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 22:44:05 2015 -0600

    add pcre2-10.20 and update license.hdr

commit 290c58c
Author: Michael Steed <msteed68@gmail.com>
Date:   Tue Aug 11 22:41:19 2015 -0600

    add string builtin files

    - string builtin source, tests, & docs
    - changes to configure.ac & Makefile.in
@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Sep 12, 2015

Member

I've squash-merged to a new branch string-staging. I'll use that to add support in the Xcode build.

zanchey, you can take a closer look at how it's plugged into the build too on this branch if you like. I also hope to add support for using OS X's libpcre here.

Member

ridiculousfish commented Sep 12, 2015

I've squash-merged to a new branch string-staging. I'll use that to add support in the Xcode build.

zanchey, you can take a closer look at how it's plugged into the build too on this branch if you like. I also hope to add support for using OS X's libpcre here.

@zanchey

This comment has been minimized.

Show comment
Hide comment
@zanchey

zanchey Sep 15, 2015

Member

I think OS X (and most Linux distributions) use the old PCRE API rather than PCRE2, so I'm not sure how straightforward that will be.

Almost no distributions are shipping the PCRE2 libraries yet.

Member

zanchey commented Sep 15, 2015

I think OS X (and most Linux distributions) use the old PCRE API rather than PCRE2, so I'm not sure how straightforward that will be.

Almost no distributions are shipping the PCRE2 libraries yet.

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Sep 15, 2015

Member

It should be straightforward to conditionally use PCRE1 or PCRE2 depending on what's available at build time.

But OS X doesn't ship any PCRE headers, and they hate it when developers reverse engineer headers. So the OS X build will build the pcre2 library, and link against it statically. This is already working in Xcode on the branch.

Member

ridiculousfish commented Sep 15, 2015

It should be straightforward to conditionally use PCRE1 or PCRE2 depending on what's available at build time.

But OS X doesn't ship any PCRE headers, and they hate it when developers reverse engineer headers. So the OS X build will build the pcre2 library, and link against it statically. This is already working in Xcode on the branch.

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Sep 21, 2015

Member

This has been merged as 1883e05, with the squash-commit d83ef07 credited to @msteed.

A big thanks to msteed for pushing this through, both design and implementation! It's not easy to design a piece this big, and I'm totally thrilled with how it turned out!

Member

ridiculousfish commented Sep 21, 2015

This has been merged as 1883e05, with the squash-commit d83ef07 credited to @msteed.

A big thanks to msteed for pushing this through, both design and implementation! It's not easy to design a piece this big, and I'm totally thrilled with how it turned out!

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Sep 21, 2015

🎉 Wonderful news.

ghost commented Sep 21, 2015

🎉 Wonderful news.

@msteed

This comment has been minimized.

Show comment
Hide comment
@msteed

msteed Sep 22, 2015

Contributor

Excellent!

Thanks to @ridiculousfish and @faho for the careful reviews, helpful feedback, and many improvements. Thanks to @kballard for the original interface design. And thanks to everyone who contributed to the discussion on #156.

Contributor

msteed commented Sep 22, 2015

Excellent!

Thanks to @ridiculousfish and @faho for the careful reviews, helpful feedback, and many improvements. Thanks to @kballard for the original interface design. And thanks to everyone who contributed to the discussion on #156.

@zanchey

This comment has been minimized.

Show comment
Hide comment
@zanchey

zanchey Sep 22, 2015

Member

Awesome!

Some more work needs to be done on the integration with the autotools build - requiring autoreconf prevents building on RHEL 5 due to too-old autoconf, and as our tarballs aren't marked as depending on aclocal the build also fails on openSUSE. I'll try and take a look in the next few days.

Member

zanchey commented Sep 22, 2015

Awesome!

Some more work needs to be done on the integration with the autotools build - requiring autoreconf prevents building on RHEL 5 due to too-old autoconf, and as our tarballs aren't marked as depending on aclocal the build also fails on openSUSE. I'll try and take a look in the next few days.

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Sep 23, 2015

Member

The change c1bd3b5 fixes issue 5 in my list, closing the loop on that.

Member

ridiculousfish commented Sep 23, 2015

The change c1bd3b5 fixes issue 5 in my list, closing the loop on that.

@danielb2

This comment has been minimized.

Show comment
Hide comment
@danielb2

danielb2 Dec 16, 2015

What version of fish is this in? It's really difficult to tell when the same milestones (next-2.x) keeps being re-used. Any reason we're doing it that way?

danielb2 commented Dec 16, 2015

What version of fish is this in? It's really difficult to tell when the same milestones (next-2.x) keeps being re-used. Any reason we're doing it that way?

@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Dec 16, 2015

Member

Since it's in the next-2.x milestone, that means it's not released yet, so it's in whatever comes after 2.2 (which AFAIK hasn't been decided yet, though I'd quite like it if it were 2.3 since I have a thing for bad movies).

Member

faho commented Dec 16, 2015

Since it's in the next-2.x milestone, that means it's not released yet, so it's in whatever comes after 2.2 (which AFAIK hasn't been decided yet, though I'd quite like it if it were 2.3 since I have a thing for bad movies).

@danielb2

This comment has been minimized.

Show comment
Hide comment
@danielb2

danielb2 Dec 16, 2015

It looks like the next-2.x milestone bucket is being reused, for 2.0, 2.1 and 2.2. That means I couldn't tell which release it was in and this ticket is all the way back in September and I have no idea when 2.2 was released. Iow, was this ticket filed before 2.2 was released or not.

Am I missing something? I think my concern here is legit.

danielb2 commented Dec 16, 2015

It looks like the next-2.x milestone bucket is being reused, for 2.0, 2.1 and 2.2. That means I couldn't tell which release it was in and this ticket is all the way back in September and I have no idea when 2.2 was released. Iow, was this ticket filed before 2.2 was released or not.

Am I missing something? I think my concern here is legit.

@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Dec 17, 2015

Member

@danielb2: All our releases are git tags. If you wish to know which tag contains a given commit (i.e. "came after" it), use git tag --contains COMMITHASH. Github will also show you on the commit page.

AFAIK, on the last release the "next-2.x" milestone was closed renamed "fish 2.2.0" and then another with the "next-2.x" name opened, so anything with that milestone should not be in a release.

Member

faho commented Dec 17, 2015

@danielb2: All our releases are git tags. If you wish to know which tag contains a given commit (i.e. "came after" it), use git tag --contains COMMITHASH. Github will also show you on the commit page.

AFAIK, on the last release the "next-2.x" milestone was closed renamed "fish 2.2.0" and then another with the "next-2.x" name opened, so anything with that milestone should not be in a release.

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Dec 17, 2015

Member

faho is right: milestones are never reused across releases.

Member

ridiculousfish commented Dec 17, 2015

faho is right: milestones are never reused across releases.

@danielb2

This comment has been minimized.

Show comment
Hide comment
@danielb2

danielb2 Dec 17, 2015

help string doesn't show any of the docs for me. Can I be doing something wrong?

danielb2 commented Dec 17, 2015

help string doesn't show any of the docs for me. Can I be doing something wrong?

@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Dec 17, 2015

Member

@danielb2: You need doxygen to build the docs. You can also try man string and string --help in case that isn't it.

Member

faho commented Dec 17, 2015

@danielb2: You need doxygen to build the docs. You can also try man string and string --help in case that isn't it.

@danielb2

This comment has been minimized.

Show comment
Hide comment
@danielb2

danielb2 commented Dec 17, 2015

thank you

@danielb2

This comment has been minimized.

Show comment
Hide comment
@danielb2

danielb2 Dec 17, 2015

tt is a function with definition
function tt
  set name (pwd)
  set name (string replace . - $name)
  tmux ls -F '#{session_name}' | grep -q "^$name\$"
  if test $status -eq 0
    tmux attach -t $name
  else
    tmux new -s $name
  end
end

there. named tmux handling based on dir name. Thanks @msteed! I was using ruby to do this before. All fish now :)

danielb2 commented Dec 17, 2015

tt is a function with definition
function tt
  set name (pwd)
  set name (string replace . - $name)
  tmux ls -F '#{session_name}' | grep -q "^$name\$"
  if test $status -eq 0
    tmux attach -t $name
  else
    tmux new -s $name
  end
end

there. named tmux handling based on dir name. Thanks @msteed! I was using ruby to do this before. All fish now :)

@msteed

This comment has been minimized.

Show comment
Hide comment
@msteed

msteed Dec 17, 2015

Contributor

@danielb2: Cool! Note that you can also replace grep -q <pattern> with string match -q -r <pattern>.

Contributor

msteed commented Dec 17, 2015

@danielb2: Cool! Note that you can also replace grep -q <pattern> with string match -q -r <pattern>.

@danielb2

This comment has been minimized.

Show comment
Hide comment
@danielb2

danielb2 Dec 17, 2015

oh. sweet! thanks :)

danielb2 commented Dec 17, 2015

oh. sweet! thanks :)

@danielb2

This comment has been minimized.

Show comment
Hide comment
@danielb2

danielb2 Dec 17, 2015

@faho btw man string and string --help didn't work. Does it mean I have to have doxygen installed before I compile?

danielb2 commented Dec 17, 2015

@faho btw man string and string --help didn't work. Does it mean I have to have doxygen installed before I compile?

@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Dec 17, 2015

Member

Does it mean I have to have doxygen installed before I compile?

Effectively yes. You need doxygen to transform the documentation from our input format to the output formats (the website, the man pages and the "--help" output). This is done as part of the build process.

You do not need it just to view the documentation. Doxygen is a build-time dependency.

It also says so in README.md under the heading "Building":

Building the documentation requires Doxygen 1.8.7 or newer.

Member

faho commented Dec 17, 2015

Does it mean I have to have doxygen installed before I compile?

Effectively yes. You need doxygen to transform the documentation from our input format to the output formats (the website, the man pages and the "--help" output). This is done as part of the build process.

You do not need it just to view the documentation. Doxygen is a build-time dependency.

It also says so in README.md under the heading "Building":

Building the documentation requires Doxygen 1.8.7 or newer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment