Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string: Allow string match --max #7495

Closed
wants to merge 1 commit into from
Closed

Conversation

faho
Copy link
Member

@faho faho commented Nov 22, 2020

This is useful in some cases, it also allows faster processing.

As a bonus, also stop reading strings if --quiet is given - no use
in reading more than we can.

TODOs:

  • Changes to fish usage are reflected in user documentation/manpages.
  • Tests have been added for regressions fixed
  • User-visible changes noted in CHANGELOG.rst

This is useful in some cases, it also allows faster processing.

As a bonus, also stop reading strings if ``--quiet`` is given - no use
in reading more than we can.
@faho faho added this to the fish 3.2.0 milestone Nov 22, 2020
@faho
Copy link
Member Author

faho commented Nov 22, 2020

One notable decision: This stops after X matching arguments. If --all is used it can therefore write more than X matches.

We could change that, but it would require handing the maximum off to the matcher, so it doesn't go over the limit.

@faho
Copy link
Member Author

faho commented Nov 22, 2020

Also:

This changes the semantics when string isn't the only thing with access to that input. Now, theoretically you could do

thing | while string match; read; end

and have the read read something. However, because the string might read too far (remember how read has to read byte-by-byte?) it's not guaranteed to be anything sensible.

The only real possibility is to still read everything to drain the pipe, so it doesn't look like you could read anything.

This is mostly theoretical because the string won't read from stdin in that case (we're requiring that stdin is directly redirected), so it's hard to trigger, but if we added a --stdin flag this would make it slightly more awkward.

@krobelus
Copy link
Member

I wonder how hard it would be to add lazy evaluation, so echo (seq 1000000 | string match 1)[1] would implicitly stop after the first match.

@mqudsi
Copy link
Contributor

mqudsi commented Nov 28, 2020

Can you share some sample workflows that would benefit from this? The complexities to the overall state and the discrepancies between matching against arguments vs matching against number of matches are unfortunate.

In particular, does this enable something (realistically speaking) that using index ranges doesn't currently allow? I have a few fish scripts that do something like set -l matches (string match ...)[1..4] to effectively limit to just 4 matches. Like you say (except for krobeleus' creative suggestion of allowing lazy evaluation) there isn't going to be a huge performance win here for most cases.

@jorgebucaran
Copy link

I ran into this the other day and asked @faho if there was a way to stop after the first match, but I'm using string match --regex ... <file | read line now. I could have used index ranges as well. I agree that there isn't going to be a huge performance win like I thought, at least not for my use-case. @krobelus suggestion is really interesting too.

@krobelus
Copy link
Member

Yeah I think this should be solved in a more orthogonal way. A new option doesn't seem worth it. The --quiet optimization can still be done though.

string match --regex ... <file | read line

I believe that this also reads the entire file, because builtin string's output is buffered if the output is a pipe.
This could possibly be enhanced in future, see https://github.com/fish-shell/fish-shell/blob//aa0bfa0eb84689aceb4924ea9cd8abd5c983ea1b/src/exec.cpp#L454

So I'd say use something like sed if you want to only read parts of a file.
In future maybe read could be made faster.

@faho
Copy link
Member Author

faho commented Nov 29, 2020

In future maybe read could be made faster.

reads fundamental problem is that it can't seek in pipes, so it has to read character-by-character to ensure it doesn't read too much (because it can't put the data back), and that's quite slow.

I don't see a way around that.


Anyway, since it seems to be surprisingly unpopular I'm just going to commit early returns for --quiet for all applicable string subcommands.

@faho faho closed this Nov 29, 2020
@zanchey zanchey removed this from the fish 3.2.0 milestone Dec 1, 2020
@faho
Copy link
Member Author

faho commented Dec 1, 2020

For posterity: 720982a made all string commands with --quiet quit as soon as possible.

The tests exercise this like yes | string match -q y, so it shows that you can use it with infinite streams.

faho added a commit that referenced this pull request Dec 1, 2020
E.g. if we do `string match -q`, and we find a match, nothing about
the input can change anything, so we quit early.

This is mainly useful for performance, but it also allows `string`
with `-q` to be used with infinite input (e.g. `yes`).

Alternative to #7495.
@mqudsi
Copy link
Contributor

mqudsi commented Dec 3, 2020

Doesn't that break some regex imports?

@faho
Copy link
Member Author

faho commented Dec 3, 2020

Doesn't that break some regex imports?

Good question. I haven't been paying too much attention to that, I think it sets the variables to the matches of the last argument? If so, this changes it so you get the first match instead.

@faho
Copy link
Member Author

faho commented Dec 3, 2020

Yup, it does indeed:

string match -rq '^(?<foo>.*)$' banana rama

sets $foo to banana, without -q it's rama.

I'd probably just change the import semantics to pick the first match always - given that it's not released yet, and that I don't think there's a real difference between picking the first and the last. Also if we figured out how to make while string work it would be easier to work with.

@mqudsi
Copy link
Contributor

mqudsi commented Dec 3, 2020

I didn’t even realize that breaks, I meant in conjunction with —all.

@faho
Copy link
Member Author

faho commented Dec 4, 2020

I didn’t even realize that breaks, I meant in conjunction with —all.

It does not affect how many matches per-argument are done, so no.

Okay, I'm going to commit a change that makes it use the first matching argument's capture groups, but I'm going to hold off on the documentation changes until #7520 is done.

@faho faho deleted the string-max branch December 4, 2020 17:54
faho added a commit that referenced this pull request Dec 4, 2020
This makes it work the same whether it quits early (with "-q") or not,
and it's generally nice to nail this down.

See #7495.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants