Add option to speed up string match/replace with --max-matches#10587
Conversation
I've often needed a way to get the last bit of performance out of unwieldy
completions that involve a lot of string processing (apt completions come to
mind, and I ran into it just now with parsing man pages for kldload
completions).
Since many times we are looking for just one exact string in the haystack, an
easy optimization here is to introduce a way for `string match` or `string
replace` to early exit after a specific number of matches (typically one) have
been found.
Depending on the size of the input, this can be a huge boon. For example,
parsing the description from FreeBSD kernel module man pages with
zcat /usr/share/man/man4/zfs.4.gz | string match -m1 '.Nd *'
runs 35% faster with -m1 than without, while processing all files under
/usr/share/man/man4/*.4.gz in a loop (so a mix of files ranging from very short
to moderately long) runs about 10% faster overall with -m1.
string match/replace with --max-matchesstring match/replace with --max-matches
|
Prior art in #7495 |
|
Thanks for digging that out, @faho. I had a vague feeling we'd discussed this before! I guess the answer to that is that we shouldn't have questioned your judgement on this matter! :D But more seriously, in that thread
I would like to point out something that isn't a regression/breaking change with this new feature, but is something worth being aware of in all cases: obviously |
Unfortunately no (#7495 (comment)):
string reads in 1024 byte chunks, and it cannot stuff bytes back into the pipe. Reading byte-for-byte is much slower, so it's not a good idea. |
|
Ah, fair point. |
|
Added tests and documentation. |
I've often needed a way to get the last bit of performance out of unwieldy completions that involve a lot of string processing (apt completions come to mind, and I ran into it just now with parsing man pages for kldload completions).
Since many times we are looking for just one exact string in the haystack, an easy optimization here is to introduce a way for
string matchorstring replaceto early exit after a specific number of matches (typically one) have been found.Depending on the size of the input, this can be a huge boon. For example, parsing the description from FreeBSD kernel module man pages with
runs 35% faster with -m1 than without (bearing in mind that most of the time is taken just starting up
zcat, parsing the compressed contents, and decompressing the file!), while processing all files under /usr/share/man/man4/*.4.gz in a loop (so a mix of files ranging from very short to moderately long) runs about 10% faster overall with -m1 (for context, in absolute terms that is 0.5s+ faster).TODOs: