New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[string] Chunk reads #4610

Closed
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@faho
Copy link
Member

faho commented Dec 18, 2017

Description

Profiling with callgrind revealed that about 60% of the time in a something | string match call
was actually spent in string_get_arg_stdin(),
because it was calling read one byte at a time.

This makes it read in chunks similar to builtin read.

This increases performance for getent hosts | string match -v '0.0.0.0*' from about 300ms to about 30ms (i.e. 90%).
At that point it's actually quicker than grep.

To improve performance even more, we'd have to cut down on str2wcstring.

Fixes issue #4604.

TODOs:

  • Changes to fish usage are reflected in user documentation/manpages.
  • Tests have been added for regressions fixed
  • User-visible changes noted in CHANGELOG.md
[string] Chunk reads
Profiling with callgrind revealed that about 60% of the time in a `something | string match` call
was actually spent in `string_get_arg_stdin()`,
because it was calling `read` one byte at a time.

This makes it read in chunks similar to builtin read.

This increases performance for `getent hosts | string match -v '0.0.0.0*'` from about 300ms to about 30ms (i.e. 90%).
At that point it's _actually_ quicker than `grep`.

To improve performance even more, we'd have to cut down on str2wcstring.

@faho faho added the enhancement label Dec 18, 2017

@faho faho added this to the fish-3.0 milestone Dec 18, 2017

@faho

This comment has been minimized.

Copy link
Member

faho commented Dec 18, 2017

@ridiculousfish
Copy link
Member

ridiculousfish left a comment

Nice work; a few nits for for improvement and then it's G2G


// Read in chunks from fd until buffer has a line.
std::string::iterator pos;
while ((pos = std::find (buffer.begin(), buffer.end(), '\n')) == buffer.end ()) {

This comment has been minimized.

@ridiculousfish

ridiculousfish Dec 19, 2017

Member

This is a bit more idiomatic as:

    size_t pos;
    while ((pos = buffer.find('\n')) == std::string::npos) {...

This comment has been minimized.

@faho

faho Dec 19, 2017

Member

THAT WAS IT! I've been thinking that there was some nicer way to to find, but I couldn't for the life of me come up with it.

*storage = str2wcstring(arg);
return storage->c_str();
// Split the buffer around '\n' found and return first part.
*storage = str2wcstring(buffer.c_str(), std::distance(buffer.begin(), pos));

This comment has been minimized.

@ridiculousfish

ridiculousfish Dec 19, 2017

Member

With the above this could be simply *storage = str2wcstring(buffer, pos);

return storage->c_str();
// Split the buffer around '\n' found and return first part.
*storage = str2wcstring(buffer.c_str(), std::distance(buffer.begin(), pos));
buffer = std::string(pos + 1, buffer.end());

This comment has been minimized.

@ridiculousfish

ridiculousfish Dec 19, 2017

Member

and this could be buffer.erase(0, pos + 1);

@ridiculousfish

This comment has been minimized.

Copy link
Member

ridiculousfish commented Dec 19, 2017

I think with my suggestions we can skip the attributions

@faho

This comment has been minimized.

Copy link
Member

faho commented Dec 20, 2017

Merged with a bit of cleanup as 2de38ef, with some additional work in f9d883d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment