improve find performance #75

tw4452852 · 2015-01-19T04:12:09Z

use walkOnGoRoutine in any directory instead of root to improve concurrency.

thomasf · 2015-01-27T22:24:13Z

awesome!

balta2ar · 2015-03-18T14:44:14Z

@tw4452852 It is possible to add some performance benchmarks to see the difference it makes in actual numbers?

thomasf · 2015-03-19T01:41:54Z

Benchmarks depends very much on the latency of the file system used. For a ram disk, iirc the speed went up at least 20% in cpu usage as a result of less iowait.
A downside with this patch is that the order of search results becomes more random.

use walkOnGoRoutine in any directory instead of root Signed-off-by: Tw <tw19881113@gmail.com>

Signed-off-by: Tw <tw19881113@gmail.com>

tw4452852 · 2015-05-27T08:59:37Z

@balta2ar Sorry for late response. I did a test on my local, here is result:

I find all the makefile in a kernel source code tree,

with the original pt:
time pt_origin -g makefile "" . > /dev/null
pt_origin -g makefile "" . > /dev/null 7.57s user 0.27s system 80% cpu 9.772 total

with the improved pt:
time pt_new -g makefile "" . > /dev/null
pt_new -g makefile "" . > /dev/null 8.18s user 0.25s system 303% cpu 2.778 total

thomasf · 2015-05-27T09:03:35Z

generally I found that the increased disorder or results were not worth the increased speed.

balta2ar · 2015-05-27T09:59:58Z

Thank you for benchmarking this, @tw4452852.

What about adding this as an option? 9.772 -> 2.778 looks significant to me. What do you think?

thomasf · 2015-05-27T10:40:59Z

IIRC my test above was ran on a 1.5gb tmpfs, on my mechanical drives the difference was not noticeable at all. Depending on disk seek timings this change could potentially be slower as well (I don't have any data on this).

Making it optional and turned off by default is probably a good idea. It became really annoying using pt inside an Emacs window because every time I refreshed the results the order became radically different. For small searches an --ordered flag would perhaps also make sense even if it delays printing until all matches are calculated.

balta2ar · 2015-05-27T12:39:01Z

I'd go with an option. @tw4452852, could you implement it, please?

Making it optional and turned off by default is probably a good idea. It became really annoying using pt inside an Emacs window because every time I refreshed the results the order became radically different. For small searches an --ordered flag would perhaps also make sense even if it delays printing until all matches are calculated.

This sounds more like an #86 issue, which in fact can be extended, e.g. so that one can specify sort field like this --sort (date | name | extension | size).

thomasf · 2015-05-27T13:07:08Z

Yeah, these issues are related but not the same. I figured to mention the other issue as well.

Signed-off-by: Tw <tw19881113@gmail.com>

tw4452852 · 2015-05-28T05:29:58Z

@balta2ar Done.

tw4452852 · 2015-05-28T05:40:26Z

Also another finding: in original code, the notify := make(chan int, len(list)) will be allocated in every function call, while it is only used in directRoot, so it adds a lot of overhead in gc. Now I just move it out of walk function as a *sync.waitGroup

tw4452852 · 2015-05-28T05:46:53Z

It is faster now, also find all the Makefile in kernel source tree:

time pt --multi-finder -g Makefile . >/dev/null
pt --multi-finder -g Makefile . >/dev/null 5.79s user 0.27s system 682% cpu 0.887 total

balta2ar · 2015-05-28T10:00:44Z

Thank you very much, @tw4452852! Love your activity here! By the way, do you think it's worth adding unit tests for this option?

Signed-off-by: Tw <tw19881113@gmail.com>

tw4452852 · 2015-05-29T02:20:11Z

@balta2ar Of course. I have added a test for it.

balta2ar · 2015-05-29T10:05:42Z

Great job!

Guys, @tw4452852 @thomasf @monochromegane before this is set in stone, what if we think of some other name for this option? I'm not saying "--multi-finder" is bad and that we should change it, I'm just offering to think one last time to maybe find something even better and more intuitive. Naming things right matters a lot in my opinion. Please suggest your ideas, if any. Thanks!

thomasf · 2015-05-29T10:17:49Z

Ideally it should not use two words.. I guess that what it does can be described as maximizing IO throughput by being more concurrent. Something like --concurrent could maybe work, the help string would then need to clarify that it actually means increasing the concurrency rather than turning it on. I'll revisit this at the end of my day to see if I have other ideas then.

padde · 2015-06-03T11:32:39Z

How about --parallel? Just my 5¢.

monochromegane · 2015-07-16T10:51:19Z

Thank you for the PR, I check and merge this at this weekend. And I will release new version. 🍻
please wait :)

balta2ar · 2015-07-16T11:00:09Z

Thanks, @monochromegane!
Speaking of the option, I like --parallel better.

thomasf · 2015-07-16T12:24:34Z

Maybe an integer flag so that user can control it.. -maxpar N (?)

monochromegane · 2015-07-19T10:14:06Z

I checked, and thanks everyone !
I will merge this PR, and change option name to --parallel.

-maxpar N

I think this is a good idea, but now go-flags implementation can't represent the following

pt (no option)
pt --parallel (default value)
pt --parallel=1 (specify value)

if you have a idea, please send a pull request.

improve find performance

tw4452852 added 2 commits May 26, 2015 17:20

improve find performance

03c8374

use walkOnGoRoutine in any directory instead of root Signed-off-by: Tw <tw19881113@gmail.com>

fix file path bug in multiple finder

ae5ef12

Signed-off-by: Tw <tw19881113@gmail.com>

tw4452852 force-pushed the find_performance branch from e6c2351 to ae5ef12 Compare May 26, 2015 10:02

make multi-finder optional

2e53dd1

Signed-off-by: Tw <tw19881113@gmail.com>

add multiple finder test

eaa84a8

Signed-off-by: Tw <tw19881113@gmail.com>

monochromegane added a commit that referenced this pull request Jul 19, 2015

Merge pull request #75 from tw4452852/find_performance

719cb42

improve find performance

monochromegane merged commit 719cb42 into monochromegane:master Jul 19, 2015

tw4452852 deleted the find_performance branch December 21, 2015 04:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve find performance #75

improve find performance #75

tw4452852 commented Jan 19, 2015

thomasf commented Jan 27, 2015

balta2ar commented Mar 18, 2015

thomasf commented Mar 19, 2015

tw4452852 commented May 27, 2015

thomasf commented May 27, 2015

balta2ar commented May 27, 2015

thomasf commented May 27, 2015

balta2ar commented May 27, 2015

thomasf commented May 27, 2015

tw4452852 commented May 28, 2015

tw4452852 commented May 28, 2015

tw4452852 commented May 28, 2015

balta2ar commented May 28, 2015

tw4452852 commented May 29, 2015

balta2ar commented May 29, 2015

thomasf commented May 29, 2015

padde commented Jun 3, 2015

monochromegane commented Jul 16, 2015

balta2ar commented Jul 16, 2015

thomasf commented Jul 16, 2015

monochromegane commented Jul 19, 2015

improve find performance #75

improve find performance #75

Conversation

tw4452852 commented Jan 19, 2015

thomasf commented Jan 27, 2015

balta2ar commented Mar 18, 2015

thomasf commented Mar 19, 2015

tw4452852 commented May 27, 2015

thomasf commented May 27, 2015

balta2ar commented May 27, 2015

thomasf commented May 27, 2015

balta2ar commented May 27, 2015

thomasf commented May 27, 2015

tw4452852 commented May 28, 2015

tw4452852 commented May 28, 2015

tw4452852 commented May 28, 2015

balta2ar commented May 28, 2015

tw4452852 commented May 29, 2015

balta2ar commented May 29, 2015

thomasf commented May 29, 2015

padde commented Jun 3, 2015

monochromegane commented Jul 16, 2015

balta2ar commented Jul 16, 2015

thomasf commented Jul 16, 2015

monochromegane commented Jul 19, 2015