New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid calling mget with massive number of keys in Readdir #110
Conversation
Name: []byte(name), | ||
Attr: &Attr{Typ: typ}, | ||
}) | ||
ent := newEntries[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ent should be a pointer
ent := newEntries[i] | ||
ent.Inode = inode | ||
ent.Name = []byte(name) | ||
attr := newAttrs[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attr should be a pointer
if a, ok := re.(string); ok { | ||
r.parseAttr([]byte(a), (*entries)[i].Attr) | ||
batchSize := 4096 | ||
if batchSize > len(*entries) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed
0da0ac5
to
c9826c1
Compare
I've changed the code to call
In [2]: %timeit os.listdir("/Users/satoru/jfs/many-files/")
657 ms ± 10.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [3]: %timeit os.listdir("/Users/satoru/jfs/many-files/")
654 ms ± 9.33 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [4]: %timeit os.listdir("/Users/satoru/jfs/many-files/")
648 ms ± 8.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [5]: %timeit os.listdir("/Users/satoru/jfs/many-files/")
525 ms ± 10.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [6]: %timeit os.listdir("/Users/satoru/jfs/many-files/")
613 ms ± 5.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [7]: %timeit os.listdir("/Users/satoru/jfs/many-files/")
617 ms ± 9.69 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
For folders with less files (100 in my benchmark), it's slower than the original implementation:
In [18]: %timeit os.listdir("/Users/satoru/jfs/some-files/")
685 µs ± 8.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [19]: %timeit os.listdir("/Users/satoru/jfs/some-files/")
666 µs ± 12.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [20]: %timeit os.listdir("/Users/satoru/jfs/some-files/")
656 µs ± 2.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [15]: %timeit os.listdir("/Users/satoru/jfs/some-files/")
752 µs ± 6.62 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [16]: %timeit os.listdir("/Users/satoru/jfs/some-files/")
748 µs ± 4.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [17]: %timeit os.listdir("/Users/satoru/jfs/some-files/")
722 µs ± 38.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) |
@suzaku Based on the result from benchmark, we should have a fast path for small directorym, thanks. |
Could you also add a benchmark in Go (for both of small and large directory)? |
@davies Is it safe here to use multiple |
@suzaku It's OK to use HSCAN. Right now, what's behavior for 5 millions files? |
If |
Let's merge this one first, then optimize the hgetall() later. |
This PR is related to #95 .
In the original implementation,
mget
is called once with all the keys which correspond to files in a directory. When there are many files in a directory, this call might block the Redis server process.In this PR, I changed it to call
mget
in smaller fixed batch. The consequence is that we reduce the chance of blocking the Redis server, butReadDir
is made slower when the number of files in a directory exceeds the batch size (which is set to 4096 now).For small directories, the latency difference is trivial:
When I run the benchmark in a directory with more than 200,000 files, the new version is obviously slower: