Improve performance of Rabin fingerprint calculation #24

MichaelEischer · 2019-07-06T14:29:16Z

These patches improve the chunking throughput by about 10-23% depending on the CPU used. The speedup is more noticeable on older CPUs.

The test were run on Linux using go 1.11.6 with go test -bench 'BenchmarkChunker$' -cpu 1 -benchtime 20s. I've also repeated the complete performance measurements three times. As the results were quite stable (except for 1 extreme slow outlier), I've just included the results from the second measurement run.

Would it be possible to create a new release for use in restic? The version there is already missing commit 7c2a180 which improves the performance to the baseline for the first commit in the merge request. For tag v0.2.0, which is the version used in restic, I've measured the following throughput:

Old CPU: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
BenchmarkChunker                     200         169683667 ns/op         197.75 MB/s

Modern CPU: Intel(R) Xeon(R) CPU E3-1275 v5 @ 3.60GHz
BenchmarkChunker                     500          64142688 ns/op         523.12 MB/s

This totals up to 15-36% more throughput than v0.2.0 .

wpos is once again limited using modulo. The performance issue with > wpos = wpos % windowsSize Did not stem from the usage of modulo itself but from wpos being a signed integer which forces the compiler to emit a modulo implementation that can handle negative numbers instead of the much faster > wpos = wpos & 0x3f This optimized version is possible as windowSize is equal to 64. Performance: Old CPU: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz benchmark old MB/s new MB/s speedup BenchmarkChunker 218.53 239.68 1.10x Modern CPU: Intel(R) Xeon(R) CPU E3-1275 v5 @ 3.60GHz benchmark old MB/s new MB/s speedup BenchmarkChunker 542.50 542.46 1.00x

Bounding wpos before using it or afterwards makes in general no difference. Here it allows the compiler to elide array bound checks when accessing win. Performance: Old CPU: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz benchmark old MB/s new MB/s speedup BenchmarkChunker 239.68 243.10 1.01x Modern CPU: Intel(R) Xeon(R) CPU E3-1275 v5 @ 3.60GHz benchmark old MB/s new MB/s speedup BenchmarkChunker 542.46 547.28 1.01x

The code only uses polynomials of degree up to 53. Therefore let the compiler know of this restriction which avoids costly checks to handle the case where polShift is larger than the value type width, which would result in unexpected behavior on x86. > byte(digest >> polShift) Performance: Old CPU: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz benchmark old MB/s new MB/s speedup BenchmarkChunker 243.10 268.83 1.11x Modern CPU: Intel(R) Xeon(R) CPU E3-1275 v5 @ 3.60GHz benchmark old MB/s new MB/s speedup BenchmarkChunker 547.28 602.37 1.10x

MichaelEischer · 2020-02-08T17:50:48Z

The speedups shown in the commit messages are still valid (to a large degree) for recent go versions such as 1.13.5 .

fd0 · 2020-02-12T19:50:11Z

Oh, wow, this PR fell through the cracks, I'm sorry for not responding earlier. I've rerun the benchmarks 10 times and benchstat tells me that it's a solid 8-10% speedup for my CPU (Intel(R) Core(TM) i5-6440HQ CPU @ 2.60GHz):

$ benchstat before after
name               old time/op   new time/op   delta
ChunkerWithSHA256    163ms ± 0%    155ms ± 0%  -4.79%   (p=0.000 n=8+10)
Chunker             70.6ms ± 0%   65.2ms ± 2%  -7.65%   (p=0.000 n=9+10)
NewChunker          33.9µs ± 5%   34.3µs ± 4%    ~     (p=0.190 n=10+10)

name               old speed     new speed     delta
ChunkerWithSHA256  206MB/s ± 0%  216MB/s ± 0%  +5.04%   (p=0.000 n=8+10)
Chunker            475MB/s ± 0%  515MB/s ± 2%  +8.30%   (p=0.000 n=9+10)

Thank you very much!

Improve performance of Rabin fingerprint calculation

fd0 · 2020-02-12T19:53:27Z

I've also released 0.3.0 with all the new commits.

MichaelEischer · 2020-02-12T20:04:23Z

Should I open a pull request to update the chunker version in restic?

fd0 · 2020-02-12T20:20:23Z

Already done :)

See restic/restic#2576

MichaelEischer added 3 commits July 6, 2019 15:59

MichaelEischer requested a review from fd0 February 8, 2020 17:50

fd0 approved these changes Feb 12, 2020

View reviewed changes

fd0 added a commit that referenced this pull request Feb 12, 2020

Merge pull request #24 from MichaelEischer/master

3404fc8

Improve performance of Rabin fingerprint calculation

fd0 merged commit 42a27a1 into restic:master Feb 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of Rabin fingerprint calculation #24

Improve performance of Rabin fingerprint calculation #24

MichaelEischer commented Jul 6, 2019

MichaelEischer commented Feb 8, 2020

fd0 commented Feb 12, 2020 •

edited

Loading

fd0 commented Feb 12, 2020

MichaelEischer commented Feb 12, 2020

fd0 commented Feb 12, 2020

Improve performance of Rabin fingerprint calculation #24

Improve performance of Rabin fingerprint calculation #24

Conversation

MichaelEischer commented Jul 6, 2019

MichaelEischer commented Feb 8, 2020

fd0 commented Feb 12, 2020 • edited Loading

fd0 commented Feb 12, 2020

MichaelEischer commented Feb 12, 2020

fd0 commented Feb 12, 2020

fd0 commented Feb 12, 2020 •

edited

Loading