Skip to content

Specialized algorithm for different ranges of values.#11

Merged
llogiq merged 1 commit intollogiq:masterfrom
Veedrac:master
Dec 3, 2016
Merged

Specialized algorithm for different ranges of values.#11
llogiq merged 1 commit intollogiq:masterfrom
Veedrac:master

Conversation

@Veedrac
Copy link
Copy Markdown
Collaborator

@Veedrac Veedrac commented Dec 3, 2016

For a one-liner this code sure is getting long.

I worked out the performance issue I was having - LLVM was getting really confused by a by-value sum taking a [T; 4]. Changing that to a reference fixed things.

I've tuned the specific constants for my machine. Verification that I haven't caused a regression would be helpful. When doing so you might notice there are a lot more benchmarks. This is getting unwieldy, but they've been very useful for me.

Fixes #10.

@llogiq
Copy link
Copy Markdown
Owner

llogiq commented Dec 3, 2016

The errors seem to be in the benchmark code. Perhaps an incomplete commit?

@llogiq
Copy link
Copy Markdown
Owner

llogiq commented Dec 3, 2016

We could include a build script to determine the optimum thresholds for the given machine (using binary search). Or at least a secondary program.

@Veedrac
Copy link
Copy Markdown
Collaborator Author

Veedrac commented Dec 3, 2016

It was just a last-minute renaming I forgot to test. Murphy's law and all.

@llogiq llogiq merged commit 1bc86d6 into llogiq:master Dec 3, 2016
@llogiq
Copy link
Copy Markdown
Owner

llogiq commented Dec 3, 2016

Benchmarks:

before w/o simd

test bench_0_hyper       ... bench:           8 ns/iter (+/- 5)
test bench_0_naive       ... bench:           2 ns/iter (+/- 1)
test bench_1000000_hyper ... bench:     111,911 ns/iter (+/- 84,902)
test bench_1000000_naive ... bench:     582,777 ns/iter (+/- 437,508)
test bench_100000_hyper  ... bench:      10,612 ns/iter (+/- 1,287)
test bench_100000_naive  ... bench:      56,531 ns/iter (+/- 5,721)
test bench_10000_hyper   ... bench:       1,006 ns/iter (+/- 106)
test bench_10000_naive   ... bench:       5,491 ns/iter (+/- 596)
test bench_1000_hyper    ... bench:         124 ns/iter (+/- 12)
test bench_1000_naive    ... bench:         584 ns/iter (+/- 6)
test bench_100_hyper     ... bench:          39 ns/iter (+/- 5)
test bench_100_naive     ... bench:          64 ns/iter (+/- 28)
test bench_10_hyper      ... bench:          16 ns/iter (+/- 3)
test bench_10_naive      ... bench:          10 ns/iter (+/- 6)
test bench_1_hyper       ... bench:           9 ns/iter (+/- 5)
test bench_1_naive       ... bench:           3 ns/iter (+/- 0)

before w/ simd

test bench_0_hyper       ... bench:           5 ns/iter (+/- 1)
test bench_0_naive       ... bench:           2 ns/iter (+/- 0)
test bench_1000000_hyper ... bench:      38,291 ns/iter (+/- 1,604)
test bench_1000000_naive ... bench:     514,708 ns/iter (+/- 37,187)
test bench_100000_hyper  ... bench:       3,695 ns/iter (+/- 410)
test bench_100000_naive  ... bench:      56,868 ns/iter (+/- 8,573)
test bench_10000_hyper   ... bench:         372 ns/iter (+/- 28)
test bench_10000_naive   ... bench:       5,749 ns/iter (+/- 159)
test bench_1000_hyper    ... bench:          89 ns/iter (+/- 17)
test bench_1000_naive    ... bench:         530 ns/iter (+/- 12)
test bench_100_hyper     ... bench:          56 ns/iter (+/- 2)
test bench_100_naive     ... bench:          55 ns/iter (+/- 4)
test bench_10_hyper      ... bench:          13 ns/iter (+/- 0)
test bench_10_naive      ... bench:           8 ns/iter (+/- 2)
test bench_1_hyper       ... bench:           8 ns/iter (+/- 2)
test bench_1_naive       ... bench:           3 ns/iter (+/- 0)

after w/o simd

running 90 tests
test bench_00000_hyper       ... bench:           5 ns/iter (+/- 1)
test bench_00000_naive       ... bench:           2 ns/iter (+/- 0)
test bench_00010_hyper       ... bench:          11 ns/iter (+/- 0)
test bench_00010_naive       ... bench:           8 ns/iter (+/- 0)
test bench_00020_hyper       ... bench:          16 ns/iter (+/- 0)
test bench_00020_naive       ... bench:          13 ns/iter (+/- 0)
test bench_00030_hyper       ... bench:          22 ns/iter (+/- 1)
test bench_00030_naive       ... bench:          19 ns/iter (+/- 0)
test bench_00040_hyper       ... bench:          21 ns/iter (+/- 0)
test bench_00040_naive       ... bench:          23 ns/iter (+/- 0)
test bench_00050_hyper       ... bench:          25 ns/iter (+/- 0)
test bench_00050_naive       ... bench:          28 ns/iter (+/- 1)
test bench_00060_hyper       ... bench:          27 ns/iter (+/- 1)
test bench_00060_naive       ... bench:          34 ns/iter (+/- 0)
test bench_00070_hyper       ... bench:          26 ns/iter (+/- 0)
test bench_00070_naive       ... bench:          39 ns/iter (+/- 22)
test bench_00080_hyper       ... bench:          25 ns/iter (+/- 0)
test bench_00080_naive       ... bench:          44 ns/iter (+/- 1)
test bench_00090_hyper       ... bench:          28 ns/iter (+/- 1)
test bench_00090_naive       ... bench:          49 ns/iter (+/- 1)
test bench_00100_hyper       ... bench:          27 ns/iter (+/- 1)
test bench_00100_naive       ... bench:          54 ns/iter (+/- 1)
test bench_00120_hyper       ... bench:          28 ns/iter (+/- 1)
test bench_00120_naive       ... bench:          64 ns/iter (+/- 1)
test bench_00140_hyper       ... bench:          32 ns/iter (+/- 3)
test bench_00140_naive       ... bench:          75 ns/iter (+/- 1)
test bench_00170_hyper       ... bench:          33 ns/iter (+/- 1)
test bench_00170_naive       ... bench:         100 ns/iter (+/- 3)
test bench_00210_hyper       ... bench:          36 ns/iter (+/- 1)
test bench_00210_naive       ... bench:         121 ns/iter (+/- 3)
test bench_00250_hyper       ... bench:          41 ns/iter (+/- 24)
test bench_00250_naive       ... bench:         142 ns/iter (+/- 3)
test bench_00300_hyper       ... bench:          43 ns/iter (+/- 1)
test bench_00300_naive       ... bench:         170 ns/iter (+/- 4)
test bench_00400_hyper       ... bench:          48 ns/iter (+/- 2)
test bench_00400_naive       ... bench:         220 ns/iter (+/- 25)
test bench_00500_hyper       ... bench:          58 ns/iter (+/- 2)
test bench_00500_naive       ... bench:         274 ns/iter (+/- 8)
test bench_00600_hyper       ... bench:          63 ns/iter (+/- 1)
test bench_00600_naive       ... bench:         324 ns/iter (+/- 17)
test bench_00700_hyper       ... bench:          74 ns/iter (+/- 9)
test bench_00700_naive       ... bench:         376 ns/iter (+/- 59)
test bench_00800_hyper       ... bench:          83 ns/iter (+/- 3)
test bench_00800_naive       ... bench:         425 ns/iter (+/- 16)
test bench_00900_hyper       ... bench:          95 ns/iter (+/- 5)
test bench_00900_naive       ... bench:         479 ns/iter (+/- 53)
test bench_01000_hyper       ... bench:          99 ns/iter (+/- 2)
test bench_01000_naive       ... bench:         530 ns/iter (+/- 17)
test bench_01200_hyper       ... bench:         115 ns/iter (+/- 3)
test bench_01200_naive       ... bench:         649 ns/iter (+/- 300)
test bench_01400_hyper       ... bench:         130 ns/iter (+/- 1)
test bench_01400_naive       ... bench:         736 ns/iter (+/- 31)
test bench_01700_hyper       ... bench:         153 ns/iter (+/- 17)
test bench_01700_naive       ... bench:         889 ns/iter (+/- 35)
test bench_02100_hyper       ... bench:         194 ns/iter (+/- 3)
test bench_02100_naive       ... bench:       1,093 ns/iter (+/- 56)
test bench_02500_hyper       ... bench:         212 ns/iter (+/- 16)
test bench_02500_naive       ... bench:       1,301 ns/iter (+/- 58)
test bench_03000_hyper       ... bench:         263 ns/iter (+/- 9)
test bench_03000_naive       ... bench:       1,557 ns/iter (+/- 43)
test bench_04000_hyper       ... bench:         338 ns/iter (+/- 12)
test bench_04000_naive       ... bench:       2,073 ns/iter (+/- 52)
test bench_05000_hyper       ... bench:         432 ns/iter (+/- 99)
test bench_05000_naive       ... bench:       2,583 ns/iter (+/- 22)
test bench_06000_hyper       ... bench:         501 ns/iter (+/- 24)
test bench_06000_naive       ... bench:       3,096 ns/iter (+/- 79)
test bench_07000_hyper       ... bench:         595 ns/iter (+/- 47)
test bench_07000_naive       ... bench:       3,610 ns/iter (+/- 82)
test bench_08000_hyper       ... bench:         671 ns/iter (+/- 26)
test bench_08000_naive       ... bench:       4,124 ns/iter (+/- 429)
test bench_09000_hyper       ... bench:         760 ns/iter (+/- 82)
test bench_09000_naive       ... bench:       4,637 ns/iter (+/- 79)
test bench_10000_hyper       ... bench:         833 ns/iter (+/- 13)
test bench_10000_naive       ... bench:       5,150 ns/iter (+/- 101)
test bench_12000_hyper       ... bench:       1,000 ns/iter (+/- 28)
test bench_12000_naive       ... bench:       6,178 ns/iter (+/- 193)
test bench_14000_hyper       ... bench:       1,170 ns/iter (+/- 64)
test bench_14000_naive       ... bench:       7,205 ns/iter (+/- 146)
test bench_17000_hyper       ... bench:       1,431 ns/iter (+/- 80)
test bench_17000_naive       ... bench:       8,751 ns/iter (+/- 992)
test bench_21000_hyper       ... bench:       1,765 ns/iter (+/- 106)
test bench_21000_naive       ... bench:      10,802 ns/iter (+/- 519)
test bench_25000_hyper       ... bench:       2,091 ns/iter (+/- 60)
test bench_25000_naive       ... bench:      12,877 ns/iter (+/- 354)
test bench_30000_hyper       ... bench:       2,490 ns/iter (+/- 98)
test bench_30000_naive       ... bench:      15,466 ns/iter (+/- 327)
test bench_big_0100000_hyper ... bench:       8,449 ns/iter (+/- 230)
test bench_big_0100000_naive ... bench:      51,506 ns/iter (+/- 1,635)
test bench_big_1000000_hyper ... bench:      86,546 ns/iter (+/- 4,126)
test bench_big_1000000_naive ... bench:     514,333 ns/iter (+/- 12,723)

after w/ simd

test bench_00000_hyper       ... bench:           4 ns/iter (+/- 0)
test bench_00000_naive       ... bench:           2 ns/iter (+/- 0)
test bench_00010_hyper       ... bench:          10 ns/iter (+/- 1)
test bench_00010_naive       ... bench:           8 ns/iter (+/- 0)
test bench_00020_hyper       ... bench:          16 ns/iter (+/- 0)
test bench_00020_naive       ... bench:          13 ns/iter (+/- 0)
test bench_00030_hyper       ... bench:          21 ns/iter (+/- 1)
test bench_00030_naive       ... bench:          19 ns/iter (+/- 0)
test bench_00040_hyper       ... bench:          22 ns/iter (+/- 19)
test bench_00040_naive       ... bench:          23 ns/iter (+/- 1)
test bench_00050_hyper       ... bench:          19 ns/iter (+/- 1)
test bench_00050_naive       ... bench:          28 ns/iter (+/- 4)
test bench_00060_hyper       ... bench:          24 ns/iter (+/- 1)
test bench_00060_naive       ... bench:          34 ns/iter (+/- 2)
test bench_00070_hyper       ... bench:          23 ns/iter (+/- 5)
test bench_00070_naive       ... bench:          39 ns/iter (+/- 3)
test bench_00080_hyper       ... bench:          18 ns/iter (+/- 2)
test bench_00080_naive       ... bench:          48 ns/iter (+/- 16)
test bench_00090_hyper       ... bench:          24 ns/iter (+/- 2)
test bench_00090_naive       ... bench:          49 ns/iter (+/- 1)
test bench_00100_hyper       ... bench:          22 ns/iter (+/- 2)
test bench_00100_naive       ... bench:          54 ns/iter (+/- 2)
test bench_00120_hyper       ... bench:          24 ns/iter (+/- 0)
test bench_00120_naive       ... bench:          64 ns/iter (+/- 1)
test bench_00140_hyper       ... bench:          26 ns/iter (+/- 2)
test bench_00140_naive       ... bench:          75 ns/iter (+/- 4)
test bench_00170_hyper       ... bench:          26 ns/iter (+/- 0)
test bench_00170_naive       ... bench:          95 ns/iter (+/- 8)
test bench_00210_hyper       ... bench:          23 ns/iter (+/- 1)
test bench_00210_naive       ... bench:         121 ns/iter (+/- 2)
test bench_00250_hyper       ... bench:          29 ns/iter (+/- 0)
test bench_00250_naive       ... bench:         142 ns/iter (+/- 5)
test bench_00300_hyper       ... bench:          30 ns/iter (+/- 1)
test bench_00300_naive       ... bench:         170 ns/iter (+/- 5)
test bench_00400_hyper       ... bench:          25 ns/iter (+/- 0)
test bench_00400_naive       ... bench:         220 ns/iter (+/- 18)
test bench_00500_hyper       ... bench:          32 ns/iter (+/- 1)
test bench_00500_naive       ... bench:         274 ns/iter (+/- 33)
test bench_00600_hyper       ... bench:          35 ns/iter (+/- 0)
test bench_00600_naive       ... bench:         323 ns/iter (+/- 5)
test bench_00700_hyper       ... bench:          40 ns/iter (+/- 1)
test bench_00700_naive       ... bench:         398 ns/iter (+/- 65)
test bench_00800_hyper       ... bench:          34 ns/iter (+/- 3)
test bench_00800_naive       ... bench:         425 ns/iter (+/- 11)
test bench_00900_hyper       ... bench:          42 ns/iter (+/- 2)
test bench_00900_naive       ... bench:         479 ns/iter (+/- 10)
test bench_01000_hyper       ... bench:          46 ns/iter (+/- 4)
test bench_01000_naive       ... bench:         529 ns/iter (+/- 24)
test bench_01200_hyper       ... bench:          46 ns/iter (+/- 1)
test bench_01200_naive       ... bench:         634 ns/iter (+/- 21)
test bench_01400_hyper       ... bench:          62 ns/iter (+/- 6)
test bench_01400_naive       ... bench:         734 ns/iter (+/- 35)
test bench_01700_hyper       ... bench:          63 ns/iter (+/- 2)
test bench_01700_naive       ... bench:         890 ns/iter (+/- 26)
test bench_02100_hyper       ... bench:          74 ns/iter (+/- 3)
test bench_02100_naive       ... bench:       1,093 ns/iter (+/- 27)
test bench_02500_hyper       ... bench:          88 ns/iter (+/- 2)
test bench_02500_naive       ... bench:       1,302 ns/iter (+/- 49)
test bench_03000_hyper       ... bench:         104 ns/iter (+/- 13)
test bench_03000_naive       ... bench:       1,558 ns/iter (+/- 57)
test bench_04000_hyper       ... bench:         129 ns/iter (+/- 7)
test bench_04000_naive       ... bench:       2,069 ns/iter (+/- 103)
test bench_05000_hyper       ... bench:         187 ns/iter (+/- 13)
test bench_05000_naive       ... bench:       2,583 ns/iter (+/- 904)
test bench_06000_hyper       ... bench:         236 ns/iter (+/- 9)
test bench_06000_naive       ... bench:       3,099 ns/iter (+/- 88)
test bench_07000_hyper       ... bench:         253 ns/iter (+/- 6)
test bench_07000_naive       ... bench:       3,610 ns/iter (+/- 28)
test bench_08000_hyper       ... bench:         269 ns/iter (+/- 9)
test bench_08000_naive       ... bench:       4,124 ns/iter (+/- 147)
test bench_09000_hyper       ... bench:         319 ns/iter (+/- 33)
test bench_09000_naive       ... bench:       4,637 ns/iter (+/- 103)
test bench_10000_hyper       ... bench:         335 ns/iter (+/- 9)
test bench_10000_naive       ... bench:       5,151 ns/iter (+/- 40)
test bench_12000_hyper       ... bench:         398 ns/iter (+/- 16)
test bench_12000_naive       ... bench:       6,177 ns/iter (+/- 144)
test bench_14000_hyper       ... bench:         464 ns/iter (+/- 8)
test bench_14000_naive       ... bench:       7,205 ns/iter (+/- 186)
test bench_17000_hyper       ... bench:         572 ns/iter (+/- 44)
test bench_17000_naive       ... bench:       8,747 ns/iter (+/- 372)
test bench_21000_hyper       ... bench:         678 ns/iter (+/- 5)
test bench_21000_naive       ... bench:      10,831 ns/iter (+/- 562)
test bench_25000_hyper       ... bench:         806 ns/iter (+/- 81)
test bench_25000_naive       ... bench:      13,322 ns/iter (+/- 1,565)
test bench_30000_hyper       ... bench:         955 ns/iter (+/- 33)
test bench_30000_naive       ... bench:      15,425 ns/iter (+/- 1,549)
test bench_big_0100000_hyper ... bench:       3,322 ns/iter (+/- 154)
test bench_big_0100000_naive ... bench:      51,424 ns/iter (+/- 3,500)
test bench_big_1000000_hyper ... bench:      37,995 ns/iter (+/- 991)
test bench_big_1000000_naive ... bench:     514,194 ns/iter (+/- 17,335)

@Veedrac
Copy link
Copy Markdown
Collaborator Author

Veedrac commented Dec 4, 2016

Thanks for the timings. I see no regressions, and some nice wins.

@llogiq
Copy link
Copy Markdown
Owner

llogiq commented Dec 4, 2016

Yes, I'm also happy about the wins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use naive count below ~75 elements in haystack

2 participants