Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math/bits: Reverse8, Reverse16 can use table lookup #19279

Closed
rsc opened this issue Feb 24, 2017 · 4 comments
Closed

math/bits: Reverse8, Reverse16 can use table lookup #19279

rsc opened this issue Feb 24, 2017 · 4 comments
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Feb 24, 2017

bits.Reverse8 can use table lookup - the table is in compress/flate as reverseByte.

bits.Reverse16 can do reverseByte[(x>>16)&0xFF] | reverseByte[x&0xFF]<<16.

I experimented with these last week and they came out significantly faster than the current shifty implementations. Using table vs the current shifts for Reverse32 was almost exactly the same and probably not worth changing.

Probably worth doing, but I don't have time to do proper benchmarking at the moment.

/cc @griesemer

@rsc rsc added this to the Go1.9 milestone Feb 24, 2017
@griesemer
Copy link
Contributor

griesemer commented Feb 24, 2017

@rsc I have experimented with this as well. It's true that the table lookups are quite a bit faster if measured alone. (I even have the generator code to create the tables for reverse, population count, and leading/trailing zeroes for 8 bit values.) But if the table is not in cache, I bet it's quite a bit slower. (On the other hand, if the table is not in cache, the function is probably not hot and thus it may not matter).

@griesemer griesemer self-assigned this Feb 24, 2017
@gopherbot
Copy link

gopherbot commented Feb 25, 2017

CL https://golang.org/cl/37459 mentions this issue.

@gopherbot
Copy link

gopherbot commented Feb 25, 2017

CL https://golang.org/cl/37460 mentions this issue.

@rasky
Copy link
Member

rasky commented Feb 25, 2017

FWIW, the same applies to OnesCount8 and OnesCount16.

gopherbot pushed a commit that referenced this issue Mar 1, 2017
…tion

No measurable impact on performance (specifically, no degradation).
Reverse is used in Huffman en/de-coding. For completeness, here are
all the speed-related benchmark results:

name                             old time/op    new time/op    delta
Decode/Digits/Huffman/1e4-8         181µs ± 0%     178µs ± 1%   ~             (p=0.100 n=3+3)
Decode/Digits/Huffman/1e5-8        1.60ms ± 3%    1.56ms ± 3%   ~             (p=0.400 n=3+3)
Decode/Digits/Huffman/1e6-8        15.7ms ± 1%    15.3ms ± 3%   ~             (p=0.700 n=3+3)
Decode/Digits/Speed/1e4-8           179µs ± 0%     180µs ± 0%   ~             (p=0.200 n=3+3)
Decode/Digits/Speed/1e5-8          1.68ms ± 0%    1.66ms ± 3%   ~             (p=0.700 n=3+3)
Decode/Digits/Speed/1e6-8          16.6ms ± 2%    16.6ms ± 5%   ~             (p=0.700 n=3+3)
Decode/Digits/Default/1e4-8         179µs ± 1%     178µs ± 1%   ~             (p=0.700 n=3+3)
Decode/Digits/Default/1e5-8        1.62ms ± 3%    1.62ms ± 4%   ~             (p=1.000 n=3+3)
Decode/Digits/Default/1e6-8        16.0ms ± 2%    16.0ms ± 3%   ~             (p=1.000 n=3+3)
Decode/Digits/Compression/1e4-8     179µs ± 1%     179µs ± 0%   ~             (p=0.200 n=3+3)
Decode/Digits/Compression/1e5-8    1.62ms ± 2%    1.62ms ± 3%   ~             (p=1.000 n=3+3)
Decode/Digits/Compression/1e6-8    16.1ms ± 3%    16.0ms ± 3%   ~             (p=1.000 n=3+3)
Decode/Twain/Huffman/1e4-8          205µs ± 2%     207µs ± 1%   ~             (p=1.000 n=3+3)
Decode/Twain/Huffman/1e5-8         1.77ms ± 2%    1.77ms ± 4%   ~             (p=0.700 n=3+3)
Decode/Twain/Huffman/1e6-8         17.4ms ± 2%    17.4ms ± 3%   ~             (p=1.000 n=3+3)
Decode/Twain/Speed/1e4-8            186µs ± 1%     186µs ± 1%   ~             (p=0.400 n=3+3)
Decode/Twain/Speed/1e5-8           1.53ms ± 2%    1.52ms ± 0%   ~             (p=0.700 n=3+3)
Decode/Twain/Speed/1e6-8           14.9ms ± 1%    14.8ms ± 1%   ~             (p=1.000 n=3+3)
Decode/Twain/Default/1e4-8          176µs ± 1%     174µs ± 0%   ~             (p=0.200 n=3+3)
Decode/Twain/Default/1e5-8         1.30ms ± 2%    1.31ms ± 1%   ~             (p=0.700 n=3+3)
Decode/Twain/Default/1e6-8         12.6ms ± 3%    12.5ms ± 0%   ~             (p=0.700 n=3+3)
Decode/Twain/Compression/1e4-8      177µs ± 0%     174µs ± 1%   ~             (p=0.100 n=3+3)
Decode/Twain/Compression/1e5-8     1.30ms ± 1%    1.31ms ± 0%   ~             (p=0.700 n=3+3)
Decode/Twain/Compression/1e6-8     12.5ms ± 1%    12.5ms ± 1%   ~             (p=1.000 n=3+3)
Encode/Digits/Huffman/1e4-8        47.4µs ± 1%    46.5µs ± 0%   ~             (p=0.100 n=3+3)
Encode/Digits/Huffman/1e5-8         453µs ± 2%     446µs ± 1%   ~             (p=0.700 n=3+3)
Encode/Digits/Huffman/1e6-8        4.44ms ± 3%    4.39ms ± 0%   ~             (p=1.000 n=3+3)
Encode/Digits/Speed/1e4-8           190µs ± 4%     185µs ± 0%   ~             (p=0.100 n=3+3)
Encode/Digits/Speed/1e5-8          1.78ms ± 5%    1.75ms ± 1%   ~             (p=1.000 n=3+3)
Encode/Digits/Speed/1e6-8          17.9ms ± 7%    17.3ms ± 1%   ~             (p=0.400 n=3+3)
Encode/Digits/Default/1e4-8         366µs ± 1%     361µs ± 0%   ~             (p=0.200 n=3+3)
Encode/Digits/Default/1e5-8        5.58ms ± 5%    5.44ms ± 1%   ~             (p=0.400 n=3+3)
Encode/Digits/Default/1e6-8        59.0ms ± 3%    58.2ms ± 1%   ~             (p=0.700 n=3+3)
Encode/Digits/Compression/1e4-8     369µs ± 3%     362µs ± 0%   ~             (p=0.100 n=3+3)
Encode/Digits/Compression/1e5-8    5.50ms ± 2%    5.47ms ± 1%   ~             (p=1.000 n=3+3)
Encode/Digits/Compression/1e6-8    59.4ms ± 2%    58.5ms ± 1%   ~             (p=0.400 n=3+3)
Encode/Twain/Huffman/1e4-8         64.4µs ± 3%    64.7µs ± 1%   ~             (p=0.700 n=3+3)
Encode/Twain/Huffman/1e5-8          526µs ± 1%     526µs ± 2%   ~             (p=1.000 n=3+3)
Encode/Twain/Huffman/1e6-8         5.18ms ± 2%    5.17ms ± 1%   ~             (p=0.700 n=3+3)
Encode/Twain/Speed/1e4-8            206µs ± 1%     204µs ± 0%   ~             (p=0.100 n=3+3)
Encode/Twain/Speed/1e5-8           1.73ms ± 2%    1.70ms ± 0%   ~             (p=0.100 n=3+3)
Encode/Twain/Speed/1e6-8           16.7ms ± 0%    16.7ms ± 1%   ~             (p=1.000 n=3+3)
Encode/Twain/Default/1e4-8          423µs ± 3%     418µs ± 1%   ~             (p=1.000 n=3+3)
Encode/Twain/Default/1e5-8         6.34ms ± 4%    6.23ms ± 0%   ~             (p=1.000 n=3+3)
Encode/Twain/Default/1e6-8         68.0ms ± 3%    67.5ms ± 0%   ~             (p=0.700 n=3+3)
Encode/Twain/Compression/1e4-8      435µs ± 3%     424µs ± 0%   ~             (p=0.700 n=3+3)
Encode/Twain/Compression/1e5-8     7.01ms ± 1%    6.92ms ± 0%   ~             (p=0.100 n=3+3)
Encode/Twain/Compression/1e6-8     77.1ms ± 4%    75.5ms ± 1%   ~             (p=0.400 n=3+3)

name                             old speed      new speed      delta
Decode/Digits/Huffman/1e4-8      55.2MB/s ± 0%  56.2MB/s ± 1%   ~             (p=0.100 n=3+3)
Decode/Digits/Huffman/1e5-8      62.4MB/s ± 3%  64.1MB/s ± 3%   ~             (p=0.400 n=3+3)
Decode/Digits/Huffman/1e6-8      63.8MB/s ± 1%  65.3MB/s ± 3%   ~             (p=0.700 n=3+3)
Decode/Digits/Speed/1e4-8        55.8MB/s ± 0%  55.4MB/s ± 0%   ~             (p=0.200 n=3+3)
Decode/Digits/Speed/1e5-8        59.6MB/s ± 0%  60.3MB/s ± 3%   ~             (p=0.700 n=3+3)
Decode/Digits/Speed/1e6-8        60.1MB/s ± 2%  60.3MB/s ± 4%   ~             (p=0.700 n=3+3)
Decode/Digits/Default/1e4-8      55.8MB/s ± 1%  56.1MB/s ± 1%   ~             (p=0.700 n=3+3)
Decode/Digits/Default/1e5-8      61.8MB/s ± 3%  61.7MB/s ± 4%   ~             (p=1.000 n=3+3)
Decode/Digits/Default/1e6-8      62.4MB/s ± 2%  62.4MB/s ± 3%   ~             (p=1.000 n=3+3)
Decode/Digits/Compression/1e4-8  55.7MB/s ± 1%  56.0MB/s ± 0%   ~             (p=0.300 n=3+3)
Decode/Digits/Compression/1e5-8  61.7MB/s ± 2%  61.9MB/s ± 3%   ~             (p=1.000 n=3+3)
Decode/Digits/Compression/1e6-8  62.2MB/s ± 3%  62.6MB/s ± 3%   ~             (p=1.000 n=3+3)
Decode/Twain/Huffman/1e4-8       48.8MB/s ± 2%  48.4MB/s ± 1%   ~             (p=1.000 n=3+3)
Decode/Twain/Huffman/1e5-8       56.4MB/s ± 2%  56.6MB/s ± 4%   ~             (p=0.700 n=3+3)
Decode/Twain/Huffman/1e6-8       57.6MB/s ± 2%  57.5MB/s ± 3%   ~             (p=1.000 n=3+3)
Decode/Twain/Speed/1e4-8         53.7MB/s ± 1%  53.9MB/s ± 1%   ~             (p=0.400 n=3+3)
Decode/Twain/Speed/1e5-8         65.5MB/s ± 2%  65.6MB/s ± 0%   ~             (p=0.700 n=3+3)
Decode/Twain/Speed/1e6-8         66.9MB/s ± 1%  67.4MB/s ± 1%   ~             (p=1.000 n=3+3)
Decode/Twain/Default/1e4-8       56.9MB/s ± 1%  57.3MB/s ± 0%   ~             (p=0.200 n=3+3)
Decode/Twain/Default/1e5-8       77.2MB/s ± 2%  76.6MB/s ± 1%   ~             (p=0.700 n=3+3)
Decode/Twain/Default/1e6-8       79.3MB/s ± 3%  80.0MB/s ± 0%   ~             (p=0.700 n=3+3)
Decode/Twain/Compression/1e4-8   56.4MB/s ± 0%  57.5MB/s ± 1%   ~             (p=0.100 n=3+3)
Decode/Twain/Compression/1e5-8   76.8MB/s ± 1%  76.5MB/s ± 0%   ~             (p=0.700 n=3+3)
Decode/Twain/Compression/1e6-8   80.1MB/s ± 1%  79.8MB/s ± 1%   ~             (p=1.000 n=3+3)
Encode/Digits/Huffman/1e4-8       211MB/s ± 1%   215MB/s ± 0%   ~             (p=0.100 n=3+3)
Encode/Digits/Huffman/1e5-8       221MB/s ± 2%   224MB/s ± 1%   ~             (p=0.700 n=3+3)
Encode/Digits/Huffman/1e6-8       225MB/s ± 3%   228MB/s ± 0%   ~             (p=1.000 n=3+3)
Encode/Digits/Speed/1e4-8        52.8MB/s ± 4%  54.1MB/s ± 0%   ~             (p=0.100 n=3+3)
Encode/Digits/Speed/1e5-8        56.2MB/s ± 5%  57.0MB/s ± 1%   ~             (p=1.000 n=3+3)
Encode/Digits/Speed/1e6-8        56.0MB/s ± 6%  57.7MB/s ± 1%   ~             (p=0.400 n=3+3)
Encode/Digits/Default/1e4-8      27.3MB/s ± 1%  27.7MB/s ± 0%   ~             (p=0.200 n=3+3)
Encode/Digits/Default/1e5-8      17.9MB/s ± 4%  18.4MB/s ± 1%   ~             (p=0.400 n=3+3)
Encode/Digits/Default/1e6-8      17.0MB/s ± 3%  17.2MB/s ± 1%   ~             (p=0.500 n=3+3)
Encode/Digits/Compression/1e4-8  27.1MB/s ± 3%  27.6MB/s ± 0%   ~             (p=0.100 n=3+3)
Encode/Digits/Compression/1e5-8  18.2MB/s ± 2%  18.3MB/s ± 1%   ~             (p=1.000 n=3+3)
Encode/Digits/Compression/1e6-8  16.9MB/s ± 2%  17.1MB/s ± 1%   ~             (p=0.400 n=3+3)
Encode/Twain/Huffman/1e4-8        155MB/s ± 3%   155MB/s ± 1%   ~             (p=0.700 n=3+3)
Encode/Twain/Huffman/1e5-8        190MB/s ± 1%   190MB/s ± 2%   ~             (p=1.000 n=3+3)
Encode/Twain/Huffman/1e6-8        193MB/s ± 2%   193MB/s ± 1%   ~             (p=0.700 n=3+3)
Encode/Twain/Speed/1e4-8         48.5MB/s ± 1%  49.1MB/s ± 0%   ~             (p=0.100 n=3+3)
Encode/Twain/Speed/1e5-8         57.7MB/s ± 2%  59.0MB/s ± 0%   ~             (p=0.100 n=3+3)
Encode/Twain/Speed/1e6-8         59.7MB/s ± 0%  59.7MB/s ± 1%   ~             (p=1.000 n=3+3)
Encode/Twain/Default/1e4-8       23.6MB/s ± 3%  23.9MB/s ± 1%   ~             (p=1.000 n=3+3)
Encode/Twain/Default/1e5-8       15.8MB/s ± 4%  16.1MB/s ± 0%   ~             (p=1.000 n=3+3)
Encode/Twain/Default/1e6-8       14.7MB/s ± 3%  14.8MB/s ± 0%   ~             (p=0.700 n=3+3)
Encode/Twain/Compression/1e4-8   23.0MB/s ± 3%  23.6MB/s ± 0%   ~             (p=0.700 n=3+3)
Encode/Twain/Compression/1e5-8   14.3MB/s ± 1%  14.5MB/s ± 0%   ~             (p=0.100 n=3+3)
Encode/Twain/Compression/1e6-8   13.0MB/s ± 4%  13.2MB/s ± 1%   ~             (p=0.400 n=3+3)

Measured on a "quiet" (no browser running) 2.3 GHz Intel Core i7, running macOS 10.12.3.

See also #19279.

Change-Id: Ice759eb34eb37442b543957447c264e0aadc1fa9
Reviewed-on: https://go-review.googlesource.com/37460
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@golang golang locked and limited conversation to collaborators Feb 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants