Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inflate: Limit variable shifts #274

Merged
merged 2 commits into from
Aug 18, 2020
Merged

inflate: Limit variable shifts #274

merged 2 commits into from
Aug 18, 2020

Conversation

klauspost
Copy link
Owner

Use and operations to speed up variable shifts.

Faster on AMD64:

benchmark                               old ns/op     new ns/op     delta
BenchmarkDecodeDigitsSpeed1e4-32        57027         56892         -0.24%
BenchmarkDecodeDigitsSpeed1e5-32        657866        650408        -1.13%
BenchmarkDecodeDigitsSpeed1e6-32        6679774       6425893       -3.80%
BenchmarkDecodeDigitsDefault1e4-32      62810         61858         -1.52%
BenchmarkDecodeDigitsDefault1e5-32      657865        628677        -4.44%
BenchmarkDecodeDigitsDefault1e6-32      6486343       6211232       -4.24%
BenchmarkDecodeDigitsCompress1e4-32     62169         61555         -0.99%
BenchmarkDecodeDigitsCompress1e5-32     677789        668714        -1.34%
BenchmarkDecodeDigitsCompress1e6-32     6851431       6685226       -2.43%
BenchmarkDecodeTwainSpeed1e4-32         60606         59003         -2.64%
BenchmarkDecodeTwainSpeed1e5-32         628151        609357        -2.99%
BenchmarkDecodeTwainSpeed1e6-32         6238098       6015035       -3.58%
BenchmarkDecodeTwainDefault1e4-32       59901         59167         -1.23%
BenchmarkDecodeTwainDefault1e5-32       576772        561311        -2.68%
BenchmarkDecodeTwainDefault1e6-32       5701418       5479259       -3.90%
BenchmarkDecodeTwainCompress1e4-32      58582         56825         -3.00%
BenchmarkDecodeTwainCompress1e5-32      535572        515826        -3.69%
BenchmarkDecodeTwainCompress1e6-32      5265486       5090632       -3.32%
BenchmarkDecodeRandomSpeed1e4-32        323           319           -1.24%
BenchmarkDecodeRandomSpeed1e5-32        1954          1945          -0.46%
BenchmarkDecodeRandomSpeed1e6-32        20016         20026         +0.05%

benchmark                               old MB/s     new MB/s     speedup
BenchmarkDecodeDigitsSpeed1e4-32        175.35       175.77       1.00x
BenchmarkDecodeDigitsSpeed1e5-32        152.01       153.75       1.01x
BenchmarkDecodeDigitsSpeed1e6-32        149.71       155.62       1.04x
BenchmarkDecodeDigitsDefault1e4-32      159.21       161.66       1.02x
BenchmarkDecodeDigitsDefault1e5-32      152.01       159.06       1.05x
BenchmarkDecodeDigitsDefault1e6-32      154.17       161.00       1.04x
BenchmarkDecodeDigitsCompress1e4-32     160.85       162.46       1.01x
BenchmarkDecodeDigitsCompress1e5-32     147.54       149.54       1.01x
BenchmarkDecodeDigitsCompress1e6-32     145.95       149.58       1.02x
BenchmarkDecodeTwainSpeed1e4-32         165.00       169.48       1.03x
BenchmarkDecodeTwainSpeed1e5-32         159.20       164.11       1.03x
BenchmarkDecodeTwainSpeed1e6-32         160.31       166.25       1.04x
BenchmarkDecodeTwainDefault1e4-32       166.94       169.01       1.01x
BenchmarkDecodeTwainDefault1e5-32       173.38       178.15       1.03x
BenchmarkDecodeTwainDefault1e6-32       175.39       182.51       1.04x
BenchmarkDecodeTwainCompress1e4-32      170.70       175.98       1.03x
BenchmarkDecodeTwainCompress1e5-32      186.72       193.86       1.04x
BenchmarkDecodeTwainCompress1e6-32      189.92       196.44       1.03x
BenchmarkDecodeRandomSpeed1e4-32        30915.66     31375.28     1.01x
BenchmarkDecodeRandomSpeed1e5-32        51177.19     51408.19     1.00x
BenchmarkDecodeRandomSpeed1e6-32        49958.99     49936.11     1.00x

Use and operations to speed up variable shifts.

Faster on AMD64:

```
benchmark                               old ns/op     new ns/op     delta
BenchmarkDecodeDigitsSpeed1e4-32        57027         56892         -0.24%
BenchmarkDecodeDigitsSpeed1e5-32        657866        650408        -1.13%
BenchmarkDecodeDigitsSpeed1e6-32        6679774       6425893       -3.80%
BenchmarkDecodeDigitsDefault1e4-32      62810         61858         -1.52%
BenchmarkDecodeDigitsDefault1e5-32      657865        628677        -4.44%
BenchmarkDecodeDigitsDefault1e6-32      6486343       6211232       -4.24%
BenchmarkDecodeDigitsCompress1e4-32     62169         61555         -0.99%
BenchmarkDecodeDigitsCompress1e5-32     677789        668714        -1.34%
BenchmarkDecodeDigitsCompress1e6-32     6851431       6685226       -2.43%
BenchmarkDecodeTwainSpeed1e4-32         60606         59003         -2.64%
BenchmarkDecodeTwainSpeed1e5-32         628151        609357        -2.99%
BenchmarkDecodeTwainSpeed1e6-32         6238098       6015035       -3.58%
BenchmarkDecodeTwainDefault1e4-32       59901         59167         -1.23%
BenchmarkDecodeTwainDefault1e5-32       576772        561311        -2.68%
BenchmarkDecodeTwainDefault1e6-32       5701418       5479259       -3.90%
BenchmarkDecodeTwainCompress1e4-32      58582         56825         -3.00%
BenchmarkDecodeTwainCompress1e5-32      535572        515826        -3.69%
BenchmarkDecodeTwainCompress1e6-32      5265486       5090632       -3.32%
BenchmarkDecodeRandomSpeed1e4-32        323           319           -1.24%
BenchmarkDecodeRandomSpeed1e5-32        1954          1945          -0.46%
BenchmarkDecodeRandomSpeed1e6-32        20016         20026         +0.05%

benchmark                               old MB/s     new MB/s     speedup
BenchmarkDecodeDigitsSpeed1e4-32        175.35       175.77       1.00x
BenchmarkDecodeDigitsSpeed1e5-32        152.01       153.75       1.01x
BenchmarkDecodeDigitsSpeed1e6-32        149.71       155.62       1.04x
BenchmarkDecodeDigitsDefault1e4-32      159.21       161.66       1.02x
BenchmarkDecodeDigitsDefault1e5-32      152.01       159.06       1.05x
BenchmarkDecodeDigitsDefault1e6-32      154.17       161.00       1.04x
BenchmarkDecodeDigitsCompress1e4-32     160.85       162.46       1.01x
BenchmarkDecodeDigitsCompress1e5-32     147.54       149.54       1.01x
BenchmarkDecodeDigitsCompress1e6-32     145.95       149.58       1.02x
BenchmarkDecodeTwainSpeed1e4-32         165.00       169.48       1.03x
BenchmarkDecodeTwainSpeed1e5-32         159.20       164.11       1.03x
BenchmarkDecodeTwainSpeed1e6-32         160.31       166.25       1.04x
BenchmarkDecodeTwainDefault1e4-32       166.94       169.01       1.01x
BenchmarkDecodeTwainDefault1e5-32       173.38       178.15       1.03x
BenchmarkDecodeTwainDefault1e6-32       175.39       182.51       1.04x
BenchmarkDecodeTwainCompress1e4-32      170.70       175.98       1.03x
BenchmarkDecodeTwainCompress1e5-32      186.72       193.86       1.04x
BenchmarkDecodeTwainCompress1e6-32      189.92       196.44       1.03x
BenchmarkDecodeRandomSpeed1e4-32        30915.66     31375.28     1.01x
BenchmarkDecodeRandomSpeed1e5-32        51177.19     51408.19     1.00x
BenchmarkDecodeRandomSpeed1e6-32        49958.99     49936.11     1.00x
```
@klauspost
Copy link
Owner Author

Could be refined, but better for now.

@klauspost klauspost merged commit f5ee0f4 into master Aug 18, 2020
@klauspost klauspost deleted the faster-inflate-shifts branch August 18, 2020 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant