Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

huff0: Use bmi1 on GOAMD64=v3 #519

Merged
merged 1 commit into from
Mar 9, 2022
Merged

huff0: Use bmi1 on GOAMD64=v3 #519

merged 1 commit into from
Mar 9, 2022

Conversation

klauspost
Copy link
Owner

Go v1.18 feature. Set GOAMD64=v3 to enable. Nothing worth having separate codepaths for.

Allows breaking dependency chain a bit.

benchmark                                              old ns/op     new ns/op     delta
BenchmarkDecompress4XNoTable/gettysburg/10000-32       11464         11195         -2.35%
BenchmarkDecompress4XNoTable/gettysburg/262143-32      322679        319985        -0.83%
BenchmarkDecompress4XNoTable/twain/10000-32            11505         11238         -2.32%
BenchmarkDecompress4XNoTable/twain/262143-32           373751        370410        -0.89%
BenchmarkDecompress4XNoTable/pngdata.001/10000-32      11957         11461         -4.15%
BenchmarkDecompress4XNoTable/pngdata.001/262143-32     306403        300566        -1.91%

Go v1.18 feature. Set `GOAMD64=v3` to enable. Nothing worth having separate codepaths for.

Allows breaking dependency chain a bit.

```
benchmark                                              old ns/op     new ns/op     delta
BenchmarkDecompress4XNoTable/gettysburg/10000-32       11464         11195         -2.35%
BenchmarkDecompress4XNoTable/gettysburg/262143-32      322679        319985        -0.83%
BenchmarkDecompress4XNoTable/twain/10000-32            11505         11238         -2.32%
BenchmarkDecompress4XNoTable/twain/262143-32           373751        370410        -0.89%
BenchmarkDecompress4XNoTable/pngdata.001/10000-32      11957         11461         -4.15%
BenchmarkDecompress4XNoTable/pngdata.001/262143-32     306403        300566        -1.91%
```
@klauspost
Copy link
Owner Author

klauspost commented Mar 8, 2022

@WojciechMula For tiny tweaks like this, we can use the conditional compilation added in v1.18

I doubt it will be measurable in the final output, so this is an easy for people to get out the last performance.

I don't know if you've seen it, but setting GOAMD64=v3 will enable "v3" features. See list of what that includes here: https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels

@klauspost klauspost merged commit 0ff8ec1 into master Mar 9, 2022
@klauspost klauspost deleted the huff0-use-bmi branch March 9, 2022 08:58
@WojciechMula
Copy link
Contributor

@WojciechMula For tiny tweaks like this, we can use the conditional compilation added in v1.18

I doubt it will be measurable in the final output, so this is an easy for people to get out the last performance.

I don't know if you've seen it, but setting GOAMD64=v3 will enable "v3" features. See list of what that includes here: https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels

Didn't know about that feature, thanks for the info. However, I am more for runtime dispatching, especially that in Go we don't have to deal with multiple calling conventions (in C/C++ it's pain in the neck). Additionally, my tests showed that using a function pointer doesn't affect on performance -- a branch predictor works well.

@klauspost
Copy link
Owner Author

@WojciechMula Yeah, see s2 compression if you want to see that.

However, it is also a tradeoff. bmi is a good example in the s2 code. There are tiny improvements to be had with bmi, but it would double the amount of output assembler from 16k LOC to 32k. Given that the difference is so small I just went for the conditional compilation.

I don't know if you looked into it, but I feel like using avo has significant productivity improvements compared to text templates. I generally feel that it makes your code easier to maintain and refactor, since it allows dynamic registers. But if you aren't comfortable with it, feel free to submit your suggestions as-is, and I will look into converting it.

@WojciechMula
Copy link
Contributor

I don't know if you looked into it, but I feel like using avo has significant productivity improvements compared to text templates. I generally feel that it makes your code easier to maintain and refactor, since it allows dynamic registers. But if you aren't comfortable with it, feel free to submit your suggestions as-is, and I will look into converting it.

I heard about avo, but never used it. Since I already have text templates, I'd like to commit them as-is and when everything will be stable, I'll convert them into avo. It shouldn't be your job. :)

Speaking of code versions: I'll go for a conditional compilation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants