Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: improve performance of IndexByte on older processors #14059

randall77 opened this issue Jan 21, 2016 · 1 comment


Copy link

commented Jan 21, 2016

Pre-avx2 processors use a loop of sse operations to do IndexByte. They use an unaligned load to do so. There may be a significant performance win by aligning the loads. See the comments at the end of

@randall77 randall77 added this to the Go1.7 milestone Jan 21, 2016
@bradfitz bradfitz changed the title Improve performance of IndexByte on older processors runtime: improve performance of IndexByte on older processors Jan 21, 2016

This comment has been minimized.

Copy link

commented Jan 25, 2016

So how expensive is MOVOU compared to MOVO when everything is aligned?

Slow down is present only pre-Nehalem, everything newer is not affected.
Experimental data confirms this: Xeon X5450 shows some slowdown, but not Xeon E5630 (Nehalem) or anything newer:

name old speed new speed delta
IndexByte32-8 3.40GB/s ± 1% 3.11GB/s ± 3% -8.34% (p=0.000 n=17+18)
IndexByte4K-8 11.2GB/s ± 4% 11.1GB/s ± 4% -0.75% (p=0.033 n=20+20)
IndexByte4M-8 11.7GB/s ± 4% 11.5GB/s ± 6% -1.82% (p=0.016 n=18+20)
IndexByte64M-8 11.7GB/s ± 5% 11.5GB/s ± 4% -1.47% (p=0.008 n=16+17)

@randall77 randall77 modified the milestones: Go1.8, Go1.7 Apr 29, 2016
@quentinmit quentinmit added the NeedsFix label Oct 11, 2016
@rsc rsc modified the milestones: Go1.9, Go1.8 Nov 11, 2016
@randall77 randall77 modified the milestones: Go1.10, Go1.9 May 31, 2017
@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017
@bradfitz bradfitz modified the milestones: Go1.11, Unplanned May 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
5 participants
You can’t perform that action at this time.