-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bytes: Compare should use runtime.cmpstring or assembly #5354
Labels
Comments
I have no idea why, but I was in the mood to try implementing this in Assembly (amd64). I'm pretty sure it's far from perfect since I have only written very few Intel 80386 assembly before. It's more or less a direct port of the Go code. Using 64-bit MMX compare or SSE 4.2 instructions [1] should result in much better performance. If someone has a use for it, do what ever you like with it. The Benchmarks (ported from string compare) are also attached. 1: http://www.strchr.com/strcmp_and_strlen_using_sse_4.2 Performance: OLD BenchmarkCompareBytesEqual 50000000 40.8 ns/op BenchmarkCompareBytesIdentical 50000000 40.1 ns/op BenchmarkCompareBytesSameLength 100000000 17.4 ns/op BenchmarkCompareBytesDifferentLength 100000000 17.3 ns/op BenchmarkCompareBytesBigUnaligned 2000 1255571 ns/op 835.15 MB/s BenchmarkCompareBytesBig 2000 1244571 ns/op 842.53 MB/s NEW BenchmarkCompareBytesEqual 100000000 17.5 ns/op BenchmarkCompareBytesIdentical 500000000 4.13 ns/op BenchmarkCompareBytesSameLength 200000000 9.52 ns/op BenchmarkCompareBytesDifferentLength 200000000 8.85 ns/op BenchmarkCompareBytesBigUnaligned 2000 927053 ns/op 1131.10 MB/s BenchmarkCompareBytesBig 2000 927553 ns/op 1130.49 MB/s Attachments:
|
Another try. It has also a bug fixed where the result was 0 if the array pointer was identical but not the length. OLD BenchmarkCompareBytesEqual 50000000 40.3 ns/op BenchmarkCompareBytesToNil 500000000 7.65 ns/op BenchmarkCompareBytesEmpty 200000000 8.23 ns/op BenchmarkCompareBytesIdentical 50000000 39.9 ns/op BenchmarkCompareBytesSameLength 100000000 17.4 ns/op BenchmarkCompareBytesDifferentLength 100000000 17.0 ns/op BenchmarkCompareBytesBigUnaligned 2000 1238070 ns/op 846.95 MB/s BenchmarkCompareBytesBig 2000 1241571 ns/op 844.56 MB/s BenchmarkCompareBytesBigIdentical 2000 1232570 ns/op NEW BenchmarkCompareBytesEqual 100000000 16.4 ns/op BenchmarkCompareBytesToNil 500000000 4.99 ns/op BenchmarkCompareBytesEmpty 500000000 4.69 ns/op BenchmarkCompareBytesIdentical 500000000 4.69 ns/op BenchmarkCompareBytesSameLength 200000000 8.54 ns/op BenchmarkCompareBytesDifferentLength 200000000 9.11 ns/op BenchmarkCompareBytesBigUnaligned 5000 655437 ns/op 1599.83 MB/s BenchmarkCompareBytesBig 5000 642836 ns/op 1631.19 MB/s BenchmarkCompareBytesBigIdentical 500000000 4.69 ns/op And BenchmarkCompareBytesBig for different lengths: OLD len = 1 50000000 39.8 ns/op 351.39 MB/s len = 2 50000000 39.8 ns/op 352.09 MB/s len = 4 50000000 39.7 ns/op 352.45 MB/s len = 8 50000000 39.8 ns/op 351.92 MB/s len = 16 50000000 57.5 ns/op 487.27 MB/s len = 32 50000000 74.0 ns/op 567.69 MB/s len = 64 20000000 106 ns/op 658.48 MB/s len = 128 10000000 188 ns/op 743.45 MB/s len = 256 5000000 335 ns/op 792.09 MB/s len = 512 5000000 632 ns/op 819.31 MB/s len = 1024 1000000 1241 ns/op 834.76 MB/s len = 2048 1000000 2435 ns/op 845.13 MB/s len = 4096 500000 4840 ns/op 847.47 MB/s len = 8192 200000 9660 ns/op 849.23 MB/s len = 16384 100000 19221 ns/op 852.92 MB/s len = 32768 50000 38482 ns/op 851.67 MB/s len = 65536 20000 76704 ns/op 854.55 MB/s len = 131072 10000 153808 ns/op 852.24 MB/s len = 262144 5000 307017 ns/op 853.86 MB/s len = 524288 5000 615835 ns/op 851.36 MB/s len = 1048576 2000 1231070 ns/op 851.77 MB/s NEW len = 1 100000000 16.5 ns/op 846.38 MB/s len = 2 100000000 16.4 ns/op 852.57 MB/s len = 4 100000000 16.4 ns/op 852.57 MB/s len = 8 100000000 16.5 ns/op 848.44 MB/s len = 16 100000000 28.7 ns/op 975.89 MB/s len = 32 50000000 46.9 ns/op 895.85 MB/s len = 64 50000000 63.5 ns/op 1102.99 MB/s len = 128 20000000 105 ns/op 1330.09 MB/s len = 256 10000000 179 ns/op 1481.81 MB/s len = 512 5000000 377 ns/op 1373.93 MB/s len = 1024 5000000 731 ns/op 1415.61 MB/s len = 2048 1000000 1394 ns/op 1476.24 MB/s len = 4096 1000000 2594 ns/op 1581.25 MB/s len = 8192 500000 4998 ns/op 1641.36 MB/s len = 16384 200000 11115 ns/op 1474.86 MB/s len = 32768 100000 19941 ns/op 1643.54 MB/s len = 65536 50000 39982 ns/op 1639.43 MB/s len = 131072 20000 82104 ns/op 1596.52 MB/s len = 262144 10000 162009 ns/op 1618.12 MB/s len = 524288 5000 321418 ns/op 1631.21 MB/s len = 1048576 5000 640836 ns/op 1636.28 MB/s Okay, I stop abusing this issue report now ;) I hope this helps as a basis for a proper CL. I see no reason why the same function couldn't be also used to improve string comparison. Attachments:
|
Julien, I was in the mood also, so I hacked up a patch using SSE to do 16 bytes at a time https://golang.org/cl/8853048 Similar to the memeq implementation I did. It's pretty fast. Still needs more testing and 386/arm implementations. As an aside, it would be nice to have an easy way of writing a single function in assembly for one arch and c (or go) for another arch. Right now it is an all-arch-or-no-arch conversion from c to assembly. I think you could hack it up with +build tags but it would require a lot of files. Maybe per-function +build pragmas? |
Owner changed to @randall77. |
This issue was closed by revision b3946dc. Status changed to Fixed. |
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The text was updated successfully, but these errors were encountered: