Skip to content

Conversation

@peterdemartini
Copy link
Contributor

Often times sorting algorithms/implementations perform differently depending on the cardinality or pre-existing order on the data. I added benchmarks that take partially sorted data as the input. The results ended up showing that the standard library sorting version performs better than the asm version.

The benchmark results below are ran on my 2021 Apple M1 Pro w/ 16GB RAM machine

version: 1.18.1
go test -short -run=^$ -bench '^(BenchmarkSort8|BenchmarkStdlibSort8)$' github.com/segmentio/asm/qsort
goos: darwin
goarch: arm64
pkg: github.com/segmentio/asm/qsort
BenchmarkSort8/random-100000-10       	     255	   4670990 ns/op	 171.27 MB/s
BenchmarkSort8/partially-ordered(10)-100000-10         	     253	   4668592 ns/op	 171.36 MB/s
BenchmarkSort8/partially-ordered(100)-100000-10        	     255	   4708517 ns/op	 169.90 MB/s
BenchmarkSort8/partially-ordered(1000)-100000-10       	     253	   4706846 ns/op	 169.97 MB/s
BenchmarkSort8/random-1000000-10                       	      20	  56492098 ns/op	 141.61 MB/s
BenchmarkSort8/partially-ordered(10)-1000000-10        	      20	  56584881 ns/op	 141.38 MB/s
BenchmarkSort8/partially-ordered(100)-1000000-10       	      19	  57192875 ns/op	 139.88 MB/s
BenchmarkSort8/partially-ordered(1000)-1000000-10      	      20	  56754398 ns/op	 140.96 MB/s

BenchmarkStdlibSort8/random-100000-10                  	     142	   8397693 ns/op	  95.26 MB/s
BenchmarkStdlibSort8/partially-sorted(10)-100000-10    	    2826	    428726 ns/op	1865.99 MB/s
BenchmarkStdlibSort8/partially-sorted(100)-100000-10   	    2368	    422622 ns/op	1892.94 MB/s
BenchmarkStdlibSort8/partially-sorted(1000)-100000-10  	    2500	    422645 ns/op	1892.84 MB/s
BenchmarkStdlibSort8/random-1000000-10                 	      10	 103331408 ns/op	  77.42 MB/s
BenchmarkStdlibSort8/partially-sorted(10)-1000000-10   	     254	   4605323 ns/op	1737.12 MB/s
BenchmarkStdlibSort8/partially-sorted(100)-1000000-10  	     232	   4569836 ns/op	1750.61 MB/s
BenchmarkStdlibSort8/partially-sorted(1000)-1000000-10 	     249	   4466235 ns/op	1791.22 MB/s
PASS
ok  	github.com/segmentio/asm/qsort	33.526s
version: 1.19-4289bd365c
go test -short -run=^$ -bench '^(BenchmarkSort8|BenchmarkStdlibSort8)$' github.com/segmentio/asm/qsort
goos: darwin
goarch: arm64
pkg: github.com/segmentio/asm/qsort
BenchmarkSort8/random-100000-10       	     255	   4679545 ns/op	 170.96 MB/s
BenchmarkSort8/partially-ordered(10)-100000-10         	     242	   4740724 ns/op	 168.75 MB/s
BenchmarkSort8/partially-ordered(100)-100000-10        	     258	   4732451 ns/op	 169.05 MB/s
BenchmarkSort8/partially-ordered(1000)-100000-10       	     249	   4690960 ns/op	 170.54 MB/s
BenchmarkSort8/random-1000000-10                       	      20	  56512566 ns/op	 141.56 MB/s
BenchmarkSort8/partially-ordered(10)-1000000-10        	      20	  57079027 ns/op	 140.16 MB/s
BenchmarkSort8/partially-ordered(100)-1000000-10       	      20	  57507540 ns/op	 139.11 MB/s
BenchmarkSort8/partially-ordered(1000)-1000000-10      	      20	  56889038 ns/op	 140.62 MB/s

BenchmarkStdlibSort8/random-100000-10                  	     147	   8113216 ns/op	  98.60 MB/s
BenchmarkStdlibSort8/partially-sorted(10)-100000-10    	    5245	    225077 ns/op	3554.34 MB/s
BenchmarkStdlibSort8/partially-sorted(100)-100000-10   	    5302	    224633 ns/op	3561.36 MB/s
BenchmarkStdlibSort8/partially-sorted(1000)-100000-10  	    5253	    226183 ns/op	3536.96 MB/s
BenchmarkStdlibSort8/random-1000000-10                 	      10	 100767362 ns/op	  79.39 MB/s
BenchmarkStdlibSort8/partially-sorted(10)-1000000-10   	     480	   2399693 ns/op	3333.76 MB/s
BenchmarkStdlibSort8/partially-sorted(100)-1000000-10  	     500	   2352194 ns/op	3401.08 MB/s
BenchmarkStdlibSort8/partially-sorted(1000)-1000000-10 	     454	   2335348 ns/op	3425.61 MB/s
PASS
ok  	github.com/segmentio/asm/qsort	37.082s

@achille-roussel
Copy link
Contributor

Ran on amd64, similar results:

$ go test -short -run=^$ -bench '^(BenchmarkSort8|BenchmarkStdlibSort8)$' github.com/segmentio/asm/qsort
goos: linux
goarch: amd64
pkg: github.com/segmentio/asm/qsort
cpu: Intel(R) Xeon(R) CPU @ 2.60GHz
BenchmarkSort8/random-1000         	   96661	     17267 ns/op	 463.31 MB/s
BenchmarkSort8/random-10000        	    7140	    153514 ns/op	 521.13 MB/s
BenchmarkSort8/random-100000       	     624	   1918605 ns/op	 416.97 MB/s
BenchmarkSort8/partially-ordered(10)-100000         	     628	   1932738 ns/op	 413.92 MB/s
BenchmarkSort8/partially-ordered(100)-100000        	     619	   1951661 ns/op	 409.91 MB/s
BenchmarkSort8/partially-ordered(1000)-100000       	     624	   1974838 ns/op	 405.10 MB/s
BenchmarkSort8/random-1000000                       	      52	  23200255 ns/op	 344.82 MB/s
BenchmarkSort8/partially-ordered(10)-1000000        	      51	  22865926 ns/op	 349.87 MB/s
BenchmarkSort8/partially-ordered(100)-1000000       	      51	  22693524 ns/op	 352.52 MB/s
BenchmarkSort8/partially-ordered(1000)-1000000      	      52	  22818639 ns/op	 350.59 MB/s
BenchmarkStdlibSort8/random-100000                  	      86	  11786551 ns/op	  67.87 MB/s
BenchmarkStdlibSort8/partially-sorted(10)-100000    	    2626	    454637 ns/op	1759.65 MB/s
BenchmarkStdlibSort8/partially-sorted(100)-100000   	    2641	    453842 ns/op	1762.73 MB/s
BenchmarkStdlibSort8/partially-sorted(1000)-100000  	    2242	    453902 ns/op	1762.49 MB/s
BenchmarkStdlibSort8/random-1000000                 	       7	 145931998 ns/op	  54.82 MB/s
BenchmarkStdlibSort8/partially-sorted(10)-1000000   	     205	   5389031 ns/op	1484.50 MB/s
BenchmarkStdlibSort8/partially-sorted(100)-1000000  	     222	   5209207 ns/op	1535.74 MB/s
BenchmarkStdlibSort8/partially-sorted(1000)-1000000 	     194	   5174930 ns/op	1545.91 MB/s
PASS
ok  	github.com/segmentio/asm/qsort	30.110s

@peterdemartini peterdemartini merged commit d7e16ff into main May 10, 2022
@peterdemartini peterdemartini deleted the diversify-qsort-benchmarks branch May 10, 2022 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants