Skip to content

Conversation

sterrettm2
Copy link
Contributor

This adds support for argsort, argselect, and kv sort for AVX2.

Benchmark                                                                        Time             CPU      Time Old      Time New       CPU Old       CPU New
-------------------------------------------------------------------------------------------------------------------------------------------------------------
[scalarargsort.*random_ vs. simdargsort.*random_]128/int64_t                  +1.2711         +1.2711           727          1652           727          1652
[scalarargsort.*random_ vs. simdargsort.*random_]256/int64_t                  +1.5339         +1.5339          1588          4024          1588          4024
[scalarargsort.*random_ vs. simdargsort.*random_]512/int64_t                  +1.9735         +1.9736          3544         10540          3544         10539
[scalarargsort.*random_ vs. simdargsort.*random_]1k/int64_t                   +0.4899         +0.4900         15180         22617         15179         22616
[scalarargsort.*random_ vs. simdargsort.*random_]5k/int64_t                   -0.5218         -0.5218        282968        135304        282948        135298
[scalarargsort.*random_ vs. simdargsort.*random_]100k/int64_t                 -0.5681         -0.5681       8460175       3654211       8459586       3653843
[scalarargsort.*random_ vs. simdargsort.*random_]1m/int64_t                   -0.4836         -0.4836     114586394      59170334     114575027      59163423
[scalarargsort.*random_ vs. simdargsort.*random_]10m/int64_t                  -0.3137         -0.3137    2054677762    1410045438    2054394419    1409915110
[scalarargsort.*random_ vs. simdargsort.*random_]128/uint64_t                 +1.4105         +1.4106           772          1860           772          1860
[scalarargsort.*random_ vs. simdargsort.*random_]256/uint64_t                 +1.6515         +1.6515          1656          4390          1656          4390
[scalarargsort.*random_ vs. simdargsort.*random_]512/uint64_t                 +2.1449         +2.1450          3651         11481          3650         11481
[scalarargsort.*random_ vs. simdargsort.*random_]1k/uint64_t                  +0.7929         +0.7929         13720         24598         13720         24597
[scalarargsort.*random_ vs. simdargsort.*random_]5k/uint64_t                  -0.4799         -0.4799        282869        147117        282846        147105
[scalarargsort.*random_ vs. simdargsort.*random_]100k/uint64_t                -0.5415         -0.5415       8484708       3890152       8484148       3889736
[scalarargsort.*random_ vs. simdargsort.*random_]1m/uint64_t                  -0.4600         -0.4601     114569475      61862446     114556094      61851167
[scalarargsort.*random_ vs. simdargsort.*random_]10m/uint64_t                 -0.3001         -0.3002    2053241244    1436979877    2053014588    1436779082
[scalarargsort.*random_ vs. simdargsort.*random_]128/double                   +0.6567         +0.6567           846          1402           846          1402
[scalarargsort.*random_ vs. simdargsort.*random_]256/double                   +0.7314         +0.7314          1893          3277          1892          3277
[scalarargsort.*random_ vs. simdargsort.*random_]512/double                   +1.0740         +1.0740          4092          8488          4092          8487
[scalarargsort.*random_ vs. simdargsort.*random_]1k/double                    -0.1487         -0.1487         23149         19707         23148         19707
[scalarargsort.*random_ vs. simdargsort.*random_]5k/double                    -0.6091         -0.6091        283673        110887        283655        110878
[scalarargsort.*random_ vs. simdargsort.*random_]100k/double                  -0.6174         -0.6174       8354939       3196334       8354229       3196128
[scalarargsort.*random_ vs. simdargsort.*random_]1m/double                    -0.5592         -0.5592     114068183      50284350     114053402      50276509
[scalarargsort.*random_ vs. simdargsort.*random_]10m/double                   -0.3127         -0.3126    2049582813    1408765808    2049324581    1408636738
[scalarargsort.*random_ vs. simdargsort.*random_]128/int32_t                  +0.7179         +0.7179           794          1364           794          1364
[scalarargsort.*random_ vs. simdargsort.*random_]256/int32_t                  +0.8265         +0.8266          1708          3120          1708          3120
[scalarargsort.*random_ vs. simdargsort.*random_]512/int32_t                  +1.2124         +1.2123          3726          8243          3726          8243
[scalarargsort.*random_ vs. simdargsort.*random_]1k/int32_t                   -0.2279         -0.2279         21168         16343         21167         16342
[scalarargsort.*random_ vs. simdargsort.*random_]5k/int32_t                   -0.6317         -0.6317        274010        100910        273990        100904
[scalarargsort.*random_ vs. simdargsort.*random_]100k/int32_t                 -0.6196         -0.6196       7931878       3017433       7931299       3017179
[scalarargsort.*random_ vs. simdargsort.*random_]1m/int32_t                   -0.5841         -0.5841     107409913      44675142     107401385      44668507
[scalarargsort.*random_ vs. simdargsort.*random_]10m/int32_t                  -0.3744         -0.3745    1836718161    1148961661    1836541830    1148846511
[scalarargsort.*random_ vs. simdargsort.*random_]128/uint32_t                 +0.8510         +0.8510           738          1367           738          1366
[scalarargsort.*random_ vs. simdargsort.*random_]256/uint32_t                 +0.9039         +0.9038          1638          3119          1638          3118
[scalarargsort.*random_ vs. simdargsort.*random_]512/uint32_t                 +1.2184         +1.2183          3694          8194          3694          8194
[scalarargsort.*random_ vs. simdargsort.*random_]1k/uint32_t                  -0.3355         -0.3355         24425         16231         24424         16231
[scalarargsort.*random_ vs. simdargsort.*random_]5k/uint32_t                  -0.6356         -0.6356        272408         99267        272394         99260
[scalarargsort.*random_ vs. simdargsort.*random_]100k/uint32_t                -0.6286         -0.6286       7951807       2953532       7951342       2953390
[scalarargsort.*random_ vs. simdargsort.*random_]1m/uint32_t                  -0.5901         -0.5901     107336303      43997337     107324792      43989818
[scalarargsort.*random_ vs. simdargsort.*random_]10m/uint32_t                 -0.3739         -0.3739    1831763385    1146822687    1831508128    1146654958
[scalarargsort.*random_ vs. simdargsort.*random_]128/float                    +0.9103         +0.9103           747          1427           747          1427
[scalarargsort.*random_ vs. simdargsort.*random_]256/float                    +1.0918         +1.0918          1599          3345          1599          3345
[scalarargsort.*random_ vs. simdargsort.*random_]512/float                    +1.7642         +1.7642          3372          9322          3372          9322
[scalarargsort.*random_ vs. simdargsort.*random_]1k/float                     -0.0662         -0.0662         21543         20118         21542         20117
[scalarargsort.*random_ vs. simdargsort.*random_]5k/float                     -0.6006         -0.6006        270926        108214        270909        108205
[scalarargsort.*random_ vs. simdargsort.*random_]100k/float                   -0.6086         -0.6086       7880947       3084313       7880386       3084130
[scalarargsort.*random_ vs. simdargsort.*random_]1m/float                     -0.5793         -0.5793     104376538      43907818     104367917      43903884
[scalarargsort.*random_ vs. simdargsort.*random_]10m/float                    -0.3640         -0.3640    1800961740    1145333523    1800764793    1145208302
OVERALL_GEOMEAN                                                               -0.0772         -0.0772             0             0             0             0

Benchmark                                                                               Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_128/int64_t                  +1.3918         +1.3917           653          1561           653          1561
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_256/int64_t                  +1.7809         +1.7808          1420          3949          1420          3949
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_512/int64_t                  +2.5843         +2.5843          3104         11125          3104         11124
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_1k/int64_t                   +2.2490         +2.2490          7089         23031          7088         23029
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_5k/int64_t                   -0.5963         -0.5963        149855         60501        149847         60498
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_100k/int64_t                 -0.7715         -0.7715       3753502        857590       3753191        857554
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_1m/int64_t                   -0.7035         -0.7035      48425688      14357676      48422468      14356105
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_10m/int64_t                  -0.6725         -0.6726     947529173     310275725     947403886     310225775
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_128/uint64_t                 +1.7617         +1.7617           671          1853           671          1853
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_256/uint64_t                 +1.9803         +1.9803          1473          4390          1473          4390
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_512/uint64_t                 +2.6901         +2.6900          3130         11549          3130         11548
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_1k/uint64_t                  +2.3899         +2.3897          7181         24341          7180         24339
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_5k/uint64_t                  -0.5657         -0.5657        149410         64883        149401         64882
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_100k/uint64_t                -0.7567         -0.7567       3741835        910257       3741600        910240
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_1m/uint64_t                  -0.7050         -0.7050      48141231      14204042      48137278      14202672
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_10m/uint64_t                 -0.6656         -0.6656     946433261     316470749     946295547     316410250
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_128/double                   +0.6683         +0.6683           832          1388           832          1388
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_256/double                   +0.7471         +0.7470          1873          3273          1873          3273
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_512/double                   +1.1045         +1.1045          4028          8476          4027          8476
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_1k/double                    -0.1526         -0.1525         23252         19704         23251         19704
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_5k/double                    -0.6073         -0.6073        283470        111326        283466        111319
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_100k/double                  -0.6163         -0.6163       8326114       3194983       8325900       3194762
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_1m/double                    -0.5595         -0.5595     114009657      50225387     113990117      50217236
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_10m/double                   -0.3102         -0.3102    2057833295    1419522163    2057528396    1419345363
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_128/int32_t                  +1.0741         +1.0742           656          1360           655          1360
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_256/int32_t                  +1.2354         +1.2355          1395          3119          1395          3119
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_512/int32_t                  +1.8226         +1.8227          2982          8418          2982          8418
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_1k/int32_t                   +1.0254         +1.0256          8813         17850          8812         17850
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_5k/int32_t                   -0.6286         -0.6286        131963         49006        131953         49005
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_100k/int32_t                 -0.7699         -0.7699       3232776        743711       3232475        743701
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_1m/int32_t                   -0.7470         -0.7470      39110830       9894646      39106243       9893241
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_10m/int32_t                  -0.7022         -0.7023     752520695     224088101     752375982     223963162
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_128/uint32_t                 +1.0364         +1.0365           668          1361           668          1361
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_256/uint32_t                 +1.1153         +1.1153          1476          3121          1475          3121
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_512/uint32_t                 +1.5222         +1.5222          3315          8361          3315          8361
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_1k/uint32_t                  +1.0883         +1.0883          8479         17707          8479         17706
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_5k/uint32_t                  -0.6606         -0.6606        141834         48138        141829         48136
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_100k/uint32_t                -0.7961         -0.7961       3580397        730082       3580179        730028
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_1m/uint32_t                  -0.7755         -0.7755      43469793       9758736      43465089       9757182
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_10m/uint32_t                 -0.7206         -0.7206     794028714     221868757     793926063     221859561
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_128/float                    +0.9151         +0.9150           743          1424           743          1424
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_256/float                    +1.0811         +1.0810          1605          3340          1605          3340
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_512/float                    +1.7802         +1.7801          3349          9311          3349          9311
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_1k/float                     -0.0714         -0.0714         21620         20076         21619         20075
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_5k/float                     -0.6033         -0.6033        271797        107833        271783        107827
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_100k/float                   -0.6108         -0.6108       7880356       3067133       7879824       3066886
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_1m/float                     -0.5795         -0.5796     104396314      43894295     104385184      43887214
[scalarargsort.*smallrange vs. simdargsort.*smallrange]_10m/float                    -0.3627         -0.3627    1807677403    1151955798    1807465937    1151863691
OVERALL_GEOMEAN                                                                      -0.1297         -0.1298             0             0             0             0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants