-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add primitive fast path for ArrayConcatFunction #7393
Conversation
✅ Deploy Preview for meta-velox canceled.
|
This pull request was exported from Phabricator. Differential Revision: D50948537 |
Summary: Optimize ArrayConcatFunction for primitives, similar to what we do for registerArrayRemoveFunctions and registerArrayTrimFunctions. Note: we can further optimize this by adding fast path for strings and add a no copy version for that. Note: there are also still several functions that uses add_items() and do not have such fast path we shall optimize those also. Follow up will address the points above. before:� ``` BUILD SUCCEEDED ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 723.07ms 1.38 array_concat_BOOLEAN_10##3arg 1.14s 877.15m array_concat_BOOLEAN_10##4arg 1.57s 637.52m array_concat_BOOLEAN_20##2arg 775.91ms 1.29 array_concat_BOOLEAN_20##3arg 1.17s 857.28m array_concat_BOOLEAN_20##4arg 1.55s 646.56m array_concat_BOOLEAN_40##2arg 778.07ms 1.29 array_concat_BOOLEAN_40##3arg 1.16s 860.94m array_concat_BOOLEAN_40##4arg 1.57s 636.92m array_concat_BOOLEAN_5##2arg 726.71ms 1.38 array_concat_BOOLEAN_5##3arg 1.11s 904.61m array_concat_BOOLEAN_5##4arg 1.55s 643.97m array_concat_INTEGER_10##2arg 689.14ms 1.45 array_concat_INTEGER_10##3arg 1.01s 991.35m array_concat_INTEGER_10##4arg 1.40s 713.43m array_concat_INTEGER_20##2arg 681.24ms 1.47 array_concat_INTEGER_20##3arg 1.03s 973.70m array_concat_INTEGER_20##4arg 1.35s 740.10m array_concat_INTEGER_40##2arg 666.57ms 1.50 array_concat_INTEGER_40##3arg 1.04s 958.60m array_concat_INTEGER_40##4arg 1.37s 727.85m array_concat_INTEGER_5##2arg 652.99ms 1.53 array_concat_INTEGER_5##3arg 985.63ms 1.01 array_concat_INTEGER_5##4arg 1.34s 745.48m array_concat_VARCHAR_10##2arg 679.71ms 1.47 array_concat_VARCHAR_10##3arg 1.46s 683.21m array_concat_VARCHAR_10##4arg 2.09s 479.20m array_concat_VARCHAR_20##2arg 1.36s 733.91m array_concat_VARCHAR_20##3arg 1.85s 539.88m array_concat_VARCHAR_20##4arg 2.78s 359.50m array_concat_VARCHAR_40##2arg 1.23s 809.85m array_concat_VARCHAR_40##3arg 1.84s 542.69m array_concat_VARCHAR_40##4arg 2.45s 407.85m array_concat_VARCHAR_5##2arg 1.53s 653.44m array_concat_VARCHAR_5##3arg 2.06s 485.88m array_concat_VARCHAR_5##4arg 2.81s 356.51m ``` after:� ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 197.26ms 5.07 array_concat_BOOLEAN_10##3arg 337.22ms 2.97 array_concat_BOOLEAN_10##4arg 486.89ms 2.05 array_concat_BOOLEAN_20##2arg 280.10ms 3.57 array_concat_BOOLEAN_20##3arg 352.67ms 2.84 array_concat_BOOLEAN_20##4arg 495.94ms 2.02 array_concat_BOOLEAN_40##2arg 260.39ms 3.84 array_concat_BOOLEAN_40##3arg 415.81ms 2.40 array_concat_BOOLEAN_40##4arg 550.83ms 1.82 array_concat_BOOLEAN_5##2arg 189.44ms 5.28 array_concat_BOOLEAN_5##3arg 244.64ms 4.09 array_concat_BOOLEAN_5##4arg 376.33ms 2.66 array_concat_INTEGER_10##2arg 80.36ms 12.44 array_concat_INTEGER_10##3arg 129.36ms 7.73 array_concat_INTEGER_10##4arg 194.14ms 5.15 array_concat_INTEGER_20##2arg 110.09ms 9.08 array_concat_INTEGER_20##3arg 144.69ms 6.91 array_concat_INTEGER_20##4arg 179.20ms 5.58 array_concat_INTEGER_40##2arg 83.20ms 12.02 array_concat_INTEGER_40##3arg 128.46ms 7.78 array_concat_INTEGER_40##4arg 167.46ms 5.97 array_concat_INTEGER_5##2arg 80.45ms 12.43 array_concat_INTEGER_5##3arg 111.43ms 8.97 array_concat_INTEGER_5##4arg 154.83ms 6.46 array_concat_VARCHAR_10##2arg 401.57ms 2.49 array_concat_VARCHAR_10##3arg 755.30ms 1.32 array_concat_VARCHAR_10##4arg 1.03s 969.99m array_concat_VARCHAR_20##2arg 681.27ms 1.47 array_concat_VARCHAR_20##3arg 959.15ms 1.04 array_concat_VARCHAR_20##4arg 1.50s 665.93m array_concat_VARCHAR_40##2arg 660.68ms 1.51 array_concat_VARCHAR_40##3arg 984.20ms 1.02 array_concat_VARCHAR_40##4arg 1.35s 738.16m array_concat_VARCHAR_5##2arg 827.10ms 1.21 array_concat_VARCHAR_5##3arg 1.11s 903.32m array_concat_VARCHAR_5##4arg 1.47s 682.15m ``` Differential Revision: D50948537
409247d
to
fbdebcf
Compare
This pull request was exported from Phabricator. Differential Revision: D50948537 |
Summary: Optimize ArrayConcatFunction for primitives, similar to what we do for registerArrayRemoveFunctions and registerArrayTrimFunctions. Note: we can further optimize this by adding fast path for strings and add a no copy version for that. Note: there are also still several functions that uses add_items() and do not have such fast path we shall optimize those also. Follow up will address the points above. before:� ``` BUILD SUCCEEDED ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 723.07ms 1.38 array_concat_BOOLEAN_10##3arg 1.14s 877.15m array_concat_BOOLEAN_10##4arg 1.57s 637.52m array_concat_BOOLEAN_20##2arg 775.91ms 1.29 array_concat_BOOLEAN_20##3arg 1.17s 857.28m array_concat_BOOLEAN_20##4arg 1.55s 646.56m array_concat_BOOLEAN_40##2arg 778.07ms 1.29 array_concat_BOOLEAN_40##3arg 1.16s 860.94m array_concat_BOOLEAN_40##4arg 1.57s 636.92m array_concat_BOOLEAN_5##2arg 726.71ms 1.38 array_concat_BOOLEAN_5##3arg 1.11s 904.61m array_concat_BOOLEAN_5##4arg 1.55s 643.97m array_concat_INTEGER_10##2arg 689.14ms 1.45 array_concat_INTEGER_10##3arg 1.01s 991.35m array_concat_INTEGER_10##4arg 1.40s 713.43m array_concat_INTEGER_20##2arg 681.24ms 1.47 array_concat_INTEGER_20##3arg 1.03s 973.70m array_concat_INTEGER_20##4arg 1.35s 740.10m array_concat_INTEGER_40##2arg 666.57ms 1.50 array_concat_INTEGER_40##3arg 1.04s 958.60m array_concat_INTEGER_40##4arg 1.37s 727.85m array_concat_INTEGER_5##2arg 652.99ms 1.53 array_concat_INTEGER_5##3arg 985.63ms 1.01 array_concat_INTEGER_5##4arg 1.34s 745.48m array_concat_VARCHAR_10##2arg 679.71ms 1.47 array_concat_VARCHAR_10##3arg 1.46s 683.21m array_concat_VARCHAR_10##4arg 2.09s 479.20m array_concat_VARCHAR_20##2arg 1.36s 733.91m array_concat_VARCHAR_20##3arg 1.85s 539.88m array_concat_VARCHAR_20##4arg 2.78s 359.50m array_concat_VARCHAR_40##2arg 1.23s 809.85m array_concat_VARCHAR_40##3arg 1.84s 542.69m array_concat_VARCHAR_40##4arg 2.45s 407.85m array_concat_VARCHAR_5##2arg 1.53s 653.44m array_concat_VARCHAR_5##3arg 2.06s 485.88m array_concat_VARCHAR_5##4arg 2.81s 356.51m ``` after:� ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 197.26ms 5.07 array_concat_BOOLEAN_10##3arg 337.22ms 2.97 array_concat_BOOLEAN_10##4arg 486.89ms 2.05 array_concat_BOOLEAN_20##2arg 280.10ms 3.57 array_concat_BOOLEAN_20##3arg 352.67ms 2.84 array_concat_BOOLEAN_20##4arg 495.94ms 2.02 array_concat_BOOLEAN_40##2arg 260.39ms 3.84 array_concat_BOOLEAN_40##3arg 415.81ms 2.40 array_concat_BOOLEAN_40##4arg 550.83ms 1.82 array_concat_BOOLEAN_5##2arg 189.44ms 5.28 array_concat_BOOLEAN_5##3arg 244.64ms 4.09 array_concat_BOOLEAN_5##4arg 376.33ms 2.66 array_concat_INTEGER_10##2arg 80.36ms 12.44 array_concat_INTEGER_10##3arg 129.36ms 7.73 array_concat_INTEGER_10##4arg 194.14ms 5.15 array_concat_INTEGER_20##2arg 110.09ms 9.08 array_concat_INTEGER_20##3arg 144.69ms 6.91 array_concat_INTEGER_20##4arg 179.20ms 5.58 array_concat_INTEGER_40##2arg 83.20ms 12.02 array_concat_INTEGER_40##3arg 128.46ms 7.78 array_concat_INTEGER_40##4arg 167.46ms 5.97 array_concat_INTEGER_5##2arg 80.45ms 12.43 array_concat_INTEGER_5##3arg 111.43ms 8.97 array_concat_INTEGER_5##4arg 154.83ms 6.46 array_concat_VARCHAR_10##2arg 401.57ms 2.49 array_concat_VARCHAR_10##3arg 755.30ms 1.32 array_concat_VARCHAR_10##4arg 1.03s 969.99m array_concat_VARCHAR_20##2arg 681.27ms 1.47 array_concat_VARCHAR_20##3arg 959.15ms 1.04 array_concat_VARCHAR_20##4arg 1.50s 665.93m array_concat_VARCHAR_40##2arg 660.68ms 1.51 array_concat_VARCHAR_40##3arg 984.20ms 1.02 array_concat_VARCHAR_40##4arg 1.35s 738.16m array_concat_VARCHAR_5##2arg 827.10ms 1.21 array_concat_VARCHAR_5##3arg 1.11s 903.32m array_concat_VARCHAR_5##4arg 1.47s 682.15m ``` Differential Revision: D50948537
fbdebcf
to
ce61682
Compare
Summary: Optimize ArrayConcatFunction for primitives, similar to what we do for registerArrayRemoveFunctions and registerArrayTrimFunctions. Note: we can further optimize this by adding fast path for strings and add a no copy version for that. Note: there are also still several functions that uses add_items() and do not have such fast path we shall optimize those also. Follow up will address the points above. before:� ``` BUILD SUCCEEDED ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 723.07ms 1.38 array_concat_BOOLEAN_10##3arg 1.14s 877.15m array_concat_BOOLEAN_10##4arg 1.57s 637.52m array_concat_BOOLEAN_20##2arg 775.91ms 1.29 array_concat_BOOLEAN_20##3arg 1.17s 857.28m array_concat_BOOLEAN_20##4arg 1.55s 646.56m array_concat_BOOLEAN_40##2arg 778.07ms 1.29 array_concat_BOOLEAN_40##3arg 1.16s 860.94m array_concat_BOOLEAN_40##4arg 1.57s 636.92m array_concat_BOOLEAN_5##2arg 726.71ms 1.38 array_concat_BOOLEAN_5##3arg 1.11s 904.61m array_concat_BOOLEAN_5##4arg 1.55s 643.97m array_concat_INTEGER_10##2arg 689.14ms 1.45 array_concat_INTEGER_10##3arg 1.01s 991.35m array_concat_INTEGER_10##4arg 1.40s 713.43m array_concat_INTEGER_20##2arg 681.24ms 1.47 array_concat_INTEGER_20##3arg 1.03s 973.70m array_concat_INTEGER_20##4arg 1.35s 740.10m array_concat_INTEGER_40##2arg 666.57ms 1.50 array_concat_INTEGER_40##3arg 1.04s 958.60m array_concat_INTEGER_40##4arg 1.37s 727.85m array_concat_INTEGER_5##2arg 652.99ms 1.53 array_concat_INTEGER_5##3arg 985.63ms 1.01 array_concat_INTEGER_5##4arg 1.34s 745.48m array_concat_VARCHAR_10##2arg 679.71ms 1.47 array_concat_VARCHAR_10##3arg 1.46s 683.21m array_concat_VARCHAR_10##4arg 2.09s 479.20m array_concat_VARCHAR_20##2arg 1.36s 733.91m array_concat_VARCHAR_20##3arg 1.85s 539.88m array_concat_VARCHAR_20##4arg 2.78s 359.50m array_concat_VARCHAR_40##2arg 1.23s 809.85m array_concat_VARCHAR_40##3arg 1.84s 542.69m array_concat_VARCHAR_40##4arg 2.45s 407.85m array_concat_VARCHAR_5##2arg 1.53s 653.44m array_concat_VARCHAR_5##3arg 2.06s 485.88m array_concat_VARCHAR_5##4arg 2.81s 356.51m ``` after:� ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 197.26ms 5.07 array_concat_BOOLEAN_10##3arg 337.22ms 2.97 array_concat_BOOLEAN_10##4arg 486.89ms 2.05 array_concat_BOOLEAN_20##2arg 280.10ms 3.57 array_concat_BOOLEAN_20##3arg 352.67ms 2.84 array_concat_BOOLEAN_20##4arg 495.94ms 2.02 array_concat_BOOLEAN_40##2arg 260.39ms 3.84 array_concat_BOOLEAN_40##3arg 415.81ms 2.40 array_concat_BOOLEAN_40##4arg 550.83ms 1.82 array_concat_BOOLEAN_5##2arg 189.44ms 5.28 array_concat_BOOLEAN_5##3arg 244.64ms 4.09 array_concat_BOOLEAN_5##4arg 376.33ms 2.66 array_concat_INTEGER_10##2arg 80.36ms 12.44 array_concat_INTEGER_10##3arg 129.36ms 7.73 array_concat_INTEGER_10##4arg 194.14ms 5.15 array_concat_INTEGER_20##2arg 110.09ms 9.08 array_concat_INTEGER_20##3arg 144.69ms 6.91 array_concat_INTEGER_20##4arg 179.20ms 5.58 array_concat_INTEGER_40##2arg 83.20ms 12.02 array_concat_INTEGER_40##3arg 128.46ms 7.78 array_concat_INTEGER_40##4arg 167.46ms 5.97 array_concat_INTEGER_5##2arg 80.45ms 12.43 array_concat_INTEGER_5##3arg 111.43ms 8.97 array_concat_INTEGER_5##4arg 154.83ms 6.46 array_concat_VARCHAR_10##2arg 401.57ms 2.49 array_concat_VARCHAR_10##3arg 755.30ms 1.32 array_concat_VARCHAR_10##4arg 1.03s 969.99m array_concat_VARCHAR_20##2arg 681.27ms 1.47 array_concat_VARCHAR_20##3arg 959.15ms 1.04 array_concat_VARCHAR_20##4arg 1.50s 665.93m array_concat_VARCHAR_40##2arg 660.68ms 1.51 array_concat_VARCHAR_40##3arg 984.20ms 1.02 array_concat_VARCHAR_40##4arg 1.35s 738.16m array_concat_VARCHAR_5##2arg 827.10ms 1.21 array_concat_VARCHAR_5##3arg 1.11s 903.32m array_concat_VARCHAR_5##4arg 1.47s 682.15m ``` Differential Revision: D50948537
ce61682
to
733b134
Compare
This pull request was exported from Phabricator. Differential Revision: D50948537 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D50948537 |
Summary: Optimize ArrayConcatFunction for primitives, similar to what we do for registerArrayRemoveFunctions and registerArrayTrimFunctions. Note: we can further optimize this by adding fast path for strings and add a no copy version for that. Note: there are also still several functions that uses add_items() and do not have such fast path we shall optimize those also. Follow up will address the points above. before:� ``` BUILD SUCCEEDED ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 723.07ms 1.38 array_concat_BOOLEAN_10##3arg 1.14s 877.15m array_concat_BOOLEAN_10##4arg 1.57s 637.52m array_concat_BOOLEAN_20##2arg 775.91ms 1.29 array_concat_BOOLEAN_20##3arg 1.17s 857.28m array_concat_BOOLEAN_20##4arg 1.55s 646.56m array_concat_BOOLEAN_40##2arg 778.07ms 1.29 array_concat_BOOLEAN_40##3arg 1.16s 860.94m array_concat_BOOLEAN_40##4arg 1.57s 636.92m array_concat_BOOLEAN_5##2arg 726.71ms 1.38 array_concat_BOOLEAN_5##3arg 1.11s 904.61m array_concat_BOOLEAN_5##4arg 1.55s 643.97m array_concat_INTEGER_10##2arg 689.14ms 1.45 array_concat_INTEGER_10##3arg 1.01s 991.35m array_concat_INTEGER_10##4arg 1.40s 713.43m array_concat_INTEGER_20##2arg 681.24ms 1.47 array_concat_INTEGER_20##3arg 1.03s 973.70m array_concat_INTEGER_20##4arg 1.35s 740.10m array_concat_INTEGER_40##2arg 666.57ms 1.50 array_concat_INTEGER_40##3arg 1.04s 958.60m array_concat_INTEGER_40##4arg 1.37s 727.85m array_concat_INTEGER_5##2arg 652.99ms 1.53 array_concat_INTEGER_5##3arg 985.63ms 1.01 array_concat_INTEGER_5##4arg 1.34s 745.48m array_concat_VARCHAR_10##2arg 679.71ms 1.47 array_concat_VARCHAR_10##3arg 1.46s 683.21m array_concat_VARCHAR_10##4arg 2.09s 479.20m array_concat_VARCHAR_20##2arg 1.36s 733.91m array_concat_VARCHAR_20##3arg 1.85s 539.88m array_concat_VARCHAR_20##4arg 2.78s 359.50m array_concat_VARCHAR_40##2arg 1.23s 809.85m array_concat_VARCHAR_40##3arg 1.84s 542.69m array_concat_VARCHAR_40##4arg 2.45s 407.85m array_concat_VARCHAR_5##2arg 1.53s 653.44m array_concat_VARCHAR_5##3arg 2.06s 485.88m array_concat_VARCHAR_5##4arg 2.81s 356.51m ``` after:� ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 197.26ms 5.07 array_concat_BOOLEAN_10##3arg 337.22ms 2.97 array_concat_BOOLEAN_10##4arg 486.89ms 2.05 array_concat_BOOLEAN_20##2arg 280.10ms 3.57 array_concat_BOOLEAN_20##3arg 352.67ms 2.84 array_concat_BOOLEAN_20##4arg 495.94ms 2.02 array_concat_BOOLEAN_40##2arg 260.39ms 3.84 array_concat_BOOLEAN_40##3arg 415.81ms 2.40 array_concat_BOOLEAN_40##4arg 550.83ms 1.82 array_concat_BOOLEAN_5##2arg 189.44ms 5.28 array_concat_BOOLEAN_5##3arg 244.64ms 4.09 array_concat_BOOLEAN_5##4arg 376.33ms 2.66 array_concat_INTEGER_10##2arg 80.36ms 12.44 array_concat_INTEGER_10##3arg 129.36ms 7.73 array_concat_INTEGER_10##4arg 194.14ms 5.15 array_concat_INTEGER_20##2arg 110.09ms 9.08 array_concat_INTEGER_20##3arg 144.69ms 6.91 array_concat_INTEGER_20##4arg 179.20ms 5.58 array_concat_INTEGER_40##2arg 83.20ms 12.02 array_concat_INTEGER_40##3arg 128.46ms 7.78 array_concat_INTEGER_40##4arg 167.46ms 5.97 array_concat_INTEGER_5##2arg 80.45ms 12.43 array_concat_INTEGER_5##3arg 111.43ms 8.97 array_concat_INTEGER_5##4arg 154.83ms 6.46 array_concat_VARCHAR_10##2arg 401.57ms 2.49 array_concat_VARCHAR_10##3arg 755.30ms 1.32 array_concat_VARCHAR_10##4arg 1.03s 969.99m array_concat_VARCHAR_20##2arg 681.27ms 1.47 array_concat_VARCHAR_20##3arg 959.15ms 1.04 array_concat_VARCHAR_20##4arg 1.50s 665.93m array_concat_VARCHAR_40##2arg 660.68ms 1.51 array_concat_VARCHAR_40##3arg 984.20ms 1.02 array_concat_VARCHAR_40##4arg 1.35s 738.16m array_concat_VARCHAR_5##2arg 827.10ms 1.21 array_concat_VARCHAR_5##3arg 1.11s 903.32m array_concat_VARCHAR_5##4arg 1.47s 682.15m ``` Differential Revision: D50948537
733b134
to
21835d2
Compare
This pull request was exported from Phabricator. Differential Revision: D50948537 |
Summary: Optimize ArrayConcatFunction for primitives, similar to what we do for registerArrayRemoveFunctions and registerArrayTrimFunctions. Note: we can further optimize this by adding fast path for strings and add a no copy version for that. Note: there are also still several functions that uses add_items() and do not have such fast path we shall optimize those also. Follow up will address the points above. before: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 737.98ms 1.36 array_concat_BOOLEAN_10##3arg 1.14s 874.37m array_concat_BOOLEAN_10##4arg 1.50s 666.55m array_concat_BOOLEAN_20##2arg 1.58s 631.44m array_concat_BOOLEAN_20##3arg 2.33s 428.92m array_concat_BOOLEAN_20##4arg 3.22s 310.71m array_concat_BOOLEAN_40##2arg 3.07s 325.76m array_concat_BOOLEAN_40##3arg 4.75s 210.37m array_concat_BOOLEAN_40##4arg 6.32s 158.26m array_concat_BOOLEAN_5##2arg 451.47ms 2.21 array_concat_BOOLEAN_5##3arg 674.46ms 1.48 array_concat_BOOLEAN_5##4arg 859.56ms 1.16 array_concat_INTEGER_10##2arg 706.34ms 1.42 array_concat_INTEGER_10##3arg 1.09s 919.50m array_concat_INTEGER_10##4arg 1.47s 681.77m array_concat_INTEGER_20##2arg 1.40s 716.06m array_concat_INTEGER_20##3arg 2.02s 494.92m array_concat_INTEGER_20##4arg 2.73s 366.24m array_concat_INTEGER_40##2arg 2.68s 372.98m array_concat_INTEGER_40##3arg 3.98s 251.52m array_concat_INTEGER_40##4arg 5.40s 185.08m array_concat_INTEGER_5##2arg 382.78ms 2.61 array_concat_INTEGER_5##3arg 565.82ms 1.77 array_concat_INTEGER_5##4arg 758.75ms 1.32 array_concat_VARCHAR_10##2arg 1.24s 803.73m array_concat_VARCHAR_10##3arg 1.81s 552.59m array_concat_VARCHAR_10##4arg 2.31s 432.19m array_concat_VARCHAR_20##2arg 3.38s 295.55m array_concat_VARCHAR_20##3arg 4.53s 220.65m array_concat_VARCHAR_20##4arg 5.61s 178.32m array_concat_VARCHAR_40##2arg 5.69s 175.66m array_concat_VARCHAR_40##3arg 9.95s 100.53m array_concat_VARCHAR_40##4arg 11.99s 83.39m array_concat_VARCHAR_5##2arg 523.94ms 1.91 array_concat_VARCHAR_5##3arg 797.74ms 1.25 array_concat_VARCHAR_5##4arg 1.05s 954.15m ``` after: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 190.24ms 5.26 array_concat_BOOLEAN_10##3arg 253.91ms 3.94 array_concat_BOOLEAN_10##4arg 387.03ms 2.58 array_concat_BOOLEAN_20##2arg 484.33ms 2.06 array_concat_BOOLEAN_20##3arg 766.60ms 1.30 array_concat_BOOLEAN_20##4arg 1.01s 991.11m array_concat_BOOLEAN_40##2arg 982.59ms 1.02 array_concat_BOOLEAN_40##3arg 1.36s 736.99m array_concat_BOOLEAN_40##4arg 1.74s 575.58m array_concat_BOOLEAN_5##2arg 139.40ms 7.17 array_concat_BOOLEAN_5##3arg 214.43ms 4.66 array_concat_BOOLEAN_5##4arg 273.88ms 3.65 array_concat_INTEGER_10##2arg 80.90ms 12.36 array_concat_INTEGER_10##3arg 110.80ms 9.03 array_concat_INTEGER_10##4arg 149.86ms 6.67 array_concat_INTEGER_20##2arg 167.08ms 5.99 array_concat_INTEGER_20##3arg 261.83ms 3.82 array_concat_INTEGER_20##4arg 319.26ms 3.13 array_concat_INTEGER_40##2arg 301.37ms 3.32 array_concat_INTEGER_40##3arg 422.25ms 2.37 array_concat_INTEGER_40##4arg 714.74ms 1.40 array_concat_INTEGER_5##2arg 60.61ms 16.50 array_concat_INTEGER_5##3arg 89.28ms 11.20 array_concat_INTEGER_5##4arg 117.99ms 8.48 array_concat_VARCHAR_10##2arg 652.44ms 1.53 array_concat_VARCHAR_10##3arg 958.59ms 1.04 array_concat_VARCHAR_10##4arg 1.26s 790.86m array_concat_VARCHAR_20##2arg 1.67s 598.25m array_concat_VARCHAR_20##3arg 2.22s 449.48m array_concat_VARCHAR_20##4arg 2.82s 355.01m array_concat_VARCHAR_40##2arg 2.83s 353.24m array_concat_VARCHAR_40##3arg 4.98s 200.99m array_concat_VARCHAR_40##4arg 7.03s 142.22m array_concat_VARCHAR_5##2arg 290.04ms 3.45 array_concat_VARCHAR_5##3arg 438.06ms 2.28 array_concat_VARCHAR_5##4arg 584.20ms 1.71 ``` Differential Revision: D50948537
21835d2
to
0e2cf19
Compare
This pull request was exported from Phabricator. Differential Revision: D50948537 |
Summary: Optimize ArrayConcatFunction for primitives, similar to what we do for registerArrayRemoveFunctions and registerArrayTrimFunctions. Note: we can further optimize this by adding fast path for strings and add a no copy version for that. Note: there are also still several functions that uses add_items() and do not have such fast path we shall optimize those also. Follow up will address the points above. before: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 737.98ms 1.36 array_concat_BOOLEAN_10##3arg 1.14s 874.37m array_concat_BOOLEAN_10##4arg 1.50s 666.55m array_concat_BOOLEAN_20##2arg 1.58s 631.44m array_concat_BOOLEAN_20##3arg 2.33s 428.92m array_concat_BOOLEAN_20##4arg 3.22s 310.71m array_concat_BOOLEAN_40##2arg 3.07s 325.76m array_concat_BOOLEAN_40##3arg 4.75s 210.37m array_concat_BOOLEAN_40##4arg 6.32s 158.26m array_concat_BOOLEAN_5##2arg 451.47ms 2.21 array_concat_BOOLEAN_5##3arg 674.46ms 1.48 array_concat_BOOLEAN_5##4arg 859.56ms 1.16 array_concat_INTEGER_10##2arg 706.34ms 1.42 array_concat_INTEGER_10##3arg 1.09s 919.50m array_concat_INTEGER_10##4arg 1.47s 681.77m array_concat_INTEGER_20##2arg 1.40s 716.06m array_concat_INTEGER_20##3arg 2.02s 494.92m array_concat_INTEGER_20##4arg 2.73s 366.24m array_concat_INTEGER_40##2arg 2.68s 372.98m array_concat_INTEGER_40##3arg 3.98s 251.52m array_concat_INTEGER_40##4arg 5.40s 185.08m array_concat_INTEGER_5##2arg 382.78ms 2.61 array_concat_INTEGER_5##3arg 565.82ms 1.77 array_concat_INTEGER_5##4arg 758.75ms 1.32 array_concat_VARCHAR_10##2arg 1.24s 803.73m array_concat_VARCHAR_10##3arg 1.81s 552.59m array_concat_VARCHAR_10##4arg 2.31s 432.19m array_concat_VARCHAR_20##2arg 3.38s 295.55m array_concat_VARCHAR_20##3arg 4.53s 220.65m array_concat_VARCHAR_20##4arg 5.61s 178.32m array_concat_VARCHAR_40##2arg 5.69s 175.66m array_concat_VARCHAR_40##3arg 9.95s 100.53m array_concat_VARCHAR_40##4arg 11.99s 83.39m array_concat_VARCHAR_5##2arg 523.94ms 1.91 array_concat_VARCHAR_5##3arg 797.74ms 1.25 array_concat_VARCHAR_5##4arg 1.05s 954.15m ``` after: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 190.24ms 5.26 array_concat_BOOLEAN_10##3arg 253.91ms 3.94 array_concat_BOOLEAN_10##4arg 387.03ms 2.58 array_concat_BOOLEAN_20##2arg 484.33ms 2.06 array_concat_BOOLEAN_20##3arg 766.60ms 1.30 array_concat_BOOLEAN_20##4arg 1.01s 991.11m array_concat_BOOLEAN_40##2arg 982.59ms 1.02 array_concat_BOOLEAN_40##3arg 1.36s 736.99m array_concat_BOOLEAN_40##4arg 1.74s 575.58m array_concat_BOOLEAN_5##2arg 139.40ms 7.17 array_concat_BOOLEAN_5##3arg 214.43ms 4.66 array_concat_BOOLEAN_5##4arg 273.88ms 3.65 array_concat_INTEGER_10##2arg 80.90ms 12.36 array_concat_INTEGER_10##3arg 110.80ms 9.03 array_concat_INTEGER_10##4arg 149.86ms 6.67 array_concat_INTEGER_20##2arg 167.08ms 5.99 array_concat_INTEGER_20##3arg 261.83ms 3.82 array_concat_INTEGER_20##4arg 319.26ms 3.13 array_concat_INTEGER_40##2arg 301.37ms 3.32 array_concat_INTEGER_40##3arg 422.25ms 2.37 array_concat_INTEGER_40##4arg 714.74ms 1.40 array_concat_INTEGER_5##2arg 60.61ms 16.50 array_concat_INTEGER_5##3arg 89.28ms 11.20 array_concat_INTEGER_5##4arg 117.99ms 8.48 array_concat_VARCHAR_10##2arg 652.44ms 1.53 array_concat_VARCHAR_10##3arg 958.59ms 1.04 array_concat_VARCHAR_10##4arg 1.26s 790.86m array_concat_VARCHAR_20##2arg 1.67s 598.25m array_concat_VARCHAR_20##3arg 2.22s 449.48m array_concat_VARCHAR_20##4arg 2.82s 355.01m array_concat_VARCHAR_40##2arg 2.83s 353.24m array_concat_VARCHAR_40##3arg 4.98s 200.99m array_concat_VARCHAR_40##4arg 7.03s 142.22m array_concat_VARCHAR_5##2arg 290.04ms 3.45 array_concat_VARCHAR_5##3arg 438.06ms 2.28 array_concat_VARCHAR_5##4arg 584.20ms 1.71 ``` Differential Revision: D50948537
0e2cf19
to
8a6fe50
Compare
This pull request was exported from Phabricator. Differential Revision: D50948537 |
Summary: Optimize ArrayConcatFunction for primitives, similar to what we do for registerArrayRemoveFunctions and registerArrayTrimFunctions. Note: we can further optimize this by adding fast path for strings and add a no copy version for that. Note: there are also still several functions that uses add_items() and do not have such fast path we shall optimize those also. Follow up will address the points above. before: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 737.98ms 1.36 array_concat_BOOLEAN_10##3arg 1.14s 874.37m array_concat_BOOLEAN_10##4arg 1.50s 666.55m array_concat_BOOLEAN_20##2arg 1.58s 631.44m array_concat_BOOLEAN_20##3arg 2.33s 428.92m array_concat_BOOLEAN_20##4arg 3.22s 310.71m array_concat_BOOLEAN_40##2arg 3.07s 325.76m array_concat_BOOLEAN_40##3arg 4.75s 210.37m array_concat_BOOLEAN_40##4arg 6.32s 158.26m array_concat_BOOLEAN_5##2arg 451.47ms 2.21 array_concat_BOOLEAN_5##3arg 674.46ms 1.48 array_concat_BOOLEAN_5##4arg 859.56ms 1.16 array_concat_INTEGER_10##2arg 706.34ms 1.42 array_concat_INTEGER_10##3arg 1.09s 919.50m array_concat_INTEGER_10##4arg 1.47s 681.77m array_concat_INTEGER_20##2arg 1.40s 716.06m array_concat_INTEGER_20##3arg 2.02s 494.92m array_concat_INTEGER_20##4arg 2.73s 366.24m array_concat_INTEGER_40##2arg 2.68s 372.98m array_concat_INTEGER_40##3arg 3.98s 251.52m array_concat_INTEGER_40##4arg 5.40s 185.08m array_concat_INTEGER_5##2arg 382.78ms 2.61 array_concat_INTEGER_5##3arg 565.82ms 1.77 array_concat_INTEGER_5##4arg 758.75ms 1.32 array_concat_VARCHAR_10##2arg 1.24s 803.73m array_concat_VARCHAR_10##3arg 1.81s 552.59m array_concat_VARCHAR_10##4arg 2.31s 432.19m array_concat_VARCHAR_20##2arg 3.38s 295.55m array_concat_VARCHAR_20##3arg 4.53s 220.65m array_concat_VARCHAR_20##4arg 5.61s 178.32m array_concat_VARCHAR_40##2arg 5.69s 175.66m array_concat_VARCHAR_40##3arg 9.95s 100.53m array_concat_VARCHAR_40##4arg 11.99s 83.39m array_concat_VARCHAR_5##2arg 523.94ms 1.91 array_concat_VARCHAR_5##3arg 797.74ms 1.25 array_concat_VARCHAR_5##4arg 1.05s 954.15m ``` after: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 190.24ms 5.26 array_concat_BOOLEAN_10##3arg 253.91ms 3.94 array_concat_BOOLEAN_10##4arg 387.03ms 2.58 array_concat_BOOLEAN_20##2arg 484.33ms 2.06 array_concat_BOOLEAN_20##3arg 766.60ms 1.30 array_concat_BOOLEAN_20##4arg 1.01s 991.11m array_concat_BOOLEAN_40##2arg 982.59ms 1.02 array_concat_BOOLEAN_40##3arg 1.36s 736.99m array_concat_BOOLEAN_40##4arg 1.74s 575.58m array_concat_BOOLEAN_5##2arg 139.40ms 7.17 array_concat_BOOLEAN_5##3arg 214.43ms 4.66 array_concat_BOOLEAN_5##4arg 273.88ms 3.65 array_concat_INTEGER_10##2arg 80.90ms 12.36 array_concat_INTEGER_10##3arg 110.80ms 9.03 array_concat_INTEGER_10##4arg 149.86ms 6.67 array_concat_INTEGER_20##2arg 167.08ms 5.99 array_concat_INTEGER_20##3arg 261.83ms 3.82 array_concat_INTEGER_20##4arg 319.26ms 3.13 array_concat_INTEGER_40##2arg 301.37ms 3.32 array_concat_INTEGER_40##3arg 422.25ms 2.37 array_concat_INTEGER_40##4arg 714.74ms 1.40 array_concat_INTEGER_5##2arg 60.61ms 16.50 array_concat_INTEGER_5##3arg 89.28ms 11.20 array_concat_INTEGER_5##4arg 117.99ms 8.48 array_concat_VARCHAR_10##2arg 652.44ms 1.53 array_concat_VARCHAR_10##3arg 958.59ms 1.04 array_concat_VARCHAR_10##4arg 1.26s 790.86m array_concat_VARCHAR_20##2arg 1.67s 598.25m array_concat_VARCHAR_20##3arg 2.22s 449.48m array_concat_VARCHAR_20##4arg 2.82s 355.01m array_concat_VARCHAR_40##2arg 2.83s 353.24m array_concat_VARCHAR_40##3arg 4.98s 200.99m array_concat_VARCHAR_40##4arg 7.03s 142.22m array_concat_VARCHAR_5##2arg 290.04ms 3.45 array_concat_VARCHAR_5##3arg 438.06ms 2.28 array_concat_VARCHAR_5##4arg 584.20ms 1.71 ``` Differential Revision: D50948537
8a6fe50
to
fc6985c
Compare
This pull request was exported from Phabricator. Differential Revision: D50948537 |
…bookincubator#7393) Summary: Optimize ArrayConcatFunction for primitives, similar to what we do for registerArrayRemoveFunctions and registerArrayTrimFunctions. Note: we can further optimize this by adding fast path for strings and add a no copy version for that. Note: there are also still several functions that uses add_items() and do not have such fast path we shall optimize those also. Follow up will address the points above. before: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 737.98ms 1.36 array_concat_BOOLEAN_10##3arg 1.14s 874.37m array_concat_BOOLEAN_10##4arg 1.50s 666.55m array_concat_BOOLEAN_20##2arg 1.58s 631.44m array_concat_BOOLEAN_20##3arg 2.33s 428.92m array_concat_BOOLEAN_20##4arg 3.22s 310.71m array_concat_BOOLEAN_40##2arg 3.07s 325.76m array_concat_BOOLEAN_40##3arg 4.75s 210.37m array_concat_BOOLEAN_40##4arg 6.32s 158.26m array_concat_BOOLEAN_5##2arg 451.47ms 2.21 array_concat_BOOLEAN_5##3arg 674.46ms 1.48 array_concat_BOOLEAN_5##4arg 859.56ms 1.16 array_concat_INTEGER_10##2arg 706.34ms 1.42 array_concat_INTEGER_10##3arg 1.09s 919.50m array_concat_INTEGER_10##4arg 1.47s 681.77m array_concat_INTEGER_20##2arg 1.40s 716.06m array_concat_INTEGER_20##3arg 2.02s 494.92m array_concat_INTEGER_20##4arg 2.73s 366.24m array_concat_INTEGER_40##2arg 2.68s 372.98m array_concat_INTEGER_40##3arg 3.98s 251.52m array_concat_INTEGER_40##4arg 5.40s 185.08m array_concat_INTEGER_5##2arg 382.78ms 2.61 array_concat_INTEGER_5##3arg 565.82ms 1.77 array_concat_INTEGER_5##4arg 758.75ms 1.32 array_concat_VARCHAR_10##2arg 1.24s 803.73m array_concat_VARCHAR_10##3arg 1.81s 552.59m array_concat_VARCHAR_10##4arg 2.31s 432.19m array_concat_VARCHAR_20##2arg 3.38s 295.55m array_concat_VARCHAR_20##3arg 4.53s 220.65m array_concat_VARCHAR_20##4arg 5.61s 178.32m array_concat_VARCHAR_40##2arg 5.69s 175.66m array_concat_VARCHAR_40##3arg 9.95s 100.53m array_concat_VARCHAR_40##4arg 11.99s 83.39m array_concat_VARCHAR_5##2arg 523.94ms 1.91 array_concat_VARCHAR_5##3arg 797.74ms 1.25 array_concat_VARCHAR_5##4arg 1.05s 954.15m ``` after: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 190.24ms 5.26 array_concat_BOOLEAN_10##3arg 253.91ms 3.94 array_concat_BOOLEAN_10##4arg 387.03ms 2.58 array_concat_BOOLEAN_20##2arg 484.33ms 2.06 array_concat_BOOLEAN_20##3arg 766.60ms 1.30 array_concat_BOOLEAN_20##4arg 1.01s 991.11m array_concat_BOOLEAN_40##2arg 982.59ms 1.02 array_concat_BOOLEAN_40##3arg 1.36s 736.99m array_concat_BOOLEAN_40##4arg 1.74s 575.58m array_concat_BOOLEAN_5##2arg 139.40ms 7.17 array_concat_BOOLEAN_5##3arg 214.43ms 4.66 array_concat_BOOLEAN_5##4arg 273.88ms 3.65 array_concat_INTEGER_10##2arg 80.90ms 12.36 array_concat_INTEGER_10##3arg 110.80ms 9.03 array_concat_INTEGER_10##4arg 149.86ms 6.67 array_concat_INTEGER_20##2arg 167.08ms 5.99 array_concat_INTEGER_20##3arg 261.83ms 3.82 array_concat_INTEGER_20##4arg 319.26ms 3.13 array_concat_INTEGER_40##2arg 301.37ms 3.32 array_concat_INTEGER_40##3arg 422.25ms 2.37 array_concat_INTEGER_40##4arg 714.74ms 1.40 array_concat_INTEGER_5##2arg 60.61ms 16.50 array_concat_INTEGER_5##3arg 89.28ms 11.20 array_concat_INTEGER_5##4arg 117.99ms 8.48 array_concat_VARCHAR_10##2arg 652.44ms 1.53 array_concat_VARCHAR_10##3arg 958.59ms 1.04 array_concat_VARCHAR_10##4arg 1.26s 790.86m array_concat_VARCHAR_20##2arg 1.67s 598.25m array_concat_VARCHAR_20##3arg 2.22s 449.48m array_concat_VARCHAR_20##4arg 2.82s 355.01m array_concat_VARCHAR_40##2arg 2.83s 353.24m array_concat_VARCHAR_40##3arg 4.98s 200.99m array_concat_VARCHAR_40##4arg 7.03s 142.22m array_concat_VARCHAR_5##2arg 290.04ms 3.45 array_concat_VARCHAR_5##3arg 438.06ms 2.28 array_concat_VARCHAR_5##4arg 584.20ms 1.71 ``` Differential Revision: D50948537
…bookincubator#7393) Summary: Optimize ArrayConcatFunction for primitives, similar to what we do for registerArrayRemoveFunctions and registerArrayTrimFunctions. Note: we can further optimize this by adding fast path for strings and add a no copy version for that. Note: there are also still several functions that uses add_items() and do not have such fast path we shall optimize those also. Follow up will address the points above. before: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 737.98ms 1.36 array_concat_BOOLEAN_10##3arg 1.14s 874.37m array_concat_BOOLEAN_10##4arg 1.50s 666.55m array_concat_BOOLEAN_20##2arg 1.58s 631.44m array_concat_BOOLEAN_20##3arg 2.33s 428.92m array_concat_BOOLEAN_20##4arg 3.22s 310.71m array_concat_BOOLEAN_40##2arg 3.07s 325.76m array_concat_BOOLEAN_40##3arg 4.75s 210.37m array_concat_BOOLEAN_40##4arg 6.32s 158.26m array_concat_BOOLEAN_5##2arg 451.47ms 2.21 array_concat_BOOLEAN_5##3arg 674.46ms 1.48 array_concat_BOOLEAN_5##4arg 859.56ms 1.16 array_concat_INTEGER_10##2arg 706.34ms 1.42 array_concat_INTEGER_10##3arg 1.09s 919.50m array_concat_INTEGER_10##4arg 1.47s 681.77m array_concat_INTEGER_20##2arg 1.40s 716.06m array_concat_INTEGER_20##3arg 2.02s 494.92m array_concat_INTEGER_20##4arg 2.73s 366.24m array_concat_INTEGER_40##2arg 2.68s 372.98m array_concat_INTEGER_40##3arg 3.98s 251.52m array_concat_INTEGER_40##4arg 5.40s 185.08m array_concat_INTEGER_5##2arg 382.78ms 2.61 array_concat_INTEGER_5##3arg 565.82ms 1.77 array_concat_INTEGER_5##4arg 758.75ms 1.32 array_concat_VARCHAR_10##2arg 1.24s 803.73m array_concat_VARCHAR_10##3arg 1.81s 552.59m array_concat_VARCHAR_10##4arg 2.31s 432.19m array_concat_VARCHAR_20##2arg 3.38s 295.55m array_concat_VARCHAR_20##3arg 4.53s 220.65m array_concat_VARCHAR_20##4arg 5.61s 178.32m array_concat_VARCHAR_40##2arg 5.69s 175.66m array_concat_VARCHAR_40##3arg 9.95s 100.53m array_concat_VARCHAR_40##4arg 11.99s 83.39m array_concat_VARCHAR_5##2arg 523.94ms 1.91 array_concat_VARCHAR_5##3arg 797.74ms 1.25 array_concat_VARCHAR_5##4arg 1.05s 954.15m ``` after: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 190.24ms 5.26 array_concat_BOOLEAN_10##3arg 253.91ms 3.94 array_concat_BOOLEAN_10##4arg 387.03ms 2.58 array_concat_BOOLEAN_20##2arg 484.33ms 2.06 array_concat_BOOLEAN_20##3arg 766.60ms 1.30 array_concat_BOOLEAN_20##4arg 1.01s 991.11m array_concat_BOOLEAN_40##2arg 982.59ms 1.02 array_concat_BOOLEAN_40##3arg 1.36s 736.99m array_concat_BOOLEAN_40##4arg 1.74s 575.58m array_concat_BOOLEAN_5##2arg 139.40ms 7.17 array_concat_BOOLEAN_5##3arg 214.43ms 4.66 array_concat_BOOLEAN_5##4arg 273.88ms 3.65 array_concat_INTEGER_10##2arg 80.90ms 12.36 array_concat_INTEGER_10##3arg 110.80ms 9.03 array_concat_INTEGER_10##4arg 149.86ms 6.67 array_concat_INTEGER_20##2arg 167.08ms 5.99 array_concat_INTEGER_20##3arg 261.83ms 3.82 array_concat_INTEGER_20##4arg 319.26ms 3.13 array_concat_INTEGER_40##2arg 301.37ms 3.32 array_concat_INTEGER_40##3arg 422.25ms 2.37 array_concat_INTEGER_40##4arg 714.74ms 1.40 array_concat_INTEGER_5##2arg 60.61ms 16.50 array_concat_INTEGER_5##3arg 89.28ms 11.20 array_concat_INTEGER_5##4arg 117.99ms 8.48 array_concat_VARCHAR_10##2arg 652.44ms 1.53 array_concat_VARCHAR_10##3arg 958.59ms 1.04 array_concat_VARCHAR_10##4arg 1.26s 790.86m array_concat_VARCHAR_20##2arg 1.67s 598.25m array_concat_VARCHAR_20##3arg 2.22s 449.48m array_concat_VARCHAR_20##4arg 2.82s 355.01m array_concat_VARCHAR_40##2arg 2.83s 353.24m array_concat_VARCHAR_40##3arg 4.98s 200.99m array_concat_VARCHAR_40##4arg 7.03s 142.22m array_concat_VARCHAR_5##2arg 290.04ms 3.45 array_concat_VARCHAR_5##3arg 438.06ms 2.28 array_concat_VARCHAR_5##4arg 584.20ms 1.71 ``` Differential Revision: D50948537
…bookincubator#7393) Summary: Optimize ArrayConcatFunction for primitives, similar to what we do for registerArrayRemoveFunctions and registerArrayTrimFunctions. Note: we can further optimize this by adding fast path for strings and add a no copy version for that. Note: there are also still several functions that uses add_items() and do not have such fast path we shall optimize those also. Follow up will address the points above. before: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 737.98ms 1.36 array_concat_BOOLEAN_10##3arg 1.14s 874.37m array_concat_BOOLEAN_10##4arg 1.50s 666.55m array_concat_BOOLEAN_20##2arg 1.58s 631.44m array_concat_BOOLEAN_20##3arg 2.33s 428.92m array_concat_BOOLEAN_20##4arg 3.22s 310.71m array_concat_BOOLEAN_40##2arg 3.07s 325.76m array_concat_BOOLEAN_40##3arg 4.75s 210.37m array_concat_BOOLEAN_40##4arg 6.32s 158.26m array_concat_BOOLEAN_5##2arg 451.47ms 2.21 array_concat_BOOLEAN_5##3arg 674.46ms 1.48 array_concat_BOOLEAN_5##4arg 859.56ms 1.16 array_concat_INTEGER_10##2arg 706.34ms 1.42 array_concat_INTEGER_10##3arg 1.09s 919.50m array_concat_INTEGER_10##4arg 1.47s 681.77m array_concat_INTEGER_20##2arg 1.40s 716.06m array_concat_INTEGER_20##3arg 2.02s 494.92m array_concat_INTEGER_20##4arg 2.73s 366.24m array_concat_INTEGER_40##2arg 2.68s 372.98m array_concat_INTEGER_40##3arg 3.98s 251.52m array_concat_INTEGER_40##4arg 5.40s 185.08m array_concat_INTEGER_5##2arg 382.78ms 2.61 array_concat_INTEGER_5##3arg 565.82ms 1.77 array_concat_INTEGER_5##4arg 758.75ms 1.32 array_concat_VARCHAR_10##2arg 1.24s 803.73m array_concat_VARCHAR_10##3arg 1.81s 552.59m array_concat_VARCHAR_10##4arg 2.31s 432.19m array_concat_VARCHAR_20##2arg 3.38s 295.55m array_concat_VARCHAR_20##3arg 4.53s 220.65m array_concat_VARCHAR_20##4arg 5.61s 178.32m array_concat_VARCHAR_40##2arg 5.69s 175.66m array_concat_VARCHAR_40##3arg 9.95s 100.53m array_concat_VARCHAR_40##4arg 11.99s 83.39m array_concat_VARCHAR_5##2arg 523.94ms 1.91 array_concat_VARCHAR_5##3arg 797.74ms 1.25 array_concat_VARCHAR_5##4arg 1.05s 954.15m ``` after: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 190.24ms 5.26 array_concat_BOOLEAN_10##3arg 253.91ms 3.94 array_concat_BOOLEAN_10##4arg 387.03ms 2.58 array_concat_BOOLEAN_20##2arg 484.33ms 2.06 array_concat_BOOLEAN_20##3arg 766.60ms 1.30 array_concat_BOOLEAN_20##4arg 1.01s 991.11m array_concat_BOOLEAN_40##2arg 982.59ms 1.02 array_concat_BOOLEAN_40##3arg 1.36s 736.99m array_concat_BOOLEAN_40##4arg 1.74s 575.58m array_concat_BOOLEAN_5##2arg 139.40ms 7.17 array_concat_BOOLEAN_5##3arg 214.43ms 4.66 array_concat_BOOLEAN_5##4arg 273.88ms 3.65 array_concat_INTEGER_10##2arg 80.90ms 12.36 array_concat_INTEGER_10##3arg 110.80ms 9.03 array_concat_INTEGER_10##4arg 149.86ms 6.67 array_concat_INTEGER_20##2arg 167.08ms 5.99 array_concat_INTEGER_20##3arg 261.83ms 3.82 array_concat_INTEGER_20##4arg 319.26ms 3.13 array_concat_INTEGER_40##2arg 301.37ms 3.32 array_concat_INTEGER_40##3arg 422.25ms 2.37 array_concat_INTEGER_40##4arg 714.74ms 1.40 array_concat_INTEGER_5##2arg 60.61ms 16.50 array_concat_INTEGER_5##3arg 89.28ms 11.20 array_concat_INTEGER_5##4arg 117.99ms 8.48 array_concat_VARCHAR_10##2arg 652.44ms 1.53 array_concat_VARCHAR_10##3arg 958.59ms 1.04 array_concat_VARCHAR_10##4arg 1.26s 790.86m array_concat_VARCHAR_20##2arg 1.67s 598.25m array_concat_VARCHAR_20##3arg 2.22s 449.48m array_concat_VARCHAR_20##4arg 2.82s 355.01m array_concat_VARCHAR_40##2arg 2.83s 353.24m array_concat_VARCHAR_40##3arg 4.98s 200.99m array_concat_VARCHAR_40##4arg 7.03s 142.22m array_concat_VARCHAR_5##2arg 290.04ms 3.45 array_concat_VARCHAR_5##3arg 438.06ms 2.28 array_concat_VARCHAR_5##4arg 584.20ms 1.71 ``` Differential Revision: D50948537
…bookincubator#7393) Summary: Optimize ArrayConcatFunction for primitives, similar to what we do for registerArrayRemoveFunctions and registerArrayTrimFunctions. Note: we can further optimize this by adding fast path for strings and add a no copy version for that. Note: there are also still several functions that uses add_items() and do not have such fast path we shall optimize those also. Follow up will address the points above. before: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 737.98ms 1.36 array_concat_BOOLEAN_10##3arg 1.14s 874.37m array_concat_BOOLEAN_10##4arg 1.50s 666.55m array_concat_BOOLEAN_20##2arg 1.58s 631.44m array_concat_BOOLEAN_20##3arg 2.33s 428.92m array_concat_BOOLEAN_20##4arg 3.22s 310.71m array_concat_BOOLEAN_40##2arg 3.07s 325.76m array_concat_BOOLEAN_40##3arg 4.75s 210.37m array_concat_BOOLEAN_40##4arg 6.32s 158.26m array_concat_BOOLEAN_5##2arg 451.47ms 2.21 array_concat_BOOLEAN_5##3arg 674.46ms 1.48 array_concat_BOOLEAN_5##4arg 859.56ms 1.16 array_concat_INTEGER_10##2arg 706.34ms 1.42 array_concat_INTEGER_10##3arg 1.09s 919.50m array_concat_INTEGER_10##4arg 1.47s 681.77m array_concat_INTEGER_20##2arg 1.40s 716.06m array_concat_INTEGER_20##3arg 2.02s 494.92m array_concat_INTEGER_20##4arg 2.73s 366.24m array_concat_INTEGER_40##2arg 2.68s 372.98m array_concat_INTEGER_40##3arg 3.98s 251.52m array_concat_INTEGER_40##4arg 5.40s 185.08m array_concat_INTEGER_5##2arg 382.78ms 2.61 array_concat_INTEGER_5##3arg 565.82ms 1.77 array_concat_INTEGER_5##4arg 758.75ms 1.32 array_concat_VARCHAR_10##2arg 1.24s 803.73m array_concat_VARCHAR_10##3arg 1.81s 552.59m array_concat_VARCHAR_10##4arg 2.31s 432.19m array_concat_VARCHAR_20##2arg 3.38s 295.55m array_concat_VARCHAR_20##3arg 4.53s 220.65m array_concat_VARCHAR_20##4arg 5.61s 178.32m array_concat_VARCHAR_40##2arg 5.69s 175.66m array_concat_VARCHAR_40##3arg 9.95s 100.53m array_concat_VARCHAR_40##4arg 11.99s 83.39m array_concat_VARCHAR_5##2arg 523.94ms 1.91 array_concat_VARCHAR_5##3arg 797.74ms 1.25 array_concat_VARCHAR_5##4arg 1.05s 954.15m ``` after: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 190.24ms 5.26 array_concat_BOOLEAN_10##3arg 253.91ms 3.94 array_concat_BOOLEAN_10##4arg 387.03ms 2.58 array_concat_BOOLEAN_20##2arg 484.33ms 2.06 array_concat_BOOLEAN_20##3arg 766.60ms 1.30 array_concat_BOOLEAN_20##4arg 1.01s 991.11m array_concat_BOOLEAN_40##2arg 982.59ms 1.02 array_concat_BOOLEAN_40##3arg 1.36s 736.99m array_concat_BOOLEAN_40##4arg 1.74s 575.58m array_concat_BOOLEAN_5##2arg 139.40ms 7.17 array_concat_BOOLEAN_5##3arg 214.43ms 4.66 array_concat_BOOLEAN_5##4arg 273.88ms 3.65 array_concat_INTEGER_10##2arg 80.90ms 12.36 array_concat_INTEGER_10##3arg 110.80ms 9.03 array_concat_INTEGER_10##4arg 149.86ms 6.67 array_concat_INTEGER_20##2arg 167.08ms 5.99 array_concat_INTEGER_20##3arg 261.83ms 3.82 array_concat_INTEGER_20##4arg 319.26ms 3.13 array_concat_INTEGER_40##2arg 301.37ms 3.32 array_concat_INTEGER_40##3arg 422.25ms 2.37 array_concat_INTEGER_40##4arg 714.74ms 1.40 array_concat_INTEGER_5##2arg 60.61ms 16.50 array_concat_INTEGER_5##3arg 89.28ms 11.20 array_concat_INTEGER_5##4arg 117.99ms 8.48 array_concat_VARCHAR_10##2arg 652.44ms 1.53 array_concat_VARCHAR_10##3arg 958.59ms 1.04 array_concat_VARCHAR_10##4arg 1.26s 790.86m array_concat_VARCHAR_20##2arg 1.67s 598.25m array_concat_VARCHAR_20##3arg 2.22s 449.48m array_concat_VARCHAR_20##4arg 2.82s 355.01m array_concat_VARCHAR_40##2arg 2.83s 353.24m array_concat_VARCHAR_40##3arg 4.98s 200.99m array_concat_VARCHAR_40##4arg 7.03s 142.22m array_concat_VARCHAR_5##2arg 290.04ms 3.45 array_concat_VARCHAR_5##3arg 438.06ms 2.28 array_concat_VARCHAR_5##4arg 584.20ms 1.71 ``` Differential Revision: D50948537
…bookincubator#7393) Summary: Optimize ArrayConcatFunction for primitives, similar to what we do for registerArrayRemoveFunctions and registerArrayTrimFunctions. Note: we can further optimize this by adding fast path for strings and add a no copy version for that. Note: there are also still several functions that uses add_items() and do not have such fast path we shall optimize those also. Follow up will address the points above. before: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 737.98ms 1.36 array_concat_BOOLEAN_10##3arg 1.14s 874.37m array_concat_BOOLEAN_10##4arg 1.50s 666.55m array_concat_BOOLEAN_20##2arg 1.58s 631.44m array_concat_BOOLEAN_20##3arg 2.33s 428.92m array_concat_BOOLEAN_20##4arg 3.22s 310.71m array_concat_BOOLEAN_40##2arg 3.07s 325.76m array_concat_BOOLEAN_40##3arg 4.75s 210.37m array_concat_BOOLEAN_40##4arg 6.32s 158.26m array_concat_BOOLEAN_5##2arg 451.47ms 2.21 array_concat_BOOLEAN_5##3arg 674.46ms 1.48 array_concat_BOOLEAN_5##4arg 859.56ms 1.16 array_concat_INTEGER_10##2arg 706.34ms 1.42 array_concat_INTEGER_10##3arg 1.09s 919.50m array_concat_INTEGER_10##4arg 1.47s 681.77m array_concat_INTEGER_20##2arg 1.40s 716.06m array_concat_INTEGER_20##3arg 2.02s 494.92m array_concat_INTEGER_20##4arg 2.73s 366.24m array_concat_INTEGER_40##2arg 2.68s 372.98m array_concat_INTEGER_40##3arg 3.98s 251.52m array_concat_INTEGER_40##4arg 5.40s 185.08m array_concat_INTEGER_5##2arg 382.78ms 2.61 array_concat_INTEGER_5##3arg 565.82ms 1.77 array_concat_INTEGER_5##4arg 758.75ms 1.32 array_concat_VARCHAR_10##2arg 1.24s 803.73m array_concat_VARCHAR_10##3arg 1.81s 552.59m array_concat_VARCHAR_10##4arg 2.31s 432.19m array_concat_VARCHAR_20##2arg 3.38s 295.55m array_concat_VARCHAR_20##3arg 4.53s 220.65m array_concat_VARCHAR_20##4arg 5.61s 178.32m array_concat_VARCHAR_40##2arg 5.69s 175.66m array_concat_VARCHAR_40##3arg 9.95s 100.53m array_concat_VARCHAR_40##4arg 11.99s 83.39m array_concat_VARCHAR_5##2arg 523.94ms 1.91 array_concat_VARCHAR_5##3arg 797.74ms 1.25 array_concat_VARCHAR_5##4arg 1.05s 954.15m ``` after: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 190.24ms 5.26 array_concat_BOOLEAN_10##3arg 253.91ms 3.94 array_concat_BOOLEAN_10##4arg 387.03ms 2.58 array_concat_BOOLEAN_20##2arg 484.33ms 2.06 array_concat_BOOLEAN_20##3arg 766.60ms 1.30 array_concat_BOOLEAN_20##4arg 1.01s 991.11m array_concat_BOOLEAN_40##2arg 982.59ms 1.02 array_concat_BOOLEAN_40##3arg 1.36s 736.99m array_concat_BOOLEAN_40##4arg 1.74s 575.58m array_concat_BOOLEAN_5##2arg 139.40ms 7.17 array_concat_BOOLEAN_5##3arg 214.43ms 4.66 array_concat_BOOLEAN_5##4arg 273.88ms 3.65 array_concat_INTEGER_10##2arg 80.90ms 12.36 array_concat_INTEGER_10##3arg 110.80ms 9.03 array_concat_INTEGER_10##4arg 149.86ms 6.67 array_concat_INTEGER_20##2arg 167.08ms 5.99 array_concat_INTEGER_20##3arg 261.83ms 3.82 array_concat_INTEGER_20##4arg 319.26ms 3.13 array_concat_INTEGER_40##2arg 301.37ms 3.32 array_concat_INTEGER_40##3arg 422.25ms 2.37 array_concat_INTEGER_40##4arg 714.74ms 1.40 array_concat_INTEGER_5##2arg 60.61ms 16.50 array_concat_INTEGER_5##3arg 89.28ms 11.20 array_concat_INTEGER_5##4arg 117.99ms 8.48 array_concat_VARCHAR_10##2arg 652.44ms 1.53 array_concat_VARCHAR_10##3arg 958.59ms 1.04 array_concat_VARCHAR_10##4arg 1.26s 790.86m array_concat_VARCHAR_20##2arg 1.67s 598.25m array_concat_VARCHAR_20##3arg 2.22s 449.48m array_concat_VARCHAR_20##4arg 2.82s 355.01m array_concat_VARCHAR_40##2arg 2.83s 353.24m array_concat_VARCHAR_40##3arg 4.98s 200.99m array_concat_VARCHAR_40##4arg 7.03s 142.22m array_concat_VARCHAR_5##2arg 290.04ms 3.45 array_concat_VARCHAR_5##3arg 438.06ms 2.28 array_concat_VARCHAR_5##4arg 584.20ms 1.71 ``` Differential Revision: D50948537
…bookincubator#7393) Summary: Optimize ArrayConcatFunction for primitives, similar to what we do for registerArrayRemoveFunctions and registerArrayTrimFunctions. Note: we can further optimize this by adding fast path for strings and add a no copy version for that. Note: there are also still several functions that uses add_items() and do not have such fast path we shall optimize those also. Follow up will address the points above. before: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 737.98ms 1.36 array_concat_BOOLEAN_10##3arg 1.14s 874.37m array_concat_BOOLEAN_10##4arg 1.50s 666.55m array_concat_BOOLEAN_20##2arg 1.58s 631.44m array_concat_BOOLEAN_20##3arg 2.33s 428.92m array_concat_BOOLEAN_20##4arg 3.22s 310.71m array_concat_BOOLEAN_40##2arg 3.07s 325.76m array_concat_BOOLEAN_40##3arg 4.75s 210.37m array_concat_BOOLEAN_40##4arg 6.32s 158.26m array_concat_BOOLEAN_5##2arg 451.47ms 2.21 array_concat_BOOLEAN_5##3arg 674.46ms 1.48 array_concat_BOOLEAN_5##4arg 859.56ms 1.16 array_concat_INTEGER_10##2arg 706.34ms 1.42 array_concat_INTEGER_10##3arg 1.09s 919.50m array_concat_INTEGER_10##4arg 1.47s 681.77m array_concat_INTEGER_20##2arg 1.40s 716.06m array_concat_INTEGER_20##3arg 2.02s 494.92m array_concat_INTEGER_20##4arg 2.73s 366.24m array_concat_INTEGER_40##2arg 2.68s 372.98m array_concat_INTEGER_40##3arg 3.98s 251.52m array_concat_INTEGER_40##4arg 5.40s 185.08m array_concat_INTEGER_5##2arg 382.78ms 2.61 array_concat_INTEGER_5##3arg 565.82ms 1.77 array_concat_INTEGER_5##4arg 758.75ms 1.32 array_concat_VARCHAR_10##2arg 1.24s 803.73m array_concat_VARCHAR_10##3arg 1.81s 552.59m array_concat_VARCHAR_10##4arg 2.31s 432.19m array_concat_VARCHAR_20##2arg 3.38s 295.55m array_concat_VARCHAR_20##3arg 4.53s 220.65m array_concat_VARCHAR_20##4arg 5.61s 178.32m array_concat_VARCHAR_40##2arg 5.69s 175.66m array_concat_VARCHAR_40##3arg 9.95s 100.53m array_concat_VARCHAR_40##4arg 11.99s 83.39m array_concat_VARCHAR_5##2arg 523.94ms 1.91 array_concat_VARCHAR_5##3arg 797.74ms 1.25 array_concat_VARCHAR_5##4arg 1.05s 954.15m ``` after: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 190.24ms 5.26 array_concat_BOOLEAN_10##3arg 253.91ms 3.94 array_concat_BOOLEAN_10##4arg 387.03ms 2.58 array_concat_BOOLEAN_20##2arg 484.33ms 2.06 array_concat_BOOLEAN_20##3arg 766.60ms 1.30 array_concat_BOOLEAN_20##4arg 1.01s 991.11m array_concat_BOOLEAN_40##2arg 982.59ms 1.02 array_concat_BOOLEAN_40##3arg 1.36s 736.99m array_concat_BOOLEAN_40##4arg 1.74s 575.58m array_concat_BOOLEAN_5##2arg 139.40ms 7.17 array_concat_BOOLEAN_5##3arg 214.43ms 4.66 array_concat_BOOLEAN_5##4arg 273.88ms 3.65 array_concat_INTEGER_10##2arg 80.90ms 12.36 array_concat_INTEGER_10##3arg 110.80ms 9.03 array_concat_INTEGER_10##4arg 149.86ms 6.67 array_concat_INTEGER_20##2arg 167.08ms 5.99 array_concat_INTEGER_20##3arg 261.83ms 3.82 array_concat_INTEGER_20##4arg 319.26ms 3.13 array_concat_INTEGER_40##2arg 301.37ms 3.32 array_concat_INTEGER_40##3arg 422.25ms 2.37 array_concat_INTEGER_40##4arg 714.74ms 1.40 array_concat_INTEGER_5##2arg 60.61ms 16.50 array_concat_INTEGER_5##3arg 89.28ms 11.20 array_concat_INTEGER_5##4arg 117.99ms 8.48 array_concat_VARCHAR_10##2arg 652.44ms 1.53 array_concat_VARCHAR_10##3arg 958.59ms 1.04 array_concat_VARCHAR_10##4arg 1.26s 790.86m array_concat_VARCHAR_20##2arg 1.67s 598.25m array_concat_VARCHAR_20##3arg 2.22s 449.48m array_concat_VARCHAR_20##4arg 2.82s 355.01m array_concat_VARCHAR_40##2arg 2.83s 353.24m array_concat_VARCHAR_40##3arg 4.98s 200.99m array_concat_VARCHAR_40##4arg 7.03s 142.22m array_concat_VARCHAR_5##2arg 290.04ms 3.45 array_concat_VARCHAR_5##3arg 438.06ms 2.28 array_concat_VARCHAR_5##4arg 584.20ms 1.71 ``` Reviewed By: mbasmanova Differential Revision: D50948537
…bookincubator#7393) Summary: Optimize ArrayConcatFunction for primitives, similar to what we do for registerArrayRemoveFunctions and registerArrayTrimFunctions. Note: we can further optimize this by adding fast path for strings and add a no copy version for that. Note: there are also still several functions that uses add_items() and do not have such fast path we shall optimize those also. Follow up will address the points above. before: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 737.98ms 1.36 array_concat_BOOLEAN_10##3arg 1.14s 874.37m array_concat_BOOLEAN_10##4arg 1.50s 666.55m array_concat_BOOLEAN_20##2arg 1.58s 631.44m array_concat_BOOLEAN_20##3arg 2.33s 428.92m array_concat_BOOLEAN_20##4arg 3.22s 310.71m array_concat_BOOLEAN_40##2arg 3.07s 325.76m array_concat_BOOLEAN_40##3arg 4.75s 210.37m array_concat_BOOLEAN_40##4arg 6.32s 158.26m array_concat_BOOLEAN_5##2arg 451.47ms 2.21 array_concat_BOOLEAN_5##3arg 674.46ms 1.48 array_concat_BOOLEAN_5##4arg 859.56ms 1.16 array_concat_INTEGER_10##2arg 706.34ms 1.42 array_concat_INTEGER_10##3arg 1.09s 919.50m array_concat_INTEGER_10##4arg 1.47s 681.77m array_concat_INTEGER_20##2arg 1.40s 716.06m array_concat_INTEGER_20##3arg 2.02s 494.92m array_concat_INTEGER_20##4arg 2.73s 366.24m array_concat_INTEGER_40##2arg 2.68s 372.98m array_concat_INTEGER_40##3arg 3.98s 251.52m array_concat_INTEGER_40##4arg 5.40s 185.08m array_concat_INTEGER_5##2arg 382.78ms 2.61 array_concat_INTEGER_5##3arg 565.82ms 1.77 array_concat_INTEGER_5##4arg 758.75ms 1.32 array_concat_VARCHAR_10##2arg 1.24s 803.73m array_concat_VARCHAR_10##3arg 1.81s 552.59m array_concat_VARCHAR_10##4arg 2.31s 432.19m array_concat_VARCHAR_20##2arg 3.38s 295.55m array_concat_VARCHAR_20##3arg 4.53s 220.65m array_concat_VARCHAR_20##4arg 5.61s 178.32m array_concat_VARCHAR_40##2arg 5.69s 175.66m array_concat_VARCHAR_40##3arg 9.95s 100.53m array_concat_VARCHAR_40##4arg 11.99s 83.39m array_concat_VARCHAR_5##2arg 523.94ms 1.91 array_concat_VARCHAR_5##3arg 797.74ms 1.25 array_concat_VARCHAR_5##4arg 1.05s 954.15m ``` after: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 190.24ms 5.26 array_concat_BOOLEAN_10##3arg 253.91ms 3.94 array_concat_BOOLEAN_10##4arg 387.03ms 2.58 array_concat_BOOLEAN_20##2arg 484.33ms 2.06 array_concat_BOOLEAN_20##3arg 766.60ms 1.30 array_concat_BOOLEAN_20##4arg 1.01s 991.11m array_concat_BOOLEAN_40##2arg 982.59ms 1.02 array_concat_BOOLEAN_40##3arg 1.36s 736.99m array_concat_BOOLEAN_40##4arg 1.74s 575.58m array_concat_BOOLEAN_5##2arg 139.40ms 7.17 array_concat_BOOLEAN_5##3arg 214.43ms 4.66 array_concat_BOOLEAN_5##4arg 273.88ms 3.65 array_concat_INTEGER_10##2arg 80.90ms 12.36 array_concat_INTEGER_10##3arg 110.80ms 9.03 array_concat_INTEGER_10##4arg 149.86ms 6.67 array_concat_INTEGER_20##2arg 167.08ms 5.99 array_concat_INTEGER_20##3arg 261.83ms 3.82 array_concat_INTEGER_20##4arg 319.26ms 3.13 array_concat_INTEGER_40##2arg 301.37ms 3.32 array_concat_INTEGER_40##3arg 422.25ms 2.37 array_concat_INTEGER_40##4arg 714.74ms 1.40 array_concat_INTEGER_5##2arg 60.61ms 16.50 array_concat_INTEGER_5##3arg 89.28ms 11.20 array_concat_INTEGER_5##4arg 117.99ms 8.48 array_concat_VARCHAR_10##2arg 652.44ms 1.53 array_concat_VARCHAR_10##3arg 958.59ms 1.04 array_concat_VARCHAR_10##4arg 1.26s 790.86m array_concat_VARCHAR_20##2arg 1.67s 598.25m array_concat_VARCHAR_20##3arg 2.22s 449.48m array_concat_VARCHAR_20##4arg 2.82s 355.01m array_concat_VARCHAR_40##2arg 2.83s 353.24m array_concat_VARCHAR_40##3arg 4.98s 200.99m array_concat_VARCHAR_40##4arg 7.03s 142.22m array_concat_VARCHAR_5##2arg 290.04ms 3.45 array_concat_VARCHAR_5##3arg 438.06ms 2.28 array_concat_VARCHAR_5##4arg 584.20ms 1.71 ``` Reviewed By: mbasmanova Differential Revision: D50948537
This pull request has been merged in 808e7fd. |
…mitive types. Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primtive fast paths. ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 556.35ms 1.80 array_concat_BOOLEAN_10##3arg 788.51ms 1.27 array_concat_BOOLEAN_10##4arg 1.12s 891.97m array_concat_BOOLEAN_20##2arg 1.18s 847.41m array_concat_BOOLEAN_20##3arg 1.78s 561.03m array_concat_BOOLEAN_20##4arg 2.39s 418.56m array_concat_BOOLEAN_40##2arg 2.33s 429.55m array_concat_BOOLEAN_40##3arg 3.37s 296.31m array_concat_BOOLEAN_40##4arg 4.67s 214.33m array_concat_BOOLEAN_5##2arg 320.80ms 3.12 array_concat_BOOLEAN_5##3arg 478.21ms 2.09 array_concat_BOOLEAN_5##4arg 628.29ms 1.59 array_concat_INTEGER_10##2arg 451.29ms 2.22 array_concat_INTEGER_10##3arg 674.39ms 1.48 array_concat_INTEGER_10##4arg 912.72ms 1.10 array_concat_INTEGER_20##2arg 902.84ms 1.11 array_concat_INTEGER_20##3arg 1.42s 704.16m array_concat_INTEGER_20##4arg 1.87s 533.34m array_concat_INTEGER_40##2arg 1.78s 562.38m array_concat_INTEGER_40##3arg 2.65s 377.14m array_concat_INTEGER_40##4arg 3.62s 276.04m array_concat_INTEGER_5##2arg 243.91ms 4.10 array_concat_INTEGER_5##3arg 380.67ms 2.63 array_concat_INTEGER_5##4arg 505.00ms 1.98 array_concat_VARCHAR_10##2arg 1.25s 801.07m array_concat_VARCHAR_10##3arg 1.75s 572.05m array_concat_VARCHAR_10##4arg 2.25s 444.06m array_concat_VARCHAR_20##2arg 3.07s 325.81m array_concat_VARCHAR_20##3arg 3.93s 254.38m array_concat_VARCHAR_20##4arg 4.98s 200.87m array_concat_VARCHAR_40##2arg 5.04s 198.40m array_concat_VARCHAR_40##3arg 8.38s 119.37m array_concat_VARCHAR_40##4arg 10.56s 94.69m array_concat_VARCHAR_5##2arg 511.14ms 1.96 array_concat_VARCHAR_5##3arg 757.66ms 1.32 array_concat_VARCHAR_5##4arg 994.37ms 1.01 ``` generic after ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 254.42ms 3.93 array_concat_BOOLEAN_10##3arg 339.38ms 2.95 array_concat_BOOLEAN_10##4arg 508.55ms 1.97 array_concat_BOOLEAN_20##2arg 589.61ms 1.70 array_concat_BOOLEAN_20##3arg 910.98ms 1.10 array_concat_BOOLEAN_20##4arg 1.22s 819.42m array_concat_BOOLEAN_40##2arg 1.11s 903.37m array_concat_BOOLEAN_40##3arg 1.61s 622.74m array_concat_BOOLEAN_40##4arg 2.05s 487.57m array_concat_BOOLEAN_5##2arg 193.32ms 5.17 array_concat_BOOLEAN_5##3arg 288.43ms 3.47 array_concat_BOOLEAN_5##4arg 385.29ms 2.60 array_concat_INTEGER_10##2arg 130.70ms 7.65 array_concat_INTEGER_10##3arg 179.97ms 5.56 array_concat_INTEGER_10##4arg 240.94ms 4.15 array_concat_INTEGER_20##2arg 186.63ms 5.36 array_concat_INTEGER_20##3arg 304.05ms 3.29 array_concat_INTEGER_20##4arg 372.18ms 2.69 array_concat_INTEGER_40##2arg 246.54ms 4.06 array_concat_INTEGER_40##3arg 309.65ms 3.23 array_concat_INTEGER_40##4arg 535.99ms 1.87 array_concat_INTEGER_5##2arg 133.79ms 7.47 array_concat_INTEGER_5##3arg 196.43ms 5.09 array_concat_INTEGER_5##4arg 248.34ms 4.03 array_concat_VARCHAR_10##2arg 394.72ms 2.53 array_concat_VARCHAR_10##3arg 508.08ms 1.97 array_concat_VARCHAR_10##4arg 648.42ms 1.54 array_concat_VARCHAR_20##2arg 674.18ms 1.48 array_concat_VARCHAR_20##3arg 857.50ms 1.17 array_concat_VARCHAR_20##4arg 1.22s 817.42m array_concat_VARCHAR_40##2arg 1.07s 935.35m array_concat_VARCHAR_40##3arg 1.64s 608.35m array_concat_VARCHAR_40##4arg 2.07s 483.29m array_concat_VARCHAR_5##2arg 183.28ms 5.46 array_concat_VARCHAR_5##3arg 262.13ms 3.81 array_concat_VARCHAR_5##4arg 332.48ms 3.01 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 230.01ms 4.35 array_concat_BOOLEAN_10##3arg 303.88ms 3.29 array_concat_BOOLEAN_10##4arg 463.32ms 2.16 array_concat_BOOLEAN_20##2arg 547.77ms 1.83 array_concat_BOOLEAN_20##3arg 832.71ms 1.20 array_concat_BOOLEAN_20##4arg 1.10s 912.79m array_concat_BOOLEAN_40##2arg 991.02ms 1.01 array_concat_BOOLEAN_40##3arg 1.48s 675.74m array_concat_BOOLEAN_40##4arg 1.96s 510.45m array_concat_BOOLEAN_5##2arg 178.92ms 5.59 array_concat_BOOLEAN_5##3arg 265.29ms 3.77 array_concat_BOOLEAN_5##4arg 350.31ms 2.85 array_concat_INTEGER_10##2arg 111.54ms 8.97 array_concat_INTEGER_10##3arg 151.91ms 6.58 array_concat_INTEGER_10##4arg 209.28ms 4.78 array_concat_INTEGER_20##2arg 150.28ms 6.65 array_concat_INTEGER_20##3arg 269.52ms 3.71 array_concat_INTEGER_20##4arg 337.27ms 2.97 array_concat_INTEGER_40##2arg 213.27ms 4.69 array_concat_INTEGER_40##3arg 266.57ms 3.75 array_concat_INTEGER_40##4arg 483.33ms 2.07 array_concat_INTEGER_5##2arg 115.68ms 8.64 array_concat_INTEGER_5##3arg 168.24ms 5.94 array_concat_INTEGER_5##4arg 219.13ms 4.56 array_concat_VARCHAR_10##2arg 357.53ms 2.80 array_concat_VARCHAR_10##3arg 459.15ms 2.18 array_concat_VARCHAR_10##4arg 579.91ms 1.72 array_concat_VARCHAR_20##2arg 628.27ms 1.59 array_concat_VARCHAR_20##3arg 802.48ms 1.25 array_concat_VARCHAR_20##4arg 1.06s 947.41m array_concat_VARCHAR_40##2arg 930.88ms 1.07 array_concat_VARCHAR_40##3arg 1.46s 683.85m array_concat_VARCHAR_40##4arg 1.92s 520.41m array_concat_VARCHAR_5##2arg 161.55ms 6.19 array_concat_VARCHAR_5##3arg 214.94ms 4.65 array_concat_VARCHAR_5##4arg 280.15ms 3.57 ``` Differential Revision: D52380460
…mitive types. (facebookincubator#8194) Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 247.69ms 4.04 array_concat_BOOLEAN_10##3arg 328.93ms 3.04 array_concat_BOOLEAN_10##4arg 471.79ms 2.12 array_concat_BOOLEAN_20##2arg 591.39ms 1.69 array_concat_BOOLEAN_20##3arg 889.65ms 1.12 array_concat_BOOLEAN_20##4arg 1.13s 885.38m array_concat_BOOLEAN_40##2arg 1.11s 902.02m array_concat_BOOLEAN_40##3arg 1.63s 614.28m array_concat_BOOLEAN_40##4arg 2.06s 486.07m array_concat_BOOLEAN_5##2arg 178.77ms 5.59 array_concat_BOOLEAN_5##3arg 262.85ms 3.80 array_concat_BOOLEAN_5##4arg 358.08ms 2.79 array_concat_INTEGER_10##2arg 84.65ms 11.81 array_concat_INTEGER_10##3arg 116.97ms 8.55 array_concat_INTEGER_10##4arg 159.98ms 6.25 array_concat_INTEGER_20##2arg 145.19ms 6.89 array_concat_INTEGER_20##3arg 249.84ms 4.00 array_concat_INTEGER_20##4arg 298.28ms 3.35 array_concat_INTEGER_40##2arg 202.66ms 4.93 array_concat_INTEGER_40##3arg 249.71ms 4.00 array_concat_INTEGER_40##4arg 462.83ms 2.16 array_concat_INTEGER_5##2arg 86.36ms 11.58 array_concat_INTEGER_5##3arg 128.82ms 7.76 array_concat_INTEGER_5##4arg 165.59ms 6.04 array_concat_VARCHAR_10##2arg 388.89ms 2.57 array_concat_VARCHAR_10##3arg 495.35ms 2.02 array_concat_VARCHAR_10##4arg 626.90ms 1.60 array_concat_VARCHAR_20##2arg 671.03ms 1.49 array_concat_VARCHAR_20##3arg 870.87ms 1.15 array_concat_VARCHAR_20##4arg 1.13s 888.08m array_concat_VARCHAR_40##2arg 1.03s 967.24m array_concat_VARCHAR_40##3arg 1.63s 613.68m array_concat_VARCHAR_40##4arg 2.13s 469.60m array_concat_VARCHAR_5##2arg 158.09ms 6.33 array_concat_VARCHAR_5##3arg 212.99ms 4.70 array_concat_VARCHAR_5##4arg 287.64ms 3.48 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 230.01ms 4.35 array_concat_BOOLEAN_10##3arg 303.88ms 3.29 array_concat_BOOLEAN_10##4arg 463.32ms 2.16 array_concat_BOOLEAN_20##2arg 547.77ms 1.83 array_concat_BOOLEAN_20##3arg 832.71ms 1.20 array_concat_BOOLEAN_20##4arg 1.10s 912.79m array_concat_BOOLEAN_40##2arg 991.02ms 1.01 array_concat_BOOLEAN_40##3arg 1.48s 675.74m array_concat_BOOLEAN_40##4arg 1.96s 510.45m array_concat_BOOLEAN_5##2arg 178.92ms 5.59 array_concat_BOOLEAN_5##3arg 265.29ms 3.77 array_concat_BOOLEAN_5##4arg 350.31ms 2.85 array_concat_INTEGER_10##2arg 111.54ms 8.97 array_concat_INTEGER_10##3arg 151.91ms 6.58 array_concat_INTEGER_10##4arg 209.28ms 4.78 array_concat_INTEGER_20##2arg 150.28ms 6.65 array_concat_INTEGER_20##3arg 269.52ms 3.71 array_concat_INTEGER_20##4arg 337.27ms 2.97 array_concat_INTEGER_40##2arg 213.27ms 4.69 array_concat_INTEGER_40##3arg 266.57ms 3.75 array_concat_INTEGER_40##4arg 483.33ms 2.07 array_concat_INTEGER_5##2arg 115.68ms 8.64 array_concat_INTEGER_5##3arg 168.24ms 5.94 array_concat_INTEGER_5##4arg 219.13ms 4.56 array_concat_VARCHAR_10##2arg 357.53ms 2.80 array_concat_VARCHAR_10##3arg 459.15ms 2.18 array_concat_VARCHAR_10##4arg 579.91ms 1.72 array_concat_VARCHAR_20##2arg 628.27ms 1.59 array_concat_VARCHAR_20##3arg 802.48ms 1.25 array_concat_VARCHAR_20##4arg 1.06s 947.41m array_concat_VARCHAR_40##2arg 930.88ms 1.07 array_concat_VARCHAR_40##3arg 1.46s 683.85m array_concat_VARCHAR_40##4arg 1.92s 520.41m array_concat_VARCHAR_5##2arg 161.55ms 6.19 array_concat_VARCHAR_5##3arg 214.94ms 4.65 array_concat_VARCHAR_5##4arg 280.15ms 3.57 ``` Differential Revision: D52380460
…mitive types. (facebookincubator#8194) Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 247.69ms 4.04 array_concat_BOOLEAN_10##3arg 328.93ms 3.04 array_concat_BOOLEAN_10##4arg 471.79ms 2.12 array_concat_BOOLEAN_20##2arg 591.39ms 1.69 array_concat_BOOLEAN_20##3arg 889.65ms 1.12 array_concat_BOOLEAN_20##4arg 1.13s 885.38m array_concat_BOOLEAN_40##2arg 1.11s 902.02m array_concat_BOOLEAN_40##3arg 1.63s 614.28m array_concat_BOOLEAN_40##4arg 2.06s 486.07m array_concat_BOOLEAN_5##2arg 178.77ms 5.59 array_concat_BOOLEAN_5##3arg 262.85ms 3.80 array_concat_BOOLEAN_5##4arg 358.08ms 2.79 array_concat_INTEGER_10##2arg 84.65ms 11.81 array_concat_INTEGER_10##3arg 116.97ms 8.55 array_concat_INTEGER_10##4arg 159.98ms 6.25 array_concat_INTEGER_20##2arg 145.19ms 6.89 array_concat_INTEGER_20##3arg 249.84ms 4.00 array_concat_INTEGER_20##4arg 298.28ms 3.35 array_concat_INTEGER_40##2arg 202.66ms 4.93 array_concat_INTEGER_40##3arg 249.71ms 4.00 array_concat_INTEGER_40##4arg 462.83ms 2.16 array_concat_INTEGER_5##2arg 86.36ms 11.58 array_concat_INTEGER_5##3arg 128.82ms 7.76 array_concat_INTEGER_5##4arg 165.59ms 6.04 array_concat_VARCHAR_10##2arg 388.89ms 2.57 array_concat_VARCHAR_10##3arg 495.35ms 2.02 array_concat_VARCHAR_10##4arg 626.90ms 1.60 array_concat_VARCHAR_20##2arg 671.03ms 1.49 array_concat_VARCHAR_20##3arg 870.87ms 1.15 array_concat_VARCHAR_20##4arg 1.13s 888.08m array_concat_VARCHAR_40##2arg 1.03s 967.24m array_concat_VARCHAR_40##3arg 1.63s 613.68m array_concat_VARCHAR_40##4arg 2.13s 469.60m array_concat_VARCHAR_5##2arg 158.09ms 6.33 array_concat_VARCHAR_5##3arg 212.99ms 4.70 array_concat_VARCHAR_5##4arg 287.64ms 3.48 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 230.01ms 4.35 array_concat_BOOLEAN_10##3arg 303.88ms 3.29 array_concat_BOOLEAN_10##4arg 463.32ms 2.16 array_concat_BOOLEAN_20##2arg 547.77ms 1.83 array_concat_BOOLEAN_20##3arg 832.71ms 1.20 array_concat_BOOLEAN_20##4arg 1.10s 912.79m array_concat_BOOLEAN_40##2arg 991.02ms 1.01 array_concat_BOOLEAN_40##3arg 1.48s 675.74m array_concat_BOOLEAN_40##4arg 1.96s 510.45m array_concat_BOOLEAN_5##2arg 178.92ms 5.59 array_concat_BOOLEAN_5##3arg 265.29ms 3.77 array_concat_BOOLEAN_5##4arg 350.31ms 2.85 array_concat_INTEGER_10##2arg 111.54ms 8.97 array_concat_INTEGER_10##3arg 151.91ms 6.58 array_concat_INTEGER_10##4arg 209.28ms 4.78 array_concat_INTEGER_20##2arg 150.28ms 6.65 array_concat_INTEGER_20##3arg 269.52ms 3.71 array_concat_INTEGER_20##4arg 337.27ms 2.97 array_concat_INTEGER_40##2arg 213.27ms 4.69 array_concat_INTEGER_40##3arg 266.57ms 3.75 array_concat_INTEGER_40##4arg 483.33ms 2.07 array_concat_INTEGER_5##2arg 115.68ms 8.64 array_concat_INTEGER_5##3arg 168.24ms 5.94 array_concat_INTEGER_5##4arg 219.13ms 4.56 array_concat_VARCHAR_10##2arg 357.53ms 2.80 array_concat_VARCHAR_10##3arg 459.15ms 2.18 array_concat_VARCHAR_10##4arg 579.91ms 1.72 array_concat_VARCHAR_20##2arg 628.27ms 1.59 array_concat_VARCHAR_20##3arg 802.48ms 1.25 array_concat_VARCHAR_20##4arg 1.06s 947.41m array_concat_VARCHAR_40##2arg 930.88ms 1.07 array_concat_VARCHAR_40##3arg 1.46s 683.85m array_concat_VARCHAR_40##4arg 1.92s 520.41m array_concat_VARCHAR_5##2arg 161.55ms 6.19 array_concat_VARCHAR_5##3arg 214.94ms 4.65 array_concat_VARCHAR_5##4arg 280.15ms 3.57 ``` Differential Revision: D52380460
…mitive types. (facebookincubator#8194) Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 247.69ms 4.04 array_concat_BOOLEAN_10##3arg 328.93ms 3.04 array_concat_BOOLEAN_10##4arg 471.79ms 2.12 array_concat_BOOLEAN_20##2arg 591.39ms 1.69 array_concat_BOOLEAN_20##3arg 889.65ms 1.12 array_concat_BOOLEAN_20##4arg 1.13s 885.38m array_concat_BOOLEAN_40##2arg 1.11s 902.02m array_concat_BOOLEAN_40##3arg 1.63s 614.28m array_concat_BOOLEAN_40##4arg 2.06s 486.07m array_concat_BOOLEAN_5##2arg 178.77ms 5.59 array_concat_BOOLEAN_5##3arg 262.85ms 3.80 array_concat_BOOLEAN_5##4arg 358.08ms 2.79 array_concat_INTEGER_10##2arg 84.65ms 11.81 array_concat_INTEGER_10##3arg 116.97ms 8.55 array_concat_INTEGER_10##4arg 159.98ms 6.25 array_concat_INTEGER_20##2arg 145.19ms 6.89 array_concat_INTEGER_20##3arg 249.84ms 4.00 array_concat_INTEGER_20##4arg 298.28ms 3.35 array_concat_INTEGER_40##2arg 202.66ms 4.93 array_concat_INTEGER_40##3arg 249.71ms 4.00 array_concat_INTEGER_40##4arg 462.83ms 2.16 array_concat_INTEGER_5##2arg 86.36ms 11.58 array_concat_INTEGER_5##3arg 128.82ms 7.76 array_concat_INTEGER_5##4arg 165.59ms 6.04 array_concat_VARCHAR_10##2arg 388.89ms 2.57 array_concat_VARCHAR_10##3arg 495.35ms 2.02 array_concat_VARCHAR_10##4arg 626.90ms 1.60 array_concat_VARCHAR_20##2arg 671.03ms 1.49 array_concat_VARCHAR_20##3arg 870.87ms 1.15 array_concat_VARCHAR_20##4arg 1.13s 888.08m array_concat_VARCHAR_40##2arg 1.03s 967.24m array_concat_VARCHAR_40##3arg 1.63s 613.68m array_concat_VARCHAR_40##4arg 2.13s 469.60m array_concat_VARCHAR_5##2arg 158.09ms 6.33 array_concat_VARCHAR_5##3arg 212.99ms 4.70 array_concat_VARCHAR_5##4arg 287.64ms 3.48 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 230.01ms 4.35 array_concat_BOOLEAN_10##3arg 303.88ms 3.29 array_concat_BOOLEAN_10##4arg 463.32ms 2.16 array_concat_BOOLEAN_20##2arg 547.77ms 1.83 array_concat_BOOLEAN_20##3arg 832.71ms 1.20 array_concat_BOOLEAN_20##4arg 1.10s 912.79m array_concat_BOOLEAN_40##2arg 991.02ms 1.01 array_concat_BOOLEAN_40##3arg 1.48s 675.74m array_concat_BOOLEAN_40##4arg 1.96s 510.45m array_concat_BOOLEAN_5##2arg 178.92ms 5.59 array_concat_BOOLEAN_5##3arg 265.29ms 3.77 array_concat_BOOLEAN_5##4arg 350.31ms 2.85 array_concat_INTEGER_10##2arg 111.54ms 8.97 array_concat_INTEGER_10##3arg 151.91ms 6.58 array_concat_INTEGER_10##4arg 209.28ms 4.78 array_concat_INTEGER_20##2arg 150.28ms 6.65 array_concat_INTEGER_20##3arg 269.52ms 3.71 array_concat_INTEGER_20##4arg 337.27ms 2.97 array_concat_INTEGER_40##2arg 213.27ms 4.69 array_concat_INTEGER_40##3arg 266.57ms 3.75 array_concat_INTEGER_40##4arg 483.33ms 2.07 array_concat_INTEGER_5##2arg 115.68ms 8.64 array_concat_INTEGER_5##3arg 168.24ms 5.94 array_concat_INTEGER_5##4arg 219.13ms 4.56 array_concat_VARCHAR_10##2arg 357.53ms 2.80 array_concat_VARCHAR_10##3arg 459.15ms 2.18 array_concat_VARCHAR_10##4arg 579.91ms 1.72 array_concat_VARCHAR_20##2arg 628.27ms 1.59 array_concat_VARCHAR_20##3arg 802.48ms 1.25 array_concat_VARCHAR_20##4arg 1.06s 947.41m array_concat_VARCHAR_40##2arg 930.88ms 1.07 array_concat_VARCHAR_40##3arg 1.46s 683.85m array_concat_VARCHAR_40##4arg 1.92s 520.41m array_concat_VARCHAR_5##2arg 161.55ms 6.19 array_concat_VARCHAR_5##3arg 214.94ms 4.65 array_concat_VARCHAR_5##4arg 280.15ms 3.57 ``` Differential Revision: D52380460
…mitive types. (facebookincubator#8194) Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 247.69ms 4.04 array_concat_BOOLEAN_10##3arg 328.93ms 3.04 array_concat_BOOLEAN_10##4arg 471.79ms 2.12 array_concat_BOOLEAN_20##2arg 591.39ms 1.69 array_concat_BOOLEAN_20##3arg 889.65ms 1.12 array_concat_BOOLEAN_20##4arg 1.13s 885.38m array_concat_BOOLEAN_40##2arg 1.11s 902.02m array_concat_BOOLEAN_40##3arg 1.63s 614.28m array_concat_BOOLEAN_40##4arg 2.06s 486.07m array_concat_BOOLEAN_5##2arg 178.77ms 5.59 array_concat_BOOLEAN_5##3arg 262.85ms 3.80 array_concat_BOOLEAN_5##4arg 358.08ms 2.79 array_concat_INTEGER_10##2arg 84.65ms 11.81 array_concat_INTEGER_10##3arg 116.97ms 8.55 array_concat_INTEGER_10##4arg 159.98ms 6.25 array_concat_INTEGER_20##2arg 145.19ms 6.89 array_concat_INTEGER_20##3arg 249.84ms 4.00 array_concat_INTEGER_20##4arg 298.28ms 3.35 array_concat_INTEGER_40##2arg 202.66ms 4.93 array_concat_INTEGER_40##3arg 249.71ms 4.00 array_concat_INTEGER_40##4arg 462.83ms 2.16 array_concat_INTEGER_5##2arg 86.36ms 11.58 array_concat_INTEGER_5##3arg 128.82ms 7.76 array_concat_INTEGER_5##4arg 165.59ms 6.04 array_concat_VARCHAR_10##2arg 388.89ms 2.57 array_concat_VARCHAR_10##3arg 495.35ms 2.02 array_concat_VARCHAR_10##4arg 626.90ms 1.60 array_concat_VARCHAR_20##2arg 671.03ms 1.49 array_concat_VARCHAR_20##3arg 870.87ms 1.15 array_concat_VARCHAR_20##4arg 1.13s 888.08m array_concat_VARCHAR_40##2arg 1.03s 967.24m array_concat_VARCHAR_40##3arg 1.63s 613.68m array_concat_VARCHAR_40##4arg 2.13s 469.60m array_concat_VARCHAR_5##2arg 158.09ms 6.33 array_concat_VARCHAR_5##3arg 212.99ms 4.70 array_concat_VARCHAR_5##4arg 287.64ms 3.48 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 230.01ms 4.35 array_concat_BOOLEAN_10##3arg 303.88ms 3.29 array_concat_BOOLEAN_10##4arg 463.32ms 2.16 array_concat_BOOLEAN_20##2arg 547.77ms 1.83 array_concat_BOOLEAN_20##3arg 832.71ms 1.20 array_concat_BOOLEAN_20##4arg 1.10s 912.79m array_concat_BOOLEAN_40##2arg 991.02ms 1.01 array_concat_BOOLEAN_40##3arg 1.48s 675.74m array_concat_BOOLEAN_40##4arg 1.96s 510.45m array_concat_BOOLEAN_5##2arg 178.92ms 5.59 array_concat_BOOLEAN_5##3arg 265.29ms 3.77 array_concat_BOOLEAN_5##4arg 350.31ms 2.85 array_concat_INTEGER_10##2arg 111.54ms 8.97 array_concat_INTEGER_10##3arg 151.91ms 6.58 array_concat_INTEGER_10##4arg 209.28ms 4.78 array_concat_INTEGER_20##2arg 150.28ms 6.65 array_concat_INTEGER_20##3arg 269.52ms 3.71 array_concat_INTEGER_20##4arg 337.27ms 2.97 array_concat_INTEGER_40##2arg 213.27ms 4.69 array_concat_INTEGER_40##3arg 266.57ms 3.75 array_concat_INTEGER_40##4arg 483.33ms 2.07 array_concat_INTEGER_5##2arg 115.68ms 8.64 array_concat_INTEGER_5##3arg 168.24ms 5.94 array_concat_INTEGER_5##4arg 219.13ms 4.56 array_concat_VARCHAR_10##2arg 357.53ms 2.80 array_concat_VARCHAR_10##3arg 459.15ms 2.18 array_concat_VARCHAR_10##4arg 579.91ms 1.72 array_concat_VARCHAR_20##2arg 628.27ms 1.59 array_concat_VARCHAR_20##3arg 802.48ms 1.25 array_concat_VARCHAR_20##4arg 1.06s 947.41m array_concat_VARCHAR_40##2arg 930.88ms 1.07 array_concat_VARCHAR_40##3arg 1.46s 683.85m array_concat_VARCHAR_40##4arg 1.92s 520.41m array_concat_VARCHAR_5##2arg 161.55ms 6.19 array_concat_VARCHAR_5##3arg 214.94ms 4.65 array_concat_VARCHAR_5##4arg 280.15ms 3.57 ``` Differential Revision: D52380460
…mitive types. (facebookincubator#8194) Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 247.69ms 4.04 array_concat_BOOLEAN_10##3arg 328.93ms 3.04 array_concat_BOOLEAN_10##4arg 471.79ms 2.12 array_concat_BOOLEAN_20##2arg 591.39ms 1.69 array_concat_BOOLEAN_20##3arg 889.65ms 1.12 array_concat_BOOLEAN_20##4arg 1.13s 885.38m array_concat_BOOLEAN_40##2arg 1.11s 902.02m array_concat_BOOLEAN_40##3arg 1.63s 614.28m array_concat_BOOLEAN_40##4arg 2.06s 486.07m array_concat_BOOLEAN_5##2arg 178.77ms 5.59 array_concat_BOOLEAN_5##3arg 262.85ms 3.80 array_concat_BOOLEAN_5##4arg 358.08ms 2.79 array_concat_INTEGER_10##2arg 84.65ms 11.81 array_concat_INTEGER_10##3arg 116.97ms 8.55 array_concat_INTEGER_10##4arg 159.98ms 6.25 array_concat_INTEGER_20##2arg 145.19ms 6.89 array_concat_INTEGER_20##3arg 249.84ms 4.00 array_concat_INTEGER_20##4arg 298.28ms 3.35 array_concat_INTEGER_40##2arg 202.66ms 4.93 array_concat_INTEGER_40##3arg 249.71ms 4.00 array_concat_INTEGER_40##4arg 462.83ms 2.16 array_concat_INTEGER_5##2arg 86.36ms 11.58 array_concat_INTEGER_5##3arg 128.82ms 7.76 array_concat_INTEGER_5##4arg 165.59ms 6.04 array_concat_VARCHAR_10##2arg 388.89ms 2.57 array_concat_VARCHAR_10##3arg 495.35ms 2.02 array_concat_VARCHAR_10##4arg 626.90ms 1.60 array_concat_VARCHAR_20##2arg 671.03ms 1.49 array_concat_VARCHAR_20##3arg 870.87ms 1.15 array_concat_VARCHAR_20##4arg 1.13s 888.08m array_concat_VARCHAR_40##2arg 1.03s 967.24m array_concat_VARCHAR_40##3arg 1.63s 613.68m array_concat_VARCHAR_40##4arg 2.13s 469.60m array_concat_VARCHAR_5##2arg 158.09ms 6.33 array_concat_VARCHAR_5##3arg 212.99ms 4.70 array_concat_VARCHAR_5##4arg 287.64ms 3.48 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 230.01ms 4.35 array_concat_BOOLEAN_10##3arg 303.88ms 3.29 array_concat_BOOLEAN_10##4arg 463.32ms 2.16 array_concat_BOOLEAN_20##2arg 547.77ms 1.83 array_concat_BOOLEAN_20##3arg 832.71ms 1.20 array_concat_BOOLEAN_20##4arg 1.10s 912.79m array_concat_BOOLEAN_40##2arg 991.02ms 1.01 array_concat_BOOLEAN_40##3arg 1.48s 675.74m array_concat_BOOLEAN_40##4arg 1.96s 510.45m array_concat_BOOLEAN_5##2arg 178.92ms 5.59 array_concat_BOOLEAN_5##3arg 265.29ms 3.77 array_concat_BOOLEAN_5##4arg 350.31ms 2.85 array_concat_INTEGER_10##2arg 111.54ms 8.97 array_concat_INTEGER_10##3arg 151.91ms 6.58 array_concat_INTEGER_10##4arg 209.28ms 4.78 array_concat_INTEGER_20##2arg 150.28ms 6.65 array_concat_INTEGER_20##3arg 269.52ms 3.71 array_concat_INTEGER_20##4arg 337.27ms 2.97 array_concat_INTEGER_40##2arg 213.27ms 4.69 array_concat_INTEGER_40##3arg 266.57ms 3.75 array_concat_INTEGER_40##4arg 483.33ms 2.07 array_concat_INTEGER_5##2arg 115.68ms 8.64 array_concat_INTEGER_5##3arg 168.24ms 5.94 array_concat_INTEGER_5##4arg 219.13ms 4.56 array_concat_VARCHAR_10##2arg 357.53ms 2.80 array_concat_VARCHAR_10##3arg 459.15ms 2.18 array_concat_VARCHAR_10##4arg 579.91ms 1.72 array_concat_VARCHAR_20##2arg 628.27ms 1.59 array_concat_VARCHAR_20##3arg 802.48ms 1.25 array_concat_VARCHAR_20##4arg 1.06s 947.41m array_concat_VARCHAR_40##2arg 930.88ms 1.07 array_concat_VARCHAR_40##3arg 1.46s 683.85m array_concat_VARCHAR_40##4arg 1.92s 520.41m array_concat_VARCHAR_5##2arg 161.55ms 6.19 array_concat_VARCHAR_5##3arg 214.94ms 4.65 array_concat_VARCHAR_5##4arg 280.15ms 3.57 ``` Differential Revision: D52380460
…mitive types. (facebookincubator#8194) Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 247.69ms 4.04 array_concat_BOOLEAN_10##3arg 328.93ms 3.04 array_concat_BOOLEAN_10##4arg 471.79ms 2.12 array_concat_BOOLEAN_20##2arg 591.39ms 1.69 array_concat_BOOLEAN_20##3arg 889.65ms 1.12 array_concat_BOOLEAN_20##4arg 1.13s 885.38m array_concat_BOOLEAN_40##2arg 1.11s 902.02m array_concat_BOOLEAN_40##3arg 1.63s 614.28m array_concat_BOOLEAN_40##4arg 2.06s 486.07m array_concat_BOOLEAN_5##2arg 178.77ms 5.59 array_concat_BOOLEAN_5##3arg 262.85ms 3.80 array_concat_BOOLEAN_5##4arg 358.08ms 2.79 array_concat_INTEGER_10##2arg 84.65ms 11.81 array_concat_INTEGER_10##3arg 116.97ms 8.55 array_concat_INTEGER_10##4arg 159.98ms 6.25 array_concat_INTEGER_20##2arg 145.19ms 6.89 array_concat_INTEGER_20##3arg 249.84ms 4.00 array_concat_INTEGER_20##4arg 298.28ms 3.35 array_concat_INTEGER_40##2arg 202.66ms 4.93 array_concat_INTEGER_40##3arg 249.71ms 4.00 array_concat_INTEGER_40##4arg 462.83ms 2.16 array_concat_INTEGER_5##2arg 86.36ms 11.58 array_concat_INTEGER_5##3arg 128.82ms 7.76 array_concat_INTEGER_5##4arg 165.59ms 6.04 array_concat_VARCHAR_10##2arg 388.89ms 2.57 array_concat_VARCHAR_10##3arg 495.35ms 2.02 array_concat_VARCHAR_10##4arg 626.90ms 1.60 array_concat_VARCHAR_20##2arg 671.03ms 1.49 array_concat_VARCHAR_20##3arg 870.87ms 1.15 array_concat_VARCHAR_20##4arg 1.13s 888.08m array_concat_VARCHAR_40##2arg 1.03s 967.24m array_concat_VARCHAR_40##3arg 1.63s 613.68m array_concat_VARCHAR_40##4arg 2.13s 469.60m array_concat_VARCHAR_5##2arg 158.09ms 6.33 array_concat_VARCHAR_5##3arg 212.99ms 4.70 array_concat_VARCHAR_5##4arg 287.64ms 3.48 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 230.01ms 4.35 array_concat_BOOLEAN_10##3arg 303.88ms 3.29 array_concat_BOOLEAN_10##4arg 463.32ms 2.16 array_concat_BOOLEAN_20##2arg 547.77ms 1.83 array_concat_BOOLEAN_20##3arg 832.71ms 1.20 array_concat_BOOLEAN_20##4arg 1.10s 912.79m array_concat_BOOLEAN_40##2arg 991.02ms 1.01 array_concat_BOOLEAN_40##3arg 1.48s 675.74m array_concat_BOOLEAN_40##4arg 1.96s 510.45m array_concat_BOOLEAN_5##2arg 178.92ms 5.59 array_concat_BOOLEAN_5##3arg 265.29ms 3.77 array_concat_BOOLEAN_5##4arg 350.31ms 2.85 array_concat_INTEGER_10##2arg 111.54ms 8.97 array_concat_INTEGER_10##3arg 151.91ms 6.58 array_concat_INTEGER_10##4arg 209.28ms 4.78 array_concat_INTEGER_20##2arg 150.28ms 6.65 array_concat_INTEGER_20##3arg 269.52ms 3.71 array_concat_INTEGER_20##4arg 337.27ms 2.97 array_concat_INTEGER_40##2arg 213.27ms 4.69 array_concat_INTEGER_40##3arg 266.57ms 3.75 array_concat_INTEGER_40##4arg 483.33ms 2.07 array_concat_INTEGER_5##2arg 115.68ms 8.64 array_concat_INTEGER_5##3arg 168.24ms 5.94 array_concat_INTEGER_5##4arg 219.13ms 4.56 array_concat_VARCHAR_10##2arg 357.53ms 2.80 array_concat_VARCHAR_10##3arg 459.15ms 2.18 array_concat_VARCHAR_10##4arg 579.91ms 1.72 array_concat_VARCHAR_20##2arg 628.27ms 1.59 array_concat_VARCHAR_20##3arg 802.48ms 1.25 array_concat_VARCHAR_20##4arg 1.06s 947.41m array_concat_VARCHAR_40##2arg 930.88ms 1.07 array_concat_VARCHAR_40##3arg 1.46s 683.85m array_concat_VARCHAR_40##4arg 1.92s 520.41m array_concat_VARCHAR_5##2arg 161.55ms 6.19 array_concat_VARCHAR_5##3arg 214.94ms 4.65 array_concat_VARCHAR_5##4arg 280.15ms 3.57 ``` Differential Revision: D52380460
…mitive types. (facebookincubator#8194) Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 247.69ms 4.04 array_concat_BOOLEAN_10##3arg 328.93ms 3.04 array_concat_BOOLEAN_10##4arg 471.79ms 2.12 array_concat_BOOLEAN_20##2arg 591.39ms 1.69 array_concat_BOOLEAN_20##3arg 889.65ms 1.12 array_concat_BOOLEAN_20##4arg 1.13s 885.38m array_concat_BOOLEAN_40##2arg 1.11s 902.02m array_concat_BOOLEAN_40##3arg 1.63s 614.28m array_concat_BOOLEAN_40##4arg 2.06s 486.07m array_concat_BOOLEAN_5##2arg 178.77ms 5.59 array_concat_BOOLEAN_5##3arg 262.85ms 3.80 array_concat_BOOLEAN_5##4arg 358.08ms 2.79 array_concat_INTEGER_10##2arg 84.65ms 11.81 array_concat_INTEGER_10##3arg 116.97ms 8.55 array_concat_INTEGER_10##4arg 159.98ms 6.25 array_concat_INTEGER_20##2arg 145.19ms 6.89 array_concat_INTEGER_20##3arg 249.84ms 4.00 array_concat_INTEGER_20##4arg 298.28ms 3.35 array_concat_INTEGER_40##2arg 202.66ms 4.93 array_concat_INTEGER_40##3arg 249.71ms 4.00 array_concat_INTEGER_40##4arg 462.83ms 2.16 array_concat_INTEGER_5##2arg 86.36ms 11.58 array_concat_INTEGER_5##3arg 128.82ms 7.76 array_concat_INTEGER_5##4arg 165.59ms 6.04 array_concat_VARCHAR_10##2arg 388.89ms 2.57 array_concat_VARCHAR_10##3arg 495.35ms 2.02 array_concat_VARCHAR_10##4arg 626.90ms 1.60 array_concat_VARCHAR_20##2arg 671.03ms 1.49 array_concat_VARCHAR_20##3arg 870.87ms 1.15 array_concat_VARCHAR_20##4arg 1.13s 888.08m array_concat_VARCHAR_40##2arg 1.03s 967.24m array_concat_VARCHAR_40##3arg 1.63s 613.68m array_concat_VARCHAR_40##4arg 2.13s 469.60m array_concat_VARCHAR_5##2arg 158.09ms 6.33 array_concat_VARCHAR_5##3arg 212.99ms 4.70 array_concat_VARCHAR_5##4arg 287.64ms 3.48 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 230.01ms 4.35 array_concat_BOOLEAN_10##3arg 303.88ms 3.29 array_concat_BOOLEAN_10##4arg 463.32ms 2.16 array_concat_BOOLEAN_20##2arg 547.77ms 1.83 array_concat_BOOLEAN_20##3arg 832.71ms 1.20 array_concat_BOOLEAN_20##4arg 1.10s 912.79m array_concat_BOOLEAN_40##2arg 991.02ms 1.01 array_concat_BOOLEAN_40##3arg 1.48s 675.74m array_concat_BOOLEAN_40##4arg 1.96s 510.45m array_concat_BOOLEAN_5##2arg 178.92ms 5.59 array_concat_BOOLEAN_5##3arg 265.29ms 3.77 array_concat_BOOLEAN_5##4arg 350.31ms 2.85 array_concat_INTEGER_10##2arg 111.54ms 8.97 array_concat_INTEGER_10##3arg 151.91ms 6.58 array_concat_INTEGER_10##4arg 209.28ms 4.78 array_concat_INTEGER_20##2arg 150.28ms 6.65 array_concat_INTEGER_20##3arg 269.52ms 3.71 array_concat_INTEGER_20##4arg 337.27ms 2.97 array_concat_INTEGER_40##2arg 213.27ms 4.69 array_concat_INTEGER_40##3arg 266.57ms 3.75 array_concat_INTEGER_40##4arg 483.33ms 2.07 array_concat_INTEGER_5##2arg 115.68ms 8.64 array_concat_INTEGER_5##3arg 168.24ms 5.94 array_concat_INTEGER_5##4arg 219.13ms 4.56 array_concat_VARCHAR_10##2arg 357.53ms 2.80 array_concat_VARCHAR_10##3arg 459.15ms 2.18 array_concat_VARCHAR_10##4arg 579.91ms 1.72 array_concat_VARCHAR_20##2arg 628.27ms 1.59 array_concat_VARCHAR_20##3arg 802.48ms 1.25 array_concat_VARCHAR_20##4arg 1.06s 947.41m array_concat_VARCHAR_40##2arg 930.88ms 1.07 array_concat_VARCHAR_40##3arg 1.46s 683.85m array_concat_VARCHAR_40##4arg 1.92s 520.41m array_concat_VARCHAR_5##2arg 161.55ms 6.19 array_concat_VARCHAR_5##3arg 214.94ms 4.65 array_concat_VARCHAR_5##4arg 280.15ms 3.57 ``` Differential Revision: D52380460
…mitive types. (facebookincubator#8194) Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 247.69ms 4.04 array_concat_BOOLEAN_10##3arg 328.93ms 3.04 array_concat_BOOLEAN_10##4arg 471.79ms 2.12 array_concat_BOOLEAN_20##2arg 591.39ms 1.69 array_concat_BOOLEAN_20##3arg 889.65ms 1.12 array_concat_BOOLEAN_20##4arg 1.13s 885.38m array_concat_BOOLEAN_40##2arg 1.11s 902.02m array_concat_BOOLEAN_40##3arg 1.63s 614.28m array_concat_BOOLEAN_40##4arg 2.06s 486.07m array_concat_BOOLEAN_5##2arg 178.77ms 5.59 array_concat_BOOLEAN_5##3arg 262.85ms 3.80 array_concat_BOOLEAN_5##4arg 358.08ms 2.79 array_concat_INTEGER_10##2arg 84.65ms 11.81 array_concat_INTEGER_10##3arg 116.97ms 8.55 array_concat_INTEGER_10##4arg 159.98ms 6.25 array_concat_INTEGER_20##2arg 145.19ms 6.89 array_concat_INTEGER_20##3arg 249.84ms 4.00 array_concat_INTEGER_20##4arg 298.28ms 3.35 array_concat_INTEGER_40##2arg 202.66ms 4.93 array_concat_INTEGER_40##3arg 249.71ms 4.00 array_concat_INTEGER_40##4arg 462.83ms 2.16 array_concat_INTEGER_5##2arg 86.36ms 11.58 array_concat_INTEGER_5##3arg 128.82ms 7.76 array_concat_INTEGER_5##4arg 165.59ms 6.04 array_concat_VARCHAR_10##2arg 388.89ms 2.57 array_concat_VARCHAR_10##3arg 495.35ms 2.02 array_concat_VARCHAR_10##4arg 626.90ms 1.60 array_concat_VARCHAR_20##2arg 671.03ms 1.49 array_concat_VARCHAR_20##3arg 870.87ms 1.15 array_concat_VARCHAR_20##4arg 1.13s 888.08m array_concat_VARCHAR_40##2arg 1.03s 967.24m array_concat_VARCHAR_40##3arg 1.63s 613.68m array_concat_VARCHAR_40##4arg 2.13s 469.60m array_concat_VARCHAR_5##2arg 158.09ms 6.33 array_concat_VARCHAR_5##3arg 212.99ms 4.70 array_concat_VARCHAR_5##4arg 287.64ms 3.48 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 230.01ms 4.35 array_concat_BOOLEAN_10##3arg 303.88ms 3.29 array_concat_BOOLEAN_10##4arg 463.32ms 2.16 array_concat_BOOLEAN_20##2arg 547.77ms 1.83 array_concat_BOOLEAN_20##3arg 832.71ms 1.20 array_concat_BOOLEAN_20##4arg 1.10s 912.79m array_concat_BOOLEAN_40##2arg 991.02ms 1.01 array_concat_BOOLEAN_40##3arg 1.48s 675.74m array_concat_BOOLEAN_40##4arg 1.96s 510.45m array_concat_BOOLEAN_5##2arg 178.92ms 5.59 array_concat_BOOLEAN_5##3arg 265.29ms 3.77 array_concat_BOOLEAN_5##4arg 350.31ms 2.85 array_concat_INTEGER_10##2arg 111.54ms 8.97 array_concat_INTEGER_10##3arg 151.91ms 6.58 array_concat_INTEGER_10##4arg 209.28ms 4.78 array_concat_INTEGER_20##2arg 150.28ms 6.65 array_concat_INTEGER_20##3arg 269.52ms 3.71 array_concat_INTEGER_20##4arg 337.27ms 2.97 array_concat_INTEGER_40##2arg 213.27ms 4.69 array_concat_INTEGER_40##3arg 266.57ms 3.75 array_concat_INTEGER_40##4arg 483.33ms 2.07 array_concat_INTEGER_5##2arg 115.68ms 8.64 array_concat_INTEGER_5##3arg 168.24ms 5.94 array_concat_INTEGER_5##4arg 219.13ms 4.56 array_concat_VARCHAR_10##2arg 357.53ms 2.80 array_concat_VARCHAR_10##3arg 459.15ms 2.18 array_concat_VARCHAR_10##4arg 579.91ms 1.72 array_concat_VARCHAR_20##2arg 628.27ms 1.59 array_concat_VARCHAR_20##3arg 802.48ms 1.25 array_concat_VARCHAR_20##4arg 1.06s 947.41m array_concat_VARCHAR_40##2arg 930.88ms 1.07 array_concat_VARCHAR_40##3arg 1.46s 683.85m array_concat_VARCHAR_40##4arg 1.92s 520.41m array_concat_VARCHAR_5##2arg 161.55ms 6.19 array_concat_VARCHAR_5##3arg 214.94ms 4.65 array_concat_VARCHAR_5##4arg 280.15ms 3.57 ``` Differential Revision: D52380460
…mitive types. (facebookincubator#8194) Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` BUILD SUCCEEDED ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 195.54ms 5.11 array_concat_BOOLEAN_10##3arg 265.57ms 3.77 array_concat_BOOLEAN_10##4arg 397.59ms 2.52 array_concat_BOOLEAN_20##2arg 487.38ms 2.05 array_concat_BOOLEAN_20##3arg 758.45ms 1.32 array_concat_BOOLEAN_20##4arg 1.07s 930.67m array_concat_BOOLEAN_40##2arg 914.62ms 1.09 array_concat_BOOLEAN_40##3arg 1.36s 737.16m array_concat_BOOLEAN_40##4arg 1.72s 580.03m array_concat_BOOLEAN_5##2arg 149.76ms 6.68 array_concat_BOOLEAN_5##3arg 234.81ms 4.26 array_concat_BOOLEAN_5##4arg 300.58ms 3.33 array_concat_INTEGER_10##2arg 70.89ms 14.11 array_concat_INTEGER_10##3arg 95.07ms 10.52 array_concat_INTEGER_10##4arg 124.94ms 8.00 array_concat_INTEGER_20##2arg 102.19ms 9.79 array_concat_INTEGER_20##3arg 155.30ms 6.44 array_concat_INTEGER_20##4arg 187.59ms 5.33 array_concat_INTEGER_40##2arg 122.93ms 8.13 array_concat_INTEGER_40##3arg 153.85ms 6.50 array_concat_INTEGER_40##4arg 322.33ms 3.10 array_concat_INTEGER_5##2arg 70.71ms 14.14 array_concat_INTEGER_5##3arg 100.96ms 9.90 array_concat_INTEGER_5##4arg 124.78ms 8.01 array_concat_VARCHAR_10##2arg 239.86ms 4.17 array_concat_VARCHAR_10##3arg 313.51ms 3.19 array_concat_VARCHAR_10##4arg 418.63ms 2.39 array_concat_VARCHAR_20##2arg 492.72ms 2.03 array_concat_VARCHAR_20##3arg 645.26ms 1.55 array_concat_VARCHAR_20##4arg 872.10ms 1.15 array_concat_VARCHAR_40##2arg 737.43ms 1.36 array_concat_VARCHAR_40##3arg 1.19s 843.70m array_concat_VARCHAR_40##4arg 1.52s 658.16m array_concat_VARCHAR_5##2arg 111.10ms 9.00 array_concat_VARCHAR_5##3arg 148.33ms 6.74 array_concat_VARCHAR_5##4arg 193.35ms 5.17 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 178.21ms 5.61 array_concat_BOOLEAN_10##3arg 233.11ms 4.29 array_concat_BOOLEAN_10##4arg 363.77ms 2.75 array_concat_BOOLEAN_20##2arg 456.42ms 2.19 array_concat_BOOLEAN_20##3arg 712.48ms 1.40 array_concat_BOOLEAN_20##4arg 927.58ms 1.08 array_concat_BOOLEAN_40##2arg 873.87ms 1.14 array_concat_BOOLEAN_40##3arg 1.35s 742.65m array_concat_BOOLEAN_40##4arg 1.66s 602.28m array_concat_BOOLEAN_5##2arg 141.29ms 7.08 array_concat_BOOLEAN_5##3arg 224.04ms 4.46 array_concat_BOOLEAN_5##4arg 290.93ms 3.44 array_concat_INTEGER_10##2arg 58.67ms 17.05 array_concat_INTEGER_10##3arg 80.23ms 12.46 array_concat_INTEGER_10##4arg 107.38ms 9.31 array_concat_INTEGER_20##2arg 90.53ms 11.05 array_concat_INTEGER_20##3arg 146.84ms 6.81 array_concat_INTEGER_20##4arg 174.97ms 5.72 array_concat_INTEGER_40##2arg 113.06ms 8.85 array_concat_INTEGER_40##3arg 144.51ms 6.92 array_concat_INTEGER_40##4arg 317.69ms 3.15 array_concat_INTEGER_5##2arg 60.72ms 16.47 array_concat_INTEGER_5##3arg 86.76ms 11.53 array_concat_INTEGER_5##4arg 104.10ms 9.61 array_concat_VARCHAR_10##2arg 226.63ms 4.41 array_concat_VARCHAR_10##3arg 304.74ms 3.28 array_concat_VARCHAR_10##4arg 393.14ms 2.54 array_concat_VARCHAR_20##2arg 467.90ms 2.14 array_concat_VARCHAR_20##3arg 624.86ms 1.60 array_concat_VARCHAR_20##4arg 833.13ms 1.20 array_concat_VARCHAR_40##2arg 703.85ms 1.42 array_concat_VARCHAR_40##3arg 1.20s 834.57m array_concat_VARCHAR_40##4arg 1.58s 634.88m array_concat_VARCHAR_5##2arg 104.95ms 9.53 array_concat_VARCHAR_5##3arg 138.85ms 7.20 array_concat_VARCHAR_5##4arg 178.57ms 5.60 ``` Differential Revision: D52380460
…mitive types. (facebookincubator#8194) Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` BUILD SUCCEEDED ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 195.54ms 5.11 array_concat_BOOLEAN_10##3arg 265.57ms 3.77 array_concat_BOOLEAN_10##4arg 397.59ms 2.52 array_concat_BOOLEAN_20##2arg 487.38ms 2.05 array_concat_BOOLEAN_20##3arg 758.45ms 1.32 array_concat_BOOLEAN_20##4arg 1.07s 930.67m array_concat_BOOLEAN_40##2arg 914.62ms 1.09 array_concat_BOOLEAN_40##3arg 1.36s 737.16m array_concat_BOOLEAN_40##4arg 1.72s 580.03m array_concat_BOOLEAN_5##2arg 149.76ms 6.68 array_concat_BOOLEAN_5##3arg 234.81ms 4.26 array_concat_BOOLEAN_5##4arg 300.58ms 3.33 array_concat_INTEGER_10##2arg 70.89ms 14.11 array_concat_INTEGER_10##3arg 95.07ms 10.52 array_concat_INTEGER_10##4arg 124.94ms 8.00 array_concat_INTEGER_20##2arg 102.19ms 9.79 array_concat_INTEGER_20##3arg 155.30ms 6.44 array_concat_INTEGER_20##4arg 187.59ms 5.33 array_concat_INTEGER_40##2arg 122.93ms 8.13 array_concat_INTEGER_40##3arg 153.85ms 6.50 array_concat_INTEGER_40##4arg 322.33ms 3.10 array_concat_INTEGER_5##2arg 70.71ms 14.14 array_concat_INTEGER_5##3arg 100.96ms 9.90 array_concat_INTEGER_5##4arg 124.78ms 8.01 array_concat_VARCHAR_10##2arg 239.86ms 4.17 array_concat_VARCHAR_10##3arg 313.51ms 3.19 array_concat_VARCHAR_10##4arg 418.63ms 2.39 array_concat_VARCHAR_20##2arg 492.72ms 2.03 array_concat_VARCHAR_20##3arg 645.26ms 1.55 array_concat_VARCHAR_20##4arg 872.10ms 1.15 array_concat_VARCHAR_40##2arg 737.43ms 1.36 array_concat_VARCHAR_40##3arg 1.19s 843.70m array_concat_VARCHAR_40##4arg 1.52s 658.16m array_concat_VARCHAR_5##2arg 111.10ms 9.00 array_concat_VARCHAR_5##3arg 148.33ms 6.74 array_concat_VARCHAR_5##4arg 193.35ms 5.17 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 178.21ms 5.61 array_concat_BOOLEAN_10##3arg 233.11ms 4.29 array_concat_BOOLEAN_10##4arg 363.77ms 2.75 array_concat_BOOLEAN_20##2arg 456.42ms 2.19 array_concat_BOOLEAN_20##3arg 712.48ms 1.40 array_concat_BOOLEAN_20##4arg 927.58ms 1.08 array_concat_BOOLEAN_40##2arg 873.87ms 1.14 array_concat_BOOLEAN_40##3arg 1.35s 742.65m array_concat_BOOLEAN_40##4arg 1.66s 602.28m array_concat_BOOLEAN_5##2arg 141.29ms 7.08 array_concat_BOOLEAN_5##3arg 224.04ms 4.46 array_concat_BOOLEAN_5##4arg 290.93ms 3.44 array_concat_INTEGER_10##2arg 58.67ms 17.05 array_concat_INTEGER_10##3arg 80.23ms 12.46 array_concat_INTEGER_10##4arg 107.38ms 9.31 array_concat_INTEGER_20##2arg 90.53ms 11.05 array_concat_INTEGER_20##3arg 146.84ms 6.81 array_concat_INTEGER_20##4arg 174.97ms 5.72 array_concat_INTEGER_40##2arg 113.06ms 8.85 array_concat_INTEGER_40##3arg 144.51ms 6.92 array_concat_INTEGER_40##4arg 317.69ms 3.15 array_concat_INTEGER_5##2arg 60.72ms 16.47 array_concat_INTEGER_5##3arg 86.76ms 11.53 array_concat_INTEGER_5##4arg 104.10ms 9.61 array_concat_VARCHAR_10##2arg 226.63ms 4.41 array_concat_VARCHAR_10##3arg 304.74ms 3.28 array_concat_VARCHAR_10##4arg 393.14ms 2.54 array_concat_VARCHAR_20##2arg 467.90ms 2.14 array_concat_VARCHAR_20##3arg 624.86ms 1.60 array_concat_VARCHAR_20##4arg 833.13ms 1.20 array_concat_VARCHAR_40##2arg 703.85ms 1.42 array_concat_VARCHAR_40##3arg 1.20s 834.57m array_concat_VARCHAR_40##4arg 1.58s 634.88m array_concat_VARCHAR_5##2arg 104.95ms 9.53 array_concat_VARCHAR_5##3arg 138.85ms 7.20 array_concat_VARCHAR_5##4arg 178.57ms 5.60 ``` Differential Revision: D52380460
…mitive types. (facebookincubator#8194) Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` BUILD SUCCEEDED ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 195.54ms 5.11 array_concat_BOOLEAN_10##3arg 265.57ms 3.77 array_concat_BOOLEAN_10##4arg 397.59ms 2.52 array_concat_BOOLEAN_20##2arg 487.38ms 2.05 array_concat_BOOLEAN_20##3arg 758.45ms 1.32 array_concat_BOOLEAN_20##4arg 1.07s 930.67m array_concat_BOOLEAN_40##2arg 914.62ms 1.09 array_concat_BOOLEAN_40##3arg 1.36s 737.16m array_concat_BOOLEAN_40##4arg 1.72s 580.03m array_concat_BOOLEAN_5##2arg 149.76ms 6.68 array_concat_BOOLEAN_5##3arg 234.81ms 4.26 array_concat_BOOLEAN_5##4arg 300.58ms 3.33 array_concat_INTEGER_10##2arg 70.89ms 14.11 array_concat_INTEGER_10##3arg 95.07ms 10.52 array_concat_INTEGER_10##4arg 124.94ms 8.00 array_concat_INTEGER_20##2arg 102.19ms 9.79 array_concat_INTEGER_20##3arg 155.30ms 6.44 array_concat_INTEGER_20##4arg 187.59ms 5.33 array_concat_INTEGER_40##2arg 122.93ms 8.13 array_concat_INTEGER_40##3arg 153.85ms 6.50 array_concat_INTEGER_40##4arg 322.33ms 3.10 array_concat_INTEGER_5##2arg 70.71ms 14.14 array_concat_INTEGER_5##3arg 100.96ms 9.90 array_concat_INTEGER_5##4arg 124.78ms 8.01 array_concat_VARCHAR_10##2arg 239.86ms 4.17 array_concat_VARCHAR_10##3arg 313.51ms 3.19 array_concat_VARCHAR_10##4arg 418.63ms 2.39 array_concat_VARCHAR_20##2arg 492.72ms 2.03 array_concat_VARCHAR_20##3arg 645.26ms 1.55 array_concat_VARCHAR_20##4arg 872.10ms 1.15 array_concat_VARCHAR_40##2arg 737.43ms 1.36 array_concat_VARCHAR_40##3arg 1.19s 843.70m array_concat_VARCHAR_40##4arg 1.52s 658.16m array_concat_VARCHAR_5##2arg 111.10ms 9.00 array_concat_VARCHAR_5##3arg 148.33ms 6.74 array_concat_VARCHAR_5##4arg 193.35ms 5.17 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 178.21ms 5.61 array_concat_BOOLEAN_10##3arg 233.11ms 4.29 array_concat_BOOLEAN_10##4arg 363.77ms 2.75 array_concat_BOOLEAN_20##2arg 456.42ms 2.19 array_concat_BOOLEAN_20##3arg 712.48ms 1.40 array_concat_BOOLEAN_20##4arg 927.58ms 1.08 array_concat_BOOLEAN_40##2arg 873.87ms 1.14 array_concat_BOOLEAN_40##3arg 1.35s 742.65m array_concat_BOOLEAN_40##4arg 1.66s 602.28m array_concat_BOOLEAN_5##2arg 141.29ms 7.08 array_concat_BOOLEAN_5##3arg 224.04ms 4.46 array_concat_BOOLEAN_5##4arg 290.93ms 3.44 array_concat_INTEGER_10##2arg 58.67ms 17.05 array_concat_INTEGER_10##3arg 80.23ms 12.46 array_concat_INTEGER_10##4arg 107.38ms 9.31 array_concat_INTEGER_20##2arg 90.53ms 11.05 array_concat_INTEGER_20##3arg 146.84ms 6.81 array_concat_INTEGER_20##4arg 174.97ms 5.72 array_concat_INTEGER_40##2arg 113.06ms 8.85 array_concat_INTEGER_40##3arg 144.51ms 6.92 array_concat_INTEGER_40##4arg 317.69ms 3.15 array_concat_INTEGER_5##2arg 60.72ms 16.47 array_concat_INTEGER_5##3arg 86.76ms 11.53 array_concat_INTEGER_5##4arg 104.10ms 9.61 array_concat_VARCHAR_10##2arg 226.63ms 4.41 array_concat_VARCHAR_10##3arg 304.74ms 3.28 array_concat_VARCHAR_10##4arg 393.14ms 2.54 array_concat_VARCHAR_20##2arg 467.90ms 2.14 array_concat_VARCHAR_20##3arg 624.86ms 1.60 array_concat_VARCHAR_20##4arg 833.13ms 1.20 array_concat_VARCHAR_40##2arg 703.85ms 1.42 array_concat_VARCHAR_40##3arg 1.20s 834.57m array_concat_VARCHAR_40##4arg 1.58s 634.88m array_concat_VARCHAR_5##2arg 104.95ms 9.53 array_concat_VARCHAR_5##3arg 138.85ms 7.20 array_concat_VARCHAR_5##4arg 178.57ms 5.60 ``` Differential Revision: D52380460
…mitive types. (facebookincubator#8194) Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` BUILD SUCCEEDED ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 195.54ms 5.11 array_concat_BOOLEAN_10##3arg 265.57ms 3.77 array_concat_BOOLEAN_10##4arg 397.59ms 2.52 array_concat_BOOLEAN_20##2arg 487.38ms 2.05 array_concat_BOOLEAN_20##3arg 758.45ms 1.32 array_concat_BOOLEAN_20##4arg 1.07s 930.67m array_concat_BOOLEAN_40##2arg 914.62ms 1.09 array_concat_BOOLEAN_40##3arg 1.36s 737.16m array_concat_BOOLEAN_40##4arg 1.72s 580.03m array_concat_BOOLEAN_5##2arg 149.76ms 6.68 array_concat_BOOLEAN_5##3arg 234.81ms 4.26 array_concat_BOOLEAN_5##4arg 300.58ms 3.33 array_concat_INTEGER_10##2arg 70.89ms 14.11 array_concat_INTEGER_10##3arg 95.07ms 10.52 array_concat_INTEGER_10##4arg 124.94ms 8.00 array_concat_INTEGER_20##2arg 102.19ms 9.79 array_concat_INTEGER_20##3arg 155.30ms 6.44 array_concat_INTEGER_20##4arg 187.59ms 5.33 array_concat_INTEGER_40##2arg 122.93ms 8.13 array_concat_INTEGER_40##3arg 153.85ms 6.50 array_concat_INTEGER_40##4arg 322.33ms 3.10 array_concat_INTEGER_5##2arg 70.71ms 14.14 array_concat_INTEGER_5##3arg 100.96ms 9.90 array_concat_INTEGER_5##4arg 124.78ms 8.01 array_concat_VARCHAR_10##2arg 239.86ms 4.17 array_concat_VARCHAR_10##3arg 313.51ms 3.19 array_concat_VARCHAR_10##4arg 418.63ms 2.39 array_concat_VARCHAR_20##2arg 492.72ms 2.03 array_concat_VARCHAR_20##3arg 645.26ms 1.55 array_concat_VARCHAR_20##4arg 872.10ms 1.15 array_concat_VARCHAR_40##2arg 737.43ms 1.36 array_concat_VARCHAR_40##3arg 1.19s 843.70m array_concat_VARCHAR_40##4arg 1.52s 658.16m array_concat_VARCHAR_5##2arg 111.10ms 9.00 array_concat_VARCHAR_5##3arg 148.33ms 6.74 array_concat_VARCHAR_5##4arg 193.35ms 5.17 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 178.21ms 5.61 array_concat_BOOLEAN_10##3arg 233.11ms 4.29 array_concat_BOOLEAN_10##4arg 363.77ms 2.75 array_concat_BOOLEAN_20##2arg 456.42ms 2.19 array_concat_BOOLEAN_20##3arg 712.48ms 1.40 array_concat_BOOLEAN_20##4arg 927.58ms 1.08 array_concat_BOOLEAN_40##2arg 873.87ms 1.14 array_concat_BOOLEAN_40##3arg 1.35s 742.65m array_concat_BOOLEAN_40##4arg 1.66s 602.28m array_concat_BOOLEAN_5##2arg 141.29ms 7.08 array_concat_BOOLEAN_5##3arg 224.04ms 4.46 array_concat_BOOLEAN_5##4arg 290.93ms 3.44 array_concat_INTEGER_10##2arg 58.67ms 17.05 array_concat_INTEGER_10##3arg 80.23ms 12.46 array_concat_INTEGER_10##4arg 107.38ms 9.31 array_concat_INTEGER_20##2arg 90.53ms 11.05 array_concat_INTEGER_20##3arg 146.84ms 6.81 array_concat_INTEGER_20##4arg 174.97ms 5.72 array_concat_INTEGER_40##2arg 113.06ms 8.85 array_concat_INTEGER_40##3arg 144.51ms 6.92 array_concat_INTEGER_40##4arg 317.69ms 3.15 array_concat_INTEGER_5##2arg 60.72ms 16.47 array_concat_INTEGER_5##3arg 86.76ms 11.53 array_concat_INTEGER_5##4arg 104.10ms 9.61 array_concat_VARCHAR_10##2arg 226.63ms 4.41 array_concat_VARCHAR_10##3arg 304.74ms 3.28 array_concat_VARCHAR_10##4arg 393.14ms 2.54 array_concat_VARCHAR_20##2arg 467.90ms 2.14 array_concat_VARCHAR_20##3arg 624.86ms 1.60 array_concat_VARCHAR_20##4arg 833.13ms 1.20 array_concat_VARCHAR_40##2arg 703.85ms 1.42 array_concat_VARCHAR_40##3arg 1.20s 834.57m array_concat_VARCHAR_40##4arg 1.58s 634.88m array_concat_VARCHAR_5##2arg 104.95ms 9.53 array_concat_VARCHAR_5##3arg 138.85ms 7.20 array_concat_VARCHAR_5##4arg 178.57ms 5.60 ``` Differential Revision: D52380460
…mitive types. (facebookincubator#8194) Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` BUILD SUCCEEDED ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 195.54ms 5.11 array_concat_BOOLEAN_10##3arg 265.57ms 3.77 array_concat_BOOLEAN_10##4arg 397.59ms 2.52 array_concat_BOOLEAN_20##2arg 487.38ms 2.05 array_concat_BOOLEAN_20##3arg 758.45ms 1.32 array_concat_BOOLEAN_20##4arg 1.07s 930.67m array_concat_BOOLEAN_40##2arg 914.62ms 1.09 array_concat_BOOLEAN_40##3arg 1.36s 737.16m array_concat_BOOLEAN_40##4arg 1.72s 580.03m array_concat_BOOLEAN_5##2arg 149.76ms 6.68 array_concat_BOOLEAN_5##3arg 234.81ms 4.26 array_concat_BOOLEAN_5##4arg 300.58ms 3.33 array_concat_INTEGER_10##2arg 70.89ms 14.11 array_concat_INTEGER_10##3arg 95.07ms 10.52 array_concat_INTEGER_10##4arg 124.94ms 8.00 array_concat_INTEGER_20##2arg 102.19ms 9.79 array_concat_INTEGER_20##3arg 155.30ms 6.44 array_concat_INTEGER_20##4arg 187.59ms 5.33 array_concat_INTEGER_40##2arg 122.93ms 8.13 array_concat_INTEGER_40##3arg 153.85ms 6.50 array_concat_INTEGER_40##4arg 322.33ms 3.10 array_concat_INTEGER_5##2arg 70.71ms 14.14 array_concat_INTEGER_5##3arg 100.96ms 9.90 array_concat_INTEGER_5##4arg 124.78ms 8.01 array_concat_VARCHAR_10##2arg 239.86ms 4.17 array_concat_VARCHAR_10##3arg 313.51ms 3.19 array_concat_VARCHAR_10##4arg 418.63ms 2.39 array_concat_VARCHAR_20##2arg 492.72ms 2.03 array_concat_VARCHAR_20##3arg 645.26ms 1.55 array_concat_VARCHAR_20##4arg 872.10ms 1.15 array_concat_VARCHAR_40##2arg 737.43ms 1.36 array_concat_VARCHAR_40##3arg 1.19s 843.70m array_concat_VARCHAR_40##4arg 1.52s 658.16m array_concat_VARCHAR_5##2arg 111.10ms 9.00 array_concat_VARCHAR_5##3arg 148.33ms 6.74 array_concat_VARCHAR_5##4arg 193.35ms 5.17 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 178.21ms 5.61 array_concat_BOOLEAN_10##3arg 233.11ms 4.29 array_concat_BOOLEAN_10##4arg 363.77ms 2.75 array_concat_BOOLEAN_20##2arg 456.42ms 2.19 array_concat_BOOLEAN_20##3arg 712.48ms 1.40 array_concat_BOOLEAN_20##4arg 927.58ms 1.08 array_concat_BOOLEAN_40##2arg 873.87ms 1.14 array_concat_BOOLEAN_40##3arg 1.35s 742.65m array_concat_BOOLEAN_40##4arg 1.66s 602.28m array_concat_BOOLEAN_5##2arg 141.29ms 7.08 array_concat_BOOLEAN_5##3arg 224.04ms 4.46 array_concat_BOOLEAN_5##4arg 290.93ms 3.44 array_concat_INTEGER_10##2arg 58.67ms 17.05 array_concat_INTEGER_10##3arg 80.23ms 12.46 array_concat_INTEGER_10##4arg 107.38ms 9.31 array_concat_INTEGER_20##2arg 90.53ms 11.05 array_concat_INTEGER_20##3arg 146.84ms 6.81 array_concat_INTEGER_20##4arg 174.97ms 5.72 array_concat_INTEGER_40##2arg 113.06ms 8.85 array_concat_INTEGER_40##3arg 144.51ms 6.92 array_concat_INTEGER_40##4arg 317.69ms 3.15 array_concat_INTEGER_5##2arg 60.72ms 16.47 array_concat_INTEGER_5##3arg 86.76ms 11.53 array_concat_INTEGER_5##4arg 104.10ms 9.61 array_concat_VARCHAR_10##2arg 226.63ms 4.41 array_concat_VARCHAR_10##3arg 304.74ms 3.28 array_concat_VARCHAR_10##4arg 393.14ms 2.54 array_concat_VARCHAR_20##2arg 467.90ms 2.14 array_concat_VARCHAR_20##3arg 624.86ms 1.60 array_concat_VARCHAR_20##4arg 833.13ms 1.20 array_concat_VARCHAR_40##2arg 703.85ms 1.42 array_concat_VARCHAR_40##3arg 1.20s 834.57m array_concat_VARCHAR_40##4arg 1.58s 634.88m array_concat_VARCHAR_5##2arg 104.95ms 9.53 array_concat_VARCHAR_5##3arg 138.85ms 7.20 array_concat_VARCHAR_5##4arg 178.57ms 5.60 ``` Differential Revision: D52380460
…mitive types. (facebookincubator#8194) Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` BUILD SUCCEEDED ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 195.54ms 5.11 array_concat_BOOLEAN_10##3arg 265.57ms 3.77 array_concat_BOOLEAN_10##4arg 397.59ms 2.52 array_concat_BOOLEAN_20##2arg 487.38ms 2.05 array_concat_BOOLEAN_20##3arg 758.45ms 1.32 array_concat_BOOLEAN_20##4arg 1.07s 930.67m array_concat_BOOLEAN_40##2arg 914.62ms 1.09 array_concat_BOOLEAN_40##3arg 1.36s 737.16m array_concat_BOOLEAN_40##4arg 1.72s 580.03m array_concat_BOOLEAN_5##2arg 149.76ms 6.68 array_concat_BOOLEAN_5##3arg 234.81ms 4.26 array_concat_BOOLEAN_5##4arg 300.58ms 3.33 array_concat_INTEGER_10##2arg 70.89ms 14.11 array_concat_INTEGER_10##3arg 95.07ms 10.52 array_concat_INTEGER_10##4arg 124.94ms 8.00 array_concat_INTEGER_20##2arg 102.19ms 9.79 array_concat_INTEGER_20##3arg 155.30ms 6.44 array_concat_INTEGER_20##4arg 187.59ms 5.33 array_concat_INTEGER_40##2arg 122.93ms 8.13 array_concat_INTEGER_40##3arg 153.85ms 6.50 array_concat_INTEGER_40##4arg 322.33ms 3.10 array_concat_INTEGER_5##2arg 70.71ms 14.14 array_concat_INTEGER_5##3arg 100.96ms 9.90 array_concat_INTEGER_5##4arg 124.78ms 8.01 array_concat_VARCHAR_10##2arg 239.86ms 4.17 array_concat_VARCHAR_10##3arg 313.51ms 3.19 array_concat_VARCHAR_10##4arg 418.63ms 2.39 array_concat_VARCHAR_20##2arg 492.72ms 2.03 array_concat_VARCHAR_20##3arg 645.26ms 1.55 array_concat_VARCHAR_20##4arg 872.10ms 1.15 array_concat_VARCHAR_40##2arg 737.43ms 1.36 array_concat_VARCHAR_40##3arg 1.19s 843.70m array_concat_VARCHAR_40##4arg 1.52s 658.16m array_concat_VARCHAR_5##2arg 111.10ms 9.00 array_concat_VARCHAR_5##3arg 148.33ms 6.74 array_concat_VARCHAR_5##4arg 193.35ms 5.17 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 178.21ms 5.61 array_concat_BOOLEAN_10##3arg 233.11ms 4.29 array_concat_BOOLEAN_10##4arg 363.77ms 2.75 array_concat_BOOLEAN_20##2arg 456.42ms 2.19 array_concat_BOOLEAN_20##3arg 712.48ms 1.40 array_concat_BOOLEAN_20##4arg 927.58ms 1.08 array_concat_BOOLEAN_40##2arg 873.87ms 1.14 array_concat_BOOLEAN_40##3arg 1.35s 742.65m array_concat_BOOLEAN_40##4arg 1.66s 602.28m array_concat_BOOLEAN_5##2arg 141.29ms 7.08 array_concat_BOOLEAN_5##3arg 224.04ms 4.46 array_concat_BOOLEAN_5##4arg 290.93ms 3.44 array_concat_INTEGER_10##2arg 58.67ms 17.05 array_concat_INTEGER_10##3arg 80.23ms 12.46 array_concat_INTEGER_10##4arg 107.38ms 9.31 array_concat_INTEGER_20##2arg 90.53ms 11.05 array_concat_INTEGER_20##3arg 146.84ms 6.81 array_concat_INTEGER_20##4arg 174.97ms 5.72 array_concat_INTEGER_40##2arg 113.06ms 8.85 array_concat_INTEGER_40##3arg 144.51ms 6.92 array_concat_INTEGER_40##4arg 317.69ms 3.15 array_concat_INTEGER_5##2arg 60.72ms 16.47 array_concat_INTEGER_5##3arg 86.76ms 11.53 array_concat_INTEGER_5##4arg 104.10ms 9.61 array_concat_VARCHAR_10##2arg 226.63ms 4.41 array_concat_VARCHAR_10##3arg 304.74ms 3.28 array_concat_VARCHAR_10##4arg 393.14ms 2.54 array_concat_VARCHAR_20##2arg 467.90ms 2.14 array_concat_VARCHAR_20##3arg 624.86ms 1.60 array_concat_VARCHAR_20##4arg 833.13ms 1.20 array_concat_VARCHAR_40##2arg 703.85ms 1.42 array_concat_VARCHAR_40##3arg 1.20s 834.57m array_concat_VARCHAR_40##4arg 1.58s 634.88m array_concat_VARCHAR_5##2arg 104.95ms 9.53 array_concat_VARCHAR_5##3arg 138.85ms 7.20 array_concat_VARCHAR_5##4arg 178.57ms 5.60 ``` Differential Revision: D52380460
…mitive types. (facebookincubator#8194) Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` BUILD SUCCEEDED ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 195.54ms 5.11 array_concat_BOOLEAN_10##3arg 265.57ms 3.77 array_concat_BOOLEAN_10##4arg 397.59ms 2.52 array_concat_BOOLEAN_20##2arg 487.38ms 2.05 array_concat_BOOLEAN_20##3arg 758.45ms 1.32 array_concat_BOOLEAN_20##4arg 1.07s 930.67m array_concat_BOOLEAN_40##2arg 914.62ms 1.09 array_concat_BOOLEAN_40##3arg 1.36s 737.16m array_concat_BOOLEAN_40##4arg 1.72s 580.03m array_concat_BOOLEAN_5##2arg 149.76ms 6.68 array_concat_BOOLEAN_5##3arg 234.81ms 4.26 array_concat_BOOLEAN_5##4arg 300.58ms 3.33 array_concat_INTEGER_10##2arg 70.89ms 14.11 array_concat_INTEGER_10##3arg 95.07ms 10.52 array_concat_INTEGER_10##4arg 124.94ms 8.00 array_concat_INTEGER_20##2arg 102.19ms 9.79 array_concat_INTEGER_20##3arg 155.30ms 6.44 array_concat_INTEGER_20##4arg 187.59ms 5.33 array_concat_INTEGER_40##2arg 122.93ms 8.13 array_concat_INTEGER_40##3arg 153.85ms 6.50 array_concat_INTEGER_40##4arg 322.33ms 3.10 array_concat_INTEGER_5##2arg 70.71ms 14.14 array_concat_INTEGER_5##3arg 100.96ms 9.90 array_concat_INTEGER_5##4arg 124.78ms 8.01 array_concat_VARCHAR_10##2arg 239.86ms 4.17 array_concat_VARCHAR_10##3arg 313.51ms 3.19 array_concat_VARCHAR_10##4arg 418.63ms 2.39 array_concat_VARCHAR_20##2arg 492.72ms 2.03 array_concat_VARCHAR_20##3arg 645.26ms 1.55 array_concat_VARCHAR_20##4arg 872.10ms 1.15 array_concat_VARCHAR_40##2arg 737.43ms 1.36 array_concat_VARCHAR_40##3arg 1.19s 843.70m array_concat_VARCHAR_40##4arg 1.52s 658.16m array_concat_VARCHAR_5##2arg 111.10ms 9.00 array_concat_VARCHAR_5##3arg 148.33ms 6.74 array_concat_VARCHAR_5##4arg 193.35ms 5.17 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 178.21ms 5.61 array_concat_BOOLEAN_10##3arg 233.11ms 4.29 array_concat_BOOLEAN_10##4arg 363.77ms 2.75 array_concat_BOOLEAN_20##2arg 456.42ms 2.19 array_concat_BOOLEAN_20##3arg 712.48ms 1.40 array_concat_BOOLEAN_20##4arg 927.58ms 1.08 array_concat_BOOLEAN_40##2arg 873.87ms 1.14 array_concat_BOOLEAN_40##3arg 1.35s 742.65m array_concat_BOOLEAN_40##4arg 1.66s 602.28m array_concat_BOOLEAN_5##2arg 141.29ms 7.08 array_concat_BOOLEAN_5##3arg 224.04ms 4.46 array_concat_BOOLEAN_5##4arg 290.93ms 3.44 array_concat_INTEGER_10##2arg 58.67ms 17.05 array_concat_INTEGER_10##3arg 80.23ms 12.46 array_concat_INTEGER_10##4arg 107.38ms 9.31 array_concat_INTEGER_20##2arg 90.53ms 11.05 array_concat_INTEGER_20##3arg 146.84ms 6.81 array_concat_INTEGER_20##4arg 174.97ms 5.72 array_concat_INTEGER_40##2arg 113.06ms 8.85 array_concat_INTEGER_40##3arg 144.51ms 6.92 array_concat_INTEGER_40##4arg 317.69ms 3.15 array_concat_INTEGER_5##2arg 60.72ms 16.47 array_concat_INTEGER_5##3arg 86.76ms 11.53 array_concat_INTEGER_5##4arg 104.10ms 9.61 array_concat_VARCHAR_10##2arg 226.63ms 4.41 array_concat_VARCHAR_10##3arg 304.74ms 3.28 array_concat_VARCHAR_10##4arg 393.14ms 2.54 array_concat_VARCHAR_20##2arg 467.90ms 2.14 array_concat_VARCHAR_20##3arg 624.86ms 1.60 array_concat_VARCHAR_20##4arg 833.13ms 1.20 array_concat_VARCHAR_40##2arg 703.85ms 1.42 array_concat_VARCHAR_40##3arg 1.20s 834.57m array_concat_VARCHAR_40##4arg 1.58s 634.88m array_concat_VARCHAR_5##2arg 104.95ms 9.53 array_concat_VARCHAR_5##3arg 138.85ms 7.20 array_concat_VARCHAR_5##4arg 178.57ms 5.60 ``` Differential Revision: D52380460
…mitive types. (facebookincubator#8194) Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` BUILD SUCCEEDED ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 195.54ms 5.11 array_concat_BOOLEAN_10##3arg 265.57ms 3.77 array_concat_BOOLEAN_10##4arg 397.59ms 2.52 array_concat_BOOLEAN_20##2arg 487.38ms 2.05 array_concat_BOOLEAN_20##3arg 758.45ms 1.32 array_concat_BOOLEAN_20##4arg 1.07s 930.67m array_concat_BOOLEAN_40##2arg 914.62ms 1.09 array_concat_BOOLEAN_40##3arg 1.36s 737.16m array_concat_BOOLEAN_40##4arg 1.72s 580.03m array_concat_BOOLEAN_5##2arg 149.76ms 6.68 array_concat_BOOLEAN_5##3arg 234.81ms 4.26 array_concat_BOOLEAN_5##4arg 300.58ms 3.33 array_concat_INTEGER_10##2arg 70.89ms 14.11 array_concat_INTEGER_10##3arg 95.07ms 10.52 array_concat_INTEGER_10##4arg 124.94ms 8.00 array_concat_INTEGER_20##2arg 102.19ms 9.79 array_concat_INTEGER_20##3arg 155.30ms 6.44 array_concat_INTEGER_20##4arg 187.59ms 5.33 array_concat_INTEGER_40##2arg 122.93ms 8.13 array_concat_INTEGER_40##3arg 153.85ms 6.50 array_concat_INTEGER_40##4arg 322.33ms 3.10 array_concat_INTEGER_5##2arg 70.71ms 14.14 array_concat_INTEGER_5##3arg 100.96ms 9.90 array_concat_INTEGER_5##4arg 124.78ms 8.01 array_concat_VARCHAR_10##2arg 239.86ms 4.17 array_concat_VARCHAR_10##3arg 313.51ms 3.19 array_concat_VARCHAR_10##4arg 418.63ms 2.39 array_concat_VARCHAR_20##2arg 492.72ms 2.03 array_concat_VARCHAR_20##3arg 645.26ms 1.55 array_concat_VARCHAR_20##4arg 872.10ms 1.15 array_concat_VARCHAR_40##2arg 737.43ms 1.36 array_concat_VARCHAR_40##3arg 1.19s 843.70m array_concat_VARCHAR_40##4arg 1.52s 658.16m array_concat_VARCHAR_5##2arg 111.10ms 9.00 array_concat_VARCHAR_5##3arg 148.33ms 6.74 array_concat_VARCHAR_5##4arg 193.35ms 5.17 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 178.21ms 5.61 array_concat_BOOLEAN_10##3arg 233.11ms 4.29 array_concat_BOOLEAN_10##4arg 363.77ms 2.75 array_concat_BOOLEAN_20##2arg 456.42ms 2.19 array_concat_BOOLEAN_20##3arg 712.48ms 1.40 array_concat_BOOLEAN_20##4arg 927.58ms 1.08 array_concat_BOOLEAN_40##2arg 873.87ms 1.14 array_concat_BOOLEAN_40##3arg 1.35s 742.65m array_concat_BOOLEAN_40##4arg 1.66s 602.28m array_concat_BOOLEAN_5##2arg 141.29ms 7.08 array_concat_BOOLEAN_5##3arg 224.04ms 4.46 array_concat_BOOLEAN_5##4arg 290.93ms 3.44 array_concat_INTEGER_10##2arg 58.67ms 17.05 array_concat_INTEGER_10##3arg 80.23ms 12.46 array_concat_INTEGER_10##4arg 107.38ms 9.31 array_concat_INTEGER_20##2arg 90.53ms 11.05 array_concat_INTEGER_20##3arg 146.84ms 6.81 array_concat_INTEGER_20##4arg 174.97ms 5.72 array_concat_INTEGER_40##2arg 113.06ms 8.85 array_concat_INTEGER_40##3arg 144.51ms 6.92 array_concat_INTEGER_40##4arg 317.69ms 3.15 array_concat_INTEGER_5##2arg 60.72ms 16.47 array_concat_INTEGER_5##3arg 86.76ms 11.53 array_concat_INTEGER_5##4arg 104.10ms 9.61 array_concat_VARCHAR_10##2arg 226.63ms 4.41 array_concat_VARCHAR_10##3arg 304.74ms 3.28 array_concat_VARCHAR_10##4arg 393.14ms 2.54 array_concat_VARCHAR_20##2arg 467.90ms 2.14 array_concat_VARCHAR_20##3arg 624.86ms 1.60 array_concat_VARCHAR_20##4arg 833.13ms 1.20 array_concat_VARCHAR_40##2arg 703.85ms 1.42 array_concat_VARCHAR_40##3arg 1.20s 834.57m array_concat_VARCHAR_40##4arg 1.58s 634.88m array_concat_VARCHAR_5##2arg 104.95ms 9.53 array_concat_VARCHAR_5##3arg 138.85ms 7.20 array_concat_VARCHAR_5##4arg 178.57ms 5.60 ``` Differential Revision: D52380460
…mitive types. (facebookincubator#8194) Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` BUILD SUCCEEDED ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 195.54ms 5.11 array_concat_BOOLEAN_10##3arg 265.57ms 3.77 array_concat_BOOLEAN_10##4arg 397.59ms 2.52 array_concat_BOOLEAN_20##2arg 487.38ms 2.05 array_concat_BOOLEAN_20##3arg 758.45ms 1.32 array_concat_BOOLEAN_20##4arg 1.07s 930.67m array_concat_BOOLEAN_40##2arg 914.62ms 1.09 array_concat_BOOLEAN_40##3arg 1.36s 737.16m array_concat_BOOLEAN_40##4arg 1.72s 580.03m array_concat_BOOLEAN_5##2arg 149.76ms 6.68 array_concat_BOOLEAN_5##3arg 234.81ms 4.26 array_concat_BOOLEAN_5##4arg 300.58ms 3.33 array_concat_INTEGER_10##2arg 70.89ms 14.11 array_concat_INTEGER_10##3arg 95.07ms 10.52 array_concat_INTEGER_10##4arg 124.94ms 8.00 array_concat_INTEGER_20##2arg 102.19ms 9.79 array_concat_INTEGER_20##3arg 155.30ms 6.44 array_concat_INTEGER_20##4arg 187.59ms 5.33 array_concat_INTEGER_40##2arg 122.93ms 8.13 array_concat_INTEGER_40##3arg 153.85ms 6.50 array_concat_INTEGER_40##4arg 322.33ms 3.10 array_concat_INTEGER_5##2arg 70.71ms 14.14 array_concat_INTEGER_5##3arg 100.96ms 9.90 array_concat_INTEGER_5##4arg 124.78ms 8.01 array_concat_VARCHAR_10##2arg 239.86ms 4.17 array_concat_VARCHAR_10##3arg 313.51ms 3.19 array_concat_VARCHAR_10##4arg 418.63ms 2.39 array_concat_VARCHAR_20##2arg 492.72ms 2.03 array_concat_VARCHAR_20##3arg 645.26ms 1.55 array_concat_VARCHAR_20##4arg 872.10ms 1.15 array_concat_VARCHAR_40##2arg 737.43ms 1.36 array_concat_VARCHAR_40##3arg 1.19s 843.70m array_concat_VARCHAR_40##4arg 1.52s 658.16m array_concat_VARCHAR_5##2arg 111.10ms 9.00 array_concat_VARCHAR_5##3arg 148.33ms 6.74 array_concat_VARCHAR_5##4arg 193.35ms 5.17 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 178.21ms 5.61 array_concat_BOOLEAN_10##3arg 233.11ms 4.29 array_concat_BOOLEAN_10##4arg 363.77ms 2.75 array_concat_BOOLEAN_20##2arg 456.42ms 2.19 array_concat_BOOLEAN_20##3arg 712.48ms 1.40 array_concat_BOOLEAN_20##4arg 927.58ms 1.08 array_concat_BOOLEAN_40##2arg 873.87ms 1.14 array_concat_BOOLEAN_40##3arg 1.35s 742.65m array_concat_BOOLEAN_40##4arg 1.66s 602.28m array_concat_BOOLEAN_5##2arg 141.29ms 7.08 array_concat_BOOLEAN_5##3arg 224.04ms 4.46 array_concat_BOOLEAN_5##4arg 290.93ms 3.44 array_concat_INTEGER_10##2arg 58.67ms 17.05 array_concat_INTEGER_10##3arg 80.23ms 12.46 array_concat_INTEGER_10##4arg 107.38ms 9.31 array_concat_INTEGER_20##2arg 90.53ms 11.05 array_concat_INTEGER_20##3arg 146.84ms 6.81 array_concat_INTEGER_20##4arg 174.97ms 5.72 array_concat_INTEGER_40##2arg 113.06ms 8.85 array_concat_INTEGER_40##3arg 144.51ms 6.92 array_concat_INTEGER_40##4arg 317.69ms 3.15 array_concat_INTEGER_5##2arg 60.72ms 16.47 array_concat_INTEGER_5##3arg 86.76ms 11.53 array_concat_INTEGER_5##4arg 104.10ms 9.61 array_concat_VARCHAR_10##2arg 226.63ms 4.41 array_concat_VARCHAR_10##3arg 304.74ms 3.28 array_concat_VARCHAR_10##4arg 393.14ms 2.54 array_concat_VARCHAR_20##2arg 467.90ms 2.14 array_concat_VARCHAR_20##3arg 624.86ms 1.60 array_concat_VARCHAR_20##4arg 833.13ms 1.20 array_concat_VARCHAR_40##2arg 703.85ms 1.42 array_concat_VARCHAR_40##3arg 1.20s 834.57m array_concat_VARCHAR_40##4arg 1.58s 634.88m array_concat_VARCHAR_5##2arg 104.95ms 9.53 array_concat_VARCHAR_5##3arg 138.85ms 7.20 array_concat_VARCHAR_5##4arg 178.57ms 5.60 ``` Differential Revision: D52380460
…mitive types. (facebookincubator#8194) Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` BUILD SUCCEEDED ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 195.54ms 5.11 array_concat_BOOLEAN_10##3arg 265.57ms 3.77 array_concat_BOOLEAN_10##4arg 397.59ms 2.52 array_concat_BOOLEAN_20##2arg 487.38ms 2.05 array_concat_BOOLEAN_20##3arg 758.45ms 1.32 array_concat_BOOLEAN_20##4arg 1.07s 930.67m array_concat_BOOLEAN_40##2arg 914.62ms 1.09 array_concat_BOOLEAN_40##3arg 1.36s 737.16m array_concat_BOOLEAN_40##4arg 1.72s 580.03m array_concat_BOOLEAN_5##2arg 149.76ms 6.68 array_concat_BOOLEAN_5##3arg 234.81ms 4.26 array_concat_BOOLEAN_5##4arg 300.58ms 3.33 array_concat_INTEGER_10##2arg 70.89ms 14.11 array_concat_INTEGER_10##3arg 95.07ms 10.52 array_concat_INTEGER_10##4arg 124.94ms 8.00 array_concat_INTEGER_20##2arg 102.19ms 9.79 array_concat_INTEGER_20##3arg 155.30ms 6.44 array_concat_INTEGER_20##4arg 187.59ms 5.33 array_concat_INTEGER_40##2arg 122.93ms 8.13 array_concat_INTEGER_40##3arg 153.85ms 6.50 array_concat_INTEGER_40##4arg 322.33ms 3.10 array_concat_INTEGER_5##2arg 70.71ms 14.14 array_concat_INTEGER_5##3arg 100.96ms 9.90 array_concat_INTEGER_5##4arg 124.78ms 8.01 array_concat_VARCHAR_10##2arg 239.86ms 4.17 array_concat_VARCHAR_10##3arg 313.51ms 3.19 array_concat_VARCHAR_10##4arg 418.63ms 2.39 array_concat_VARCHAR_20##2arg 492.72ms 2.03 array_concat_VARCHAR_20##3arg 645.26ms 1.55 array_concat_VARCHAR_20##4arg 872.10ms 1.15 array_concat_VARCHAR_40##2arg 737.43ms 1.36 array_concat_VARCHAR_40##3arg 1.19s 843.70m array_concat_VARCHAR_40##4arg 1.52s 658.16m array_concat_VARCHAR_5##2arg 111.10ms 9.00 array_concat_VARCHAR_5##3arg 148.33ms 6.74 array_concat_VARCHAR_5##4arg 193.35ms 5.17 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 178.21ms 5.61 array_concat_BOOLEAN_10##3arg 233.11ms 4.29 array_concat_BOOLEAN_10##4arg 363.77ms 2.75 array_concat_BOOLEAN_20##2arg 456.42ms 2.19 array_concat_BOOLEAN_20##3arg 712.48ms 1.40 array_concat_BOOLEAN_20##4arg 927.58ms 1.08 array_concat_BOOLEAN_40##2arg 873.87ms 1.14 array_concat_BOOLEAN_40##3arg 1.35s 742.65m array_concat_BOOLEAN_40##4arg 1.66s 602.28m array_concat_BOOLEAN_5##2arg 141.29ms 7.08 array_concat_BOOLEAN_5##3arg 224.04ms 4.46 array_concat_BOOLEAN_5##4arg 290.93ms 3.44 array_concat_INTEGER_10##2arg 58.67ms 17.05 array_concat_INTEGER_10##3arg 80.23ms 12.46 array_concat_INTEGER_10##4arg 107.38ms 9.31 array_concat_INTEGER_20##2arg 90.53ms 11.05 array_concat_INTEGER_20##3arg 146.84ms 6.81 array_concat_INTEGER_20##4arg 174.97ms 5.72 array_concat_INTEGER_40##2arg 113.06ms 8.85 array_concat_INTEGER_40##3arg 144.51ms 6.92 array_concat_INTEGER_40##4arg 317.69ms 3.15 array_concat_INTEGER_5##2arg 60.72ms 16.47 array_concat_INTEGER_5##3arg 86.76ms 11.53 array_concat_INTEGER_5##4arg 104.10ms 9.61 array_concat_VARCHAR_10##2arg 226.63ms 4.41 array_concat_VARCHAR_10##3arg 304.74ms 3.28 array_concat_VARCHAR_10##4arg 393.14ms 2.54 array_concat_VARCHAR_20##2arg 467.90ms 2.14 array_concat_VARCHAR_20##3arg 624.86ms 1.60 array_concat_VARCHAR_20##4arg 833.13ms 1.20 array_concat_VARCHAR_40##2arg 703.85ms 1.42 array_concat_VARCHAR_40##3arg 1.20s 834.57m array_concat_VARCHAR_40##4arg 1.58s 634.88m array_concat_VARCHAR_5##2arg 104.95ms 9.53 array_concat_VARCHAR_5##3arg 138.85ms 7.20 array_concat_VARCHAR_5##4arg 178.57ms 5.60 ``` Reviewed By: kevinwilfong Differential Revision: D52380460
…mitive types. (facebookincubator#8194) Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` BUILD SUCCEEDED ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 195.54ms 5.11 array_concat_BOOLEAN_10##3arg 265.57ms 3.77 array_concat_BOOLEAN_10##4arg 397.59ms 2.52 array_concat_BOOLEAN_20##2arg 487.38ms 2.05 array_concat_BOOLEAN_20##3arg 758.45ms 1.32 array_concat_BOOLEAN_20##4arg 1.07s 930.67m array_concat_BOOLEAN_40##2arg 914.62ms 1.09 array_concat_BOOLEAN_40##3arg 1.36s 737.16m array_concat_BOOLEAN_40##4arg 1.72s 580.03m array_concat_BOOLEAN_5##2arg 149.76ms 6.68 array_concat_BOOLEAN_5##3arg 234.81ms 4.26 array_concat_BOOLEAN_5##4arg 300.58ms 3.33 array_concat_INTEGER_10##2arg 70.89ms 14.11 array_concat_INTEGER_10##3arg 95.07ms 10.52 array_concat_INTEGER_10##4arg 124.94ms 8.00 array_concat_INTEGER_20##2arg 102.19ms 9.79 array_concat_INTEGER_20##3arg 155.30ms 6.44 array_concat_INTEGER_20##4arg 187.59ms 5.33 array_concat_INTEGER_40##2arg 122.93ms 8.13 array_concat_INTEGER_40##3arg 153.85ms 6.50 array_concat_INTEGER_40##4arg 322.33ms 3.10 array_concat_INTEGER_5##2arg 70.71ms 14.14 array_concat_INTEGER_5##3arg 100.96ms 9.90 array_concat_INTEGER_5##4arg 124.78ms 8.01 array_concat_VARCHAR_10##2arg 239.86ms 4.17 array_concat_VARCHAR_10##3arg 313.51ms 3.19 array_concat_VARCHAR_10##4arg 418.63ms 2.39 array_concat_VARCHAR_20##2arg 492.72ms 2.03 array_concat_VARCHAR_20##3arg 645.26ms 1.55 array_concat_VARCHAR_20##4arg 872.10ms 1.15 array_concat_VARCHAR_40##2arg 737.43ms 1.36 array_concat_VARCHAR_40##3arg 1.19s 843.70m array_concat_VARCHAR_40##4arg 1.52s 658.16m array_concat_VARCHAR_5##2arg 111.10ms 9.00 array_concat_VARCHAR_5##3arg 148.33ms 6.74 array_concat_VARCHAR_5##4arg 193.35ms 5.17 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 178.21ms 5.61 array_concat_BOOLEAN_10##3arg 233.11ms 4.29 array_concat_BOOLEAN_10##4arg 363.77ms 2.75 array_concat_BOOLEAN_20##2arg 456.42ms 2.19 array_concat_BOOLEAN_20##3arg 712.48ms 1.40 array_concat_BOOLEAN_20##4arg 927.58ms 1.08 array_concat_BOOLEAN_40##2arg 873.87ms 1.14 array_concat_BOOLEAN_40##3arg 1.35s 742.65m array_concat_BOOLEAN_40##4arg 1.66s 602.28m array_concat_BOOLEAN_5##2arg 141.29ms 7.08 array_concat_BOOLEAN_5##3arg 224.04ms 4.46 array_concat_BOOLEAN_5##4arg 290.93ms 3.44 array_concat_INTEGER_10##2arg 58.67ms 17.05 array_concat_INTEGER_10##3arg 80.23ms 12.46 array_concat_INTEGER_10##4arg 107.38ms 9.31 array_concat_INTEGER_20##2arg 90.53ms 11.05 array_concat_INTEGER_20##3arg 146.84ms 6.81 array_concat_INTEGER_20##4arg 174.97ms 5.72 array_concat_INTEGER_40##2arg 113.06ms 8.85 array_concat_INTEGER_40##3arg 144.51ms 6.92 array_concat_INTEGER_40##4arg 317.69ms 3.15 array_concat_INTEGER_5##2arg 60.72ms 16.47 array_concat_INTEGER_5##3arg 86.76ms 11.53 array_concat_INTEGER_5##4arg 104.10ms 9.61 array_concat_VARCHAR_10##2arg 226.63ms 4.41 array_concat_VARCHAR_10##3arg 304.74ms 3.28 array_concat_VARCHAR_10##4arg 393.14ms 2.54 array_concat_VARCHAR_20##2arg 467.90ms 2.14 array_concat_VARCHAR_20##3arg 624.86ms 1.60 array_concat_VARCHAR_20##4arg 833.13ms 1.20 array_concat_VARCHAR_40##2arg 703.85ms 1.42 array_concat_VARCHAR_40##3arg 1.20s 834.57m array_concat_VARCHAR_40##4arg 1.58s 634.88m array_concat_VARCHAR_5##2arg 104.95ms 9.53 array_concat_VARCHAR_5##3arg 138.85ms 7.20 array_concat_VARCHAR_5##4arg 178.57ms 5.60 ``` Reviewed By: kevinwilfong Differential Revision: D52380460
…mitive types. (facebookincubator#8194) Summary: add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` BUILD SUCCEEDED ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 195.54ms 5.11 array_concat_BOOLEAN_10##3arg 265.57ms 3.77 array_concat_BOOLEAN_10##4arg 397.59ms 2.52 array_concat_BOOLEAN_20##2arg 487.38ms 2.05 array_concat_BOOLEAN_20##3arg 758.45ms 1.32 array_concat_BOOLEAN_20##4arg 1.07s 930.67m array_concat_BOOLEAN_40##2arg 914.62ms 1.09 array_concat_BOOLEAN_40##3arg 1.36s 737.16m array_concat_BOOLEAN_40##4arg 1.72s 580.03m array_concat_BOOLEAN_5##2arg 149.76ms 6.68 array_concat_BOOLEAN_5##3arg 234.81ms 4.26 array_concat_BOOLEAN_5##4arg 300.58ms 3.33 array_concat_INTEGER_10##2arg 70.89ms 14.11 array_concat_INTEGER_10##3arg 95.07ms 10.52 array_concat_INTEGER_10##4arg 124.94ms 8.00 array_concat_INTEGER_20##2arg 102.19ms 9.79 array_concat_INTEGER_20##3arg 155.30ms 6.44 array_concat_INTEGER_20##4arg 187.59ms 5.33 array_concat_INTEGER_40##2arg 122.93ms 8.13 array_concat_INTEGER_40##3arg 153.85ms 6.50 array_concat_INTEGER_40##4arg 322.33ms 3.10 array_concat_INTEGER_5##2arg 70.71ms 14.14 array_concat_INTEGER_5##3arg 100.96ms 9.90 array_concat_INTEGER_5##4arg 124.78ms 8.01 array_concat_VARCHAR_10##2arg 239.86ms 4.17 array_concat_VARCHAR_10##3arg 313.51ms 3.19 array_concat_VARCHAR_10##4arg 418.63ms 2.39 array_concat_VARCHAR_20##2arg 492.72ms 2.03 array_concat_VARCHAR_20##3arg 645.26ms 1.55 array_concat_VARCHAR_20##4arg 872.10ms 1.15 array_concat_VARCHAR_40##2arg 737.43ms 1.36 array_concat_VARCHAR_40##3arg 1.19s 843.70m array_concat_VARCHAR_40##4arg 1.52s 658.16m array_concat_VARCHAR_5##2arg 111.10ms 9.00 array_concat_VARCHAR_5##3arg 148.33ms 6.74 array_concat_VARCHAR_5##4arg 193.35ms 5.17 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 178.21ms 5.61 array_concat_BOOLEAN_10##3arg 233.11ms 4.29 array_concat_BOOLEAN_10##4arg 363.77ms 2.75 array_concat_BOOLEAN_20##2arg 456.42ms 2.19 array_concat_BOOLEAN_20##3arg 712.48ms 1.40 array_concat_BOOLEAN_20##4arg 927.58ms 1.08 array_concat_BOOLEAN_40##2arg 873.87ms 1.14 array_concat_BOOLEAN_40##3arg 1.35s 742.65m array_concat_BOOLEAN_40##4arg 1.66s 602.28m array_concat_BOOLEAN_5##2arg 141.29ms 7.08 array_concat_BOOLEAN_5##3arg 224.04ms 4.46 array_concat_BOOLEAN_5##4arg 290.93ms 3.44 array_concat_INTEGER_10##2arg 58.67ms 17.05 array_concat_INTEGER_10##3arg 80.23ms 12.46 array_concat_INTEGER_10##4arg 107.38ms 9.31 array_concat_INTEGER_20##2arg 90.53ms 11.05 array_concat_INTEGER_20##3arg 146.84ms 6.81 array_concat_INTEGER_20##4arg 174.97ms 5.72 array_concat_INTEGER_40##2arg 113.06ms 8.85 array_concat_INTEGER_40##3arg 144.51ms 6.92 array_concat_INTEGER_40##4arg 317.69ms 3.15 array_concat_INTEGER_5##2arg 60.72ms 16.47 array_concat_INTEGER_5##3arg 86.76ms 11.53 array_concat_INTEGER_5##4arg 104.10ms 9.61 array_concat_VARCHAR_10##2arg 226.63ms 4.41 array_concat_VARCHAR_10##3arg 304.74ms 3.28 array_concat_VARCHAR_10##4arg 393.14ms 2.54 array_concat_VARCHAR_20##2arg 467.90ms 2.14 array_concat_VARCHAR_20##3arg 624.86ms 1.60 array_concat_VARCHAR_20##4arg 833.13ms 1.20 array_concat_VARCHAR_40##2arg 703.85ms 1.42 array_concat_VARCHAR_40##3arg 1.20s 834.57m array_concat_VARCHAR_40##4arg 1.58s 634.88m array_concat_VARCHAR_5##2arg 104.95ms 9.53 array_concat_VARCHAR_5##3arg 138.85ms 7.20 array_concat_VARCHAR_5##4arg 178.57ms 5.60 ``` Reviewed By: kevinwilfong Differential Revision: D52380460
…mitive types. (#8194) Summary: Pull Request resolved: #8194 add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` BUILD SUCCEEDED ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 195.54ms 5.11 array_concat_BOOLEAN_10##3arg 265.57ms 3.77 array_concat_BOOLEAN_10##4arg 397.59ms 2.52 array_concat_BOOLEAN_20##2arg 487.38ms 2.05 array_concat_BOOLEAN_20##3arg 758.45ms 1.32 array_concat_BOOLEAN_20##4arg 1.07s 930.67m array_concat_BOOLEAN_40##2arg 914.62ms 1.09 array_concat_BOOLEAN_40##3arg 1.36s 737.16m array_concat_BOOLEAN_40##4arg 1.72s 580.03m array_concat_BOOLEAN_5##2arg 149.76ms 6.68 array_concat_BOOLEAN_5##3arg 234.81ms 4.26 array_concat_BOOLEAN_5##4arg 300.58ms 3.33 array_concat_INTEGER_10##2arg 70.89ms 14.11 array_concat_INTEGER_10##3arg 95.07ms 10.52 array_concat_INTEGER_10##4arg 124.94ms 8.00 array_concat_INTEGER_20##2arg 102.19ms 9.79 array_concat_INTEGER_20##3arg 155.30ms 6.44 array_concat_INTEGER_20##4arg 187.59ms 5.33 array_concat_INTEGER_40##2arg 122.93ms 8.13 array_concat_INTEGER_40##3arg 153.85ms 6.50 array_concat_INTEGER_40##4arg 322.33ms 3.10 array_concat_INTEGER_5##2arg 70.71ms 14.14 array_concat_INTEGER_5##3arg 100.96ms 9.90 array_concat_INTEGER_5##4arg 124.78ms 8.01 array_concat_VARCHAR_10##2arg 239.86ms 4.17 array_concat_VARCHAR_10##3arg 313.51ms 3.19 array_concat_VARCHAR_10##4arg 418.63ms 2.39 array_concat_VARCHAR_20##2arg 492.72ms 2.03 array_concat_VARCHAR_20##3arg 645.26ms 1.55 array_concat_VARCHAR_20##4arg 872.10ms 1.15 array_concat_VARCHAR_40##2arg 737.43ms 1.36 array_concat_VARCHAR_40##3arg 1.19s 843.70m array_concat_VARCHAR_40##4arg 1.52s 658.16m array_concat_VARCHAR_5##2arg 111.10ms 9.00 array_concat_VARCHAR_5##3arg 148.33ms 6.74 array_concat_VARCHAR_5##4arg 193.35ms 5.17 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 178.21ms 5.61 array_concat_BOOLEAN_10##3arg 233.11ms 4.29 array_concat_BOOLEAN_10##4arg 363.77ms 2.75 array_concat_BOOLEAN_20##2arg 456.42ms 2.19 array_concat_BOOLEAN_20##3arg 712.48ms 1.40 array_concat_BOOLEAN_20##4arg 927.58ms 1.08 array_concat_BOOLEAN_40##2arg 873.87ms 1.14 array_concat_BOOLEAN_40##3arg 1.35s 742.65m array_concat_BOOLEAN_40##4arg 1.66s 602.28m array_concat_BOOLEAN_5##2arg 141.29ms 7.08 array_concat_BOOLEAN_5##3arg 224.04ms 4.46 array_concat_BOOLEAN_5##4arg 290.93ms 3.44 array_concat_INTEGER_10##2arg 58.67ms 17.05 array_concat_INTEGER_10##3arg 80.23ms 12.46 array_concat_INTEGER_10##4arg 107.38ms 9.31 array_concat_INTEGER_20##2arg 90.53ms 11.05 array_concat_INTEGER_20##3arg 146.84ms 6.81 array_concat_INTEGER_20##4arg 174.97ms 5.72 array_concat_INTEGER_40##2arg 113.06ms 8.85 array_concat_INTEGER_40##3arg 144.51ms 6.92 array_concat_INTEGER_40##4arg 317.69ms 3.15 array_concat_INTEGER_5##2arg 60.72ms 16.47 array_concat_INTEGER_5##3arg 86.76ms 11.53 array_concat_INTEGER_5##4arg 104.10ms 9.61 array_concat_VARCHAR_10##2arg 226.63ms 4.41 array_concat_VARCHAR_10##3arg 304.74ms 3.28 array_concat_VARCHAR_10##4arg 393.14ms 2.54 array_concat_VARCHAR_20##2arg 467.90ms 2.14 array_concat_VARCHAR_20##3arg 624.86ms 1.60 array_concat_VARCHAR_20##4arg 833.13ms 1.20 array_concat_VARCHAR_40##2arg 703.85ms 1.42 array_concat_VARCHAR_40##3arg 1.20s 834.57m array_concat_VARCHAR_40##4arg 1.58s 634.88m array_concat_VARCHAR_5##2arg 104.95ms 9.53 array_concat_VARCHAR_5##3arg 138.85ms 7.20 array_concat_VARCHAR_5##4arg 178.57ms 5.60 ``` Reviewed By: kevinwilfong Differential Revision: D52380460 fbshipit-source-id: b92bae384de643ad5c6cd614c050cdd78637a5e6
…mitive types. (facebookincubator#8194) Summary: Pull Request resolved: facebookincubator#8194 add_items append elements from an array view to array writer. when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive and If a function is using add_items() then to avoid the cost authors register fast paths for primitives see (facebookincubator#7393) we can optimize add_items() and avoid that authoring overhead, right now its slow because it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for each element in the array (since they are all of the same type) and instead do it before we start the copy and have a fast path for when the elements are of a pritmive type. when the elements are not primitive the cost of checking the type s amortized by the cost of the copying the complex elements. with this diff, the function array_concat performance with generic implementation is very close to the one with registration for primitive fast paths. up to 5X faster than before ## Array concat benchmark. generic before ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 567.29ms 1.76 array_concat_BOOLEAN_10##3arg 848.30ms 1.18 array_concat_BOOLEAN_10##4arg 1.20s 835.32m array_concat_BOOLEAN_20##2arg 1.24s 804.59m array_concat_BOOLEAN_20##3arg 1.83s 545.78m array_concat_BOOLEAN_20##4arg 2.43s 411.28m array_concat_BOOLEAN_40##2arg 2.42s 413.40m array_concat_BOOLEAN_40##3arg 3.45s 290.10m array_concat_BOOLEAN_40##4arg 4.72s 211.95m array_concat_BOOLEAN_5##2arg 326.58ms 3.06 array_concat_BOOLEAN_5##3arg 500.23ms 2.00 array_concat_BOOLEAN_5##4arg 647.58ms 1.54 array_concat_INTEGER_10##2arg 451.38ms 2.22 array_concat_INTEGER_10##3arg 676.54ms 1.48 array_concat_INTEGER_10##4arg 907.98ms 1.10 array_concat_INTEGER_20##2arg 903.66ms 1.11 array_concat_INTEGER_20##3arg 1.46s 685.90m array_concat_INTEGER_20##4arg 1.90s 525.07m array_concat_INTEGER_40##2arg 1.83s 547.40m array_concat_INTEGER_40##3arg 2.63s 379.91m array_concat_INTEGER_40##4arg 3.65s 274.16m array_concat_INTEGER_5##2arg 243.12ms 4.11 array_concat_INTEGER_5##3arg 381.92ms 2.62 array_concat_INTEGER_5##4arg 502.78ms 1.99 array_concat_VARCHAR_10##2arg 1.26s 792.79m array_concat_VARCHAR_10##3arg 1.73s 579.50m array_concat_VARCHAR_10##4arg 2.21s 452.26m array_concat_VARCHAR_20##2arg 3.23s 309.67m array_concat_VARCHAR_20##3arg 4.08s 244.99m array_concat_VARCHAR_20##4arg 5.09s 196.40m array_concat_VARCHAR_40##2arg 5.49s 182.17m array_concat_VARCHAR_40##3arg 9.23s 108.36m ``` generic after ``` BUILD SUCCEEDED ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 195.54ms 5.11 array_concat_BOOLEAN_10##3arg 265.57ms 3.77 array_concat_BOOLEAN_10##4arg 397.59ms 2.52 array_concat_BOOLEAN_20##2arg 487.38ms 2.05 array_concat_BOOLEAN_20##3arg 758.45ms 1.32 array_concat_BOOLEAN_20##4arg 1.07s 930.67m array_concat_BOOLEAN_40##2arg 914.62ms 1.09 array_concat_BOOLEAN_40##3arg 1.36s 737.16m array_concat_BOOLEAN_40##4arg 1.72s 580.03m array_concat_BOOLEAN_5##2arg 149.76ms 6.68 array_concat_BOOLEAN_5##3arg 234.81ms 4.26 array_concat_BOOLEAN_5##4arg 300.58ms 3.33 array_concat_INTEGER_10##2arg 70.89ms 14.11 array_concat_INTEGER_10##3arg 95.07ms 10.52 array_concat_INTEGER_10##4arg 124.94ms 8.00 array_concat_INTEGER_20##2arg 102.19ms 9.79 array_concat_INTEGER_20##3arg 155.30ms 6.44 array_concat_INTEGER_20##4arg 187.59ms 5.33 array_concat_INTEGER_40##2arg 122.93ms 8.13 array_concat_INTEGER_40##3arg 153.85ms 6.50 array_concat_INTEGER_40##4arg 322.33ms 3.10 array_concat_INTEGER_5##2arg 70.71ms 14.14 array_concat_INTEGER_5##3arg 100.96ms 9.90 array_concat_INTEGER_5##4arg 124.78ms 8.01 array_concat_VARCHAR_10##2arg 239.86ms 4.17 array_concat_VARCHAR_10##3arg 313.51ms 3.19 array_concat_VARCHAR_10##4arg 418.63ms 2.39 array_concat_VARCHAR_20##2arg 492.72ms 2.03 array_concat_VARCHAR_20##3arg 645.26ms 1.55 array_concat_VARCHAR_20##4arg 872.10ms 1.15 array_concat_VARCHAR_40##2arg 737.43ms 1.36 array_concat_VARCHAR_40##3arg 1.19s 843.70m array_concat_VARCHAR_40##4arg 1.52s 658.16m array_concat_VARCHAR_5##2arg 111.10ms 9.00 array_concat_VARCHAR_5##3arg 148.33ms 6.74 array_concat_VARCHAR_5##4arg 193.35ms 5.17 ``` primitive fast path ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ array_concat_BOOLEAN_10##2arg 178.21ms 5.61 array_concat_BOOLEAN_10##3arg 233.11ms 4.29 array_concat_BOOLEAN_10##4arg 363.77ms 2.75 array_concat_BOOLEAN_20##2arg 456.42ms 2.19 array_concat_BOOLEAN_20##3arg 712.48ms 1.40 array_concat_BOOLEAN_20##4arg 927.58ms 1.08 array_concat_BOOLEAN_40##2arg 873.87ms 1.14 array_concat_BOOLEAN_40##3arg 1.35s 742.65m array_concat_BOOLEAN_40##4arg 1.66s 602.28m array_concat_BOOLEAN_5##2arg 141.29ms 7.08 array_concat_BOOLEAN_5##3arg 224.04ms 4.46 array_concat_BOOLEAN_5##4arg 290.93ms 3.44 array_concat_INTEGER_10##2arg 58.67ms 17.05 array_concat_INTEGER_10##3arg 80.23ms 12.46 array_concat_INTEGER_10##4arg 107.38ms 9.31 array_concat_INTEGER_20##2arg 90.53ms 11.05 array_concat_INTEGER_20##3arg 146.84ms 6.81 array_concat_INTEGER_20##4arg 174.97ms 5.72 array_concat_INTEGER_40##2arg 113.06ms 8.85 array_concat_INTEGER_40##3arg 144.51ms 6.92 array_concat_INTEGER_40##4arg 317.69ms 3.15 array_concat_INTEGER_5##2arg 60.72ms 16.47 array_concat_INTEGER_5##3arg 86.76ms 11.53 array_concat_INTEGER_5##4arg 104.10ms 9.61 array_concat_VARCHAR_10##2arg 226.63ms 4.41 array_concat_VARCHAR_10##3arg 304.74ms 3.28 array_concat_VARCHAR_10##4arg 393.14ms 2.54 array_concat_VARCHAR_20##2arg 467.90ms 2.14 array_concat_VARCHAR_20##3arg 624.86ms 1.60 array_concat_VARCHAR_20##4arg 833.13ms 1.20 array_concat_VARCHAR_40##2arg 703.85ms 1.42 array_concat_VARCHAR_40##3arg 1.20s 834.57m array_concat_VARCHAR_40##4arg 1.58s 634.88m array_concat_VARCHAR_5##2arg 104.95ms 9.53 array_concat_VARCHAR_5##3arg 138.85ms 7.20 array_concat_VARCHAR_5##4arg 178.57ms 5.60 ``` Reviewed By: kevinwilfong Differential Revision: D52380460 fbshipit-source-id: b92bae384de643ad5c6cd614c050cdd78637a5e6
Summary:
Optimize ArrayConcatFunction for primitives, similar to what we do for
registerArrayRemoveFunctions and registerArrayTrimFunctions.
Note: we can further optimize this by adding fast path for strings and
add a no copy version for that.
Note: there are also still several functions that uses add_items() and do
not have such fast path we shall optimize those also.
Follow up will address the points above.
before:
after:�
Differential Revision: D50948537