Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add primitive fast path for ArrayConcatFunction #7393

Closed

Conversation

laithsakka
Copy link
Contributor

@laithsakka laithsakka commented Nov 2, 2023

Summary:
Optimize ArrayConcatFunction for primitives, similar to what we do for
registerArrayRemoveFunctions and registerArrayTrimFunctions.

Note: we can further optimize this by adding fast path for strings and
add a no copy version for that.
Note: there are also still several functions that uses add_items() and do
not have such fast path we shall optimize those also.

Follow up will address the points above.

before:

BUILD SUCCEEDED
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             723.07ms      1.38
array_concat_BOOLEAN_10##3arg                                1.14s   877.15m
array_concat_BOOLEAN_10##4arg                                1.57s   637.52m
array_concat_BOOLEAN_20##2arg                             775.91ms      1.29
array_concat_BOOLEAN_20##3arg                                1.17s   857.28m
array_concat_BOOLEAN_20##4arg                                1.55s   646.56m
array_concat_BOOLEAN_40##2arg                             778.07ms      1.29
array_concat_BOOLEAN_40##3arg                                1.16s   860.94m
array_concat_BOOLEAN_40##4arg                                1.57s   636.92m
array_concat_BOOLEAN_5##2arg                              726.71ms      1.38
array_concat_BOOLEAN_5##3arg                                 1.11s   904.61m
array_concat_BOOLEAN_5##4arg                                 1.55s   643.97m
array_concat_INTEGER_10##2arg                             689.14ms      1.45
array_concat_INTEGER_10##3arg                                1.01s   991.35m
array_concat_INTEGER_10##4arg                                1.40s   713.43m
array_concat_INTEGER_20##2arg                             681.24ms      1.47
array_concat_INTEGER_20##3arg                                1.03s   973.70m
array_concat_INTEGER_20##4arg                                1.35s   740.10m
array_concat_INTEGER_40##2arg                             666.57ms      1.50
array_concat_INTEGER_40##3arg                                1.04s   958.60m
array_concat_INTEGER_40##4arg                                1.37s   727.85m
array_concat_INTEGER_5##2arg                              652.99ms      1.53
array_concat_INTEGER_5##3arg                              985.63ms      1.01
array_concat_INTEGER_5##4arg                                 1.34s   745.48m
array_concat_VARCHAR_10##2arg                             679.71ms      1.47
array_concat_VARCHAR_10##3arg                                1.46s   683.21m
array_concat_VARCHAR_10##4arg                                2.09s   479.20m
array_concat_VARCHAR_20##2arg                                1.36s   733.91m
array_concat_VARCHAR_20##3arg                                1.85s   539.88m
array_concat_VARCHAR_20##4arg                                2.78s   359.50m
array_concat_VARCHAR_40##2arg                                1.23s   809.85m
array_concat_VARCHAR_40##3arg                                1.84s   542.69m
array_concat_VARCHAR_40##4arg                                2.45s   407.85m
array_concat_VARCHAR_5##2arg                                 1.53s   653.44m
array_concat_VARCHAR_5##3arg                                 2.06s   485.88m
array_concat_VARCHAR_5##4arg                                 2.81s   356.51m

after:�

============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             197.26ms      5.07
array_concat_BOOLEAN_10##3arg                             337.22ms      2.97
array_concat_BOOLEAN_10##4arg                             486.89ms      2.05
array_concat_BOOLEAN_20##2arg                             280.10ms      3.57
array_concat_BOOLEAN_20##3arg                             352.67ms      2.84
array_concat_BOOLEAN_20##4arg                             495.94ms      2.02
array_concat_BOOLEAN_40##2arg                             260.39ms      3.84
array_concat_BOOLEAN_40##3arg                             415.81ms      2.40
array_concat_BOOLEAN_40##4arg                             550.83ms      1.82
array_concat_BOOLEAN_5##2arg                              189.44ms      5.28
array_concat_BOOLEAN_5##3arg                              244.64ms      4.09
array_concat_BOOLEAN_5##4arg                              376.33ms      2.66
array_concat_INTEGER_10##2arg                              80.36ms     12.44
array_concat_INTEGER_10##3arg                             129.36ms      7.73
array_concat_INTEGER_10##4arg                             194.14ms      5.15
array_concat_INTEGER_20##2arg                             110.09ms      9.08
array_concat_INTEGER_20##3arg                             144.69ms      6.91
array_concat_INTEGER_20##4arg                             179.20ms      5.58
array_concat_INTEGER_40##2arg                              83.20ms     12.02
array_concat_INTEGER_40##3arg                             128.46ms      7.78
array_concat_INTEGER_40##4arg                             167.46ms      5.97
array_concat_INTEGER_5##2arg                               80.45ms     12.43
array_concat_INTEGER_5##3arg                              111.43ms      8.97
array_concat_INTEGER_5##4arg                              154.83ms      6.46
array_concat_VARCHAR_10##2arg                             401.57ms      2.49
array_concat_VARCHAR_10##3arg                             755.30ms      1.32
array_concat_VARCHAR_10##4arg                                1.03s   969.99m
array_concat_VARCHAR_20##2arg                             681.27ms      1.47
array_concat_VARCHAR_20##3arg                             959.15ms      1.04
array_concat_VARCHAR_20##4arg                                1.50s   665.93m
array_concat_VARCHAR_40##2arg                             660.68ms      1.51
array_concat_VARCHAR_40##3arg                             984.20ms      1.02
array_concat_VARCHAR_40##4arg                                1.35s   738.16m
array_concat_VARCHAR_5##2arg                              827.10ms      1.21
array_concat_VARCHAR_5##3arg                                 1.11s   903.32m
array_concat_VARCHAR_5##4arg                                 1.47s   682.15m

Differential Revision: D50948537

Copy link

netlify bot commented Nov 2, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit a1b7d28
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/654954bf5a7a330008cca8c2

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 2, 2023
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50948537

laithsakka added a commit to laithsakka/velox that referenced this pull request Nov 2, 2023
Summary:

Optimize  ArrayConcatFunction for primitives, similar to what we do for
registerArrayRemoveFunctions and registerArrayTrimFunctions.

Note: we can further optimize this by adding fast path for strings and
add a no copy version for that.
Note: there are also still several functions that uses add_items() and do
not have such fast path we shall optimize those also.

Follow up will address the points above.

before:�
```
BUILD SUCCEEDED
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             723.07ms      1.38
array_concat_BOOLEAN_10##3arg                                1.14s   877.15m
array_concat_BOOLEAN_10##4arg                                1.57s   637.52m
array_concat_BOOLEAN_20##2arg                             775.91ms      1.29
array_concat_BOOLEAN_20##3arg                                1.17s   857.28m
array_concat_BOOLEAN_20##4arg                                1.55s   646.56m
array_concat_BOOLEAN_40##2arg                             778.07ms      1.29
array_concat_BOOLEAN_40##3arg                                1.16s   860.94m
array_concat_BOOLEAN_40##4arg                                1.57s   636.92m
array_concat_BOOLEAN_5##2arg                              726.71ms      1.38
array_concat_BOOLEAN_5##3arg                                 1.11s   904.61m
array_concat_BOOLEAN_5##4arg                                 1.55s   643.97m
array_concat_INTEGER_10##2arg                             689.14ms      1.45
array_concat_INTEGER_10##3arg                                1.01s   991.35m
array_concat_INTEGER_10##4arg                                1.40s   713.43m
array_concat_INTEGER_20##2arg                             681.24ms      1.47
array_concat_INTEGER_20##3arg                                1.03s   973.70m
array_concat_INTEGER_20##4arg                                1.35s   740.10m
array_concat_INTEGER_40##2arg                             666.57ms      1.50
array_concat_INTEGER_40##3arg                                1.04s   958.60m
array_concat_INTEGER_40##4arg                                1.37s   727.85m
array_concat_INTEGER_5##2arg                              652.99ms      1.53
array_concat_INTEGER_5##3arg                              985.63ms      1.01
array_concat_INTEGER_5##4arg                                 1.34s   745.48m
array_concat_VARCHAR_10##2arg                             679.71ms      1.47
array_concat_VARCHAR_10##3arg                                1.46s   683.21m
array_concat_VARCHAR_10##4arg                                2.09s   479.20m
array_concat_VARCHAR_20##2arg                                1.36s   733.91m
array_concat_VARCHAR_20##3arg                                1.85s   539.88m
array_concat_VARCHAR_20##4arg                                2.78s   359.50m
array_concat_VARCHAR_40##2arg                                1.23s   809.85m
array_concat_VARCHAR_40##3arg                                1.84s   542.69m
array_concat_VARCHAR_40##4arg                                2.45s   407.85m
array_concat_VARCHAR_5##2arg                                 1.53s   653.44m
array_concat_VARCHAR_5##3arg                                 2.06s   485.88m
array_concat_VARCHAR_5##4arg                                 2.81s   356.51m
```


after:�
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             197.26ms      5.07
array_concat_BOOLEAN_10##3arg                             337.22ms      2.97
array_concat_BOOLEAN_10##4arg                             486.89ms      2.05
array_concat_BOOLEAN_20##2arg                             280.10ms      3.57
array_concat_BOOLEAN_20##3arg                             352.67ms      2.84
array_concat_BOOLEAN_20##4arg                             495.94ms      2.02
array_concat_BOOLEAN_40##2arg                             260.39ms      3.84
array_concat_BOOLEAN_40##3arg                             415.81ms      2.40
array_concat_BOOLEAN_40##4arg                             550.83ms      1.82
array_concat_BOOLEAN_5##2arg                              189.44ms      5.28
array_concat_BOOLEAN_5##3arg                              244.64ms      4.09
array_concat_BOOLEAN_5##4arg                              376.33ms      2.66
array_concat_INTEGER_10##2arg                              80.36ms     12.44
array_concat_INTEGER_10##3arg                             129.36ms      7.73
array_concat_INTEGER_10##4arg                             194.14ms      5.15
array_concat_INTEGER_20##2arg                             110.09ms      9.08
array_concat_INTEGER_20##3arg                             144.69ms      6.91
array_concat_INTEGER_20##4arg                             179.20ms      5.58
array_concat_INTEGER_40##2arg                              83.20ms     12.02
array_concat_INTEGER_40##3arg                             128.46ms      7.78
array_concat_INTEGER_40##4arg                             167.46ms      5.97
array_concat_INTEGER_5##2arg                               80.45ms     12.43
array_concat_INTEGER_5##3arg                              111.43ms      8.97
array_concat_INTEGER_5##4arg                              154.83ms      6.46
array_concat_VARCHAR_10##2arg                             401.57ms      2.49
array_concat_VARCHAR_10##3arg                             755.30ms      1.32
array_concat_VARCHAR_10##4arg                                1.03s   969.99m
array_concat_VARCHAR_20##2arg                             681.27ms      1.47
array_concat_VARCHAR_20##3arg                             959.15ms      1.04
array_concat_VARCHAR_20##4arg                                1.50s   665.93m
array_concat_VARCHAR_40##2arg                             660.68ms      1.51
array_concat_VARCHAR_40##3arg                             984.20ms      1.02
array_concat_VARCHAR_40##4arg                                1.35s   738.16m
array_concat_VARCHAR_5##2arg                              827.10ms      1.21
array_concat_VARCHAR_5##3arg                                 1.11s   903.32m
array_concat_VARCHAR_5##4arg                                 1.47s   682.15m
```

Differential Revision: D50948537
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50948537

laithsakka added a commit to laithsakka/velox that referenced this pull request Nov 2, 2023
Summary:

Optimize  ArrayConcatFunction for primitives, similar to what we do for
registerArrayRemoveFunctions and registerArrayTrimFunctions.

Note: we can further optimize this by adding fast path for strings and
add a no copy version for that.
Note: there are also still several functions that uses add_items() and do
not have such fast path we shall optimize those also.

Follow up will address the points above.

before:�
```
BUILD SUCCEEDED
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             723.07ms      1.38
array_concat_BOOLEAN_10##3arg                                1.14s   877.15m
array_concat_BOOLEAN_10##4arg                                1.57s   637.52m
array_concat_BOOLEAN_20##2arg                             775.91ms      1.29
array_concat_BOOLEAN_20##3arg                                1.17s   857.28m
array_concat_BOOLEAN_20##4arg                                1.55s   646.56m
array_concat_BOOLEAN_40##2arg                             778.07ms      1.29
array_concat_BOOLEAN_40##3arg                                1.16s   860.94m
array_concat_BOOLEAN_40##4arg                                1.57s   636.92m
array_concat_BOOLEAN_5##2arg                              726.71ms      1.38
array_concat_BOOLEAN_5##3arg                                 1.11s   904.61m
array_concat_BOOLEAN_5##4arg                                 1.55s   643.97m
array_concat_INTEGER_10##2arg                             689.14ms      1.45
array_concat_INTEGER_10##3arg                                1.01s   991.35m
array_concat_INTEGER_10##4arg                                1.40s   713.43m
array_concat_INTEGER_20##2arg                             681.24ms      1.47
array_concat_INTEGER_20##3arg                                1.03s   973.70m
array_concat_INTEGER_20##4arg                                1.35s   740.10m
array_concat_INTEGER_40##2arg                             666.57ms      1.50
array_concat_INTEGER_40##3arg                                1.04s   958.60m
array_concat_INTEGER_40##4arg                                1.37s   727.85m
array_concat_INTEGER_5##2arg                              652.99ms      1.53
array_concat_INTEGER_5##3arg                              985.63ms      1.01
array_concat_INTEGER_5##4arg                                 1.34s   745.48m
array_concat_VARCHAR_10##2arg                             679.71ms      1.47
array_concat_VARCHAR_10##3arg                                1.46s   683.21m
array_concat_VARCHAR_10##4arg                                2.09s   479.20m
array_concat_VARCHAR_20##2arg                                1.36s   733.91m
array_concat_VARCHAR_20##3arg                                1.85s   539.88m
array_concat_VARCHAR_20##4arg                                2.78s   359.50m
array_concat_VARCHAR_40##2arg                                1.23s   809.85m
array_concat_VARCHAR_40##3arg                                1.84s   542.69m
array_concat_VARCHAR_40##4arg                                2.45s   407.85m
array_concat_VARCHAR_5##2arg                                 1.53s   653.44m
array_concat_VARCHAR_5##3arg                                 2.06s   485.88m
array_concat_VARCHAR_5##4arg                                 2.81s   356.51m
```


after:�
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             197.26ms      5.07
array_concat_BOOLEAN_10##3arg                             337.22ms      2.97
array_concat_BOOLEAN_10##4arg                             486.89ms      2.05
array_concat_BOOLEAN_20##2arg                             280.10ms      3.57
array_concat_BOOLEAN_20##3arg                             352.67ms      2.84
array_concat_BOOLEAN_20##4arg                             495.94ms      2.02
array_concat_BOOLEAN_40##2arg                             260.39ms      3.84
array_concat_BOOLEAN_40##3arg                             415.81ms      2.40
array_concat_BOOLEAN_40##4arg                             550.83ms      1.82
array_concat_BOOLEAN_5##2arg                              189.44ms      5.28
array_concat_BOOLEAN_5##3arg                              244.64ms      4.09
array_concat_BOOLEAN_5##4arg                              376.33ms      2.66
array_concat_INTEGER_10##2arg                              80.36ms     12.44
array_concat_INTEGER_10##3arg                             129.36ms      7.73
array_concat_INTEGER_10##4arg                             194.14ms      5.15
array_concat_INTEGER_20##2arg                             110.09ms      9.08
array_concat_INTEGER_20##3arg                             144.69ms      6.91
array_concat_INTEGER_20##4arg                             179.20ms      5.58
array_concat_INTEGER_40##2arg                              83.20ms     12.02
array_concat_INTEGER_40##3arg                             128.46ms      7.78
array_concat_INTEGER_40##4arg                             167.46ms      5.97
array_concat_INTEGER_5##2arg                               80.45ms     12.43
array_concat_INTEGER_5##3arg                              111.43ms      8.97
array_concat_INTEGER_5##4arg                              154.83ms      6.46
array_concat_VARCHAR_10##2arg                             401.57ms      2.49
array_concat_VARCHAR_10##3arg                             755.30ms      1.32
array_concat_VARCHAR_10##4arg                                1.03s   969.99m
array_concat_VARCHAR_20##2arg                             681.27ms      1.47
array_concat_VARCHAR_20##3arg                             959.15ms      1.04
array_concat_VARCHAR_20##4arg                                1.50s   665.93m
array_concat_VARCHAR_40##2arg                             660.68ms      1.51
array_concat_VARCHAR_40##3arg                             984.20ms      1.02
array_concat_VARCHAR_40##4arg                                1.35s   738.16m
array_concat_VARCHAR_5##2arg                              827.10ms      1.21
array_concat_VARCHAR_5##3arg                                 1.11s   903.32m
array_concat_VARCHAR_5##4arg                                 1.47s   682.15m
```

Differential Revision: D50948537
laithsakka added a commit to laithsakka/velox that referenced this pull request Nov 2, 2023
Summary:

Optimize  ArrayConcatFunction for primitives, similar to what we do for
registerArrayRemoveFunctions and registerArrayTrimFunctions.

Note: we can further optimize this by adding fast path for strings and
add a no copy version for that.
Note: there are also still several functions that uses add_items() and do
not have such fast path we shall optimize those also.

Follow up will address the points above.

before:�
```
BUILD SUCCEEDED
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             723.07ms      1.38
array_concat_BOOLEAN_10##3arg                                1.14s   877.15m
array_concat_BOOLEAN_10##4arg                                1.57s   637.52m
array_concat_BOOLEAN_20##2arg                             775.91ms      1.29
array_concat_BOOLEAN_20##3arg                                1.17s   857.28m
array_concat_BOOLEAN_20##4arg                                1.55s   646.56m
array_concat_BOOLEAN_40##2arg                             778.07ms      1.29
array_concat_BOOLEAN_40##3arg                                1.16s   860.94m
array_concat_BOOLEAN_40##4arg                                1.57s   636.92m
array_concat_BOOLEAN_5##2arg                              726.71ms      1.38
array_concat_BOOLEAN_5##3arg                                 1.11s   904.61m
array_concat_BOOLEAN_5##4arg                                 1.55s   643.97m
array_concat_INTEGER_10##2arg                             689.14ms      1.45
array_concat_INTEGER_10##3arg                                1.01s   991.35m
array_concat_INTEGER_10##4arg                                1.40s   713.43m
array_concat_INTEGER_20##2arg                             681.24ms      1.47
array_concat_INTEGER_20##3arg                                1.03s   973.70m
array_concat_INTEGER_20##4arg                                1.35s   740.10m
array_concat_INTEGER_40##2arg                             666.57ms      1.50
array_concat_INTEGER_40##3arg                                1.04s   958.60m
array_concat_INTEGER_40##4arg                                1.37s   727.85m
array_concat_INTEGER_5##2arg                              652.99ms      1.53
array_concat_INTEGER_5##3arg                              985.63ms      1.01
array_concat_INTEGER_5##4arg                                 1.34s   745.48m
array_concat_VARCHAR_10##2arg                             679.71ms      1.47
array_concat_VARCHAR_10##3arg                                1.46s   683.21m
array_concat_VARCHAR_10##4arg                                2.09s   479.20m
array_concat_VARCHAR_20##2arg                                1.36s   733.91m
array_concat_VARCHAR_20##3arg                                1.85s   539.88m
array_concat_VARCHAR_20##4arg                                2.78s   359.50m
array_concat_VARCHAR_40##2arg                                1.23s   809.85m
array_concat_VARCHAR_40##3arg                                1.84s   542.69m
array_concat_VARCHAR_40##4arg                                2.45s   407.85m
array_concat_VARCHAR_5##2arg                                 1.53s   653.44m
array_concat_VARCHAR_5##3arg                                 2.06s   485.88m
array_concat_VARCHAR_5##4arg                                 2.81s   356.51m
```


after:�
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             197.26ms      5.07
array_concat_BOOLEAN_10##3arg                             337.22ms      2.97
array_concat_BOOLEAN_10##4arg                             486.89ms      2.05
array_concat_BOOLEAN_20##2arg                             280.10ms      3.57
array_concat_BOOLEAN_20##3arg                             352.67ms      2.84
array_concat_BOOLEAN_20##4arg                             495.94ms      2.02
array_concat_BOOLEAN_40##2arg                             260.39ms      3.84
array_concat_BOOLEAN_40##3arg                             415.81ms      2.40
array_concat_BOOLEAN_40##4arg                             550.83ms      1.82
array_concat_BOOLEAN_5##2arg                              189.44ms      5.28
array_concat_BOOLEAN_5##3arg                              244.64ms      4.09
array_concat_BOOLEAN_5##4arg                              376.33ms      2.66
array_concat_INTEGER_10##2arg                              80.36ms     12.44
array_concat_INTEGER_10##3arg                             129.36ms      7.73
array_concat_INTEGER_10##4arg                             194.14ms      5.15
array_concat_INTEGER_20##2arg                             110.09ms      9.08
array_concat_INTEGER_20##3arg                             144.69ms      6.91
array_concat_INTEGER_20##4arg                             179.20ms      5.58
array_concat_INTEGER_40##2arg                              83.20ms     12.02
array_concat_INTEGER_40##3arg                             128.46ms      7.78
array_concat_INTEGER_40##4arg                             167.46ms      5.97
array_concat_INTEGER_5##2arg                               80.45ms     12.43
array_concat_INTEGER_5##3arg                              111.43ms      8.97
array_concat_INTEGER_5##4arg                              154.83ms      6.46
array_concat_VARCHAR_10##2arg                             401.57ms      2.49
array_concat_VARCHAR_10##3arg                             755.30ms      1.32
array_concat_VARCHAR_10##4arg                                1.03s   969.99m
array_concat_VARCHAR_20##2arg                             681.27ms      1.47
array_concat_VARCHAR_20##3arg                             959.15ms      1.04
array_concat_VARCHAR_20##4arg                                1.50s   665.93m
array_concat_VARCHAR_40##2arg                             660.68ms      1.51
array_concat_VARCHAR_40##3arg                             984.20ms      1.02
array_concat_VARCHAR_40##4arg                                1.35s   738.16m
array_concat_VARCHAR_5##2arg                              827.10ms      1.21
array_concat_VARCHAR_5##3arg                                 1.11s   903.32m
array_concat_VARCHAR_5##4arg                                 1.47s   682.15m
```

Differential Revision: D50948537
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50948537

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50948537

laithsakka added a commit to laithsakka/velox that referenced this pull request Nov 2, 2023
Summary:

Optimize  ArrayConcatFunction for primitives, similar to what we do for
registerArrayRemoveFunctions and registerArrayTrimFunctions.

Note: we can further optimize this by adding fast path for strings and
add a no copy version for that.
Note: there are also still several functions that uses add_items() and do
not have such fast path we shall optimize those also.

Follow up will address the points above.

before:�
```
BUILD SUCCEEDED
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             723.07ms      1.38
array_concat_BOOLEAN_10##3arg                                1.14s   877.15m
array_concat_BOOLEAN_10##4arg                                1.57s   637.52m
array_concat_BOOLEAN_20##2arg                             775.91ms      1.29
array_concat_BOOLEAN_20##3arg                                1.17s   857.28m
array_concat_BOOLEAN_20##4arg                                1.55s   646.56m
array_concat_BOOLEAN_40##2arg                             778.07ms      1.29
array_concat_BOOLEAN_40##3arg                                1.16s   860.94m
array_concat_BOOLEAN_40##4arg                                1.57s   636.92m
array_concat_BOOLEAN_5##2arg                              726.71ms      1.38
array_concat_BOOLEAN_5##3arg                                 1.11s   904.61m
array_concat_BOOLEAN_5##4arg                                 1.55s   643.97m
array_concat_INTEGER_10##2arg                             689.14ms      1.45
array_concat_INTEGER_10##3arg                                1.01s   991.35m
array_concat_INTEGER_10##4arg                                1.40s   713.43m
array_concat_INTEGER_20##2arg                             681.24ms      1.47
array_concat_INTEGER_20##3arg                                1.03s   973.70m
array_concat_INTEGER_20##4arg                                1.35s   740.10m
array_concat_INTEGER_40##2arg                             666.57ms      1.50
array_concat_INTEGER_40##3arg                                1.04s   958.60m
array_concat_INTEGER_40##4arg                                1.37s   727.85m
array_concat_INTEGER_5##2arg                              652.99ms      1.53
array_concat_INTEGER_5##3arg                              985.63ms      1.01
array_concat_INTEGER_5##4arg                                 1.34s   745.48m
array_concat_VARCHAR_10##2arg                             679.71ms      1.47
array_concat_VARCHAR_10##3arg                                1.46s   683.21m
array_concat_VARCHAR_10##4arg                                2.09s   479.20m
array_concat_VARCHAR_20##2arg                                1.36s   733.91m
array_concat_VARCHAR_20##3arg                                1.85s   539.88m
array_concat_VARCHAR_20##4arg                                2.78s   359.50m
array_concat_VARCHAR_40##2arg                                1.23s   809.85m
array_concat_VARCHAR_40##3arg                                1.84s   542.69m
array_concat_VARCHAR_40##4arg                                2.45s   407.85m
array_concat_VARCHAR_5##2arg                                 1.53s   653.44m
array_concat_VARCHAR_5##3arg                                 2.06s   485.88m
array_concat_VARCHAR_5##4arg                                 2.81s   356.51m
```


after:�
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             197.26ms      5.07
array_concat_BOOLEAN_10##3arg                             337.22ms      2.97
array_concat_BOOLEAN_10##4arg                             486.89ms      2.05
array_concat_BOOLEAN_20##2arg                             280.10ms      3.57
array_concat_BOOLEAN_20##3arg                             352.67ms      2.84
array_concat_BOOLEAN_20##4arg                             495.94ms      2.02
array_concat_BOOLEAN_40##2arg                             260.39ms      3.84
array_concat_BOOLEAN_40##3arg                             415.81ms      2.40
array_concat_BOOLEAN_40##4arg                             550.83ms      1.82
array_concat_BOOLEAN_5##2arg                              189.44ms      5.28
array_concat_BOOLEAN_5##3arg                              244.64ms      4.09
array_concat_BOOLEAN_5##4arg                              376.33ms      2.66
array_concat_INTEGER_10##2arg                              80.36ms     12.44
array_concat_INTEGER_10##3arg                             129.36ms      7.73
array_concat_INTEGER_10##4arg                             194.14ms      5.15
array_concat_INTEGER_20##2arg                             110.09ms      9.08
array_concat_INTEGER_20##3arg                             144.69ms      6.91
array_concat_INTEGER_20##4arg                             179.20ms      5.58
array_concat_INTEGER_40##2arg                              83.20ms     12.02
array_concat_INTEGER_40##3arg                             128.46ms      7.78
array_concat_INTEGER_40##4arg                             167.46ms      5.97
array_concat_INTEGER_5##2arg                               80.45ms     12.43
array_concat_INTEGER_5##3arg                              111.43ms      8.97
array_concat_INTEGER_5##4arg                              154.83ms      6.46
array_concat_VARCHAR_10##2arg                             401.57ms      2.49
array_concat_VARCHAR_10##3arg                             755.30ms      1.32
array_concat_VARCHAR_10##4arg                                1.03s   969.99m
array_concat_VARCHAR_20##2arg                             681.27ms      1.47
array_concat_VARCHAR_20##3arg                             959.15ms      1.04
array_concat_VARCHAR_20##4arg                                1.50s   665.93m
array_concat_VARCHAR_40##2arg                             660.68ms      1.51
array_concat_VARCHAR_40##3arg                             984.20ms      1.02
array_concat_VARCHAR_40##4arg                                1.35s   738.16m
array_concat_VARCHAR_5##2arg                              827.10ms      1.21
array_concat_VARCHAR_5##3arg                                 1.11s   903.32m
array_concat_VARCHAR_5##4arg                                 1.47s   682.15m
```

Differential Revision: D50948537
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50948537

laithsakka added a commit to laithsakka/velox that referenced this pull request Nov 3, 2023
Summary:

Optimize  ArrayConcatFunction for primitives, similar to what we do for
registerArrayRemoveFunctions and registerArrayTrimFunctions.

Note: we can further optimize this by adding fast path for strings and
add a no copy version for that.
Note: there are also still several functions that uses add_items() and do
not have such fast path we shall optimize those also.

Follow up will address the points above.

before:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             737.98ms      1.36
array_concat_BOOLEAN_10##3arg                                1.14s   874.37m
array_concat_BOOLEAN_10##4arg                                1.50s   666.55m
array_concat_BOOLEAN_20##2arg                                1.58s   631.44m
array_concat_BOOLEAN_20##3arg                                2.33s   428.92m
array_concat_BOOLEAN_20##4arg                                3.22s   310.71m
array_concat_BOOLEAN_40##2arg                                3.07s   325.76m
array_concat_BOOLEAN_40##3arg                                4.75s   210.37m
array_concat_BOOLEAN_40##4arg                                6.32s   158.26m
array_concat_BOOLEAN_5##2arg                              451.47ms      2.21
array_concat_BOOLEAN_5##3arg                              674.46ms      1.48
array_concat_BOOLEAN_5##4arg                              859.56ms      1.16
array_concat_INTEGER_10##2arg                             706.34ms      1.42
array_concat_INTEGER_10##3arg                                1.09s   919.50m
array_concat_INTEGER_10##4arg                                1.47s   681.77m
array_concat_INTEGER_20##2arg                                1.40s   716.06m
array_concat_INTEGER_20##3arg                                2.02s   494.92m
array_concat_INTEGER_20##4arg                                2.73s   366.24m
array_concat_INTEGER_40##2arg                                2.68s   372.98m
array_concat_INTEGER_40##3arg                                3.98s   251.52m
array_concat_INTEGER_40##4arg                                5.40s   185.08m
array_concat_INTEGER_5##2arg                              382.78ms      2.61
array_concat_INTEGER_5##3arg                              565.82ms      1.77
array_concat_INTEGER_5##4arg                              758.75ms      1.32
array_concat_VARCHAR_10##2arg                                1.24s   803.73m
array_concat_VARCHAR_10##3arg                                1.81s   552.59m
array_concat_VARCHAR_10##4arg                                2.31s   432.19m
array_concat_VARCHAR_20##2arg                                3.38s   295.55m
array_concat_VARCHAR_20##3arg                                4.53s   220.65m
array_concat_VARCHAR_20##4arg                                5.61s   178.32m
array_concat_VARCHAR_40##2arg                                5.69s   175.66m
array_concat_VARCHAR_40##3arg                                9.95s   100.53m
array_concat_VARCHAR_40##4arg                               11.99s    83.39m
array_concat_VARCHAR_5##2arg                              523.94ms      1.91
array_concat_VARCHAR_5##3arg                              797.74ms      1.25
array_concat_VARCHAR_5##4arg                                 1.05s   954.15m
```


after:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             190.24ms      5.26
array_concat_BOOLEAN_10##3arg                             253.91ms      3.94
array_concat_BOOLEAN_10##4arg                             387.03ms      2.58
array_concat_BOOLEAN_20##2arg                             484.33ms      2.06
array_concat_BOOLEAN_20##3arg                             766.60ms      1.30
array_concat_BOOLEAN_20##4arg                                1.01s   991.11m
array_concat_BOOLEAN_40##2arg                             982.59ms      1.02
array_concat_BOOLEAN_40##3arg                                1.36s   736.99m
array_concat_BOOLEAN_40##4arg                                1.74s   575.58m
array_concat_BOOLEAN_5##2arg                              139.40ms      7.17
array_concat_BOOLEAN_5##3arg                              214.43ms      4.66
array_concat_BOOLEAN_5##4arg                              273.88ms      3.65
array_concat_INTEGER_10##2arg                              80.90ms     12.36
array_concat_INTEGER_10##3arg                             110.80ms      9.03
array_concat_INTEGER_10##4arg                             149.86ms      6.67
array_concat_INTEGER_20##2arg                             167.08ms      5.99
array_concat_INTEGER_20##3arg                             261.83ms      3.82
array_concat_INTEGER_20##4arg                             319.26ms      3.13
array_concat_INTEGER_40##2arg                             301.37ms      3.32
array_concat_INTEGER_40##3arg                             422.25ms      2.37
array_concat_INTEGER_40##4arg                             714.74ms      1.40
array_concat_INTEGER_5##2arg                               60.61ms     16.50
array_concat_INTEGER_5##3arg                               89.28ms     11.20
array_concat_INTEGER_5##4arg                              117.99ms      8.48
array_concat_VARCHAR_10##2arg                             652.44ms      1.53
array_concat_VARCHAR_10##3arg                             958.59ms      1.04
array_concat_VARCHAR_10##4arg                                1.26s   790.86m
array_concat_VARCHAR_20##2arg                                1.67s   598.25m
array_concat_VARCHAR_20##3arg                                2.22s   449.48m
array_concat_VARCHAR_20##4arg                                2.82s   355.01m
array_concat_VARCHAR_40##2arg                                2.83s   353.24m
array_concat_VARCHAR_40##3arg                                4.98s   200.99m
array_concat_VARCHAR_40##4arg                                7.03s   142.22m
array_concat_VARCHAR_5##2arg                              290.04ms      3.45
array_concat_VARCHAR_5##3arg                              438.06ms      2.28
array_concat_VARCHAR_5##4arg                              584.20ms      1.71
```

Differential Revision: D50948537
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50948537

laithsakka added a commit to laithsakka/velox that referenced this pull request Nov 3, 2023
Summary:

Optimize  ArrayConcatFunction for primitives, similar to what we do for
registerArrayRemoveFunctions and registerArrayTrimFunctions.

Note: we can further optimize this by adding fast path for strings and
add a no copy version for that.
Note: there are also still several functions that uses add_items() and do
not have such fast path we shall optimize those also.

Follow up will address the points above.

before:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             737.98ms      1.36
array_concat_BOOLEAN_10##3arg                                1.14s   874.37m
array_concat_BOOLEAN_10##4arg                                1.50s   666.55m
array_concat_BOOLEAN_20##2arg                                1.58s   631.44m
array_concat_BOOLEAN_20##3arg                                2.33s   428.92m
array_concat_BOOLEAN_20##4arg                                3.22s   310.71m
array_concat_BOOLEAN_40##2arg                                3.07s   325.76m
array_concat_BOOLEAN_40##3arg                                4.75s   210.37m
array_concat_BOOLEAN_40##4arg                                6.32s   158.26m
array_concat_BOOLEAN_5##2arg                              451.47ms      2.21
array_concat_BOOLEAN_5##3arg                              674.46ms      1.48
array_concat_BOOLEAN_5##4arg                              859.56ms      1.16
array_concat_INTEGER_10##2arg                             706.34ms      1.42
array_concat_INTEGER_10##3arg                                1.09s   919.50m
array_concat_INTEGER_10##4arg                                1.47s   681.77m
array_concat_INTEGER_20##2arg                                1.40s   716.06m
array_concat_INTEGER_20##3arg                                2.02s   494.92m
array_concat_INTEGER_20##4arg                                2.73s   366.24m
array_concat_INTEGER_40##2arg                                2.68s   372.98m
array_concat_INTEGER_40##3arg                                3.98s   251.52m
array_concat_INTEGER_40##4arg                                5.40s   185.08m
array_concat_INTEGER_5##2arg                              382.78ms      2.61
array_concat_INTEGER_5##3arg                              565.82ms      1.77
array_concat_INTEGER_5##4arg                              758.75ms      1.32
array_concat_VARCHAR_10##2arg                                1.24s   803.73m
array_concat_VARCHAR_10##3arg                                1.81s   552.59m
array_concat_VARCHAR_10##4arg                                2.31s   432.19m
array_concat_VARCHAR_20##2arg                                3.38s   295.55m
array_concat_VARCHAR_20##3arg                                4.53s   220.65m
array_concat_VARCHAR_20##4arg                                5.61s   178.32m
array_concat_VARCHAR_40##2arg                                5.69s   175.66m
array_concat_VARCHAR_40##3arg                                9.95s   100.53m
array_concat_VARCHAR_40##4arg                               11.99s    83.39m
array_concat_VARCHAR_5##2arg                              523.94ms      1.91
array_concat_VARCHAR_5##3arg                              797.74ms      1.25
array_concat_VARCHAR_5##4arg                                 1.05s   954.15m
```


after:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             190.24ms      5.26
array_concat_BOOLEAN_10##3arg                             253.91ms      3.94
array_concat_BOOLEAN_10##4arg                             387.03ms      2.58
array_concat_BOOLEAN_20##2arg                             484.33ms      2.06
array_concat_BOOLEAN_20##3arg                             766.60ms      1.30
array_concat_BOOLEAN_20##4arg                                1.01s   991.11m
array_concat_BOOLEAN_40##2arg                             982.59ms      1.02
array_concat_BOOLEAN_40##3arg                                1.36s   736.99m
array_concat_BOOLEAN_40##4arg                                1.74s   575.58m
array_concat_BOOLEAN_5##2arg                              139.40ms      7.17
array_concat_BOOLEAN_5##3arg                              214.43ms      4.66
array_concat_BOOLEAN_5##4arg                              273.88ms      3.65
array_concat_INTEGER_10##2arg                              80.90ms     12.36
array_concat_INTEGER_10##3arg                             110.80ms      9.03
array_concat_INTEGER_10##4arg                             149.86ms      6.67
array_concat_INTEGER_20##2arg                             167.08ms      5.99
array_concat_INTEGER_20##3arg                             261.83ms      3.82
array_concat_INTEGER_20##4arg                             319.26ms      3.13
array_concat_INTEGER_40##2arg                             301.37ms      3.32
array_concat_INTEGER_40##3arg                             422.25ms      2.37
array_concat_INTEGER_40##4arg                             714.74ms      1.40
array_concat_INTEGER_5##2arg                               60.61ms     16.50
array_concat_INTEGER_5##3arg                               89.28ms     11.20
array_concat_INTEGER_5##4arg                              117.99ms      8.48
array_concat_VARCHAR_10##2arg                             652.44ms      1.53
array_concat_VARCHAR_10##3arg                             958.59ms      1.04
array_concat_VARCHAR_10##4arg                                1.26s   790.86m
array_concat_VARCHAR_20##2arg                                1.67s   598.25m
array_concat_VARCHAR_20##3arg                                2.22s   449.48m
array_concat_VARCHAR_20##4arg                                2.82s   355.01m
array_concat_VARCHAR_40##2arg                                2.83s   353.24m
array_concat_VARCHAR_40##3arg                                4.98s   200.99m
array_concat_VARCHAR_40##4arg                                7.03s   142.22m
array_concat_VARCHAR_5##2arg                              290.04ms      3.45
array_concat_VARCHAR_5##3arg                              438.06ms      2.28
array_concat_VARCHAR_5##4arg                              584.20ms      1.71
```

Differential Revision: D50948537
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50948537

laithsakka added a commit to laithsakka/velox that referenced this pull request Nov 3, 2023
Summary:

Optimize  ArrayConcatFunction for primitives, similar to what we do for
registerArrayRemoveFunctions and registerArrayTrimFunctions.

Note: we can further optimize this by adding fast path for strings and
add a no copy version for that.
Note: there are also still several functions that uses add_items() and do
not have such fast path we shall optimize those also.

Follow up will address the points above.

before:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             737.98ms      1.36
array_concat_BOOLEAN_10##3arg                                1.14s   874.37m
array_concat_BOOLEAN_10##4arg                                1.50s   666.55m
array_concat_BOOLEAN_20##2arg                                1.58s   631.44m
array_concat_BOOLEAN_20##3arg                                2.33s   428.92m
array_concat_BOOLEAN_20##4arg                                3.22s   310.71m
array_concat_BOOLEAN_40##2arg                                3.07s   325.76m
array_concat_BOOLEAN_40##3arg                                4.75s   210.37m
array_concat_BOOLEAN_40##4arg                                6.32s   158.26m
array_concat_BOOLEAN_5##2arg                              451.47ms      2.21
array_concat_BOOLEAN_5##3arg                              674.46ms      1.48
array_concat_BOOLEAN_5##4arg                              859.56ms      1.16
array_concat_INTEGER_10##2arg                             706.34ms      1.42
array_concat_INTEGER_10##3arg                                1.09s   919.50m
array_concat_INTEGER_10##4arg                                1.47s   681.77m
array_concat_INTEGER_20##2arg                                1.40s   716.06m
array_concat_INTEGER_20##3arg                                2.02s   494.92m
array_concat_INTEGER_20##4arg                                2.73s   366.24m
array_concat_INTEGER_40##2arg                                2.68s   372.98m
array_concat_INTEGER_40##3arg                                3.98s   251.52m
array_concat_INTEGER_40##4arg                                5.40s   185.08m
array_concat_INTEGER_5##2arg                              382.78ms      2.61
array_concat_INTEGER_5##3arg                              565.82ms      1.77
array_concat_INTEGER_5##4arg                              758.75ms      1.32
array_concat_VARCHAR_10##2arg                                1.24s   803.73m
array_concat_VARCHAR_10##3arg                                1.81s   552.59m
array_concat_VARCHAR_10##4arg                                2.31s   432.19m
array_concat_VARCHAR_20##2arg                                3.38s   295.55m
array_concat_VARCHAR_20##3arg                                4.53s   220.65m
array_concat_VARCHAR_20##4arg                                5.61s   178.32m
array_concat_VARCHAR_40##2arg                                5.69s   175.66m
array_concat_VARCHAR_40##3arg                                9.95s   100.53m
array_concat_VARCHAR_40##4arg                               11.99s    83.39m
array_concat_VARCHAR_5##2arg                              523.94ms      1.91
array_concat_VARCHAR_5##3arg                              797.74ms      1.25
array_concat_VARCHAR_5##4arg                                 1.05s   954.15m
```


after:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             190.24ms      5.26
array_concat_BOOLEAN_10##3arg                             253.91ms      3.94
array_concat_BOOLEAN_10##4arg                             387.03ms      2.58
array_concat_BOOLEAN_20##2arg                             484.33ms      2.06
array_concat_BOOLEAN_20##3arg                             766.60ms      1.30
array_concat_BOOLEAN_20##4arg                                1.01s   991.11m
array_concat_BOOLEAN_40##2arg                             982.59ms      1.02
array_concat_BOOLEAN_40##3arg                                1.36s   736.99m
array_concat_BOOLEAN_40##4arg                                1.74s   575.58m
array_concat_BOOLEAN_5##2arg                              139.40ms      7.17
array_concat_BOOLEAN_5##3arg                              214.43ms      4.66
array_concat_BOOLEAN_5##4arg                              273.88ms      3.65
array_concat_INTEGER_10##2arg                              80.90ms     12.36
array_concat_INTEGER_10##3arg                             110.80ms      9.03
array_concat_INTEGER_10##4arg                             149.86ms      6.67
array_concat_INTEGER_20##2arg                             167.08ms      5.99
array_concat_INTEGER_20##3arg                             261.83ms      3.82
array_concat_INTEGER_20##4arg                             319.26ms      3.13
array_concat_INTEGER_40##2arg                             301.37ms      3.32
array_concat_INTEGER_40##3arg                             422.25ms      2.37
array_concat_INTEGER_40##4arg                             714.74ms      1.40
array_concat_INTEGER_5##2arg                               60.61ms     16.50
array_concat_INTEGER_5##3arg                               89.28ms     11.20
array_concat_INTEGER_5##4arg                              117.99ms      8.48
array_concat_VARCHAR_10##2arg                             652.44ms      1.53
array_concat_VARCHAR_10##3arg                             958.59ms      1.04
array_concat_VARCHAR_10##4arg                                1.26s   790.86m
array_concat_VARCHAR_20##2arg                                1.67s   598.25m
array_concat_VARCHAR_20##3arg                                2.22s   449.48m
array_concat_VARCHAR_20##4arg                                2.82s   355.01m
array_concat_VARCHAR_40##2arg                                2.83s   353.24m
array_concat_VARCHAR_40##3arg                                4.98s   200.99m
array_concat_VARCHAR_40##4arg                                7.03s   142.22m
array_concat_VARCHAR_5##2arg                              290.04ms      3.45
array_concat_VARCHAR_5##3arg                              438.06ms      2.28
array_concat_VARCHAR_5##4arg                              584.20ms      1.71
```

Differential Revision: D50948537
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50948537

laithsakka added a commit to laithsakka/velox that referenced this pull request Nov 3, 2023
…bookincubator#7393)

Summary:

Optimize  ArrayConcatFunction for primitives, similar to what we do for
registerArrayRemoveFunctions and registerArrayTrimFunctions.


Note: we can further optimize this by adding fast path for strings and
add a no copy version for that.

Note: there are also still several functions that uses add_items() and do
not have such fast path we shall optimize those also.

Follow up will address the points above.

before:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             737.98ms      1.36
array_concat_BOOLEAN_10##3arg                                1.14s   874.37m
array_concat_BOOLEAN_10##4arg                                1.50s   666.55m
array_concat_BOOLEAN_20##2arg                                1.58s   631.44m
array_concat_BOOLEAN_20##3arg                                2.33s   428.92m
array_concat_BOOLEAN_20##4arg                                3.22s   310.71m
array_concat_BOOLEAN_40##2arg                                3.07s   325.76m
array_concat_BOOLEAN_40##3arg                                4.75s   210.37m
array_concat_BOOLEAN_40##4arg                                6.32s   158.26m
array_concat_BOOLEAN_5##2arg                              451.47ms      2.21
array_concat_BOOLEAN_5##3arg                              674.46ms      1.48
array_concat_BOOLEAN_5##4arg                              859.56ms      1.16
array_concat_INTEGER_10##2arg                             706.34ms      1.42
array_concat_INTEGER_10##3arg                                1.09s   919.50m
array_concat_INTEGER_10##4arg                                1.47s   681.77m
array_concat_INTEGER_20##2arg                                1.40s   716.06m
array_concat_INTEGER_20##3arg                                2.02s   494.92m
array_concat_INTEGER_20##4arg                                2.73s   366.24m
array_concat_INTEGER_40##2arg                                2.68s   372.98m
array_concat_INTEGER_40##3arg                                3.98s   251.52m
array_concat_INTEGER_40##4arg                                5.40s   185.08m
array_concat_INTEGER_5##2arg                              382.78ms      2.61
array_concat_INTEGER_5##3arg                              565.82ms      1.77
array_concat_INTEGER_5##4arg                              758.75ms      1.32
array_concat_VARCHAR_10##2arg                                1.24s   803.73m
array_concat_VARCHAR_10##3arg                                1.81s   552.59m
array_concat_VARCHAR_10##4arg                                2.31s   432.19m
array_concat_VARCHAR_20##2arg                                3.38s   295.55m
array_concat_VARCHAR_20##3arg                                4.53s   220.65m
array_concat_VARCHAR_20##4arg                                5.61s   178.32m
array_concat_VARCHAR_40##2arg                                5.69s   175.66m
array_concat_VARCHAR_40##3arg                                9.95s   100.53m
array_concat_VARCHAR_40##4arg                               11.99s    83.39m
array_concat_VARCHAR_5##2arg                              523.94ms      1.91
array_concat_VARCHAR_5##3arg                              797.74ms      1.25
array_concat_VARCHAR_5##4arg                                 1.05s   954.15m
```


after:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             190.24ms      5.26
array_concat_BOOLEAN_10##3arg                             253.91ms      3.94
array_concat_BOOLEAN_10##4arg                             387.03ms      2.58
array_concat_BOOLEAN_20##2arg                             484.33ms      2.06
array_concat_BOOLEAN_20##3arg                             766.60ms      1.30
array_concat_BOOLEAN_20##4arg                                1.01s   991.11m
array_concat_BOOLEAN_40##2arg                             982.59ms      1.02
array_concat_BOOLEAN_40##3arg                                1.36s   736.99m
array_concat_BOOLEAN_40##4arg                                1.74s   575.58m
array_concat_BOOLEAN_5##2arg                              139.40ms      7.17
array_concat_BOOLEAN_5##3arg                              214.43ms      4.66
array_concat_BOOLEAN_5##4arg                              273.88ms      3.65
array_concat_INTEGER_10##2arg                              80.90ms     12.36
array_concat_INTEGER_10##3arg                             110.80ms      9.03
array_concat_INTEGER_10##4arg                             149.86ms      6.67
array_concat_INTEGER_20##2arg                             167.08ms      5.99
array_concat_INTEGER_20##3arg                             261.83ms      3.82
array_concat_INTEGER_20##4arg                             319.26ms      3.13
array_concat_INTEGER_40##2arg                             301.37ms      3.32
array_concat_INTEGER_40##3arg                             422.25ms      2.37
array_concat_INTEGER_40##4arg                             714.74ms      1.40
array_concat_INTEGER_5##2arg                               60.61ms     16.50
array_concat_INTEGER_5##3arg                               89.28ms     11.20
array_concat_INTEGER_5##4arg                              117.99ms      8.48
array_concat_VARCHAR_10##2arg                             652.44ms      1.53
array_concat_VARCHAR_10##3arg                             958.59ms      1.04
array_concat_VARCHAR_10##4arg                                1.26s   790.86m
array_concat_VARCHAR_20##2arg                                1.67s   598.25m
array_concat_VARCHAR_20##3arg                                2.22s   449.48m
array_concat_VARCHAR_20##4arg                                2.82s   355.01m
array_concat_VARCHAR_40##2arg                                2.83s   353.24m
array_concat_VARCHAR_40##3arg                                4.98s   200.99m
array_concat_VARCHAR_40##4arg                                7.03s   142.22m
array_concat_VARCHAR_5##2arg                              290.04ms      3.45
array_concat_VARCHAR_5##3arg                              438.06ms      2.28
array_concat_VARCHAR_5##4arg                              584.20ms      1.71
```

Differential Revision: D50948537
laithsakka added a commit to laithsakka/velox that referenced this pull request Nov 3, 2023
…bookincubator#7393)

Summary:

Optimize  ArrayConcatFunction for primitives, similar to what we do for
registerArrayRemoveFunctions and registerArrayTrimFunctions.


Note: we can further optimize this by adding fast path for strings and
add a no copy version for that.

Note: there are also still several functions that uses add_items() and do
not have such fast path we shall optimize those also.

Follow up will address the points above.

before:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             737.98ms      1.36
array_concat_BOOLEAN_10##3arg                                1.14s   874.37m
array_concat_BOOLEAN_10##4arg                                1.50s   666.55m
array_concat_BOOLEAN_20##2arg                                1.58s   631.44m
array_concat_BOOLEAN_20##3arg                                2.33s   428.92m
array_concat_BOOLEAN_20##4arg                                3.22s   310.71m
array_concat_BOOLEAN_40##2arg                                3.07s   325.76m
array_concat_BOOLEAN_40##3arg                                4.75s   210.37m
array_concat_BOOLEAN_40##4arg                                6.32s   158.26m
array_concat_BOOLEAN_5##2arg                              451.47ms      2.21
array_concat_BOOLEAN_5##3arg                              674.46ms      1.48
array_concat_BOOLEAN_5##4arg                              859.56ms      1.16
array_concat_INTEGER_10##2arg                             706.34ms      1.42
array_concat_INTEGER_10##3arg                                1.09s   919.50m
array_concat_INTEGER_10##4arg                                1.47s   681.77m
array_concat_INTEGER_20##2arg                                1.40s   716.06m
array_concat_INTEGER_20##3arg                                2.02s   494.92m
array_concat_INTEGER_20##4arg                                2.73s   366.24m
array_concat_INTEGER_40##2arg                                2.68s   372.98m
array_concat_INTEGER_40##3arg                                3.98s   251.52m
array_concat_INTEGER_40##4arg                                5.40s   185.08m
array_concat_INTEGER_5##2arg                              382.78ms      2.61
array_concat_INTEGER_5##3arg                              565.82ms      1.77
array_concat_INTEGER_5##4arg                              758.75ms      1.32
array_concat_VARCHAR_10##2arg                                1.24s   803.73m
array_concat_VARCHAR_10##3arg                                1.81s   552.59m
array_concat_VARCHAR_10##4arg                                2.31s   432.19m
array_concat_VARCHAR_20##2arg                                3.38s   295.55m
array_concat_VARCHAR_20##3arg                                4.53s   220.65m
array_concat_VARCHAR_20##4arg                                5.61s   178.32m
array_concat_VARCHAR_40##2arg                                5.69s   175.66m
array_concat_VARCHAR_40##3arg                                9.95s   100.53m
array_concat_VARCHAR_40##4arg                               11.99s    83.39m
array_concat_VARCHAR_5##2arg                              523.94ms      1.91
array_concat_VARCHAR_5##3arg                              797.74ms      1.25
array_concat_VARCHAR_5##4arg                                 1.05s   954.15m
```


after:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             190.24ms      5.26
array_concat_BOOLEAN_10##3arg                             253.91ms      3.94
array_concat_BOOLEAN_10##4arg                             387.03ms      2.58
array_concat_BOOLEAN_20##2arg                             484.33ms      2.06
array_concat_BOOLEAN_20##3arg                             766.60ms      1.30
array_concat_BOOLEAN_20##4arg                                1.01s   991.11m
array_concat_BOOLEAN_40##2arg                             982.59ms      1.02
array_concat_BOOLEAN_40##3arg                                1.36s   736.99m
array_concat_BOOLEAN_40##4arg                                1.74s   575.58m
array_concat_BOOLEAN_5##2arg                              139.40ms      7.17
array_concat_BOOLEAN_5##3arg                              214.43ms      4.66
array_concat_BOOLEAN_5##4arg                              273.88ms      3.65
array_concat_INTEGER_10##2arg                              80.90ms     12.36
array_concat_INTEGER_10##3arg                             110.80ms      9.03
array_concat_INTEGER_10##4arg                             149.86ms      6.67
array_concat_INTEGER_20##2arg                             167.08ms      5.99
array_concat_INTEGER_20##3arg                             261.83ms      3.82
array_concat_INTEGER_20##4arg                             319.26ms      3.13
array_concat_INTEGER_40##2arg                             301.37ms      3.32
array_concat_INTEGER_40##3arg                             422.25ms      2.37
array_concat_INTEGER_40##4arg                             714.74ms      1.40
array_concat_INTEGER_5##2arg                               60.61ms     16.50
array_concat_INTEGER_5##3arg                               89.28ms     11.20
array_concat_INTEGER_5##4arg                              117.99ms      8.48
array_concat_VARCHAR_10##2arg                             652.44ms      1.53
array_concat_VARCHAR_10##3arg                             958.59ms      1.04
array_concat_VARCHAR_10##4arg                                1.26s   790.86m
array_concat_VARCHAR_20##2arg                                1.67s   598.25m
array_concat_VARCHAR_20##3arg                                2.22s   449.48m
array_concat_VARCHAR_20##4arg                                2.82s   355.01m
array_concat_VARCHAR_40##2arg                                2.83s   353.24m
array_concat_VARCHAR_40##3arg                                4.98s   200.99m
array_concat_VARCHAR_40##4arg                                7.03s   142.22m
array_concat_VARCHAR_5##2arg                              290.04ms      3.45
array_concat_VARCHAR_5##3arg                              438.06ms      2.28
array_concat_VARCHAR_5##4arg                              584.20ms      1.71
```

Differential Revision: D50948537
laithsakka added a commit to laithsakka/velox that referenced this pull request Nov 3, 2023
…bookincubator#7393)

Summary:

Optimize  ArrayConcatFunction for primitives, similar to what we do for
registerArrayRemoveFunctions and registerArrayTrimFunctions.


Note: we can further optimize this by adding fast path for strings and
add a no copy version for that.

Note: there are also still several functions that uses add_items() and do
not have such fast path we shall optimize those also.

Follow up will address the points above.

before:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             737.98ms      1.36
array_concat_BOOLEAN_10##3arg                                1.14s   874.37m
array_concat_BOOLEAN_10##4arg                                1.50s   666.55m
array_concat_BOOLEAN_20##2arg                                1.58s   631.44m
array_concat_BOOLEAN_20##3arg                                2.33s   428.92m
array_concat_BOOLEAN_20##4arg                                3.22s   310.71m
array_concat_BOOLEAN_40##2arg                                3.07s   325.76m
array_concat_BOOLEAN_40##3arg                                4.75s   210.37m
array_concat_BOOLEAN_40##4arg                                6.32s   158.26m
array_concat_BOOLEAN_5##2arg                              451.47ms      2.21
array_concat_BOOLEAN_5##3arg                              674.46ms      1.48
array_concat_BOOLEAN_5##4arg                              859.56ms      1.16
array_concat_INTEGER_10##2arg                             706.34ms      1.42
array_concat_INTEGER_10##3arg                                1.09s   919.50m
array_concat_INTEGER_10##4arg                                1.47s   681.77m
array_concat_INTEGER_20##2arg                                1.40s   716.06m
array_concat_INTEGER_20##3arg                                2.02s   494.92m
array_concat_INTEGER_20##4arg                                2.73s   366.24m
array_concat_INTEGER_40##2arg                                2.68s   372.98m
array_concat_INTEGER_40##3arg                                3.98s   251.52m
array_concat_INTEGER_40##4arg                                5.40s   185.08m
array_concat_INTEGER_5##2arg                              382.78ms      2.61
array_concat_INTEGER_5##3arg                              565.82ms      1.77
array_concat_INTEGER_5##4arg                              758.75ms      1.32
array_concat_VARCHAR_10##2arg                                1.24s   803.73m
array_concat_VARCHAR_10##3arg                                1.81s   552.59m
array_concat_VARCHAR_10##4arg                                2.31s   432.19m
array_concat_VARCHAR_20##2arg                                3.38s   295.55m
array_concat_VARCHAR_20##3arg                                4.53s   220.65m
array_concat_VARCHAR_20##4arg                                5.61s   178.32m
array_concat_VARCHAR_40##2arg                                5.69s   175.66m
array_concat_VARCHAR_40##3arg                                9.95s   100.53m
array_concat_VARCHAR_40##4arg                               11.99s    83.39m
array_concat_VARCHAR_5##2arg                              523.94ms      1.91
array_concat_VARCHAR_5##3arg                              797.74ms      1.25
array_concat_VARCHAR_5##4arg                                 1.05s   954.15m
```


after:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             190.24ms      5.26
array_concat_BOOLEAN_10##3arg                             253.91ms      3.94
array_concat_BOOLEAN_10##4arg                             387.03ms      2.58
array_concat_BOOLEAN_20##2arg                             484.33ms      2.06
array_concat_BOOLEAN_20##3arg                             766.60ms      1.30
array_concat_BOOLEAN_20##4arg                                1.01s   991.11m
array_concat_BOOLEAN_40##2arg                             982.59ms      1.02
array_concat_BOOLEAN_40##3arg                                1.36s   736.99m
array_concat_BOOLEAN_40##4arg                                1.74s   575.58m
array_concat_BOOLEAN_5##2arg                              139.40ms      7.17
array_concat_BOOLEAN_5##3arg                              214.43ms      4.66
array_concat_BOOLEAN_5##4arg                              273.88ms      3.65
array_concat_INTEGER_10##2arg                              80.90ms     12.36
array_concat_INTEGER_10##3arg                             110.80ms      9.03
array_concat_INTEGER_10##4arg                             149.86ms      6.67
array_concat_INTEGER_20##2arg                             167.08ms      5.99
array_concat_INTEGER_20##3arg                             261.83ms      3.82
array_concat_INTEGER_20##4arg                             319.26ms      3.13
array_concat_INTEGER_40##2arg                             301.37ms      3.32
array_concat_INTEGER_40##3arg                             422.25ms      2.37
array_concat_INTEGER_40##4arg                             714.74ms      1.40
array_concat_INTEGER_5##2arg                               60.61ms     16.50
array_concat_INTEGER_5##3arg                               89.28ms     11.20
array_concat_INTEGER_5##4arg                              117.99ms      8.48
array_concat_VARCHAR_10##2arg                             652.44ms      1.53
array_concat_VARCHAR_10##3arg                             958.59ms      1.04
array_concat_VARCHAR_10##4arg                                1.26s   790.86m
array_concat_VARCHAR_20##2arg                                1.67s   598.25m
array_concat_VARCHAR_20##3arg                                2.22s   449.48m
array_concat_VARCHAR_20##4arg                                2.82s   355.01m
array_concat_VARCHAR_40##2arg                                2.83s   353.24m
array_concat_VARCHAR_40##3arg                                4.98s   200.99m
array_concat_VARCHAR_40##4arg                                7.03s   142.22m
array_concat_VARCHAR_5##2arg                              290.04ms      3.45
array_concat_VARCHAR_5##3arg                              438.06ms      2.28
array_concat_VARCHAR_5##4arg                              584.20ms      1.71
```

Differential Revision: D50948537
laithsakka added a commit to laithsakka/velox that referenced this pull request Nov 3, 2023
…bookincubator#7393)

Summary:

Optimize  ArrayConcatFunction for primitives, similar to what we do for
registerArrayRemoveFunctions and registerArrayTrimFunctions.


Note: we can further optimize this by adding fast path for strings and
add a no copy version for that.

Note: there are also still several functions that uses add_items() and do
not have such fast path we shall optimize those also.

Follow up will address the points above.

before:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             737.98ms      1.36
array_concat_BOOLEAN_10##3arg                                1.14s   874.37m
array_concat_BOOLEAN_10##4arg                                1.50s   666.55m
array_concat_BOOLEAN_20##2arg                                1.58s   631.44m
array_concat_BOOLEAN_20##3arg                                2.33s   428.92m
array_concat_BOOLEAN_20##4arg                                3.22s   310.71m
array_concat_BOOLEAN_40##2arg                                3.07s   325.76m
array_concat_BOOLEAN_40##3arg                                4.75s   210.37m
array_concat_BOOLEAN_40##4arg                                6.32s   158.26m
array_concat_BOOLEAN_5##2arg                              451.47ms      2.21
array_concat_BOOLEAN_5##3arg                              674.46ms      1.48
array_concat_BOOLEAN_5##4arg                              859.56ms      1.16
array_concat_INTEGER_10##2arg                             706.34ms      1.42
array_concat_INTEGER_10##3arg                                1.09s   919.50m
array_concat_INTEGER_10##4arg                                1.47s   681.77m
array_concat_INTEGER_20##2arg                                1.40s   716.06m
array_concat_INTEGER_20##3arg                                2.02s   494.92m
array_concat_INTEGER_20##4arg                                2.73s   366.24m
array_concat_INTEGER_40##2arg                                2.68s   372.98m
array_concat_INTEGER_40##3arg                                3.98s   251.52m
array_concat_INTEGER_40##4arg                                5.40s   185.08m
array_concat_INTEGER_5##2arg                              382.78ms      2.61
array_concat_INTEGER_5##3arg                              565.82ms      1.77
array_concat_INTEGER_5##4arg                              758.75ms      1.32
array_concat_VARCHAR_10##2arg                                1.24s   803.73m
array_concat_VARCHAR_10##3arg                                1.81s   552.59m
array_concat_VARCHAR_10##4arg                                2.31s   432.19m
array_concat_VARCHAR_20##2arg                                3.38s   295.55m
array_concat_VARCHAR_20##3arg                                4.53s   220.65m
array_concat_VARCHAR_20##4arg                                5.61s   178.32m
array_concat_VARCHAR_40##2arg                                5.69s   175.66m
array_concat_VARCHAR_40##3arg                                9.95s   100.53m
array_concat_VARCHAR_40##4arg                               11.99s    83.39m
array_concat_VARCHAR_5##2arg                              523.94ms      1.91
array_concat_VARCHAR_5##3arg                              797.74ms      1.25
array_concat_VARCHAR_5##4arg                                 1.05s   954.15m
```


after:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             190.24ms      5.26
array_concat_BOOLEAN_10##3arg                             253.91ms      3.94
array_concat_BOOLEAN_10##4arg                             387.03ms      2.58
array_concat_BOOLEAN_20##2arg                             484.33ms      2.06
array_concat_BOOLEAN_20##3arg                             766.60ms      1.30
array_concat_BOOLEAN_20##4arg                                1.01s   991.11m
array_concat_BOOLEAN_40##2arg                             982.59ms      1.02
array_concat_BOOLEAN_40##3arg                                1.36s   736.99m
array_concat_BOOLEAN_40##4arg                                1.74s   575.58m
array_concat_BOOLEAN_5##2arg                              139.40ms      7.17
array_concat_BOOLEAN_5##3arg                              214.43ms      4.66
array_concat_BOOLEAN_5##4arg                              273.88ms      3.65
array_concat_INTEGER_10##2arg                              80.90ms     12.36
array_concat_INTEGER_10##3arg                             110.80ms      9.03
array_concat_INTEGER_10##4arg                             149.86ms      6.67
array_concat_INTEGER_20##2arg                             167.08ms      5.99
array_concat_INTEGER_20##3arg                             261.83ms      3.82
array_concat_INTEGER_20##4arg                             319.26ms      3.13
array_concat_INTEGER_40##2arg                             301.37ms      3.32
array_concat_INTEGER_40##3arg                             422.25ms      2.37
array_concat_INTEGER_40##4arg                             714.74ms      1.40
array_concat_INTEGER_5##2arg                               60.61ms     16.50
array_concat_INTEGER_5##3arg                               89.28ms     11.20
array_concat_INTEGER_5##4arg                              117.99ms      8.48
array_concat_VARCHAR_10##2arg                             652.44ms      1.53
array_concat_VARCHAR_10##3arg                             958.59ms      1.04
array_concat_VARCHAR_10##4arg                                1.26s   790.86m
array_concat_VARCHAR_20##2arg                                1.67s   598.25m
array_concat_VARCHAR_20##3arg                                2.22s   449.48m
array_concat_VARCHAR_20##4arg                                2.82s   355.01m
array_concat_VARCHAR_40##2arg                                2.83s   353.24m
array_concat_VARCHAR_40##3arg                                4.98s   200.99m
array_concat_VARCHAR_40##4arg                                7.03s   142.22m
array_concat_VARCHAR_5##2arg                              290.04ms      3.45
array_concat_VARCHAR_5##3arg                              438.06ms      2.28
array_concat_VARCHAR_5##4arg                              584.20ms      1.71
```

Differential Revision: D50948537
laithsakka added a commit to laithsakka/velox that referenced this pull request Nov 3, 2023
…bookincubator#7393)

Summary:

Optimize  ArrayConcatFunction for primitives, similar to what we do for
registerArrayRemoveFunctions and registerArrayTrimFunctions.


Note: we can further optimize this by adding fast path for strings and
add a no copy version for that.

Note: there are also still several functions that uses add_items() and do
not have such fast path we shall optimize those also.

Follow up will address the points above.

before:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             737.98ms      1.36
array_concat_BOOLEAN_10##3arg                                1.14s   874.37m
array_concat_BOOLEAN_10##4arg                                1.50s   666.55m
array_concat_BOOLEAN_20##2arg                                1.58s   631.44m
array_concat_BOOLEAN_20##3arg                                2.33s   428.92m
array_concat_BOOLEAN_20##4arg                                3.22s   310.71m
array_concat_BOOLEAN_40##2arg                                3.07s   325.76m
array_concat_BOOLEAN_40##3arg                                4.75s   210.37m
array_concat_BOOLEAN_40##4arg                                6.32s   158.26m
array_concat_BOOLEAN_5##2arg                              451.47ms      2.21
array_concat_BOOLEAN_5##3arg                              674.46ms      1.48
array_concat_BOOLEAN_5##4arg                              859.56ms      1.16
array_concat_INTEGER_10##2arg                             706.34ms      1.42
array_concat_INTEGER_10##3arg                                1.09s   919.50m
array_concat_INTEGER_10##4arg                                1.47s   681.77m
array_concat_INTEGER_20##2arg                                1.40s   716.06m
array_concat_INTEGER_20##3arg                                2.02s   494.92m
array_concat_INTEGER_20##4arg                                2.73s   366.24m
array_concat_INTEGER_40##2arg                                2.68s   372.98m
array_concat_INTEGER_40##3arg                                3.98s   251.52m
array_concat_INTEGER_40##4arg                                5.40s   185.08m
array_concat_INTEGER_5##2arg                              382.78ms      2.61
array_concat_INTEGER_5##3arg                              565.82ms      1.77
array_concat_INTEGER_5##4arg                              758.75ms      1.32
array_concat_VARCHAR_10##2arg                                1.24s   803.73m
array_concat_VARCHAR_10##3arg                                1.81s   552.59m
array_concat_VARCHAR_10##4arg                                2.31s   432.19m
array_concat_VARCHAR_20##2arg                                3.38s   295.55m
array_concat_VARCHAR_20##3arg                                4.53s   220.65m
array_concat_VARCHAR_20##4arg                                5.61s   178.32m
array_concat_VARCHAR_40##2arg                                5.69s   175.66m
array_concat_VARCHAR_40##3arg                                9.95s   100.53m
array_concat_VARCHAR_40##4arg                               11.99s    83.39m
array_concat_VARCHAR_5##2arg                              523.94ms      1.91
array_concat_VARCHAR_5##3arg                              797.74ms      1.25
array_concat_VARCHAR_5##4arg                                 1.05s   954.15m
```


after:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             190.24ms      5.26
array_concat_BOOLEAN_10##3arg                             253.91ms      3.94
array_concat_BOOLEAN_10##4arg                             387.03ms      2.58
array_concat_BOOLEAN_20##2arg                             484.33ms      2.06
array_concat_BOOLEAN_20##3arg                             766.60ms      1.30
array_concat_BOOLEAN_20##4arg                                1.01s   991.11m
array_concat_BOOLEAN_40##2arg                             982.59ms      1.02
array_concat_BOOLEAN_40##3arg                                1.36s   736.99m
array_concat_BOOLEAN_40##4arg                                1.74s   575.58m
array_concat_BOOLEAN_5##2arg                              139.40ms      7.17
array_concat_BOOLEAN_5##3arg                              214.43ms      4.66
array_concat_BOOLEAN_5##4arg                              273.88ms      3.65
array_concat_INTEGER_10##2arg                              80.90ms     12.36
array_concat_INTEGER_10##3arg                             110.80ms      9.03
array_concat_INTEGER_10##4arg                             149.86ms      6.67
array_concat_INTEGER_20##2arg                             167.08ms      5.99
array_concat_INTEGER_20##3arg                             261.83ms      3.82
array_concat_INTEGER_20##4arg                             319.26ms      3.13
array_concat_INTEGER_40##2arg                             301.37ms      3.32
array_concat_INTEGER_40##3arg                             422.25ms      2.37
array_concat_INTEGER_40##4arg                             714.74ms      1.40
array_concat_INTEGER_5##2arg                               60.61ms     16.50
array_concat_INTEGER_5##3arg                               89.28ms     11.20
array_concat_INTEGER_5##4arg                              117.99ms      8.48
array_concat_VARCHAR_10##2arg                             652.44ms      1.53
array_concat_VARCHAR_10##3arg                             958.59ms      1.04
array_concat_VARCHAR_10##4arg                                1.26s   790.86m
array_concat_VARCHAR_20##2arg                                1.67s   598.25m
array_concat_VARCHAR_20##3arg                                2.22s   449.48m
array_concat_VARCHAR_20##4arg                                2.82s   355.01m
array_concat_VARCHAR_40##2arg                                2.83s   353.24m
array_concat_VARCHAR_40##3arg                                4.98s   200.99m
array_concat_VARCHAR_40##4arg                                7.03s   142.22m
array_concat_VARCHAR_5##2arg                              290.04ms      3.45
array_concat_VARCHAR_5##3arg                              438.06ms      2.28
array_concat_VARCHAR_5##4arg                              584.20ms      1.71
```

Differential Revision: D50948537
laithsakka added a commit to laithsakka/velox that referenced this pull request Nov 6, 2023
…bookincubator#7393)

Summary:

Optimize  ArrayConcatFunction for primitives, similar to what we do for
registerArrayRemoveFunctions and registerArrayTrimFunctions.


Note: we can further optimize this by adding fast path for strings and
add a no copy version for that.

Note: there are also still several functions that uses add_items() and do
not have such fast path we shall optimize those also.

Follow up will address the points above.

before:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             737.98ms      1.36
array_concat_BOOLEAN_10##3arg                                1.14s   874.37m
array_concat_BOOLEAN_10##4arg                                1.50s   666.55m
array_concat_BOOLEAN_20##2arg                                1.58s   631.44m
array_concat_BOOLEAN_20##3arg                                2.33s   428.92m
array_concat_BOOLEAN_20##4arg                                3.22s   310.71m
array_concat_BOOLEAN_40##2arg                                3.07s   325.76m
array_concat_BOOLEAN_40##3arg                                4.75s   210.37m
array_concat_BOOLEAN_40##4arg                                6.32s   158.26m
array_concat_BOOLEAN_5##2arg                              451.47ms      2.21
array_concat_BOOLEAN_5##3arg                              674.46ms      1.48
array_concat_BOOLEAN_5##4arg                              859.56ms      1.16
array_concat_INTEGER_10##2arg                             706.34ms      1.42
array_concat_INTEGER_10##3arg                                1.09s   919.50m
array_concat_INTEGER_10##4arg                                1.47s   681.77m
array_concat_INTEGER_20##2arg                                1.40s   716.06m
array_concat_INTEGER_20##3arg                                2.02s   494.92m
array_concat_INTEGER_20##4arg                                2.73s   366.24m
array_concat_INTEGER_40##2arg                                2.68s   372.98m
array_concat_INTEGER_40##3arg                                3.98s   251.52m
array_concat_INTEGER_40##4arg                                5.40s   185.08m
array_concat_INTEGER_5##2arg                              382.78ms      2.61
array_concat_INTEGER_5##3arg                              565.82ms      1.77
array_concat_INTEGER_5##4arg                              758.75ms      1.32
array_concat_VARCHAR_10##2arg                                1.24s   803.73m
array_concat_VARCHAR_10##3arg                                1.81s   552.59m
array_concat_VARCHAR_10##4arg                                2.31s   432.19m
array_concat_VARCHAR_20##2arg                                3.38s   295.55m
array_concat_VARCHAR_20##3arg                                4.53s   220.65m
array_concat_VARCHAR_20##4arg                                5.61s   178.32m
array_concat_VARCHAR_40##2arg                                5.69s   175.66m
array_concat_VARCHAR_40##3arg                                9.95s   100.53m
array_concat_VARCHAR_40##4arg                               11.99s    83.39m
array_concat_VARCHAR_5##2arg                              523.94ms      1.91
array_concat_VARCHAR_5##3arg                              797.74ms      1.25
array_concat_VARCHAR_5##4arg                                 1.05s   954.15m
```


after:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             190.24ms      5.26
array_concat_BOOLEAN_10##3arg                             253.91ms      3.94
array_concat_BOOLEAN_10##4arg                             387.03ms      2.58
array_concat_BOOLEAN_20##2arg                             484.33ms      2.06
array_concat_BOOLEAN_20##3arg                             766.60ms      1.30
array_concat_BOOLEAN_20##4arg                                1.01s   991.11m
array_concat_BOOLEAN_40##2arg                             982.59ms      1.02
array_concat_BOOLEAN_40##3arg                                1.36s   736.99m
array_concat_BOOLEAN_40##4arg                                1.74s   575.58m
array_concat_BOOLEAN_5##2arg                              139.40ms      7.17
array_concat_BOOLEAN_5##3arg                              214.43ms      4.66
array_concat_BOOLEAN_5##4arg                              273.88ms      3.65
array_concat_INTEGER_10##2arg                              80.90ms     12.36
array_concat_INTEGER_10##3arg                             110.80ms      9.03
array_concat_INTEGER_10##4arg                             149.86ms      6.67
array_concat_INTEGER_20##2arg                             167.08ms      5.99
array_concat_INTEGER_20##3arg                             261.83ms      3.82
array_concat_INTEGER_20##4arg                             319.26ms      3.13
array_concat_INTEGER_40##2arg                             301.37ms      3.32
array_concat_INTEGER_40##3arg                             422.25ms      2.37
array_concat_INTEGER_40##4arg                             714.74ms      1.40
array_concat_INTEGER_5##2arg                               60.61ms     16.50
array_concat_INTEGER_5##3arg                               89.28ms     11.20
array_concat_INTEGER_5##4arg                              117.99ms      8.48
array_concat_VARCHAR_10##2arg                             652.44ms      1.53
array_concat_VARCHAR_10##3arg                             958.59ms      1.04
array_concat_VARCHAR_10##4arg                                1.26s   790.86m
array_concat_VARCHAR_20##2arg                                1.67s   598.25m
array_concat_VARCHAR_20##3arg                                2.22s   449.48m
array_concat_VARCHAR_20##4arg                                2.82s   355.01m
array_concat_VARCHAR_40##2arg                                2.83s   353.24m
array_concat_VARCHAR_40##3arg                                4.98s   200.99m
array_concat_VARCHAR_40##4arg                                7.03s   142.22m
array_concat_VARCHAR_5##2arg                              290.04ms      3.45
array_concat_VARCHAR_5##3arg                              438.06ms      2.28
array_concat_VARCHAR_5##4arg                              584.20ms      1.71
```

Reviewed By: mbasmanova

Differential Revision: D50948537
laithsakka added a commit to laithsakka/velox that referenced this pull request Nov 6, 2023
…bookincubator#7393)

Summary:

Optimize  ArrayConcatFunction for primitives, similar to what we do for
registerArrayRemoveFunctions and registerArrayTrimFunctions.


Note: we can further optimize this by adding fast path for strings and
add a no copy version for that.

Note: there are also still several functions that uses add_items() and do
not have such fast path we shall optimize those also.

Follow up will address the points above.

before:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             737.98ms      1.36
array_concat_BOOLEAN_10##3arg                                1.14s   874.37m
array_concat_BOOLEAN_10##4arg                                1.50s   666.55m
array_concat_BOOLEAN_20##2arg                                1.58s   631.44m
array_concat_BOOLEAN_20##3arg                                2.33s   428.92m
array_concat_BOOLEAN_20##4arg                                3.22s   310.71m
array_concat_BOOLEAN_40##2arg                                3.07s   325.76m
array_concat_BOOLEAN_40##3arg                                4.75s   210.37m
array_concat_BOOLEAN_40##4arg                                6.32s   158.26m
array_concat_BOOLEAN_5##2arg                              451.47ms      2.21
array_concat_BOOLEAN_5##3arg                              674.46ms      1.48
array_concat_BOOLEAN_5##4arg                              859.56ms      1.16
array_concat_INTEGER_10##2arg                             706.34ms      1.42
array_concat_INTEGER_10##3arg                                1.09s   919.50m
array_concat_INTEGER_10##4arg                                1.47s   681.77m
array_concat_INTEGER_20##2arg                                1.40s   716.06m
array_concat_INTEGER_20##3arg                                2.02s   494.92m
array_concat_INTEGER_20##4arg                                2.73s   366.24m
array_concat_INTEGER_40##2arg                                2.68s   372.98m
array_concat_INTEGER_40##3arg                                3.98s   251.52m
array_concat_INTEGER_40##4arg                                5.40s   185.08m
array_concat_INTEGER_5##2arg                              382.78ms      2.61
array_concat_INTEGER_5##3arg                              565.82ms      1.77
array_concat_INTEGER_5##4arg                              758.75ms      1.32
array_concat_VARCHAR_10##2arg                                1.24s   803.73m
array_concat_VARCHAR_10##3arg                                1.81s   552.59m
array_concat_VARCHAR_10##4arg                                2.31s   432.19m
array_concat_VARCHAR_20##2arg                                3.38s   295.55m
array_concat_VARCHAR_20##3arg                                4.53s   220.65m
array_concat_VARCHAR_20##4arg                                5.61s   178.32m
array_concat_VARCHAR_40##2arg                                5.69s   175.66m
array_concat_VARCHAR_40##3arg                                9.95s   100.53m
array_concat_VARCHAR_40##4arg                               11.99s    83.39m
array_concat_VARCHAR_5##2arg                              523.94ms      1.91
array_concat_VARCHAR_5##3arg                              797.74ms      1.25
array_concat_VARCHAR_5##4arg                                 1.05s   954.15m
```


after:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             190.24ms      5.26
array_concat_BOOLEAN_10##3arg                             253.91ms      3.94
array_concat_BOOLEAN_10##4arg                             387.03ms      2.58
array_concat_BOOLEAN_20##2arg                             484.33ms      2.06
array_concat_BOOLEAN_20##3arg                             766.60ms      1.30
array_concat_BOOLEAN_20##4arg                                1.01s   991.11m
array_concat_BOOLEAN_40##2arg                             982.59ms      1.02
array_concat_BOOLEAN_40##3arg                                1.36s   736.99m
array_concat_BOOLEAN_40##4arg                                1.74s   575.58m
array_concat_BOOLEAN_5##2arg                              139.40ms      7.17
array_concat_BOOLEAN_5##3arg                              214.43ms      4.66
array_concat_BOOLEAN_5##4arg                              273.88ms      3.65
array_concat_INTEGER_10##2arg                              80.90ms     12.36
array_concat_INTEGER_10##3arg                             110.80ms      9.03
array_concat_INTEGER_10##4arg                             149.86ms      6.67
array_concat_INTEGER_20##2arg                             167.08ms      5.99
array_concat_INTEGER_20##3arg                             261.83ms      3.82
array_concat_INTEGER_20##4arg                             319.26ms      3.13
array_concat_INTEGER_40##2arg                             301.37ms      3.32
array_concat_INTEGER_40##3arg                             422.25ms      2.37
array_concat_INTEGER_40##4arg                             714.74ms      1.40
array_concat_INTEGER_5##2arg                               60.61ms     16.50
array_concat_INTEGER_5##3arg                               89.28ms     11.20
array_concat_INTEGER_5##4arg                              117.99ms      8.48
array_concat_VARCHAR_10##2arg                             652.44ms      1.53
array_concat_VARCHAR_10##3arg                             958.59ms      1.04
array_concat_VARCHAR_10##4arg                                1.26s   790.86m
array_concat_VARCHAR_20##2arg                                1.67s   598.25m
array_concat_VARCHAR_20##3arg                                2.22s   449.48m
array_concat_VARCHAR_20##4arg                                2.82s   355.01m
array_concat_VARCHAR_40##2arg                                2.83s   353.24m
array_concat_VARCHAR_40##3arg                                4.98s   200.99m
array_concat_VARCHAR_40##4arg                                7.03s   142.22m
array_concat_VARCHAR_5##2arg                              290.04ms      3.45
array_concat_VARCHAR_5##3arg                              438.06ms      2.28
array_concat_VARCHAR_5##4arg                              584.20ms      1.71
```

Reviewed By: mbasmanova

Differential Revision: D50948537
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 808e7fd.

laithsakka added a commit to laithsakka/velox that referenced this pull request Dec 27, 2023
…mitive types.

Summary:
add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primtive fast paths. 


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             556.35ms      1.80
array_concat_BOOLEAN_10##3arg                             788.51ms      1.27
array_concat_BOOLEAN_10##4arg                                1.12s   891.97m
array_concat_BOOLEAN_20##2arg                                1.18s   847.41m
array_concat_BOOLEAN_20##3arg                                1.78s   561.03m
array_concat_BOOLEAN_20##4arg                                2.39s   418.56m
array_concat_BOOLEAN_40##2arg                                2.33s   429.55m
array_concat_BOOLEAN_40##3arg                                3.37s   296.31m
array_concat_BOOLEAN_40##4arg                                4.67s   214.33m
array_concat_BOOLEAN_5##2arg                              320.80ms      3.12
array_concat_BOOLEAN_5##3arg                              478.21ms      2.09
array_concat_BOOLEAN_5##4arg                              628.29ms      1.59
array_concat_INTEGER_10##2arg                             451.29ms      2.22
array_concat_INTEGER_10##3arg                             674.39ms      1.48
array_concat_INTEGER_10##4arg                             912.72ms      1.10
array_concat_INTEGER_20##2arg                             902.84ms      1.11
array_concat_INTEGER_20##3arg                                1.42s   704.16m
array_concat_INTEGER_20##4arg                                1.87s   533.34m
array_concat_INTEGER_40##2arg                                1.78s   562.38m
array_concat_INTEGER_40##3arg                                2.65s   377.14m
array_concat_INTEGER_40##4arg                                3.62s   276.04m
array_concat_INTEGER_5##2arg                              243.91ms      4.10
array_concat_INTEGER_5##3arg                              380.67ms      2.63
array_concat_INTEGER_5##4arg                              505.00ms      1.98
array_concat_VARCHAR_10##2arg                                1.25s   801.07m
array_concat_VARCHAR_10##3arg                                1.75s   572.05m
array_concat_VARCHAR_10##4arg                                2.25s   444.06m
array_concat_VARCHAR_20##2arg                                3.07s   325.81m
array_concat_VARCHAR_20##3arg                                3.93s   254.38m
array_concat_VARCHAR_20##4arg                                4.98s   200.87m
array_concat_VARCHAR_40##2arg                                5.04s   198.40m
array_concat_VARCHAR_40##3arg                                8.38s   119.37m
array_concat_VARCHAR_40##4arg                               10.56s    94.69m
array_concat_VARCHAR_5##2arg                              511.14ms      1.96
array_concat_VARCHAR_5##3arg                              757.66ms      1.32
array_concat_VARCHAR_5##4arg                              994.37ms      1.01
```

generic after
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             254.42ms      3.93
array_concat_BOOLEAN_10##3arg                             339.38ms      2.95
array_concat_BOOLEAN_10##4arg                             508.55ms      1.97
array_concat_BOOLEAN_20##2arg                             589.61ms      1.70
array_concat_BOOLEAN_20##3arg                             910.98ms      1.10
array_concat_BOOLEAN_20##4arg                                1.22s   819.42m
array_concat_BOOLEAN_40##2arg                                1.11s   903.37m
array_concat_BOOLEAN_40##3arg                                1.61s   622.74m
array_concat_BOOLEAN_40##4arg                                2.05s   487.57m
array_concat_BOOLEAN_5##2arg                              193.32ms      5.17
array_concat_BOOLEAN_5##3arg                              288.43ms      3.47
array_concat_BOOLEAN_5##4arg                              385.29ms      2.60
array_concat_INTEGER_10##2arg                             130.70ms      7.65
array_concat_INTEGER_10##3arg                             179.97ms      5.56
array_concat_INTEGER_10##4arg                             240.94ms      4.15
array_concat_INTEGER_20##2arg                             186.63ms      5.36
array_concat_INTEGER_20##3arg                             304.05ms      3.29
array_concat_INTEGER_20##4arg                             372.18ms      2.69
array_concat_INTEGER_40##2arg                             246.54ms      4.06
array_concat_INTEGER_40##3arg                             309.65ms      3.23
array_concat_INTEGER_40##4arg                             535.99ms      1.87
array_concat_INTEGER_5##2arg                              133.79ms      7.47
array_concat_INTEGER_5##3arg                              196.43ms      5.09
array_concat_INTEGER_5##4arg                              248.34ms      4.03
array_concat_VARCHAR_10##2arg                             394.72ms      2.53
array_concat_VARCHAR_10##3arg                             508.08ms      1.97
array_concat_VARCHAR_10##4arg                             648.42ms      1.54
array_concat_VARCHAR_20##2arg                             674.18ms      1.48
array_concat_VARCHAR_20##3arg                             857.50ms      1.17
array_concat_VARCHAR_20##4arg                                1.22s   817.42m
array_concat_VARCHAR_40##2arg                                1.07s   935.35m
array_concat_VARCHAR_40##3arg                                1.64s   608.35m
array_concat_VARCHAR_40##4arg                                2.07s   483.29m
array_concat_VARCHAR_5##2arg                              183.28ms      5.46
array_concat_VARCHAR_5##3arg                              262.13ms      3.81
array_concat_VARCHAR_5##4arg                              332.48ms      3.01

```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             230.01ms      4.35
array_concat_BOOLEAN_10##3arg                             303.88ms      3.29
array_concat_BOOLEAN_10##4arg                             463.32ms      2.16
array_concat_BOOLEAN_20##2arg                             547.77ms      1.83
array_concat_BOOLEAN_20##3arg                             832.71ms      1.20
array_concat_BOOLEAN_20##4arg                                1.10s   912.79m
array_concat_BOOLEAN_40##2arg                             991.02ms      1.01
array_concat_BOOLEAN_40##3arg                                1.48s   675.74m
array_concat_BOOLEAN_40##4arg                                1.96s   510.45m
array_concat_BOOLEAN_5##2arg                              178.92ms      5.59
array_concat_BOOLEAN_5##3arg                              265.29ms      3.77
array_concat_BOOLEAN_5##4arg                              350.31ms      2.85
array_concat_INTEGER_10##2arg                             111.54ms      8.97
array_concat_INTEGER_10##3arg                             151.91ms      6.58
array_concat_INTEGER_10##4arg                             209.28ms      4.78
array_concat_INTEGER_20##2arg                             150.28ms      6.65
array_concat_INTEGER_20##3arg                             269.52ms      3.71
array_concat_INTEGER_20##4arg                             337.27ms      2.97
array_concat_INTEGER_40##2arg                             213.27ms      4.69
array_concat_INTEGER_40##3arg                             266.57ms      3.75
array_concat_INTEGER_40##4arg                             483.33ms      2.07
array_concat_INTEGER_5##2arg                              115.68ms      8.64
array_concat_INTEGER_5##3arg                              168.24ms      5.94
array_concat_INTEGER_5##4arg                              219.13ms      4.56
array_concat_VARCHAR_10##2arg                             357.53ms      2.80
array_concat_VARCHAR_10##3arg                             459.15ms      2.18
array_concat_VARCHAR_10##4arg                             579.91ms      1.72
array_concat_VARCHAR_20##2arg                             628.27ms      1.59
array_concat_VARCHAR_20##3arg                             802.48ms      1.25
array_concat_VARCHAR_20##4arg                                1.06s   947.41m
array_concat_VARCHAR_40##2arg                             930.88ms      1.07
array_concat_VARCHAR_40##3arg                                1.46s   683.85m
array_concat_VARCHAR_40##4arg                                1.92s   520.41m
array_concat_VARCHAR_5##2arg                              161.55ms      6.19
array_concat_VARCHAR_5##3arg                              214.94ms      4.65
array_concat_VARCHAR_5##4arg                              280.15ms      3.57
```

Differential Revision: D52380460
laithsakka added a commit to laithsakka/velox that referenced this pull request Dec 29, 2023
…mitive types. (facebookincubator#8194)

Summary:

add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             247.69ms      4.04
array_concat_BOOLEAN_10##3arg                             328.93ms      3.04
array_concat_BOOLEAN_10##4arg                             471.79ms      2.12
array_concat_BOOLEAN_20##2arg                             591.39ms      1.69
array_concat_BOOLEAN_20##3arg                             889.65ms      1.12
array_concat_BOOLEAN_20##4arg                                1.13s   885.38m
array_concat_BOOLEAN_40##2arg                                1.11s   902.02m
array_concat_BOOLEAN_40##3arg                                1.63s   614.28m
array_concat_BOOLEAN_40##4arg                                2.06s   486.07m
array_concat_BOOLEAN_5##2arg                              178.77ms      5.59
array_concat_BOOLEAN_5##3arg                              262.85ms      3.80
array_concat_BOOLEAN_5##4arg                              358.08ms      2.79
array_concat_INTEGER_10##2arg                              84.65ms     11.81
array_concat_INTEGER_10##3arg                             116.97ms      8.55
array_concat_INTEGER_10##4arg                             159.98ms      6.25
array_concat_INTEGER_20##2arg                             145.19ms      6.89
array_concat_INTEGER_20##3arg                             249.84ms      4.00
array_concat_INTEGER_20##4arg                             298.28ms      3.35
array_concat_INTEGER_40##2arg                             202.66ms      4.93
array_concat_INTEGER_40##3arg                             249.71ms      4.00
array_concat_INTEGER_40##4arg                             462.83ms      2.16
array_concat_INTEGER_5##2arg                               86.36ms     11.58
array_concat_INTEGER_5##3arg                              128.82ms      7.76
array_concat_INTEGER_5##4arg                              165.59ms      6.04
array_concat_VARCHAR_10##2arg                             388.89ms      2.57
array_concat_VARCHAR_10##3arg                             495.35ms      2.02
array_concat_VARCHAR_10##4arg                             626.90ms      1.60
array_concat_VARCHAR_20##2arg                             671.03ms      1.49
array_concat_VARCHAR_20##3arg                             870.87ms      1.15
array_concat_VARCHAR_20##4arg                                1.13s   888.08m
array_concat_VARCHAR_40##2arg                                1.03s   967.24m
array_concat_VARCHAR_40##3arg                                1.63s   613.68m
array_concat_VARCHAR_40##4arg                                2.13s   469.60m
array_concat_VARCHAR_5##2arg                              158.09ms      6.33
array_concat_VARCHAR_5##3arg                              212.99ms      4.70
array_concat_VARCHAR_5##4arg                              287.64ms      3.48
```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             230.01ms      4.35
array_concat_BOOLEAN_10##3arg                             303.88ms      3.29
array_concat_BOOLEAN_10##4arg                             463.32ms      2.16
array_concat_BOOLEAN_20##2arg                             547.77ms      1.83
array_concat_BOOLEAN_20##3arg                             832.71ms      1.20
array_concat_BOOLEAN_20##4arg                                1.10s   912.79m
array_concat_BOOLEAN_40##2arg                             991.02ms      1.01
array_concat_BOOLEAN_40##3arg                                1.48s   675.74m
array_concat_BOOLEAN_40##4arg                                1.96s   510.45m
array_concat_BOOLEAN_5##2arg                              178.92ms      5.59
array_concat_BOOLEAN_5##3arg                              265.29ms      3.77
array_concat_BOOLEAN_5##4arg                              350.31ms      2.85
array_concat_INTEGER_10##2arg                             111.54ms      8.97
array_concat_INTEGER_10##3arg                             151.91ms      6.58
array_concat_INTEGER_10##4arg                             209.28ms      4.78
array_concat_INTEGER_20##2arg                             150.28ms      6.65
array_concat_INTEGER_20##3arg                             269.52ms      3.71
array_concat_INTEGER_20##4arg                             337.27ms      2.97
array_concat_INTEGER_40##2arg                             213.27ms      4.69
array_concat_INTEGER_40##3arg                             266.57ms      3.75
array_concat_INTEGER_40##4arg                             483.33ms      2.07
array_concat_INTEGER_5##2arg                              115.68ms      8.64
array_concat_INTEGER_5##3arg                              168.24ms      5.94
array_concat_INTEGER_5##4arg                              219.13ms      4.56
array_concat_VARCHAR_10##2arg                             357.53ms      2.80
array_concat_VARCHAR_10##3arg                             459.15ms      2.18
array_concat_VARCHAR_10##4arg                             579.91ms      1.72
array_concat_VARCHAR_20##2arg                             628.27ms      1.59
array_concat_VARCHAR_20##3arg                             802.48ms      1.25
array_concat_VARCHAR_20##4arg                                1.06s   947.41m
array_concat_VARCHAR_40##2arg                             930.88ms      1.07
array_concat_VARCHAR_40##3arg                                1.46s   683.85m
array_concat_VARCHAR_40##4arg                                1.92s   520.41m
array_concat_VARCHAR_5##2arg                              161.55ms      6.19
array_concat_VARCHAR_5##3arg                              214.94ms      4.65
array_concat_VARCHAR_5##4arg                              280.15ms      3.57
```

Differential Revision: D52380460
laithsakka added a commit to laithsakka/velox that referenced this pull request Dec 29, 2023
…mitive types. (facebookincubator#8194)

Summary:

add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             247.69ms      4.04
array_concat_BOOLEAN_10##3arg                             328.93ms      3.04
array_concat_BOOLEAN_10##4arg                             471.79ms      2.12
array_concat_BOOLEAN_20##2arg                             591.39ms      1.69
array_concat_BOOLEAN_20##3arg                             889.65ms      1.12
array_concat_BOOLEAN_20##4arg                                1.13s   885.38m
array_concat_BOOLEAN_40##2arg                                1.11s   902.02m
array_concat_BOOLEAN_40##3arg                                1.63s   614.28m
array_concat_BOOLEAN_40##4arg                                2.06s   486.07m
array_concat_BOOLEAN_5##2arg                              178.77ms      5.59
array_concat_BOOLEAN_5##3arg                              262.85ms      3.80
array_concat_BOOLEAN_5##4arg                              358.08ms      2.79
array_concat_INTEGER_10##2arg                              84.65ms     11.81
array_concat_INTEGER_10##3arg                             116.97ms      8.55
array_concat_INTEGER_10##4arg                             159.98ms      6.25
array_concat_INTEGER_20##2arg                             145.19ms      6.89
array_concat_INTEGER_20##3arg                             249.84ms      4.00
array_concat_INTEGER_20##4arg                             298.28ms      3.35
array_concat_INTEGER_40##2arg                             202.66ms      4.93
array_concat_INTEGER_40##3arg                             249.71ms      4.00
array_concat_INTEGER_40##4arg                             462.83ms      2.16
array_concat_INTEGER_5##2arg                               86.36ms     11.58
array_concat_INTEGER_5##3arg                              128.82ms      7.76
array_concat_INTEGER_5##4arg                              165.59ms      6.04
array_concat_VARCHAR_10##2arg                             388.89ms      2.57
array_concat_VARCHAR_10##3arg                             495.35ms      2.02
array_concat_VARCHAR_10##4arg                             626.90ms      1.60
array_concat_VARCHAR_20##2arg                             671.03ms      1.49
array_concat_VARCHAR_20##3arg                             870.87ms      1.15
array_concat_VARCHAR_20##4arg                                1.13s   888.08m
array_concat_VARCHAR_40##2arg                                1.03s   967.24m
array_concat_VARCHAR_40##3arg                                1.63s   613.68m
array_concat_VARCHAR_40##4arg                                2.13s   469.60m
array_concat_VARCHAR_5##2arg                              158.09ms      6.33
array_concat_VARCHAR_5##3arg                              212.99ms      4.70
array_concat_VARCHAR_5##4arg                              287.64ms      3.48
```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             230.01ms      4.35
array_concat_BOOLEAN_10##3arg                             303.88ms      3.29
array_concat_BOOLEAN_10##4arg                             463.32ms      2.16
array_concat_BOOLEAN_20##2arg                             547.77ms      1.83
array_concat_BOOLEAN_20##3arg                             832.71ms      1.20
array_concat_BOOLEAN_20##4arg                                1.10s   912.79m
array_concat_BOOLEAN_40##2arg                             991.02ms      1.01
array_concat_BOOLEAN_40##3arg                                1.48s   675.74m
array_concat_BOOLEAN_40##4arg                                1.96s   510.45m
array_concat_BOOLEAN_5##2arg                              178.92ms      5.59
array_concat_BOOLEAN_5##3arg                              265.29ms      3.77
array_concat_BOOLEAN_5##4arg                              350.31ms      2.85
array_concat_INTEGER_10##2arg                             111.54ms      8.97
array_concat_INTEGER_10##3arg                             151.91ms      6.58
array_concat_INTEGER_10##4arg                             209.28ms      4.78
array_concat_INTEGER_20##2arg                             150.28ms      6.65
array_concat_INTEGER_20##3arg                             269.52ms      3.71
array_concat_INTEGER_20##4arg                             337.27ms      2.97
array_concat_INTEGER_40##2arg                             213.27ms      4.69
array_concat_INTEGER_40##3arg                             266.57ms      3.75
array_concat_INTEGER_40##4arg                             483.33ms      2.07
array_concat_INTEGER_5##2arg                              115.68ms      8.64
array_concat_INTEGER_5##3arg                              168.24ms      5.94
array_concat_INTEGER_5##4arg                              219.13ms      4.56
array_concat_VARCHAR_10##2arg                             357.53ms      2.80
array_concat_VARCHAR_10##3arg                             459.15ms      2.18
array_concat_VARCHAR_10##4arg                             579.91ms      1.72
array_concat_VARCHAR_20##2arg                             628.27ms      1.59
array_concat_VARCHAR_20##3arg                             802.48ms      1.25
array_concat_VARCHAR_20##4arg                                1.06s   947.41m
array_concat_VARCHAR_40##2arg                             930.88ms      1.07
array_concat_VARCHAR_40##3arg                                1.46s   683.85m
array_concat_VARCHAR_40##4arg                                1.92s   520.41m
array_concat_VARCHAR_5##2arg                              161.55ms      6.19
array_concat_VARCHAR_5##3arg                              214.94ms      4.65
array_concat_VARCHAR_5##4arg                              280.15ms      3.57
```

Differential Revision: D52380460
laithsakka added a commit to laithsakka/velox that referenced this pull request Dec 29, 2023
…mitive types. (facebookincubator#8194)

Summary:

add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             247.69ms      4.04
array_concat_BOOLEAN_10##3arg                             328.93ms      3.04
array_concat_BOOLEAN_10##4arg                             471.79ms      2.12
array_concat_BOOLEAN_20##2arg                             591.39ms      1.69
array_concat_BOOLEAN_20##3arg                             889.65ms      1.12
array_concat_BOOLEAN_20##4arg                                1.13s   885.38m
array_concat_BOOLEAN_40##2arg                                1.11s   902.02m
array_concat_BOOLEAN_40##3arg                                1.63s   614.28m
array_concat_BOOLEAN_40##4arg                                2.06s   486.07m
array_concat_BOOLEAN_5##2arg                              178.77ms      5.59
array_concat_BOOLEAN_5##3arg                              262.85ms      3.80
array_concat_BOOLEAN_5##4arg                              358.08ms      2.79
array_concat_INTEGER_10##2arg                              84.65ms     11.81
array_concat_INTEGER_10##3arg                             116.97ms      8.55
array_concat_INTEGER_10##4arg                             159.98ms      6.25
array_concat_INTEGER_20##2arg                             145.19ms      6.89
array_concat_INTEGER_20##3arg                             249.84ms      4.00
array_concat_INTEGER_20##4arg                             298.28ms      3.35
array_concat_INTEGER_40##2arg                             202.66ms      4.93
array_concat_INTEGER_40##3arg                             249.71ms      4.00
array_concat_INTEGER_40##4arg                             462.83ms      2.16
array_concat_INTEGER_5##2arg                               86.36ms     11.58
array_concat_INTEGER_5##3arg                              128.82ms      7.76
array_concat_INTEGER_5##4arg                              165.59ms      6.04
array_concat_VARCHAR_10##2arg                             388.89ms      2.57
array_concat_VARCHAR_10##3arg                             495.35ms      2.02
array_concat_VARCHAR_10##4arg                             626.90ms      1.60
array_concat_VARCHAR_20##2arg                             671.03ms      1.49
array_concat_VARCHAR_20##3arg                             870.87ms      1.15
array_concat_VARCHAR_20##4arg                                1.13s   888.08m
array_concat_VARCHAR_40##2arg                                1.03s   967.24m
array_concat_VARCHAR_40##3arg                                1.63s   613.68m
array_concat_VARCHAR_40##4arg                                2.13s   469.60m
array_concat_VARCHAR_5##2arg                              158.09ms      6.33
array_concat_VARCHAR_5##3arg                              212.99ms      4.70
array_concat_VARCHAR_5##4arg                              287.64ms      3.48
```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             230.01ms      4.35
array_concat_BOOLEAN_10##3arg                             303.88ms      3.29
array_concat_BOOLEAN_10##4arg                             463.32ms      2.16
array_concat_BOOLEAN_20##2arg                             547.77ms      1.83
array_concat_BOOLEAN_20##3arg                             832.71ms      1.20
array_concat_BOOLEAN_20##4arg                                1.10s   912.79m
array_concat_BOOLEAN_40##2arg                             991.02ms      1.01
array_concat_BOOLEAN_40##3arg                                1.48s   675.74m
array_concat_BOOLEAN_40##4arg                                1.96s   510.45m
array_concat_BOOLEAN_5##2arg                              178.92ms      5.59
array_concat_BOOLEAN_5##3arg                              265.29ms      3.77
array_concat_BOOLEAN_5##4arg                              350.31ms      2.85
array_concat_INTEGER_10##2arg                             111.54ms      8.97
array_concat_INTEGER_10##3arg                             151.91ms      6.58
array_concat_INTEGER_10##4arg                             209.28ms      4.78
array_concat_INTEGER_20##2arg                             150.28ms      6.65
array_concat_INTEGER_20##3arg                             269.52ms      3.71
array_concat_INTEGER_20##4arg                             337.27ms      2.97
array_concat_INTEGER_40##2arg                             213.27ms      4.69
array_concat_INTEGER_40##3arg                             266.57ms      3.75
array_concat_INTEGER_40##4arg                             483.33ms      2.07
array_concat_INTEGER_5##2arg                              115.68ms      8.64
array_concat_INTEGER_5##3arg                              168.24ms      5.94
array_concat_INTEGER_5##4arg                              219.13ms      4.56
array_concat_VARCHAR_10##2arg                             357.53ms      2.80
array_concat_VARCHAR_10##3arg                             459.15ms      2.18
array_concat_VARCHAR_10##4arg                             579.91ms      1.72
array_concat_VARCHAR_20##2arg                             628.27ms      1.59
array_concat_VARCHAR_20##3arg                             802.48ms      1.25
array_concat_VARCHAR_20##4arg                                1.06s   947.41m
array_concat_VARCHAR_40##2arg                             930.88ms      1.07
array_concat_VARCHAR_40##3arg                                1.46s   683.85m
array_concat_VARCHAR_40##4arg                                1.92s   520.41m
array_concat_VARCHAR_5##2arg                              161.55ms      6.19
array_concat_VARCHAR_5##3arg                              214.94ms      4.65
array_concat_VARCHAR_5##4arg                              280.15ms      3.57
```

Differential Revision: D52380460
laithsakka added a commit to laithsakka/velox that referenced this pull request Dec 29, 2023
…mitive types. (facebookincubator#8194)

Summary:

add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             247.69ms      4.04
array_concat_BOOLEAN_10##3arg                             328.93ms      3.04
array_concat_BOOLEAN_10##4arg                             471.79ms      2.12
array_concat_BOOLEAN_20##2arg                             591.39ms      1.69
array_concat_BOOLEAN_20##3arg                             889.65ms      1.12
array_concat_BOOLEAN_20##4arg                                1.13s   885.38m
array_concat_BOOLEAN_40##2arg                                1.11s   902.02m
array_concat_BOOLEAN_40##3arg                                1.63s   614.28m
array_concat_BOOLEAN_40##4arg                                2.06s   486.07m
array_concat_BOOLEAN_5##2arg                              178.77ms      5.59
array_concat_BOOLEAN_5##3arg                              262.85ms      3.80
array_concat_BOOLEAN_5##4arg                              358.08ms      2.79
array_concat_INTEGER_10##2arg                              84.65ms     11.81
array_concat_INTEGER_10##3arg                             116.97ms      8.55
array_concat_INTEGER_10##4arg                             159.98ms      6.25
array_concat_INTEGER_20##2arg                             145.19ms      6.89
array_concat_INTEGER_20##3arg                             249.84ms      4.00
array_concat_INTEGER_20##4arg                             298.28ms      3.35
array_concat_INTEGER_40##2arg                             202.66ms      4.93
array_concat_INTEGER_40##3arg                             249.71ms      4.00
array_concat_INTEGER_40##4arg                             462.83ms      2.16
array_concat_INTEGER_5##2arg                               86.36ms     11.58
array_concat_INTEGER_5##3arg                              128.82ms      7.76
array_concat_INTEGER_5##4arg                              165.59ms      6.04
array_concat_VARCHAR_10##2arg                             388.89ms      2.57
array_concat_VARCHAR_10##3arg                             495.35ms      2.02
array_concat_VARCHAR_10##4arg                             626.90ms      1.60
array_concat_VARCHAR_20##2arg                             671.03ms      1.49
array_concat_VARCHAR_20##3arg                             870.87ms      1.15
array_concat_VARCHAR_20##4arg                                1.13s   888.08m
array_concat_VARCHAR_40##2arg                                1.03s   967.24m
array_concat_VARCHAR_40##3arg                                1.63s   613.68m
array_concat_VARCHAR_40##4arg                                2.13s   469.60m
array_concat_VARCHAR_5##2arg                              158.09ms      6.33
array_concat_VARCHAR_5##3arg                              212.99ms      4.70
array_concat_VARCHAR_5##4arg                              287.64ms      3.48
```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             230.01ms      4.35
array_concat_BOOLEAN_10##3arg                             303.88ms      3.29
array_concat_BOOLEAN_10##4arg                             463.32ms      2.16
array_concat_BOOLEAN_20##2arg                             547.77ms      1.83
array_concat_BOOLEAN_20##3arg                             832.71ms      1.20
array_concat_BOOLEAN_20##4arg                                1.10s   912.79m
array_concat_BOOLEAN_40##2arg                             991.02ms      1.01
array_concat_BOOLEAN_40##3arg                                1.48s   675.74m
array_concat_BOOLEAN_40##4arg                                1.96s   510.45m
array_concat_BOOLEAN_5##2arg                              178.92ms      5.59
array_concat_BOOLEAN_5##3arg                              265.29ms      3.77
array_concat_BOOLEAN_5##4arg                              350.31ms      2.85
array_concat_INTEGER_10##2arg                             111.54ms      8.97
array_concat_INTEGER_10##3arg                             151.91ms      6.58
array_concat_INTEGER_10##4arg                             209.28ms      4.78
array_concat_INTEGER_20##2arg                             150.28ms      6.65
array_concat_INTEGER_20##3arg                             269.52ms      3.71
array_concat_INTEGER_20##4arg                             337.27ms      2.97
array_concat_INTEGER_40##2arg                             213.27ms      4.69
array_concat_INTEGER_40##3arg                             266.57ms      3.75
array_concat_INTEGER_40##4arg                             483.33ms      2.07
array_concat_INTEGER_5##2arg                              115.68ms      8.64
array_concat_INTEGER_5##3arg                              168.24ms      5.94
array_concat_INTEGER_5##4arg                              219.13ms      4.56
array_concat_VARCHAR_10##2arg                             357.53ms      2.80
array_concat_VARCHAR_10##3arg                             459.15ms      2.18
array_concat_VARCHAR_10##4arg                             579.91ms      1.72
array_concat_VARCHAR_20##2arg                             628.27ms      1.59
array_concat_VARCHAR_20##3arg                             802.48ms      1.25
array_concat_VARCHAR_20##4arg                                1.06s   947.41m
array_concat_VARCHAR_40##2arg                             930.88ms      1.07
array_concat_VARCHAR_40##3arg                                1.46s   683.85m
array_concat_VARCHAR_40##4arg                                1.92s   520.41m
array_concat_VARCHAR_5##2arg                              161.55ms      6.19
array_concat_VARCHAR_5##3arg                              214.94ms      4.65
array_concat_VARCHAR_5##4arg                              280.15ms      3.57
```

Differential Revision: D52380460
laithsakka added a commit to laithsakka/velox that referenced this pull request Dec 29, 2023
…mitive types. (facebookincubator#8194)

Summary:

add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             247.69ms      4.04
array_concat_BOOLEAN_10##3arg                             328.93ms      3.04
array_concat_BOOLEAN_10##4arg                             471.79ms      2.12
array_concat_BOOLEAN_20##2arg                             591.39ms      1.69
array_concat_BOOLEAN_20##3arg                             889.65ms      1.12
array_concat_BOOLEAN_20##4arg                                1.13s   885.38m
array_concat_BOOLEAN_40##2arg                                1.11s   902.02m
array_concat_BOOLEAN_40##3arg                                1.63s   614.28m
array_concat_BOOLEAN_40##4arg                                2.06s   486.07m
array_concat_BOOLEAN_5##2arg                              178.77ms      5.59
array_concat_BOOLEAN_5##3arg                              262.85ms      3.80
array_concat_BOOLEAN_5##4arg                              358.08ms      2.79
array_concat_INTEGER_10##2arg                              84.65ms     11.81
array_concat_INTEGER_10##3arg                             116.97ms      8.55
array_concat_INTEGER_10##4arg                             159.98ms      6.25
array_concat_INTEGER_20##2arg                             145.19ms      6.89
array_concat_INTEGER_20##3arg                             249.84ms      4.00
array_concat_INTEGER_20##4arg                             298.28ms      3.35
array_concat_INTEGER_40##2arg                             202.66ms      4.93
array_concat_INTEGER_40##3arg                             249.71ms      4.00
array_concat_INTEGER_40##4arg                             462.83ms      2.16
array_concat_INTEGER_5##2arg                               86.36ms     11.58
array_concat_INTEGER_5##3arg                              128.82ms      7.76
array_concat_INTEGER_5##4arg                              165.59ms      6.04
array_concat_VARCHAR_10##2arg                             388.89ms      2.57
array_concat_VARCHAR_10##3arg                             495.35ms      2.02
array_concat_VARCHAR_10##4arg                             626.90ms      1.60
array_concat_VARCHAR_20##2arg                             671.03ms      1.49
array_concat_VARCHAR_20##3arg                             870.87ms      1.15
array_concat_VARCHAR_20##4arg                                1.13s   888.08m
array_concat_VARCHAR_40##2arg                                1.03s   967.24m
array_concat_VARCHAR_40##3arg                                1.63s   613.68m
array_concat_VARCHAR_40##4arg                                2.13s   469.60m
array_concat_VARCHAR_5##2arg                              158.09ms      6.33
array_concat_VARCHAR_5##3arg                              212.99ms      4.70
array_concat_VARCHAR_5##4arg                              287.64ms      3.48
```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             230.01ms      4.35
array_concat_BOOLEAN_10##3arg                             303.88ms      3.29
array_concat_BOOLEAN_10##4arg                             463.32ms      2.16
array_concat_BOOLEAN_20##2arg                             547.77ms      1.83
array_concat_BOOLEAN_20##3arg                             832.71ms      1.20
array_concat_BOOLEAN_20##4arg                                1.10s   912.79m
array_concat_BOOLEAN_40##2arg                             991.02ms      1.01
array_concat_BOOLEAN_40##3arg                                1.48s   675.74m
array_concat_BOOLEAN_40##4arg                                1.96s   510.45m
array_concat_BOOLEAN_5##2arg                              178.92ms      5.59
array_concat_BOOLEAN_5##3arg                              265.29ms      3.77
array_concat_BOOLEAN_5##4arg                              350.31ms      2.85
array_concat_INTEGER_10##2arg                             111.54ms      8.97
array_concat_INTEGER_10##3arg                             151.91ms      6.58
array_concat_INTEGER_10##4arg                             209.28ms      4.78
array_concat_INTEGER_20##2arg                             150.28ms      6.65
array_concat_INTEGER_20##3arg                             269.52ms      3.71
array_concat_INTEGER_20##4arg                             337.27ms      2.97
array_concat_INTEGER_40##2arg                             213.27ms      4.69
array_concat_INTEGER_40##3arg                             266.57ms      3.75
array_concat_INTEGER_40##4arg                             483.33ms      2.07
array_concat_INTEGER_5##2arg                              115.68ms      8.64
array_concat_INTEGER_5##3arg                              168.24ms      5.94
array_concat_INTEGER_5##4arg                              219.13ms      4.56
array_concat_VARCHAR_10##2arg                             357.53ms      2.80
array_concat_VARCHAR_10##3arg                             459.15ms      2.18
array_concat_VARCHAR_10##4arg                             579.91ms      1.72
array_concat_VARCHAR_20##2arg                             628.27ms      1.59
array_concat_VARCHAR_20##3arg                             802.48ms      1.25
array_concat_VARCHAR_20##4arg                                1.06s   947.41m
array_concat_VARCHAR_40##2arg                             930.88ms      1.07
array_concat_VARCHAR_40##3arg                                1.46s   683.85m
array_concat_VARCHAR_40##4arg                                1.92s   520.41m
array_concat_VARCHAR_5##2arg                              161.55ms      6.19
array_concat_VARCHAR_5##3arg                              214.94ms      4.65
array_concat_VARCHAR_5##4arg                              280.15ms      3.57
```

Differential Revision: D52380460
laithsakka added a commit to laithsakka/velox that referenced this pull request Jan 2, 2024
…mitive types. (facebookincubator#8194)

Summary:

add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             247.69ms      4.04
array_concat_BOOLEAN_10##3arg                             328.93ms      3.04
array_concat_BOOLEAN_10##4arg                             471.79ms      2.12
array_concat_BOOLEAN_20##2arg                             591.39ms      1.69
array_concat_BOOLEAN_20##3arg                             889.65ms      1.12
array_concat_BOOLEAN_20##4arg                                1.13s   885.38m
array_concat_BOOLEAN_40##2arg                                1.11s   902.02m
array_concat_BOOLEAN_40##3arg                                1.63s   614.28m
array_concat_BOOLEAN_40##4arg                                2.06s   486.07m
array_concat_BOOLEAN_5##2arg                              178.77ms      5.59
array_concat_BOOLEAN_5##3arg                              262.85ms      3.80
array_concat_BOOLEAN_5##4arg                              358.08ms      2.79
array_concat_INTEGER_10##2arg                              84.65ms     11.81
array_concat_INTEGER_10##3arg                             116.97ms      8.55
array_concat_INTEGER_10##4arg                             159.98ms      6.25
array_concat_INTEGER_20##2arg                             145.19ms      6.89
array_concat_INTEGER_20##3arg                             249.84ms      4.00
array_concat_INTEGER_20##4arg                             298.28ms      3.35
array_concat_INTEGER_40##2arg                             202.66ms      4.93
array_concat_INTEGER_40##3arg                             249.71ms      4.00
array_concat_INTEGER_40##4arg                             462.83ms      2.16
array_concat_INTEGER_5##2arg                               86.36ms     11.58
array_concat_INTEGER_5##3arg                              128.82ms      7.76
array_concat_INTEGER_5##4arg                              165.59ms      6.04
array_concat_VARCHAR_10##2arg                             388.89ms      2.57
array_concat_VARCHAR_10##3arg                             495.35ms      2.02
array_concat_VARCHAR_10##4arg                             626.90ms      1.60
array_concat_VARCHAR_20##2arg                             671.03ms      1.49
array_concat_VARCHAR_20##3arg                             870.87ms      1.15
array_concat_VARCHAR_20##4arg                                1.13s   888.08m
array_concat_VARCHAR_40##2arg                                1.03s   967.24m
array_concat_VARCHAR_40##3arg                                1.63s   613.68m
array_concat_VARCHAR_40##4arg                                2.13s   469.60m
array_concat_VARCHAR_5##2arg                              158.09ms      6.33
array_concat_VARCHAR_5##3arg                              212.99ms      4.70
array_concat_VARCHAR_5##4arg                              287.64ms      3.48
```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             230.01ms      4.35
array_concat_BOOLEAN_10##3arg                             303.88ms      3.29
array_concat_BOOLEAN_10##4arg                             463.32ms      2.16
array_concat_BOOLEAN_20##2arg                             547.77ms      1.83
array_concat_BOOLEAN_20##3arg                             832.71ms      1.20
array_concat_BOOLEAN_20##4arg                                1.10s   912.79m
array_concat_BOOLEAN_40##2arg                             991.02ms      1.01
array_concat_BOOLEAN_40##3arg                                1.48s   675.74m
array_concat_BOOLEAN_40##4arg                                1.96s   510.45m
array_concat_BOOLEAN_5##2arg                              178.92ms      5.59
array_concat_BOOLEAN_5##3arg                              265.29ms      3.77
array_concat_BOOLEAN_5##4arg                              350.31ms      2.85
array_concat_INTEGER_10##2arg                             111.54ms      8.97
array_concat_INTEGER_10##3arg                             151.91ms      6.58
array_concat_INTEGER_10##4arg                             209.28ms      4.78
array_concat_INTEGER_20##2arg                             150.28ms      6.65
array_concat_INTEGER_20##3arg                             269.52ms      3.71
array_concat_INTEGER_20##4arg                             337.27ms      2.97
array_concat_INTEGER_40##2arg                             213.27ms      4.69
array_concat_INTEGER_40##3arg                             266.57ms      3.75
array_concat_INTEGER_40##4arg                             483.33ms      2.07
array_concat_INTEGER_5##2arg                              115.68ms      8.64
array_concat_INTEGER_5##3arg                              168.24ms      5.94
array_concat_INTEGER_5##4arg                              219.13ms      4.56
array_concat_VARCHAR_10##2arg                             357.53ms      2.80
array_concat_VARCHAR_10##3arg                             459.15ms      2.18
array_concat_VARCHAR_10##4arg                             579.91ms      1.72
array_concat_VARCHAR_20##2arg                             628.27ms      1.59
array_concat_VARCHAR_20##3arg                             802.48ms      1.25
array_concat_VARCHAR_20##4arg                                1.06s   947.41m
array_concat_VARCHAR_40##2arg                             930.88ms      1.07
array_concat_VARCHAR_40##3arg                                1.46s   683.85m
array_concat_VARCHAR_40##4arg                                1.92s   520.41m
array_concat_VARCHAR_5##2arg                              161.55ms      6.19
array_concat_VARCHAR_5##3arg                              214.94ms      4.65
array_concat_VARCHAR_5##4arg                              280.15ms      3.57
```

Differential Revision: D52380460
laithsakka added a commit to laithsakka/velox that referenced this pull request Jan 2, 2024
…mitive types. (facebookincubator#8194)

Summary:

add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             247.69ms      4.04
array_concat_BOOLEAN_10##3arg                             328.93ms      3.04
array_concat_BOOLEAN_10##4arg                             471.79ms      2.12
array_concat_BOOLEAN_20##2arg                             591.39ms      1.69
array_concat_BOOLEAN_20##3arg                             889.65ms      1.12
array_concat_BOOLEAN_20##4arg                                1.13s   885.38m
array_concat_BOOLEAN_40##2arg                                1.11s   902.02m
array_concat_BOOLEAN_40##3arg                                1.63s   614.28m
array_concat_BOOLEAN_40##4arg                                2.06s   486.07m
array_concat_BOOLEAN_5##2arg                              178.77ms      5.59
array_concat_BOOLEAN_5##3arg                              262.85ms      3.80
array_concat_BOOLEAN_5##4arg                              358.08ms      2.79
array_concat_INTEGER_10##2arg                              84.65ms     11.81
array_concat_INTEGER_10##3arg                             116.97ms      8.55
array_concat_INTEGER_10##4arg                             159.98ms      6.25
array_concat_INTEGER_20##2arg                             145.19ms      6.89
array_concat_INTEGER_20##3arg                             249.84ms      4.00
array_concat_INTEGER_20##4arg                             298.28ms      3.35
array_concat_INTEGER_40##2arg                             202.66ms      4.93
array_concat_INTEGER_40##3arg                             249.71ms      4.00
array_concat_INTEGER_40##4arg                             462.83ms      2.16
array_concat_INTEGER_5##2arg                               86.36ms     11.58
array_concat_INTEGER_5##3arg                              128.82ms      7.76
array_concat_INTEGER_5##4arg                              165.59ms      6.04
array_concat_VARCHAR_10##2arg                             388.89ms      2.57
array_concat_VARCHAR_10##3arg                             495.35ms      2.02
array_concat_VARCHAR_10##4arg                             626.90ms      1.60
array_concat_VARCHAR_20##2arg                             671.03ms      1.49
array_concat_VARCHAR_20##3arg                             870.87ms      1.15
array_concat_VARCHAR_20##4arg                                1.13s   888.08m
array_concat_VARCHAR_40##2arg                                1.03s   967.24m
array_concat_VARCHAR_40##3arg                                1.63s   613.68m
array_concat_VARCHAR_40##4arg                                2.13s   469.60m
array_concat_VARCHAR_5##2arg                              158.09ms      6.33
array_concat_VARCHAR_5##3arg                              212.99ms      4.70
array_concat_VARCHAR_5##4arg                              287.64ms      3.48
```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             230.01ms      4.35
array_concat_BOOLEAN_10##3arg                             303.88ms      3.29
array_concat_BOOLEAN_10##4arg                             463.32ms      2.16
array_concat_BOOLEAN_20##2arg                             547.77ms      1.83
array_concat_BOOLEAN_20##3arg                             832.71ms      1.20
array_concat_BOOLEAN_20##4arg                                1.10s   912.79m
array_concat_BOOLEAN_40##2arg                             991.02ms      1.01
array_concat_BOOLEAN_40##3arg                                1.48s   675.74m
array_concat_BOOLEAN_40##4arg                                1.96s   510.45m
array_concat_BOOLEAN_5##2arg                              178.92ms      5.59
array_concat_BOOLEAN_5##3arg                              265.29ms      3.77
array_concat_BOOLEAN_5##4arg                              350.31ms      2.85
array_concat_INTEGER_10##2arg                             111.54ms      8.97
array_concat_INTEGER_10##3arg                             151.91ms      6.58
array_concat_INTEGER_10##4arg                             209.28ms      4.78
array_concat_INTEGER_20##2arg                             150.28ms      6.65
array_concat_INTEGER_20##3arg                             269.52ms      3.71
array_concat_INTEGER_20##4arg                             337.27ms      2.97
array_concat_INTEGER_40##2arg                             213.27ms      4.69
array_concat_INTEGER_40##3arg                             266.57ms      3.75
array_concat_INTEGER_40##4arg                             483.33ms      2.07
array_concat_INTEGER_5##2arg                              115.68ms      8.64
array_concat_INTEGER_5##3arg                              168.24ms      5.94
array_concat_INTEGER_5##4arg                              219.13ms      4.56
array_concat_VARCHAR_10##2arg                             357.53ms      2.80
array_concat_VARCHAR_10##3arg                             459.15ms      2.18
array_concat_VARCHAR_10##4arg                             579.91ms      1.72
array_concat_VARCHAR_20##2arg                             628.27ms      1.59
array_concat_VARCHAR_20##3arg                             802.48ms      1.25
array_concat_VARCHAR_20##4arg                                1.06s   947.41m
array_concat_VARCHAR_40##2arg                             930.88ms      1.07
array_concat_VARCHAR_40##3arg                                1.46s   683.85m
array_concat_VARCHAR_40##4arg                                1.92s   520.41m
array_concat_VARCHAR_5##2arg                              161.55ms      6.19
array_concat_VARCHAR_5##3arg                              214.94ms      4.65
array_concat_VARCHAR_5##4arg                              280.15ms      3.57
```

Differential Revision: D52380460
laithsakka added a commit to laithsakka/velox that referenced this pull request Jan 2, 2024
…mitive types. (facebookincubator#8194)

Summary:

add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             247.69ms      4.04
array_concat_BOOLEAN_10##3arg                             328.93ms      3.04
array_concat_BOOLEAN_10##4arg                             471.79ms      2.12
array_concat_BOOLEAN_20##2arg                             591.39ms      1.69
array_concat_BOOLEAN_20##3arg                             889.65ms      1.12
array_concat_BOOLEAN_20##4arg                                1.13s   885.38m
array_concat_BOOLEAN_40##2arg                                1.11s   902.02m
array_concat_BOOLEAN_40##3arg                                1.63s   614.28m
array_concat_BOOLEAN_40##4arg                                2.06s   486.07m
array_concat_BOOLEAN_5##2arg                              178.77ms      5.59
array_concat_BOOLEAN_5##3arg                              262.85ms      3.80
array_concat_BOOLEAN_5##4arg                              358.08ms      2.79
array_concat_INTEGER_10##2arg                              84.65ms     11.81
array_concat_INTEGER_10##3arg                             116.97ms      8.55
array_concat_INTEGER_10##4arg                             159.98ms      6.25
array_concat_INTEGER_20##2arg                             145.19ms      6.89
array_concat_INTEGER_20##3arg                             249.84ms      4.00
array_concat_INTEGER_20##4arg                             298.28ms      3.35
array_concat_INTEGER_40##2arg                             202.66ms      4.93
array_concat_INTEGER_40##3arg                             249.71ms      4.00
array_concat_INTEGER_40##4arg                             462.83ms      2.16
array_concat_INTEGER_5##2arg                               86.36ms     11.58
array_concat_INTEGER_5##3arg                              128.82ms      7.76
array_concat_INTEGER_5##4arg                              165.59ms      6.04
array_concat_VARCHAR_10##2arg                             388.89ms      2.57
array_concat_VARCHAR_10##3arg                             495.35ms      2.02
array_concat_VARCHAR_10##4arg                             626.90ms      1.60
array_concat_VARCHAR_20##2arg                             671.03ms      1.49
array_concat_VARCHAR_20##3arg                             870.87ms      1.15
array_concat_VARCHAR_20##4arg                                1.13s   888.08m
array_concat_VARCHAR_40##2arg                                1.03s   967.24m
array_concat_VARCHAR_40##3arg                                1.63s   613.68m
array_concat_VARCHAR_40##4arg                                2.13s   469.60m
array_concat_VARCHAR_5##2arg                              158.09ms      6.33
array_concat_VARCHAR_5##3arg                              212.99ms      4.70
array_concat_VARCHAR_5##4arg                              287.64ms      3.48
```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             230.01ms      4.35
array_concat_BOOLEAN_10##3arg                             303.88ms      3.29
array_concat_BOOLEAN_10##4arg                             463.32ms      2.16
array_concat_BOOLEAN_20##2arg                             547.77ms      1.83
array_concat_BOOLEAN_20##3arg                             832.71ms      1.20
array_concat_BOOLEAN_20##4arg                                1.10s   912.79m
array_concat_BOOLEAN_40##2arg                             991.02ms      1.01
array_concat_BOOLEAN_40##3arg                                1.48s   675.74m
array_concat_BOOLEAN_40##4arg                                1.96s   510.45m
array_concat_BOOLEAN_5##2arg                              178.92ms      5.59
array_concat_BOOLEAN_5##3arg                              265.29ms      3.77
array_concat_BOOLEAN_5##4arg                              350.31ms      2.85
array_concat_INTEGER_10##2arg                             111.54ms      8.97
array_concat_INTEGER_10##3arg                             151.91ms      6.58
array_concat_INTEGER_10##4arg                             209.28ms      4.78
array_concat_INTEGER_20##2arg                             150.28ms      6.65
array_concat_INTEGER_20##3arg                             269.52ms      3.71
array_concat_INTEGER_20##4arg                             337.27ms      2.97
array_concat_INTEGER_40##2arg                             213.27ms      4.69
array_concat_INTEGER_40##3arg                             266.57ms      3.75
array_concat_INTEGER_40##4arg                             483.33ms      2.07
array_concat_INTEGER_5##2arg                              115.68ms      8.64
array_concat_INTEGER_5##3arg                              168.24ms      5.94
array_concat_INTEGER_5##4arg                              219.13ms      4.56
array_concat_VARCHAR_10##2arg                             357.53ms      2.80
array_concat_VARCHAR_10##3arg                             459.15ms      2.18
array_concat_VARCHAR_10##4arg                             579.91ms      1.72
array_concat_VARCHAR_20##2arg                             628.27ms      1.59
array_concat_VARCHAR_20##3arg                             802.48ms      1.25
array_concat_VARCHAR_20##4arg                                1.06s   947.41m
array_concat_VARCHAR_40##2arg                             930.88ms      1.07
array_concat_VARCHAR_40##3arg                                1.46s   683.85m
array_concat_VARCHAR_40##4arg                                1.92s   520.41m
array_concat_VARCHAR_5##2arg                              161.55ms      6.19
array_concat_VARCHAR_5##3arg                              214.94ms      4.65
array_concat_VARCHAR_5##4arg                              280.15ms      3.57
```

Differential Revision: D52380460
laithsakka added a commit to laithsakka/velox that referenced this pull request Jan 2, 2024
…mitive types. (facebookincubator#8194)

Summary:

add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
BUILD SUCCEEDED
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             195.54ms      5.11
array_concat_BOOLEAN_10##3arg                             265.57ms      3.77
array_concat_BOOLEAN_10##4arg                             397.59ms      2.52
array_concat_BOOLEAN_20##2arg                             487.38ms      2.05
array_concat_BOOLEAN_20##3arg                             758.45ms      1.32
array_concat_BOOLEAN_20##4arg                                1.07s   930.67m
array_concat_BOOLEAN_40##2arg                             914.62ms      1.09
array_concat_BOOLEAN_40##3arg                                1.36s   737.16m
array_concat_BOOLEAN_40##4arg                                1.72s   580.03m
array_concat_BOOLEAN_5##2arg                              149.76ms      6.68
array_concat_BOOLEAN_5##3arg                              234.81ms      4.26
array_concat_BOOLEAN_5##4arg                              300.58ms      3.33
array_concat_INTEGER_10##2arg                              70.89ms     14.11
array_concat_INTEGER_10##3arg                              95.07ms     10.52
array_concat_INTEGER_10##4arg                             124.94ms      8.00
array_concat_INTEGER_20##2arg                             102.19ms      9.79
array_concat_INTEGER_20##3arg                             155.30ms      6.44
array_concat_INTEGER_20##4arg                             187.59ms      5.33
array_concat_INTEGER_40##2arg                             122.93ms      8.13
array_concat_INTEGER_40##3arg                             153.85ms      6.50
array_concat_INTEGER_40##4arg                             322.33ms      3.10
array_concat_INTEGER_5##2arg                               70.71ms     14.14
array_concat_INTEGER_5##3arg                              100.96ms      9.90
array_concat_INTEGER_5##4arg                              124.78ms      8.01
array_concat_VARCHAR_10##2arg                             239.86ms      4.17
array_concat_VARCHAR_10##3arg                             313.51ms      3.19
array_concat_VARCHAR_10##4arg                             418.63ms      2.39
array_concat_VARCHAR_20##2arg                             492.72ms      2.03
array_concat_VARCHAR_20##3arg                             645.26ms      1.55
array_concat_VARCHAR_20##4arg                             872.10ms      1.15
array_concat_VARCHAR_40##2arg                             737.43ms      1.36
array_concat_VARCHAR_40##3arg                                1.19s   843.70m
array_concat_VARCHAR_40##4arg                                1.52s   658.16m
array_concat_VARCHAR_5##2arg                              111.10ms      9.00
array_concat_VARCHAR_5##3arg                              148.33ms      6.74
array_concat_VARCHAR_5##4arg                              193.35ms      5.17

```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             178.21ms      5.61
array_concat_BOOLEAN_10##3arg                             233.11ms      4.29
array_concat_BOOLEAN_10##4arg                             363.77ms      2.75
array_concat_BOOLEAN_20##2arg                             456.42ms      2.19
array_concat_BOOLEAN_20##3arg                             712.48ms      1.40
array_concat_BOOLEAN_20##4arg                             927.58ms      1.08
array_concat_BOOLEAN_40##2arg                             873.87ms      1.14
array_concat_BOOLEAN_40##3arg                                1.35s   742.65m
array_concat_BOOLEAN_40##4arg                                1.66s   602.28m
array_concat_BOOLEAN_5##2arg                              141.29ms      7.08
array_concat_BOOLEAN_5##3arg                              224.04ms      4.46
array_concat_BOOLEAN_5##4arg                              290.93ms      3.44
array_concat_INTEGER_10##2arg                              58.67ms     17.05
array_concat_INTEGER_10##3arg                              80.23ms     12.46
array_concat_INTEGER_10##4arg                             107.38ms      9.31
array_concat_INTEGER_20##2arg                              90.53ms     11.05
array_concat_INTEGER_20##3arg                             146.84ms      6.81
array_concat_INTEGER_20##4arg                             174.97ms      5.72
array_concat_INTEGER_40##2arg                             113.06ms      8.85
array_concat_INTEGER_40##3arg                             144.51ms      6.92
array_concat_INTEGER_40##4arg                             317.69ms      3.15
array_concat_INTEGER_5##2arg                               60.72ms     16.47
array_concat_INTEGER_5##3arg                               86.76ms     11.53
array_concat_INTEGER_5##4arg                              104.10ms      9.61
array_concat_VARCHAR_10##2arg                             226.63ms      4.41
array_concat_VARCHAR_10##3arg                             304.74ms      3.28
array_concat_VARCHAR_10##4arg                             393.14ms      2.54
array_concat_VARCHAR_20##2arg                             467.90ms      2.14
array_concat_VARCHAR_20##3arg                             624.86ms      1.60
array_concat_VARCHAR_20##4arg                             833.13ms      1.20
array_concat_VARCHAR_40##2arg                             703.85ms      1.42
array_concat_VARCHAR_40##3arg                                1.20s   834.57m
array_concat_VARCHAR_40##4arg                                1.58s   634.88m
array_concat_VARCHAR_5##2arg                              104.95ms      9.53
array_concat_VARCHAR_5##3arg                              138.85ms      7.20
array_concat_VARCHAR_5##4arg                              178.57ms      5.60
```

Differential Revision: D52380460
laithsakka added a commit to laithsakka/velox that referenced this pull request Jan 2, 2024
…mitive types. (facebookincubator#8194)

Summary:

add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
BUILD SUCCEEDED
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             195.54ms      5.11
array_concat_BOOLEAN_10##3arg                             265.57ms      3.77
array_concat_BOOLEAN_10##4arg                             397.59ms      2.52
array_concat_BOOLEAN_20##2arg                             487.38ms      2.05
array_concat_BOOLEAN_20##3arg                             758.45ms      1.32
array_concat_BOOLEAN_20##4arg                                1.07s   930.67m
array_concat_BOOLEAN_40##2arg                             914.62ms      1.09
array_concat_BOOLEAN_40##3arg                                1.36s   737.16m
array_concat_BOOLEAN_40##4arg                                1.72s   580.03m
array_concat_BOOLEAN_5##2arg                              149.76ms      6.68
array_concat_BOOLEAN_5##3arg                              234.81ms      4.26
array_concat_BOOLEAN_5##4arg                              300.58ms      3.33
array_concat_INTEGER_10##2arg                              70.89ms     14.11
array_concat_INTEGER_10##3arg                              95.07ms     10.52
array_concat_INTEGER_10##4arg                             124.94ms      8.00
array_concat_INTEGER_20##2arg                             102.19ms      9.79
array_concat_INTEGER_20##3arg                             155.30ms      6.44
array_concat_INTEGER_20##4arg                             187.59ms      5.33
array_concat_INTEGER_40##2arg                             122.93ms      8.13
array_concat_INTEGER_40##3arg                             153.85ms      6.50
array_concat_INTEGER_40##4arg                             322.33ms      3.10
array_concat_INTEGER_5##2arg                               70.71ms     14.14
array_concat_INTEGER_5##3arg                              100.96ms      9.90
array_concat_INTEGER_5##4arg                              124.78ms      8.01
array_concat_VARCHAR_10##2arg                             239.86ms      4.17
array_concat_VARCHAR_10##3arg                             313.51ms      3.19
array_concat_VARCHAR_10##4arg                             418.63ms      2.39
array_concat_VARCHAR_20##2arg                             492.72ms      2.03
array_concat_VARCHAR_20##3arg                             645.26ms      1.55
array_concat_VARCHAR_20##4arg                             872.10ms      1.15
array_concat_VARCHAR_40##2arg                             737.43ms      1.36
array_concat_VARCHAR_40##3arg                                1.19s   843.70m
array_concat_VARCHAR_40##4arg                                1.52s   658.16m
array_concat_VARCHAR_5##2arg                              111.10ms      9.00
array_concat_VARCHAR_5##3arg                              148.33ms      6.74
array_concat_VARCHAR_5##4arg                              193.35ms      5.17

```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             178.21ms      5.61
array_concat_BOOLEAN_10##3arg                             233.11ms      4.29
array_concat_BOOLEAN_10##4arg                             363.77ms      2.75
array_concat_BOOLEAN_20##2arg                             456.42ms      2.19
array_concat_BOOLEAN_20##3arg                             712.48ms      1.40
array_concat_BOOLEAN_20##4arg                             927.58ms      1.08
array_concat_BOOLEAN_40##2arg                             873.87ms      1.14
array_concat_BOOLEAN_40##3arg                                1.35s   742.65m
array_concat_BOOLEAN_40##4arg                                1.66s   602.28m
array_concat_BOOLEAN_5##2arg                              141.29ms      7.08
array_concat_BOOLEAN_5##3arg                              224.04ms      4.46
array_concat_BOOLEAN_5##4arg                              290.93ms      3.44
array_concat_INTEGER_10##2arg                              58.67ms     17.05
array_concat_INTEGER_10##3arg                              80.23ms     12.46
array_concat_INTEGER_10##4arg                             107.38ms      9.31
array_concat_INTEGER_20##2arg                              90.53ms     11.05
array_concat_INTEGER_20##3arg                             146.84ms      6.81
array_concat_INTEGER_20##4arg                             174.97ms      5.72
array_concat_INTEGER_40##2arg                             113.06ms      8.85
array_concat_INTEGER_40##3arg                             144.51ms      6.92
array_concat_INTEGER_40##4arg                             317.69ms      3.15
array_concat_INTEGER_5##2arg                               60.72ms     16.47
array_concat_INTEGER_5##3arg                               86.76ms     11.53
array_concat_INTEGER_5##4arg                              104.10ms      9.61
array_concat_VARCHAR_10##2arg                             226.63ms      4.41
array_concat_VARCHAR_10##3arg                             304.74ms      3.28
array_concat_VARCHAR_10##4arg                             393.14ms      2.54
array_concat_VARCHAR_20##2arg                             467.90ms      2.14
array_concat_VARCHAR_20##3arg                             624.86ms      1.60
array_concat_VARCHAR_20##4arg                             833.13ms      1.20
array_concat_VARCHAR_40##2arg                             703.85ms      1.42
array_concat_VARCHAR_40##3arg                                1.20s   834.57m
array_concat_VARCHAR_40##4arg                                1.58s   634.88m
array_concat_VARCHAR_5##2arg                              104.95ms      9.53
array_concat_VARCHAR_5##3arg                              138.85ms      7.20
array_concat_VARCHAR_5##4arg                              178.57ms      5.60
```

Differential Revision: D52380460
laithsakka added a commit to laithsakka/velox that referenced this pull request Jan 3, 2024
…mitive types. (facebookincubator#8194)

Summary:

add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
BUILD SUCCEEDED
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             195.54ms      5.11
array_concat_BOOLEAN_10##3arg                             265.57ms      3.77
array_concat_BOOLEAN_10##4arg                             397.59ms      2.52
array_concat_BOOLEAN_20##2arg                             487.38ms      2.05
array_concat_BOOLEAN_20##3arg                             758.45ms      1.32
array_concat_BOOLEAN_20##4arg                                1.07s   930.67m
array_concat_BOOLEAN_40##2arg                             914.62ms      1.09
array_concat_BOOLEAN_40##3arg                                1.36s   737.16m
array_concat_BOOLEAN_40##4arg                                1.72s   580.03m
array_concat_BOOLEAN_5##2arg                              149.76ms      6.68
array_concat_BOOLEAN_5##3arg                              234.81ms      4.26
array_concat_BOOLEAN_5##4arg                              300.58ms      3.33
array_concat_INTEGER_10##2arg                              70.89ms     14.11
array_concat_INTEGER_10##3arg                              95.07ms     10.52
array_concat_INTEGER_10##4arg                             124.94ms      8.00
array_concat_INTEGER_20##2arg                             102.19ms      9.79
array_concat_INTEGER_20##3arg                             155.30ms      6.44
array_concat_INTEGER_20##4arg                             187.59ms      5.33
array_concat_INTEGER_40##2arg                             122.93ms      8.13
array_concat_INTEGER_40##3arg                             153.85ms      6.50
array_concat_INTEGER_40##4arg                             322.33ms      3.10
array_concat_INTEGER_5##2arg                               70.71ms     14.14
array_concat_INTEGER_5##3arg                              100.96ms      9.90
array_concat_INTEGER_5##4arg                              124.78ms      8.01
array_concat_VARCHAR_10##2arg                             239.86ms      4.17
array_concat_VARCHAR_10##3arg                             313.51ms      3.19
array_concat_VARCHAR_10##4arg                             418.63ms      2.39
array_concat_VARCHAR_20##2arg                             492.72ms      2.03
array_concat_VARCHAR_20##3arg                             645.26ms      1.55
array_concat_VARCHAR_20##4arg                             872.10ms      1.15
array_concat_VARCHAR_40##2arg                             737.43ms      1.36
array_concat_VARCHAR_40##3arg                                1.19s   843.70m
array_concat_VARCHAR_40##4arg                                1.52s   658.16m
array_concat_VARCHAR_5##2arg                              111.10ms      9.00
array_concat_VARCHAR_5##3arg                              148.33ms      6.74
array_concat_VARCHAR_5##4arg                              193.35ms      5.17

```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             178.21ms      5.61
array_concat_BOOLEAN_10##3arg                             233.11ms      4.29
array_concat_BOOLEAN_10##4arg                             363.77ms      2.75
array_concat_BOOLEAN_20##2arg                             456.42ms      2.19
array_concat_BOOLEAN_20##3arg                             712.48ms      1.40
array_concat_BOOLEAN_20##4arg                             927.58ms      1.08
array_concat_BOOLEAN_40##2arg                             873.87ms      1.14
array_concat_BOOLEAN_40##3arg                                1.35s   742.65m
array_concat_BOOLEAN_40##4arg                                1.66s   602.28m
array_concat_BOOLEAN_5##2arg                              141.29ms      7.08
array_concat_BOOLEAN_5##3arg                              224.04ms      4.46
array_concat_BOOLEAN_5##4arg                              290.93ms      3.44
array_concat_INTEGER_10##2arg                              58.67ms     17.05
array_concat_INTEGER_10##3arg                              80.23ms     12.46
array_concat_INTEGER_10##4arg                             107.38ms      9.31
array_concat_INTEGER_20##2arg                              90.53ms     11.05
array_concat_INTEGER_20##3arg                             146.84ms      6.81
array_concat_INTEGER_20##4arg                             174.97ms      5.72
array_concat_INTEGER_40##2arg                             113.06ms      8.85
array_concat_INTEGER_40##3arg                             144.51ms      6.92
array_concat_INTEGER_40##4arg                             317.69ms      3.15
array_concat_INTEGER_5##2arg                               60.72ms     16.47
array_concat_INTEGER_5##3arg                               86.76ms     11.53
array_concat_INTEGER_5##4arg                              104.10ms      9.61
array_concat_VARCHAR_10##2arg                             226.63ms      4.41
array_concat_VARCHAR_10##3arg                             304.74ms      3.28
array_concat_VARCHAR_10##4arg                             393.14ms      2.54
array_concat_VARCHAR_20##2arg                             467.90ms      2.14
array_concat_VARCHAR_20##3arg                             624.86ms      1.60
array_concat_VARCHAR_20##4arg                             833.13ms      1.20
array_concat_VARCHAR_40##2arg                             703.85ms      1.42
array_concat_VARCHAR_40##3arg                                1.20s   834.57m
array_concat_VARCHAR_40##4arg                                1.58s   634.88m
array_concat_VARCHAR_5##2arg                              104.95ms      9.53
array_concat_VARCHAR_5##3arg                              138.85ms      7.20
array_concat_VARCHAR_5##4arg                              178.57ms      5.60
```

Differential Revision: D52380460
laithsakka added a commit to laithsakka/velox that referenced this pull request Jan 3, 2024
…mitive types. (facebookincubator#8194)

Summary:

add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
BUILD SUCCEEDED
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             195.54ms      5.11
array_concat_BOOLEAN_10##3arg                             265.57ms      3.77
array_concat_BOOLEAN_10##4arg                             397.59ms      2.52
array_concat_BOOLEAN_20##2arg                             487.38ms      2.05
array_concat_BOOLEAN_20##3arg                             758.45ms      1.32
array_concat_BOOLEAN_20##4arg                                1.07s   930.67m
array_concat_BOOLEAN_40##2arg                             914.62ms      1.09
array_concat_BOOLEAN_40##3arg                                1.36s   737.16m
array_concat_BOOLEAN_40##4arg                                1.72s   580.03m
array_concat_BOOLEAN_5##2arg                              149.76ms      6.68
array_concat_BOOLEAN_5##3arg                              234.81ms      4.26
array_concat_BOOLEAN_5##4arg                              300.58ms      3.33
array_concat_INTEGER_10##2arg                              70.89ms     14.11
array_concat_INTEGER_10##3arg                              95.07ms     10.52
array_concat_INTEGER_10##4arg                             124.94ms      8.00
array_concat_INTEGER_20##2arg                             102.19ms      9.79
array_concat_INTEGER_20##3arg                             155.30ms      6.44
array_concat_INTEGER_20##4arg                             187.59ms      5.33
array_concat_INTEGER_40##2arg                             122.93ms      8.13
array_concat_INTEGER_40##3arg                             153.85ms      6.50
array_concat_INTEGER_40##4arg                             322.33ms      3.10
array_concat_INTEGER_5##2arg                               70.71ms     14.14
array_concat_INTEGER_5##3arg                              100.96ms      9.90
array_concat_INTEGER_5##4arg                              124.78ms      8.01
array_concat_VARCHAR_10##2arg                             239.86ms      4.17
array_concat_VARCHAR_10##3arg                             313.51ms      3.19
array_concat_VARCHAR_10##4arg                             418.63ms      2.39
array_concat_VARCHAR_20##2arg                             492.72ms      2.03
array_concat_VARCHAR_20##3arg                             645.26ms      1.55
array_concat_VARCHAR_20##4arg                             872.10ms      1.15
array_concat_VARCHAR_40##2arg                             737.43ms      1.36
array_concat_VARCHAR_40##3arg                                1.19s   843.70m
array_concat_VARCHAR_40##4arg                                1.52s   658.16m
array_concat_VARCHAR_5##2arg                              111.10ms      9.00
array_concat_VARCHAR_5##3arg                              148.33ms      6.74
array_concat_VARCHAR_5##4arg                              193.35ms      5.17

```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             178.21ms      5.61
array_concat_BOOLEAN_10##3arg                             233.11ms      4.29
array_concat_BOOLEAN_10##4arg                             363.77ms      2.75
array_concat_BOOLEAN_20##2arg                             456.42ms      2.19
array_concat_BOOLEAN_20##3arg                             712.48ms      1.40
array_concat_BOOLEAN_20##4arg                             927.58ms      1.08
array_concat_BOOLEAN_40##2arg                             873.87ms      1.14
array_concat_BOOLEAN_40##3arg                                1.35s   742.65m
array_concat_BOOLEAN_40##4arg                                1.66s   602.28m
array_concat_BOOLEAN_5##2arg                              141.29ms      7.08
array_concat_BOOLEAN_5##3arg                              224.04ms      4.46
array_concat_BOOLEAN_5##4arg                              290.93ms      3.44
array_concat_INTEGER_10##2arg                              58.67ms     17.05
array_concat_INTEGER_10##3arg                              80.23ms     12.46
array_concat_INTEGER_10##4arg                             107.38ms      9.31
array_concat_INTEGER_20##2arg                              90.53ms     11.05
array_concat_INTEGER_20##3arg                             146.84ms      6.81
array_concat_INTEGER_20##4arg                             174.97ms      5.72
array_concat_INTEGER_40##2arg                             113.06ms      8.85
array_concat_INTEGER_40##3arg                             144.51ms      6.92
array_concat_INTEGER_40##4arg                             317.69ms      3.15
array_concat_INTEGER_5##2arg                               60.72ms     16.47
array_concat_INTEGER_5##3arg                               86.76ms     11.53
array_concat_INTEGER_5##4arg                              104.10ms      9.61
array_concat_VARCHAR_10##2arg                             226.63ms      4.41
array_concat_VARCHAR_10##3arg                             304.74ms      3.28
array_concat_VARCHAR_10##4arg                             393.14ms      2.54
array_concat_VARCHAR_20##2arg                             467.90ms      2.14
array_concat_VARCHAR_20##3arg                             624.86ms      1.60
array_concat_VARCHAR_20##4arg                             833.13ms      1.20
array_concat_VARCHAR_40##2arg                             703.85ms      1.42
array_concat_VARCHAR_40##3arg                                1.20s   834.57m
array_concat_VARCHAR_40##4arg                                1.58s   634.88m
array_concat_VARCHAR_5##2arg                              104.95ms      9.53
array_concat_VARCHAR_5##3arg                              138.85ms      7.20
array_concat_VARCHAR_5##4arg                              178.57ms      5.60
```

Differential Revision: D52380460
laithsakka added a commit to laithsakka/velox that referenced this pull request Jan 3, 2024
…mitive types. (facebookincubator#8194)

Summary:

add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
BUILD SUCCEEDED
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             195.54ms      5.11
array_concat_BOOLEAN_10##3arg                             265.57ms      3.77
array_concat_BOOLEAN_10##4arg                             397.59ms      2.52
array_concat_BOOLEAN_20##2arg                             487.38ms      2.05
array_concat_BOOLEAN_20##3arg                             758.45ms      1.32
array_concat_BOOLEAN_20##4arg                                1.07s   930.67m
array_concat_BOOLEAN_40##2arg                             914.62ms      1.09
array_concat_BOOLEAN_40##3arg                                1.36s   737.16m
array_concat_BOOLEAN_40##4arg                                1.72s   580.03m
array_concat_BOOLEAN_5##2arg                              149.76ms      6.68
array_concat_BOOLEAN_5##3arg                              234.81ms      4.26
array_concat_BOOLEAN_5##4arg                              300.58ms      3.33
array_concat_INTEGER_10##2arg                              70.89ms     14.11
array_concat_INTEGER_10##3arg                              95.07ms     10.52
array_concat_INTEGER_10##4arg                             124.94ms      8.00
array_concat_INTEGER_20##2arg                             102.19ms      9.79
array_concat_INTEGER_20##3arg                             155.30ms      6.44
array_concat_INTEGER_20##4arg                             187.59ms      5.33
array_concat_INTEGER_40##2arg                             122.93ms      8.13
array_concat_INTEGER_40##3arg                             153.85ms      6.50
array_concat_INTEGER_40##4arg                             322.33ms      3.10
array_concat_INTEGER_5##2arg                               70.71ms     14.14
array_concat_INTEGER_5##3arg                              100.96ms      9.90
array_concat_INTEGER_5##4arg                              124.78ms      8.01
array_concat_VARCHAR_10##2arg                             239.86ms      4.17
array_concat_VARCHAR_10##3arg                             313.51ms      3.19
array_concat_VARCHAR_10##4arg                             418.63ms      2.39
array_concat_VARCHAR_20##2arg                             492.72ms      2.03
array_concat_VARCHAR_20##3arg                             645.26ms      1.55
array_concat_VARCHAR_20##4arg                             872.10ms      1.15
array_concat_VARCHAR_40##2arg                             737.43ms      1.36
array_concat_VARCHAR_40##3arg                                1.19s   843.70m
array_concat_VARCHAR_40##4arg                                1.52s   658.16m
array_concat_VARCHAR_5##2arg                              111.10ms      9.00
array_concat_VARCHAR_5##3arg                              148.33ms      6.74
array_concat_VARCHAR_5##4arg                              193.35ms      5.17

```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             178.21ms      5.61
array_concat_BOOLEAN_10##3arg                             233.11ms      4.29
array_concat_BOOLEAN_10##4arg                             363.77ms      2.75
array_concat_BOOLEAN_20##2arg                             456.42ms      2.19
array_concat_BOOLEAN_20##3arg                             712.48ms      1.40
array_concat_BOOLEAN_20##4arg                             927.58ms      1.08
array_concat_BOOLEAN_40##2arg                             873.87ms      1.14
array_concat_BOOLEAN_40##3arg                                1.35s   742.65m
array_concat_BOOLEAN_40##4arg                                1.66s   602.28m
array_concat_BOOLEAN_5##2arg                              141.29ms      7.08
array_concat_BOOLEAN_5##3arg                              224.04ms      4.46
array_concat_BOOLEAN_5##4arg                              290.93ms      3.44
array_concat_INTEGER_10##2arg                              58.67ms     17.05
array_concat_INTEGER_10##3arg                              80.23ms     12.46
array_concat_INTEGER_10##4arg                             107.38ms      9.31
array_concat_INTEGER_20##2arg                              90.53ms     11.05
array_concat_INTEGER_20##3arg                             146.84ms      6.81
array_concat_INTEGER_20##4arg                             174.97ms      5.72
array_concat_INTEGER_40##2arg                             113.06ms      8.85
array_concat_INTEGER_40##3arg                             144.51ms      6.92
array_concat_INTEGER_40##4arg                             317.69ms      3.15
array_concat_INTEGER_5##2arg                               60.72ms     16.47
array_concat_INTEGER_5##3arg                               86.76ms     11.53
array_concat_INTEGER_5##4arg                              104.10ms      9.61
array_concat_VARCHAR_10##2arg                             226.63ms      4.41
array_concat_VARCHAR_10##3arg                             304.74ms      3.28
array_concat_VARCHAR_10##4arg                             393.14ms      2.54
array_concat_VARCHAR_20##2arg                             467.90ms      2.14
array_concat_VARCHAR_20##3arg                             624.86ms      1.60
array_concat_VARCHAR_20##4arg                             833.13ms      1.20
array_concat_VARCHAR_40##2arg                             703.85ms      1.42
array_concat_VARCHAR_40##3arg                                1.20s   834.57m
array_concat_VARCHAR_40##4arg                                1.58s   634.88m
array_concat_VARCHAR_5##2arg                              104.95ms      9.53
array_concat_VARCHAR_5##3arg                              138.85ms      7.20
array_concat_VARCHAR_5##4arg                              178.57ms      5.60
```

Differential Revision: D52380460
laithsakka added a commit to laithsakka/velox that referenced this pull request Jan 3, 2024
…mitive types. (facebookincubator#8194)

Summary:

add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
BUILD SUCCEEDED
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             195.54ms      5.11
array_concat_BOOLEAN_10##3arg                             265.57ms      3.77
array_concat_BOOLEAN_10##4arg                             397.59ms      2.52
array_concat_BOOLEAN_20##2arg                             487.38ms      2.05
array_concat_BOOLEAN_20##3arg                             758.45ms      1.32
array_concat_BOOLEAN_20##4arg                                1.07s   930.67m
array_concat_BOOLEAN_40##2arg                             914.62ms      1.09
array_concat_BOOLEAN_40##3arg                                1.36s   737.16m
array_concat_BOOLEAN_40##4arg                                1.72s   580.03m
array_concat_BOOLEAN_5##2arg                              149.76ms      6.68
array_concat_BOOLEAN_5##3arg                              234.81ms      4.26
array_concat_BOOLEAN_5##4arg                              300.58ms      3.33
array_concat_INTEGER_10##2arg                              70.89ms     14.11
array_concat_INTEGER_10##3arg                              95.07ms     10.52
array_concat_INTEGER_10##4arg                             124.94ms      8.00
array_concat_INTEGER_20##2arg                             102.19ms      9.79
array_concat_INTEGER_20##3arg                             155.30ms      6.44
array_concat_INTEGER_20##4arg                             187.59ms      5.33
array_concat_INTEGER_40##2arg                             122.93ms      8.13
array_concat_INTEGER_40##3arg                             153.85ms      6.50
array_concat_INTEGER_40##4arg                             322.33ms      3.10
array_concat_INTEGER_5##2arg                               70.71ms     14.14
array_concat_INTEGER_5##3arg                              100.96ms      9.90
array_concat_INTEGER_5##4arg                              124.78ms      8.01
array_concat_VARCHAR_10##2arg                             239.86ms      4.17
array_concat_VARCHAR_10##3arg                             313.51ms      3.19
array_concat_VARCHAR_10##4arg                             418.63ms      2.39
array_concat_VARCHAR_20##2arg                             492.72ms      2.03
array_concat_VARCHAR_20##3arg                             645.26ms      1.55
array_concat_VARCHAR_20##4arg                             872.10ms      1.15
array_concat_VARCHAR_40##2arg                             737.43ms      1.36
array_concat_VARCHAR_40##3arg                                1.19s   843.70m
array_concat_VARCHAR_40##4arg                                1.52s   658.16m
array_concat_VARCHAR_5##2arg                              111.10ms      9.00
array_concat_VARCHAR_5##3arg                              148.33ms      6.74
array_concat_VARCHAR_5##4arg                              193.35ms      5.17

```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             178.21ms      5.61
array_concat_BOOLEAN_10##3arg                             233.11ms      4.29
array_concat_BOOLEAN_10##4arg                             363.77ms      2.75
array_concat_BOOLEAN_20##2arg                             456.42ms      2.19
array_concat_BOOLEAN_20##3arg                             712.48ms      1.40
array_concat_BOOLEAN_20##4arg                             927.58ms      1.08
array_concat_BOOLEAN_40##2arg                             873.87ms      1.14
array_concat_BOOLEAN_40##3arg                                1.35s   742.65m
array_concat_BOOLEAN_40##4arg                                1.66s   602.28m
array_concat_BOOLEAN_5##2arg                              141.29ms      7.08
array_concat_BOOLEAN_5##3arg                              224.04ms      4.46
array_concat_BOOLEAN_5##4arg                              290.93ms      3.44
array_concat_INTEGER_10##2arg                              58.67ms     17.05
array_concat_INTEGER_10##3arg                              80.23ms     12.46
array_concat_INTEGER_10##4arg                             107.38ms      9.31
array_concat_INTEGER_20##2arg                              90.53ms     11.05
array_concat_INTEGER_20##3arg                             146.84ms      6.81
array_concat_INTEGER_20##4arg                             174.97ms      5.72
array_concat_INTEGER_40##2arg                             113.06ms      8.85
array_concat_INTEGER_40##3arg                             144.51ms      6.92
array_concat_INTEGER_40##4arg                             317.69ms      3.15
array_concat_INTEGER_5##2arg                               60.72ms     16.47
array_concat_INTEGER_5##3arg                               86.76ms     11.53
array_concat_INTEGER_5##4arg                              104.10ms      9.61
array_concat_VARCHAR_10##2arg                             226.63ms      4.41
array_concat_VARCHAR_10##3arg                             304.74ms      3.28
array_concat_VARCHAR_10##4arg                             393.14ms      2.54
array_concat_VARCHAR_20##2arg                             467.90ms      2.14
array_concat_VARCHAR_20##3arg                             624.86ms      1.60
array_concat_VARCHAR_20##4arg                             833.13ms      1.20
array_concat_VARCHAR_40##2arg                             703.85ms      1.42
array_concat_VARCHAR_40##3arg                                1.20s   834.57m
array_concat_VARCHAR_40##4arg                                1.58s   634.88m
array_concat_VARCHAR_5##2arg                              104.95ms      9.53
array_concat_VARCHAR_5##3arg                              138.85ms      7.20
array_concat_VARCHAR_5##4arg                              178.57ms      5.60
```

Differential Revision: D52380460
laithsakka added a commit to laithsakka/velox that referenced this pull request Jan 3, 2024
…mitive types. (facebookincubator#8194)

Summary:

add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
BUILD SUCCEEDED
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             195.54ms      5.11
array_concat_BOOLEAN_10##3arg                             265.57ms      3.77
array_concat_BOOLEAN_10##4arg                             397.59ms      2.52
array_concat_BOOLEAN_20##2arg                             487.38ms      2.05
array_concat_BOOLEAN_20##3arg                             758.45ms      1.32
array_concat_BOOLEAN_20##4arg                                1.07s   930.67m
array_concat_BOOLEAN_40##2arg                             914.62ms      1.09
array_concat_BOOLEAN_40##3arg                                1.36s   737.16m
array_concat_BOOLEAN_40##4arg                                1.72s   580.03m
array_concat_BOOLEAN_5##2arg                              149.76ms      6.68
array_concat_BOOLEAN_5##3arg                              234.81ms      4.26
array_concat_BOOLEAN_5##4arg                              300.58ms      3.33
array_concat_INTEGER_10##2arg                              70.89ms     14.11
array_concat_INTEGER_10##3arg                              95.07ms     10.52
array_concat_INTEGER_10##4arg                             124.94ms      8.00
array_concat_INTEGER_20##2arg                             102.19ms      9.79
array_concat_INTEGER_20##3arg                             155.30ms      6.44
array_concat_INTEGER_20##4arg                             187.59ms      5.33
array_concat_INTEGER_40##2arg                             122.93ms      8.13
array_concat_INTEGER_40##3arg                             153.85ms      6.50
array_concat_INTEGER_40##4arg                             322.33ms      3.10
array_concat_INTEGER_5##2arg                               70.71ms     14.14
array_concat_INTEGER_5##3arg                              100.96ms      9.90
array_concat_INTEGER_5##4arg                              124.78ms      8.01
array_concat_VARCHAR_10##2arg                             239.86ms      4.17
array_concat_VARCHAR_10##3arg                             313.51ms      3.19
array_concat_VARCHAR_10##4arg                             418.63ms      2.39
array_concat_VARCHAR_20##2arg                             492.72ms      2.03
array_concat_VARCHAR_20##3arg                             645.26ms      1.55
array_concat_VARCHAR_20##4arg                             872.10ms      1.15
array_concat_VARCHAR_40##2arg                             737.43ms      1.36
array_concat_VARCHAR_40##3arg                                1.19s   843.70m
array_concat_VARCHAR_40##4arg                                1.52s   658.16m
array_concat_VARCHAR_5##2arg                              111.10ms      9.00
array_concat_VARCHAR_5##3arg                              148.33ms      6.74
array_concat_VARCHAR_5##4arg                              193.35ms      5.17

```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             178.21ms      5.61
array_concat_BOOLEAN_10##3arg                             233.11ms      4.29
array_concat_BOOLEAN_10##4arg                             363.77ms      2.75
array_concat_BOOLEAN_20##2arg                             456.42ms      2.19
array_concat_BOOLEAN_20##3arg                             712.48ms      1.40
array_concat_BOOLEAN_20##4arg                             927.58ms      1.08
array_concat_BOOLEAN_40##2arg                             873.87ms      1.14
array_concat_BOOLEAN_40##3arg                                1.35s   742.65m
array_concat_BOOLEAN_40##4arg                                1.66s   602.28m
array_concat_BOOLEAN_5##2arg                              141.29ms      7.08
array_concat_BOOLEAN_5##3arg                              224.04ms      4.46
array_concat_BOOLEAN_5##4arg                              290.93ms      3.44
array_concat_INTEGER_10##2arg                              58.67ms     17.05
array_concat_INTEGER_10##3arg                              80.23ms     12.46
array_concat_INTEGER_10##4arg                             107.38ms      9.31
array_concat_INTEGER_20##2arg                              90.53ms     11.05
array_concat_INTEGER_20##3arg                             146.84ms      6.81
array_concat_INTEGER_20##4arg                             174.97ms      5.72
array_concat_INTEGER_40##2arg                             113.06ms      8.85
array_concat_INTEGER_40##3arg                             144.51ms      6.92
array_concat_INTEGER_40##4arg                             317.69ms      3.15
array_concat_INTEGER_5##2arg                               60.72ms     16.47
array_concat_INTEGER_5##3arg                               86.76ms     11.53
array_concat_INTEGER_5##4arg                              104.10ms      9.61
array_concat_VARCHAR_10##2arg                             226.63ms      4.41
array_concat_VARCHAR_10##3arg                             304.74ms      3.28
array_concat_VARCHAR_10##4arg                             393.14ms      2.54
array_concat_VARCHAR_20##2arg                             467.90ms      2.14
array_concat_VARCHAR_20##3arg                             624.86ms      1.60
array_concat_VARCHAR_20##4arg                             833.13ms      1.20
array_concat_VARCHAR_40##2arg                             703.85ms      1.42
array_concat_VARCHAR_40##3arg                                1.20s   834.57m
array_concat_VARCHAR_40##4arg                                1.58s   634.88m
array_concat_VARCHAR_5##2arg                              104.95ms      9.53
array_concat_VARCHAR_5##3arg                              138.85ms      7.20
array_concat_VARCHAR_5##4arg                              178.57ms      5.60
```

Differential Revision: D52380460
laithsakka added a commit to laithsakka/velox that referenced this pull request Jan 3, 2024
…mitive types. (facebookincubator#8194)

Summary:

add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
BUILD SUCCEEDED
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             195.54ms      5.11
array_concat_BOOLEAN_10##3arg                             265.57ms      3.77
array_concat_BOOLEAN_10##4arg                             397.59ms      2.52
array_concat_BOOLEAN_20##2arg                             487.38ms      2.05
array_concat_BOOLEAN_20##3arg                             758.45ms      1.32
array_concat_BOOLEAN_20##4arg                                1.07s   930.67m
array_concat_BOOLEAN_40##2arg                             914.62ms      1.09
array_concat_BOOLEAN_40##3arg                                1.36s   737.16m
array_concat_BOOLEAN_40##4arg                                1.72s   580.03m
array_concat_BOOLEAN_5##2arg                              149.76ms      6.68
array_concat_BOOLEAN_5##3arg                              234.81ms      4.26
array_concat_BOOLEAN_5##4arg                              300.58ms      3.33
array_concat_INTEGER_10##2arg                              70.89ms     14.11
array_concat_INTEGER_10##3arg                              95.07ms     10.52
array_concat_INTEGER_10##4arg                             124.94ms      8.00
array_concat_INTEGER_20##2arg                             102.19ms      9.79
array_concat_INTEGER_20##3arg                             155.30ms      6.44
array_concat_INTEGER_20##4arg                             187.59ms      5.33
array_concat_INTEGER_40##2arg                             122.93ms      8.13
array_concat_INTEGER_40##3arg                             153.85ms      6.50
array_concat_INTEGER_40##4arg                             322.33ms      3.10
array_concat_INTEGER_5##2arg                               70.71ms     14.14
array_concat_INTEGER_5##3arg                              100.96ms      9.90
array_concat_INTEGER_5##4arg                              124.78ms      8.01
array_concat_VARCHAR_10##2arg                             239.86ms      4.17
array_concat_VARCHAR_10##3arg                             313.51ms      3.19
array_concat_VARCHAR_10##4arg                             418.63ms      2.39
array_concat_VARCHAR_20##2arg                             492.72ms      2.03
array_concat_VARCHAR_20##3arg                             645.26ms      1.55
array_concat_VARCHAR_20##4arg                             872.10ms      1.15
array_concat_VARCHAR_40##2arg                             737.43ms      1.36
array_concat_VARCHAR_40##3arg                                1.19s   843.70m
array_concat_VARCHAR_40##4arg                                1.52s   658.16m
array_concat_VARCHAR_5##2arg                              111.10ms      9.00
array_concat_VARCHAR_5##3arg                              148.33ms      6.74
array_concat_VARCHAR_5##4arg                              193.35ms      5.17

```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             178.21ms      5.61
array_concat_BOOLEAN_10##3arg                             233.11ms      4.29
array_concat_BOOLEAN_10##4arg                             363.77ms      2.75
array_concat_BOOLEAN_20##2arg                             456.42ms      2.19
array_concat_BOOLEAN_20##3arg                             712.48ms      1.40
array_concat_BOOLEAN_20##4arg                             927.58ms      1.08
array_concat_BOOLEAN_40##2arg                             873.87ms      1.14
array_concat_BOOLEAN_40##3arg                                1.35s   742.65m
array_concat_BOOLEAN_40##4arg                                1.66s   602.28m
array_concat_BOOLEAN_5##2arg                              141.29ms      7.08
array_concat_BOOLEAN_5##3arg                              224.04ms      4.46
array_concat_BOOLEAN_5##4arg                              290.93ms      3.44
array_concat_INTEGER_10##2arg                              58.67ms     17.05
array_concat_INTEGER_10##3arg                              80.23ms     12.46
array_concat_INTEGER_10##4arg                             107.38ms      9.31
array_concat_INTEGER_20##2arg                              90.53ms     11.05
array_concat_INTEGER_20##3arg                             146.84ms      6.81
array_concat_INTEGER_20##4arg                             174.97ms      5.72
array_concat_INTEGER_40##2arg                             113.06ms      8.85
array_concat_INTEGER_40##3arg                             144.51ms      6.92
array_concat_INTEGER_40##4arg                             317.69ms      3.15
array_concat_INTEGER_5##2arg                               60.72ms     16.47
array_concat_INTEGER_5##3arg                               86.76ms     11.53
array_concat_INTEGER_5##4arg                              104.10ms      9.61
array_concat_VARCHAR_10##2arg                             226.63ms      4.41
array_concat_VARCHAR_10##3arg                             304.74ms      3.28
array_concat_VARCHAR_10##4arg                             393.14ms      2.54
array_concat_VARCHAR_20##2arg                             467.90ms      2.14
array_concat_VARCHAR_20##3arg                             624.86ms      1.60
array_concat_VARCHAR_20##4arg                             833.13ms      1.20
array_concat_VARCHAR_40##2arg                             703.85ms      1.42
array_concat_VARCHAR_40##3arg                                1.20s   834.57m
array_concat_VARCHAR_40##4arg                                1.58s   634.88m
array_concat_VARCHAR_5##2arg                              104.95ms      9.53
array_concat_VARCHAR_5##3arg                              138.85ms      7.20
array_concat_VARCHAR_5##4arg                              178.57ms      5.60
```

Differential Revision: D52380460
laithsakka added a commit to laithsakka/velox that referenced this pull request Jan 3, 2024
…mitive types. (facebookincubator#8194)

Summary:

add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
BUILD SUCCEEDED
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             195.54ms      5.11
array_concat_BOOLEAN_10##3arg                             265.57ms      3.77
array_concat_BOOLEAN_10##4arg                             397.59ms      2.52
array_concat_BOOLEAN_20##2arg                             487.38ms      2.05
array_concat_BOOLEAN_20##3arg                             758.45ms      1.32
array_concat_BOOLEAN_20##4arg                                1.07s   930.67m
array_concat_BOOLEAN_40##2arg                             914.62ms      1.09
array_concat_BOOLEAN_40##3arg                                1.36s   737.16m
array_concat_BOOLEAN_40##4arg                                1.72s   580.03m
array_concat_BOOLEAN_5##2arg                              149.76ms      6.68
array_concat_BOOLEAN_5##3arg                              234.81ms      4.26
array_concat_BOOLEAN_5##4arg                              300.58ms      3.33
array_concat_INTEGER_10##2arg                              70.89ms     14.11
array_concat_INTEGER_10##3arg                              95.07ms     10.52
array_concat_INTEGER_10##4arg                             124.94ms      8.00
array_concat_INTEGER_20##2arg                             102.19ms      9.79
array_concat_INTEGER_20##3arg                             155.30ms      6.44
array_concat_INTEGER_20##4arg                             187.59ms      5.33
array_concat_INTEGER_40##2arg                             122.93ms      8.13
array_concat_INTEGER_40##3arg                             153.85ms      6.50
array_concat_INTEGER_40##4arg                             322.33ms      3.10
array_concat_INTEGER_5##2arg                               70.71ms     14.14
array_concat_INTEGER_5##3arg                              100.96ms      9.90
array_concat_INTEGER_5##4arg                              124.78ms      8.01
array_concat_VARCHAR_10##2arg                             239.86ms      4.17
array_concat_VARCHAR_10##3arg                             313.51ms      3.19
array_concat_VARCHAR_10##4arg                             418.63ms      2.39
array_concat_VARCHAR_20##2arg                             492.72ms      2.03
array_concat_VARCHAR_20##3arg                             645.26ms      1.55
array_concat_VARCHAR_20##4arg                             872.10ms      1.15
array_concat_VARCHAR_40##2arg                             737.43ms      1.36
array_concat_VARCHAR_40##3arg                                1.19s   843.70m
array_concat_VARCHAR_40##4arg                                1.52s   658.16m
array_concat_VARCHAR_5##2arg                              111.10ms      9.00
array_concat_VARCHAR_5##3arg                              148.33ms      6.74
array_concat_VARCHAR_5##4arg                              193.35ms      5.17

```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             178.21ms      5.61
array_concat_BOOLEAN_10##3arg                             233.11ms      4.29
array_concat_BOOLEAN_10##4arg                             363.77ms      2.75
array_concat_BOOLEAN_20##2arg                             456.42ms      2.19
array_concat_BOOLEAN_20##3arg                             712.48ms      1.40
array_concat_BOOLEAN_20##4arg                             927.58ms      1.08
array_concat_BOOLEAN_40##2arg                             873.87ms      1.14
array_concat_BOOLEAN_40##3arg                                1.35s   742.65m
array_concat_BOOLEAN_40##4arg                                1.66s   602.28m
array_concat_BOOLEAN_5##2arg                              141.29ms      7.08
array_concat_BOOLEAN_5##3arg                              224.04ms      4.46
array_concat_BOOLEAN_5##4arg                              290.93ms      3.44
array_concat_INTEGER_10##2arg                              58.67ms     17.05
array_concat_INTEGER_10##3arg                              80.23ms     12.46
array_concat_INTEGER_10##4arg                             107.38ms      9.31
array_concat_INTEGER_20##2arg                              90.53ms     11.05
array_concat_INTEGER_20##3arg                             146.84ms      6.81
array_concat_INTEGER_20##4arg                             174.97ms      5.72
array_concat_INTEGER_40##2arg                             113.06ms      8.85
array_concat_INTEGER_40##3arg                             144.51ms      6.92
array_concat_INTEGER_40##4arg                             317.69ms      3.15
array_concat_INTEGER_5##2arg                               60.72ms     16.47
array_concat_INTEGER_5##3arg                               86.76ms     11.53
array_concat_INTEGER_5##4arg                              104.10ms      9.61
array_concat_VARCHAR_10##2arg                             226.63ms      4.41
array_concat_VARCHAR_10##3arg                             304.74ms      3.28
array_concat_VARCHAR_10##4arg                             393.14ms      2.54
array_concat_VARCHAR_20##2arg                             467.90ms      2.14
array_concat_VARCHAR_20##3arg                             624.86ms      1.60
array_concat_VARCHAR_20##4arg                             833.13ms      1.20
array_concat_VARCHAR_40##2arg                             703.85ms      1.42
array_concat_VARCHAR_40##3arg                                1.20s   834.57m
array_concat_VARCHAR_40##4arg                                1.58s   634.88m
array_concat_VARCHAR_5##2arg                              104.95ms      9.53
array_concat_VARCHAR_5##3arg                              138.85ms      7.20
array_concat_VARCHAR_5##4arg                              178.57ms      5.60
```

Differential Revision: D52380460
laithsakka added a commit to laithsakka/velox that referenced this pull request Jan 9, 2024
…mitive types. (facebookincubator#8194)

Summary:

add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
BUILD SUCCEEDED
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             195.54ms      5.11
array_concat_BOOLEAN_10##3arg                             265.57ms      3.77
array_concat_BOOLEAN_10##4arg                             397.59ms      2.52
array_concat_BOOLEAN_20##2arg                             487.38ms      2.05
array_concat_BOOLEAN_20##3arg                             758.45ms      1.32
array_concat_BOOLEAN_20##4arg                                1.07s   930.67m
array_concat_BOOLEAN_40##2arg                             914.62ms      1.09
array_concat_BOOLEAN_40##3arg                                1.36s   737.16m
array_concat_BOOLEAN_40##4arg                                1.72s   580.03m
array_concat_BOOLEAN_5##2arg                              149.76ms      6.68
array_concat_BOOLEAN_5##3arg                              234.81ms      4.26
array_concat_BOOLEAN_5##4arg                              300.58ms      3.33
array_concat_INTEGER_10##2arg                              70.89ms     14.11
array_concat_INTEGER_10##3arg                              95.07ms     10.52
array_concat_INTEGER_10##4arg                             124.94ms      8.00
array_concat_INTEGER_20##2arg                             102.19ms      9.79
array_concat_INTEGER_20##3arg                             155.30ms      6.44
array_concat_INTEGER_20##4arg                             187.59ms      5.33
array_concat_INTEGER_40##2arg                             122.93ms      8.13
array_concat_INTEGER_40##3arg                             153.85ms      6.50
array_concat_INTEGER_40##4arg                             322.33ms      3.10
array_concat_INTEGER_5##2arg                               70.71ms     14.14
array_concat_INTEGER_5##3arg                              100.96ms      9.90
array_concat_INTEGER_5##4arg                              124.78ms      8.01
array_concat_VARCHAR_10##2arg                             239.86ms      4.17
array_concat_VARCHAR_10##3arg                             313.51ms      3.19
array_concat_VARCHAR_10##4arg                             418.63ms      2.39
array_concat_VARCHAR_20##2arg                             492.72ms      2.03
array_concat_VARCHAR_20##3arg                             645.26ms      1.55
array_concat_VARCHAR_20##4arg                             872.10ms      1.15
array_concat_VARCHAR_40##2arg                             737.43ms      1.36
array_concat_VARCHAR_40##3arg                                1.19s   843.70m
array_concat_VARCHAR_40##4arg                                1.52s   658.16m
array_concat_VARCHAR_5##2arg                              111.10ms      9.00
array_concat_VARCHAR_5##3arg                              148.33ms      6.74
array_concat_VARCHAR_5##4arg                              193.35ms      5.17

```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             178.21ms      5.61
array_concat_BOOLEAN_10##3arg                             233.11ms      4.29
array_concat_BOOLEAN_10##4arg                             363.77ms      2.75
array_concat_BOOLEAN_20##2arg                             456.42ms      2.19
array_concat_BOOLEAN_20##3arg                             712.48ms      1.40
array_concat_BOOLEAN_20##4arg                             927.58ms      1.08
array_concat_BOOLEAN_40##2arg                             873.87ms      1.14
array_concat_BOOLEAN_40##3arg                                1.35s   742.65m
array_concat_BOOLEAN_40##4arg                                1.66s   602.28m
array_concat_BOOLEAN_5##2arg                              141.29ms      7.08
array_concat_BOOLEAN_5##3arg                              224.04ms      4.46
array_concat_BOOLEAN_5##4arg                              290.93ms      3.44
array_concat_INTEGER_10##2arg                              58.67ms     17.05
array_concat_INTEGER_10##3arg                              80.23ms     12.46
array_concat_INTEGER_10##4arg                             107.38ms      9.31
array_concat_INTEGER_20##2arg                              90.53ms     11.05
array_concat_INTEGER_20##3arg                             146.84ms      6.81
array_concat_INTEGER_20##4arg                             174.97ms      5.72
array_concat_INTEGER_40##2arg                             113.06ms      8.85
array_concat_INTEGER_40##3arg                             144.51ms      6.92
array_concat_INTEGER_40##4arg                             317.69ms      3.15
array_concat_INTEGER_5##2arg                               60.72ms     16.47
array_concat_INTEGER_5##3arg                               86.76ms     11.53
array_concat_INTEGER_5##4arg                              104.10ms      9.61
array_concat_VARCHAR_10##2arg                             226.63ms      4.41
array_concat_VARCHAR_10##3arg                             304.74ms      3.28
array_concat_VARCHAR_10##4arg                             393.14ms      2.54
array_concat_VARCHAR_20##2arg                             467.90ms      2.14
array_concat_VARCHAR_20##3arg                             624.86ms      1.60
array_concat_VARCHAR_20##4arg                             833.13ms      1.20
array_concat_VARCHAR_40##2arg                             703.85ms      1.42
array_concat_VARCHAR_40##3arg                                1.20s   834.57m
array_concat_VARCHAR_40##4arg                                1.58s   634.88m
array_concat_VARCHAR_5##2arg                              104.95ms      9.53
array_concat_VARCHAR_5##3arg                              138.85ms      7.20
array_concat_VARCHAR_5##4arg                              178.57ms      5.60
```

Reviewed By: kevinwilfong

Differential Revision: D52380460
laithsakka added a commit to laithsakka/velox that referenced this pull request Jan 9, 2024
…mitive types. (facebookincubator#8194)

Summary:

add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
BUILD SUCCEEDED
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             195.54ms      5.11
array_concat_BOOLEAN_10##3arg                             265.57ms      3.77
array_concat_BOOLEAN_10##4arg                             397.59ms      2.52
array_concat_BOOLEAN_20##2arg                             487.38ms      2.05
array_concat_BOOLEAN_20##3arg                             758.45ms      1.32
array_concat_BOOLEAN_20##4arg                                1.07s   930.67m
array_concat_BOOLEAN_40##2arg                             914.62ms      1.09
array_concat_BOOLEAN_40##3arg                                1.36s   737.16m
array_concat_BOOLEAN_40##4arg                                1.72s   580.03m
array_concat_BOOLEAN_5##2arg                              149.76ms      6.68
array_concat_BOOLEAN_5##3arg                              234.81ms      4.26
array_concat_BOOLEAN_5##4arg                              300.58ms      3.33
array_concat_INTEGER_10##2arg                              70.89ms     14.11
array_concat_INTEGER_10##3arg                              95.07ms     10.52
array_concat_INTEGER_10##4arg                             124.94ms      8.00
array_concat_INTEGER_20##2arg                             102.19ms      9.79
array_concat_INTEGER_20##3arg                             155.30ms      6.44
array_concat_INTEGER_20##4arg                             187.59ms      5.33
array_concat_INTEGER_40##2arg                             122.93ms      8.13
array_concat_INTEGER_40##3arg                             153.85ms      6.50
array_concat_INTEGER_40##4arg                             322.33ms      3.10
array_concat_INTEGER_5##2arg                               70.71ms     14.14
array_concat_INTEGER_5##3arg                              100.96ms      9.90
array_concat_INTEGER_5##4arg                              124.78ms      8.01
array_concat_VARCHAR_10##2arg                             239.86ms      4.17
array_concat_VARCHAR_10##3arg                             313.51ms      3.19
array_concat_VARCHAR_10##4arg                             418.63ms      2.39
array_concat_VARCHAR_20##2arg                             492.72ms      2.03
array_concat_VARCHAR_20##3arg                             645.26ms      1.55
array_concat_VARCHAR_20##4arg                             872.10ms      1.15
array_concat_VARCHAR_40##2arg                             737.43ms      1.36
array_concat_VARCHAR_40##3arg                                1.19s   843.70m
array_concat_VARCHAR_40##4arg                                1.52s   658.16m
array_concat_VARCHAR_5##2arg                              111.10ms      9.00
array_concat_VARCHAR_5##3arg                              148.33ms      6.74
array_concat_VARCHAR_5##4arg                              193.35ms      5.17

```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             178.21ms      5.61
array_concat_BOOLEAN_10##3arg                             233.11ms      4.29
array_concat_BOOLEAN_10##4arg                             363.77ms      2.75
array_concat_BOOLEAN_20##2arg                             456.42ms      2.19
array_concat_BOOLEAN_20##3arg                             712.48ms      1.40
array_concat_BOOLEAN_20##4arg                             927.58ms      1.08
array_concat_BOOLEAN_40##2arg                             873.87ms      1.14
array_concat_BOOLEAN_40##3arg                                1.35s   742.65m
array_concat_BOOLEAN_40##4arg                                1.66s   602.28m
array_concat_BOOLEAN_5##2arg                              141.29ms      7.08
array_concat_BOOLEAN_5##3arg                              224.04ms      4.46
array_concat_BOOLEAN_5##4arg                              290.93ms      3.44
array_concat_INTEGER_10##2arg                              58.67ms     17.05
array_concat_INTEGER_10##3arg                              80.23ms     12.46
array_concat_INTEGER_10##4arg                             107.38ms      9.31
array_concat_INTEGER_20##2arg                              90.53ms     11.05
array_concat_INTEGER_20##3arg                             146.84ms      6.81
array_concat_INTEGER_20##4arg                             174.97ms      5.72
array_concat_INTEGER_40##2arg                             113.06ms      8.85
array_concat_INTEGER_40##3arg                             144.51ms      6.92
array_concat_INTEGER_40##4arg                             317.69ms      3.15
array_concat_INTEGER_5##2arg                               60.72ms     16.47
array_concat_INTEGER_5##3arg                               86.76ms     11.53
array_concat_INTEGER_5##4arg                              104.10ms      9.61
array_concat_VARCHAR_10##2arg                             226.63ms      4.41
array_concat_VARCHAR_10##3arg                             304.74ms      3.28
array_concat_VARCHAR_10##4arg                             393.14ms      2.54
array_concat_VARCHAR_20##2arg                             467.90ms      2.14
array_concat_VARCHAR_20##3arg                             624.86ms      1.60
array_concat_VARCHAR_20##4arg                             833.13ms      1.20
array_concat_VARCHAR_40##2arg                             703.85ms      1.42
array_concat_VARCHAR_40##3arg                                1.20s   834.57m
array_concat_VARCHAR_40##4arg                                1.58s   634.88m
array_concat_VARCHAR_5##2arg                              104.95ms      9.53
array_concat_VARCHAR_5##3arg                              138.85ms      7.20
array_concat_VARCHAR_5##4arg                              178.57ms      5.60
```

Reviewed By: kevinwilfong

Differential Revision: D52380460
laithsakka added a commit to laithsakka/velox that referenced this pull request Jan 9, 2024
…mitive types. (facebookincubator#8194)

Summary:

add_items append elements from an array view to array writer. 
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because 
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for 
each element in the array (since they are all of the same type) and instead do it before we start the 
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before


## Array concat benchmark. 

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
BUILD SUCCEEDED
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             195.54ms      5.11
array_concat_BOOLEAN_10##3arg                             265.57ms      3.77
array_concat_BOOLEAN_10##4arg                             397.59ms      2.52
array_concat_BOOLEAN_20##2arg                             487.38ms      2.05
array_concat_BOOLEAN_20##3arg                             758.45ms      1.32
array_concat_BOOLEAN_20##4arg                                1.07s   930.67m
array_concat_BOOLEAN_40##2arg                             914.62ms      1.09
array_concat_BOOLEAN_40##3arg                                1.36s   737.16m
array_concat_BOOLEAN_40##4arg                                1.72s   580.03m
array_concat_BOOLEAN_5##2arg                              149.76ms      6.68
array_concat_BOOLEAN_5##3arg                              234.81ms      4.26
array_concat_BOOLEAN_5##4arg                              300.58ms      3.33
array_concat_INTEGER_10##2arg                              70.89ms     14.11
array_concat_INTEGER_10##3arg                              95.07ms     10.52
array_concat_INTEGER_10##4arg                             124.94ms      8.00
array_concat_INTEGER_20##2arg                             102.19ms      9.79
array_concat_INTEGER_20##3arg                             155.30ms      6.44
array_concat_INTEGER_20##4arg                             187.59ms      5.33
array_concat_INTEGER_40##2arg                             122.93ms      8.13
array_concat_INTEGER_40##3arg                             153.85ms      6.50
array_concat_INTEGER_40##4arg                             322.33ms      3.10
array_concat_INTEGER_5##2arg                               70.71ms     14.14
array_concat_INTEGER_5##3arg                              100.96ms      9.90
array_concat_INTEGER_5##4arg                              124.78ms      8.01
array_concat_VARCHAR_10##2arg                             239.86ms      4.17
array_concat_VARCHAR_10##3arg                             313.51ms      3.19
array_concat_VARCHAR_10##4arg                             418.63ms      2.39
array_concat_VARCHAR_20##2arg                             492.72ms      2.03
array_concat_VARCHAR_20##3arg                             645.26ms      1.55
array_concat_VARCHAR_20##4arg                             872.10ms      1.15
array_concat_VARCHAR_40##2arg                             737.43ms      1.36
array_concat_VARCHAR_40##3arg                                1.19s   843.70m
array_concat_VARCHAR_40##4arg                                1.52s   658.16m
array_concat_VARCHAR_5##2arg                              111.10ms      9.00
array_concat_VARCHAR_5##3arg                              148.33ms      6.74
array_concat_VARCHAR_5##4arg                              193.35ms      5.17

```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             178.21ms      5.61
array_concat_BOOLEAN_10##3arg                             233.11ms      4.29
array_concat_BOOLEAN_10##4arg                             363.77ms      2.75
array_concat_BOOLEAN_20##2arg                             456.42ms      2.19
array_concat_BOOLEAN_20##3arg                             712.48ms      1.40
array_concat_BOOLEAN_20##4arg                             927.58ms      1.08
array_concat_BOOLEAN_40##2arg                             873.87ms      1.14
array_concat_BOOLEAN_40##3arg                                1.35s   742.65m
array_concat_BOOLEAN_40##4arg                                1.66s   602.28m
array_concat_BOOLEAN_5##2arg                              141.29ms      7.08
array_concat_BOOLEAN_5##3arg                              224.04ms      4.46
array_concat_BOOLEAN_5##4arg                              290.93ms      3.44
array_concat_INTEGER_10##2arg                              58.67ms     17.05
array_concat_INTEGER_10##3arg                              80.23ms     12.46
array_concat_INTEGER_10##4arg                             107.38ms      9.31
array_concat_INTEGER_20##2arg                              90.53ms     11.05
array_concat_INTEGER_20##3arg                             146.84ms      6.81
array_concat_INTEGER_20##4arg                             174.97ms      5.72
array_concat_INTEGER_40##2arg                             113.06ms      8.85
array_concat_INTEGER_40##3arg                             144.51ms      6.92
array_concat_INTEGER_40##4arg                             317.69ms      3.15
array_concat_INTEGER_5##2arg                               60.72ms     16.47
array_concat_INTEGER_5##3arg                               86.76ms     11.53
array_concat_INTEGER_5##4arg                              104.10ms      9.61
array_concat_VARCHAR_10##2arg                             226.63ms      4.41
array_concat_VARCHAR_10##3arg                             304.74ms      3.28
array_concat_VARCHAR_10##4arg                             393.14ms      2.54
array_concat_VARCHAR_20##2arg                             467.90ms      2.14
array_concat_VARCHAR_20##3arg                             624.86ms      1.60
array_concat_VARCHAR_20##4arg                             833.13ms      1.20
array_concat_VARCHAR_40##2arg                             703.85ms      1.42
array_concat_VARCHAR_40##3arg                                1.20s   834.57m
array_concat_VARCHAR_40##4arg                                1.58s   634.88m
array_concat_VARCHAR_5##2arg                              104.95ms      9.53
array_concat_VARCHAR_5##3arg                              138.85ms      7.20
array_concat_VARCHAR_5##4arg                              178.57ms      5.60
```

Reviewed By: kevinwilfong

Differential Revision: D52380460
facebook-github-bot pushed a commit that referenced this pull request Jan 9, 2024
…mitive types. (#8194)

Summary:
Pull Request resolved: #8194

add_items append elements from an array view to array writer.
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for
each element in the array (since they are all of the same type) and instead do it before we start the
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before

## Array concat benchmark.

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
BUILD SUCCEEDED
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             195.54ms      5.11
array_concat_BOOLEAN_10##3arg                             265.57ms      3.77
array_concat_BOOLEAN_10##4arg                             397.59ms      2.52
array_concat_BOOLEAN_20##2arg                             487.38ms      2.05
array_concat_BOOLEAN_20##3arg                             758.45ms      1.32
array_concat_BOOLEAN_20##4arg                                1.07s   930.67m
array_concat_BOOLEAN_40##2arg                             914.62ms      1.09
array_concat_BOOLEAN_40##3arg                                1.36s   737.16m
array_concat_BOOLEAN_40##4arg                                1.72s   580.03m
array_concat_BOOLEAN_5##2arg                              149.76ms      6.68
array_concat_BOOLEAN_5##3arg                              234.81ms      4.26
array_concat_BOOLEAN_5##4arg                              300.58ms      3.33
array_concat_INTEGER_10##2arg                              70.89ms     14.11
array_concat_INTEGER_10##3arg                              95.07ms     10.52
array_concat_INTEGER_10##4arg                             124.94ms      8.00
array_concat_INTEGER_20##2arg                             102.19ms      9.79
array_concat_INTEGER_20##3arg                             155.30ms      6.44
array_concat_INTEGER_20##4arg                             187.59ms      5.33
array_concat_INTEGER_40##2arg                             122.93ms      8.13
array_concat_INTEGER_40##3arg                             153.85ms      6.50
array_concat_INTEGER_40##4arg                             322.33ms      3.10
array_concat_INTEGER_5##2arg                               70.71ms     14.14
array_concat_INTEGER_5##3arg                              100.96ms      9.90
array_concat_INTEGER_5##4arg                              124.78ms      8.01
array_concat_VARCHAR_10##2arg                             239.86ms      4.17
array_concat_VARCHAR_10##3arg                             313.51ms      3.19
array_concat_VARCHAR_10##4arg                             418.63ms      2.39
array_concat_VARCHAR_20##2arg                             492.72ms      2.03
array_concat_VARCHAR_20##3arg                             645.26ms      1.55
array_concat_VARCHAR_20##4arg                             872.10ms      1.15
array_concat_VARCHAR_40##2arg                             737.43ms      1.36
array_concat_VARCHAR_40##3arg                                1.19s   843.70m
array_concat_VARCHAR_40##4arg                                1.52s   658.16m
array_concat_VARCHAR_5##2arg                              111.10ms      9.00
array_concat_VARCHAR_5##3arg                              148.33ms      6.74
array_concat_VARCHAR_5##4arg                              193.35ms      5.17

```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             178.21ms      5.61
array_concat_BOOLEAN_10##3arg                             233.11ms      4.29
array_concat_BOOLEAN_10##4arg                             363.77ms      2.75
array_concat_BOOLEAN_20##2arg                             456.42ms      2.19
array_concat_BOOLEAN_20##3arg                             712.48ms      1.40
array_concat_BOOLEAN_20##4arg                             927.58ms      1.08
array_concat_BOOLEAN_40##2arg                             873.87ms      1.14
array_concat_BOOLEAN_40##3arg                                1.35s   742.65m
array_concat_BOOLEAN_40##4arg                                1.66s   602.28m
array_concat_BOOLEAN_5##2arg                              141.29ms      7.08
array_concat_BOOLEAN_5##3arg                              224.04ms      4.46
array_concat_BOOLEAN_5##4arg                              290.93ms      3.44
array_concat_INTEGER_10##2arg                              58.67ms     17.05
array_concat_INTEGER_10##3arg                              80.23ms     12.46
array_concat_INTEGER_10##4arg                             107.38ms      9.31
array_concat_INTEGER_20##2arg                              90.53ms     11.05
array_concat_INTEGER_20##3arg                             146.84ms      6.81
array_concat_INTEGER_20##4arg                             174.97ms      5.72
array_concat_INTEGER_40##2arg                             113.06ms      8.85
array_concat_INTEGER_40##3arg                             144.51ms      6.92
array_concat_INTEGER_40##4arg                             317.69ms      3.15
array_concat_INTEGER_5##2arg                               60.72ms     16.47
array_concat_INTEGER_5##3arg                               86.76ms     11.53
array_concat_INTEGER_5##4arg                              104.10ms      9.61
array_concat_VARCHAR_10##2arg                             226.63ms      4.41
array_concat_VARCHAR_10##3arg                             304.74ms      3.28
array_concat_VARCHAR_10##4arg                             393.14ms      2.54
array_concat_VARCHAR_20##2arg                             467.90ms      2.14
array_concat_VARCHAR_20##3arg                             624.86ms      1.60
array_concat_VARCHAR_20##4arg                             833.13ms      1.20
array_concat_VARCHAR_40##2arg                             703.85ms      1.42
array_concat_VARCHAR_40##3arg                                1.20s   834.57m
array_concat_VARCHAR_40##4arg                                1.58s   634.88m
array_concat_VARCHAR_5##2arg                              104.95ms      9.53
array_concat_VARCHAR_5##3arg                              138.85ms      7.20
array_concat_VARCHAR_5##4arg                              178.57ms      5.60
```

Reviewed By: kevinwilfong

Differential Revision: D52380460

fbshipit-source-id: b92bae384de643ad5c6cd614c050cdd78637a5e6
liujiayi771 pushed a commit to liujiayi771/velox that referenced this pull request Jan 16, 2024
…mitive types. (facebookincubator#8194)

Summary:
Pull Request resolved: facebookincubator#8194

add_items append elements from an array view to array writer.
when the input array is ArrayView<Generic> and output is ArrayWriter<Generic> its is expensive
and If a function is using add_items() then to avoid the cost authors register fast paths for primitives
see (facebookincubator#7393)

we can optimize add_items() and avoid that authoring overhead, right now its slow because
it does a type check per element when copying it. We can optimize the cast cost and avoid to do it for
each element in the array (since they are all of the same type) and instead do it before we start the
copy and have a fast path for when the elements are of a pritmive type.

when the elements are not primitive the cost of checking the type s amortized by the cost of the
copying the complex elements.

with this diff, the function array_concat performance with generic implementation is very close
to the one with registration for primitive fast paths.  up to 5X faster than before

## Array concat benchmark.

generic before
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             567.29ms      1.76
array_concat_BOOLEAN_10##3arg                             848.30ms      1.18
array_concat_BOOLEAN_10##4arg                                1.20s   835.32m
array_concat_BOOLEAN_20##2arg                                1.24s   804.59m
array_concat_BOOLEAN_20##3arg                                1.83s   545.78m
array_concat_BOOLEAN_20##4arg                                2.43s   411.28m
array_concat_BOOLEAN_40##2arg                                2.42s   413.40m
array_concat_BOOLEAN_40##3arg                                3.45s   290.10m
array_concat_BOOLEAN_40##4arg                                4.72s   211.95m
array_concat_BOOLEAN_5##2arg                              326.58ms      3.06
array_concat_BOOLEAN_5##3arg                              500.23ms      2.00
array_concat_BOOLEAN_5##4arg                              647.58ms      1.54
array_concat_INTEGER_10##2arg                             451.38ms      2.22
array_concat_INTEGER_10##3arg                             676.54ms      1.48
array_concat_INTEGER_10##4arg                             907.98ms      1.10
array_concat_INTEGER_20##2arg                             903.66ms      1.11
array_concat_INTEGER_20##3arg                                1.46s   685.90m
array_concat_INTEGER_20##4arg                                1.90s   525.07m
array_concat_INTEGER_40##2arg                                1.83s   547.40m
array_concat_INTEGER_40##3arg                                2.63s   379.91m
array_concat_INTEGER_40##4arg                                3.65s   274.16m
array_concat_INTEGER_5##2arg                              243.12ms      4.11
array_concat_INTEGER_5##3arg                              381.92ms      2.62
array_concat_INTEGER_5##4arg                              502.78ms      1.99
array_concat_VARCHAR_10##2arg                                1.26s   792.79m
array_concat_VARCHAR_10##3arg                                1.73s   579.50m
array_concat_VARCHAR_10##4arg                                2.21s   452.26m
array_concat_VARCHAR_20##2arg                                3.23s   309.67m
array_concat_VARCHAR_20##3arg                                4.08s   244.99m
array_concat_VARCHAR_20##4arg                                5.09s   196.40m
array_concat_VARCHAR_40##2arg                                5.49s   182.17m
array_concat_VARCHAR_40##3arg                                9.23s   108.36m
```

generic after
```
BUILD SUCCEEDED
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             195.54ms      5.11
array_concat_BOOLEAN_10##3arg                             265.57ms      3.77
array_concat_BOOLEAN_10##4arg                             397.59ms      2.52
array_concat_BOOLEAN_20##2arg                             487.38ms      2.05
array_concat_BOOLEAN_20##3arg                             758.45ms      1.32
array_concat_BOOLEAN_20##4arg                                1.07s   930.67m
array_concat_BOOLEAN_40##2arg                             914.62ms      1.09
array_concat_BOOLEAN_40##3arg                                1.36s   737.16m
array_concat_BOOLEAN_40##4arg                                1.72s   580.03m
array_concat_BOOLEAN_5##2arg                              149.76ms      6.68
array_concat_BOOLEAN_5##3arg                              234.81ms      4.26
array_concat_BOOLEAN_5##4arg                              300.58ms      3.33
array_concat_INTEGER_10##2arg                              70.89ms     14.11
array_concat_INTEGER_10##3arg                              95.07ms     10.52
array_concat_INTEGER_10##4arg                             124.94ms      8.00
array_concat_INTEGER_20##2arg                             102.19ms      9.79
array_concat_INTEGER_20##3arg                             155.30ms      6.44
array_concat_INTEGER_20##4arg                             187.59ms      5.33
array_concat_INTEGER_40##2arg                             122.93ms      8.13
array_concat_INTEGER_40##3arg                             153.85ms      6.50
array_concat_INTEGER_40##4arg                             322.33ms      3.10
array_concat_INTEGER_5##2arg                               70.71ms     14.14
array_concat_INTEGER_5##3arg                              100.96ms      9.90
array_concat_INTEGER_5##4arg                              124.78ms      8.01
array_concat_VARCHAR_10##2arg                             239.86ms      4.17
array_concat_VARCHAR_10##3arg                             313.51ms      3.19
array_concat_VARCHAR_10##4arg                             418.63ms      2.39
array_concat_VARCHAR_20##2arg                             492.72ms      2.03
array_concat_VARCHAR_20##3arg                             645.26ms      1.55
array_concat_VARCHAR_20##4arg                             872.10ms      1.15
array_concat_VARCHAR_40##2arg                             737.43ms      1.36
array_concat_VARCHAR_40##3arg                                1.19s   843.70m
array_concat_VARCHAR_40##4arg                                1.52s   658.16m
array_concat_VARCHAR_5##2arg                              111.10ms      9.00
array_concat_VARCHAR_5##3arg                              148.33ms      6.74
array_concat_VARCHAR_5##4arg                              193.35ms      5.17

```

primitive fast path
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
array_concat_BOOLEAN_10##2arg                             178.21ms      5.61
array_concat_BOOLEAN_10##3arg                             233.11ms      4.29
array_concat_BOOLEAN_10##4arg                             363.77ms      2.75
array_concat_BOOLEAN_20##2arg                             456.42ms      2.19
array_concat_BOOLEAN_20##3arg                             712.48ms      1.40
array_concat_BOOLEAN_20##4arg                             927.58ms      1.08
array_concat_BOOLEAN_40##2arg                             873.87ms      1.14
array_concat_BOOLEAN_40##3arg                                1.35s   742.65m
array_concat_BOOLEAN_40##4arg                                1.66s   602.28m
array_concat_BOOLEAN_5##2arg                              141.29ms      7.08
array_concat_BOOLEAN_5##3arg                              224.04ms      4.46
array_concat_BOOLEAN_5##4arg                              290.93ms      3.44
array_concat_INTEGER_10##2arg                              58.67ms     17.05
array_concat_INTEGER_10##3arg                              80.23ms     12.46
array_concat_INTEGER_10##4arg                             107.38ms      9.31
array_concat_INTEGER_20##2arg                              90.53ms     11.05
array_concat_INTEGER_20##3arg                             146.84ms      6.81
array_concat_INTEGER_20##4arg                             174.97ms      5.72
array_concat_INTEGER_40##2arg                             113.06ms      8.85
array_concat_INTEGER_40##3arg                             144.51ms      6.92
array_concat_INTEGER_40##4arg                             317.69ms      3.15
array_concat_INTEGER_5##2arg                               60.72ms     16.47
array_concat_INTEGER_5##3arg                               86.76ms     11.53
array_concat_INTEGER_5##4arg                              104.10ms      9.61
array_concat_VARCHAR_10##2arg                             226.63ms      4.41
array_concat_VARCHAR_10##3arg                             304.74ms      3.28
array_concat_VARCHAR_10##4arg                             393.14ms      2.54
array_concat_VARCHAR_20##2arg                             467.90ms      2.14
array_concat_VARCHAR_20##3arg                             624.86ms      1.60
array_concat_VARCHAR_20##4arg                             833.13ms      1.20
array_concat_VARCHAR_40##2arg                             703.85ms      1.42
array_concat_VARCHAR_40##3arg                                1.20s   834.57m
array_concat_VARCHAR_40##4arg                                1.58s   634.88m
array_concat_VARCHAR_5##2arg                              104.95ms      9.53
array_concat_VARCHAR_5##3arg                              138.85ms      7.20
array_concat_VARCHAR_5##4arg                              178.57ms      5.60
```

Reviewed By: kevinwilfong

Differential Revision: D52380460

fbshipit-source-id: b92bae384de643ad5c6cd614c050cdd78637a5e6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants