Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark for Presto IN expression #516

Closed
wants to merge 1 commit into from

Conversation

mbasmanova
Copy link
Contributor

The benchmark shows that IN expression with large number of values is incredibly
slow. This is because IN is implemented as a function with variable number of
arguments. IN with 10K values is represented as an expression tree with one
node for the IN, one for the column and 10K nodes for values. Each value is
wrapped in a ConstantExpr which gets evaluated for each batch of rows making ne
shared pointers to constant vectors.

A follow-up PR will optimize IN expression to take values as a single array.

prestosql/benchmarks/InBenchmark.cpprelative  time/iter  iters/s
============================================================================
fastIn                                                       3.49ms   286.17
in                                                55.08%     6.34ms   157.62
fastIn10K                                                    3.12ms   320.43
in10K                                              0.07%      4.72s  211.75m
============================================================================

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 28, 2021
@mbasmanova
Copy link
Contributor Author

CC: @laithsakka

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mbasmanova mbasmanova force-pushed the benchmark-in branch 2 times, most recently from 8274996 to 764ddd3 Compare October 28, 2021 22:08
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Oct 28, 2021
Summary:
The benchmark shows that IN expression with large number of values is very
slow. This is because IN is implemented as a function with variable number of
arguments. IN with 1K values is represented as an expression tree with one
node for the IN, one for the column and 1K nodes for values. Each value is
wrapped in a ConstantExpr which gets evaluated for each batch of rows making new
shared pointers to constant vectors.

A follow-up PR will optimize IN expression to take values as a single array.

```
============================================================================
velox/functions/prestosql/benchmarks/InBenchmark.cpprelative  time/iter  iters/s
============================================================================
fastIn                                                       4.38ms   228.06
in                                                34.65%    12.65ms    79.02
fastIn1K                                                     4.02ms   248.83
in1K                                               1.84%   218.08ms     4.59
============================================================================
```

Pull Request resolved: facebookincubator#516

Differential Revision: D31997776

Pulled By: mbasmanova

fbshipit-source-id: fb733bba97b0416680357acc0e41b6fcab1b642d
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31997776

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Oct 28, 2021
Summary:
The benchmark shows that IN expression with large number of values is very
slow. This is because IN is implemented as a function with variable number of
arguments. IN with 1K values is represented as an expression tree with one
node for the IN, one for the column and 1K nodes for values. Each value is
wrapped in a ConstantExpr which gets evaluated for each batch of rows making new
shared pointers to constant vectors.

A follow-up PR will optimize IN expression to take values as a single array.

```
============================================================================
velox/functions/prestosql/benchmarks/InBenchmark.cpprelative  time/iter  iters/s
============================================================================
fastIn                                                       4.38ms   228.06
in                                                34.65%    12.65ms    79.02
fastIn1K                                                     4.02ms   248.83
in1K                                               1.84%   218.08ms     4.59
============================================================================
```

Pull Request resolved: facebookincubator#516

Differential Revision: D31997776

Pulled By: mbasmanova

fbshipit-source-id: 8b43b2471eb943bf9f8d84227ba1e3aed31b51c7
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31997776

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31997776

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Oct 28, 2021
Summary:
The benchmark shows that IN expression with large number of values is very
slow. This is because IN is implemented as a function with variable number of
arguments. IN with 1K values is represented as an expression tree with one
node for the IN, one for the column and 1K nodes for values. Each value is
wrapped in a ConstantExpr which gets evaluated for each batch of rows making new
shared pointers to constant vectors.

A follow-up PR will optimize IN expression to take values as a single array.

```
============================================================================
velox/functions/prestosql/benchmarks/InBenchmark.cpprelative  time/iter  iters/s
============================================================================
fastIn                                                       4.38ms   228.06
in                                                34.65%    12.65ms    79.02
fastIn1K                                                     4.02ms   248.83
in1K                                               1.84%   218.08ms     4.59
============================================================================
```

Pull Request resolved: facebookincubator#516

Differential Revision: D31997776

Pulled By: mbasmanova

fbshipit-source-id: 17fa62157984a1181d09c04dcd7351ddfeace6c2
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Oct 28, 2021
Summary:
The benchmark shows that IN expression with large number of values is incredibly
slow. This is because IN is implemented as a function with variable number of
arguments. IN with 10K values is represented as an expression tree with one
node for the IN, one for the column and 10K nodes for values. Each value is
wrapped in a ConstantExpr which gets evaluated for each batch of rows making ne
shared pointers to constant vectors.

A follow-up PR will optimize IN expression to take values as a single array.

```
prestosql/benchmarks/InBenchmark.cpprelative  time/iter  iters/s
============================================================================
fastIn                                                       3.49ms   286.17
in                                                55.08%     6.34ms   157.62
fastIn10K                                                    3.12ms   320.43
in10K                                              0.07%      4.72s  211.75m
============================================================================
```

Pull Request resolved: facebookincubator#516

Differential Revision: D31997776

Pulled By: mbasmanova

fbshipit-source-id: 55d20a05beda182be9d4e5de6d426a55884861ad
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Oct 28, 2021
Summary:
The benchmark shows that IN expression with large number of values is very
slow. This is because IN is implemented as a function with variable number of
arguments. IN with 1K values is represented as an expression tree with one
node for the IN, one for the column and 1K nodes for values. Each value is
wrapped in a ConstantExpr which gets evaluated for each batch of rows making new
shared pointers to constant vectors.

A follow-up PR will optimize IN expression to take values as a single array.

```
============================================================================
velox/functions/prestosql/benchmarks/InBenchmark.cpprelative  time/iter  iters/s
============================================================================
fastIn                                                       4.38ms   228.06
in                                                34.65%    12.65ms    79.02
fastIn1K                                                     4.02ms   248.83
in1K                                               1.84%   218.08ms     4.59
============================================================================
```

Pull Request resolved: facebookincubator#516

Differential Revision: D31997776

Pulled By: mbasmanova

fbshipit-source-id: c2cdf821cadb50a87c8998d5ad88f64f0d097db2
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31997776

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Oct 28, 2021
Summary:
The benchmark shows that IN expression with large number of values is incredibly
slow. This is because IN is implemented as a function with variable number of
arguments. IN with 10K values is represented as an expression tree with one
node for the IN, one for the column and 10K nodes for values. Each value is
wrapped in a ConstantExpr which gets evaluated for each batch of rows making ne
shared pointers to constant vectors.

A follow-up PR will optimize IN expression to take values as a single array.

```
prestosql/benchmarks/InBenchmark.cpprelative  time/iter  iters/s
============================================================================
fastIn                                                       3.49ms   286.17
in                                                55.08%     6.34ms   157.62
fastIn10K                                                    3.12ms   320.43
in10K                                              0.07%      4.72s  211.75m
============================================================================
```

Pull Request resolved: facebookincubator#516

Differential Revision: D31997776

Pulled By: mbasmanova

fbshipit-source-id: e9e4ed363ffed4c2da5f52d9db886621a313134e
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Oct 28, 2021
Summary:
The benchmark shows that IN expression with large number of values is incredibly
slow. This is because IN is implemented as a function with variable number of
arguments. IN with 10K values is represented as an expression tree with one
node for the IN, one for the column and 10K nodes for values. Each value is
wrapped in a ConstantExpr which gets evaluated for each batch of rows making ne
shared pointers to constant vectors.

A follow-up PR will optimize IN expression to take values as a single array.

```
prestosql/benchmarks/InBenchmark.cpprelative  time/iter  iters/s
============================================================================
fastIn                                                       3.49ms   286.17
in                                                55.08%     6.34ms   157.62
fastIn10K                                                    3.12ms   320.43
in10K                                              0.07%      4.72s  211.75m
============================================================================
```

Pull Request resolved: facebookincubator#516

Differential Revision: D31997776

Pulled By: mbasmanova

fbshipit-source-id: 746fd2fde2d787fee80b6c09bf215439b5a75a4a
Summary:
The benchmark shows that IN expression with large number of values is very
slow. This is because IN is implemented as a function with variable number of
arguments. IN with 1K values is represented as an expression tree with one
node for the IN, one for the column and 1K nodes for values. Each value is
wrapped in a ConstantExpr which gets evaluated for each batch of rows making new
shared pointers to constant vectors.

A follow-up PR will optimize IN expression to take values as a single array.

```
============================================================================
velox/functions/prestosql/benchmarks/InBenchmark.cpprelative  time/iter  iters/s
============================================================================
fastIn                                                       4.38ms   228.06
in                                                34.65%    12.65ms    79.02
fastIn1K                                                     4.02ms   248.83
in1K                                               1.84%   218.08ms     4.59
============================================================================
```

Pull Request resolved: facebookincubator#516

Differential Revision: D31997776

Pulled By: mbasmanova

fbshipit-source-id: 40da127b8b9a46181c26e2cf2eb0f975f05fc6f4
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31997776

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Oct 28, 2021
Summary:
The benchmark shows that IN expression with large number of values is incredibly
slow. This is because IN is implemented as a function with variable number of
arguments. IN with 10K values is represented as an expression tree with one
node for the IN, one for the column and 10K nodes for values. Each value is
wrapped in a ConstantExpr which gets evaluated for each batch of rows making ne
shared pointers to constant vectors.

A follow-up PR will optimize IN expression to take values as a single array.

```
prestosql/benchmarks/InBenchmark.cpprelative  time/iter  iters/s
============================================================================
fastIn                                                       3.49ms   286.17
in                                                55.08%     6.34ms   157.62
fastIn10K                                                    3.12ms   320.43
in10K                                              0.07%      4.72s  211.75m
============================================================================
```

Pull Request resolved: facebookincubator#516

Differential Revision: D31997776

Pulled By: mbasmanova

fbshipit-source-id: 03550457be779504825cc03d30a9c227fe5942aa
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Oct 28, 2021
Summary:
The benchmark shows that IN expression with large number of values is incredibly
slow. This is because IN is implemented as a function with variable number of
arguments. IN with 10K values is represented as an expression tree with one
node for the IN, one for the column and 10K nodes for values. Each value is
wrapped in a ConstantExpr which gets evaluated for each batch of rows making ne
shared pointers to constant vectors.

A follow-up PR will optimize IN expression to take values as a single array.

```
prestosql/benchmarks/InBenchmark.cpprelative  time/iter  iters/s
============================================================================
fastIn                                                       3.49ms   286.17
in                                                55.08%     6.34ms   157.62
fastIn10K                                                    3.12ms   320.43
in10K                                              0.07%      4.72s  211.75m
============================================================================
```

Pull Request resolved: facebookincubator#516

Differential Revision: D31997776

Pulled By: mbasmanova

fbshipit-source-id: 0d2878624915cf44055f58f5192b58d6131b46f2
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Oct 28, 2021
Summary:
The benchmark shows that IN expression with large number of values is incredibly
slow. This is because IN is implemented as a function with variable number of
arguments. IN with 10K values is represented as an expression tree with one
node for the IN, one for the column and 10K nodes for values. Each value is
wrapped in a ConstantExpr which gets evaluated for each batch of rows making ne
shared pointers to constant vectors.

A follow-up PR will optimize IN expression to take values as a single array.

```
prestosql/benchmarks/InBenchmark.cpprelative  time/iter  iters/s
============================================================================
fastIn                                                       3.49ms   286.17
in                                                55.08%     6.34ms   157.62
fastIn10K                                                    3.12ms   320.43
in10K                                              0.07%      4.72s  211.75m
============================================================================
```

Pull Request resolved: facebookincubator#516

Differential Revision: D31997776

Pulled By: mbasmanova

fbshipit-source-id: fb84e677975ca12c3db6fe02ed5a639fd14968a7
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Oct 28, 2021
Summary:
The benchmark shows that IN expression with large number of values is incredibly
slow. This is because IN is implemented as a function with variable number of
arguments. IN with 10K values is represented as an expression tree with one
node for the IN, one for the column and 10K nodes for values. Each value is
wrapped in a ConstantExpr which gets evaluated for each batch of rows making ne
shared pointers to constant vectors.

A follow-up PR will optimize IN expression to take values as a single array.

```
prestosql/benchmarks/InBenchmark.cpprelative  time/iter  iters/s
============================================================================
fastIn                                                       3.49ms   286.17
in                                                55.08%     6.34ms   157.62
fastIn10K                                                    3.12ms   320.43
in10K                                              0.07%      4.72s  211.75m
============================================================================
```

Pull Request resolved: facebookincubator#516

Differential Revision: D31997776

Pulled By: mbasmanova

fbshipit-source-id: 141481cca54274d6cae24e14b3dccd7c1c34456d
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Oct 29, 2021
Summary:
The benchmark shows that IN expression with large number of values is incredibly
slow. This is because IN is implemented as a function with variable number of
arguments. IN with 10K values is represented as an expression tree with one
node for the IN, one for the column and 10K nodes for values. Each value is
wrapped in a ConstantExpr which gets evaluated for each batch of rows making ne
shared pointers to constant vectors.

A follow-up PR will optimize IN expression to take values as a single array.

```
prestosql/benchmarks/InBenchmark.cpprelative  time/iter  iters/s
============================================================================
fastIn                                                       3.49ms   286.17
in                                                55.08%     6.34ms   157.62
fastIn10K                                                    3.12ms   320.43
in10K                                              0.07%      4.72s  211.75m
============================================================================
```

Pull Request resolved: facebookincubator#516

Differential Revision: D31997776

Pulled By: mbasmanova

fbshipit-source-id: 359d76141b472922f31b78edcfab19282e8b67e7
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Oct 29, 2021
Summary:
The benchmark shows that IN expression with large number of values is incredibly
slow. This is because IN is implemented as a function with variable number of
arguments. IN with 10K values is represented as an expression tree with one
node for the IN, one for the column and 10K nodes for values. Each value is
wrapped in a ConstantExpr which gets evaluated for each batch of rows making ne
shared pointers to constant vectors.

A follow-up PR will optimize IN expression to take values as a single array.

```
prestosql/benchmarks/InBenchmark.cpprelative  time/iter  iters/s
============================================================================
fastIn                                                       3.49ms   286.17
in                                                55.08%     6.34ms   157.62
fastIn10K                                                    3.12ms   320.43
in10K                                              0.07%      4.72s  211.75m
============================================================================
```

Pull Request resolved: facebookincubator#516

Differential Revision: D31997776

Pulled By: mbasmanova

fbshipit-source-id: f188d320b2157e8e635e4875a481a63e653299e8
@facebook-github-bot
Copy link
Contributor

@mbasmanova merged this pull request in 1764e89.

kevinwilfong pushed a commit to kevinwilfong/velox that referenced this pull request Oct 29, 2021
Summary:
The benchmark shows that IN expression with large number of values is very
slow. This is because IN is implemented as a function with variable number of
arguments. IN with 1K values is represented as an expression tree with one
node for the IN, one for the column and 1K nodes for values. Each value is
wrapped in a ConstantExpr which gets evaluated for each batch of rows making new
shared pointers to constant vectors.

A follow-up PR will optimize IN expression to take values as a single array.

```
============================================================================
velox/functions/prestosql/benchmarks/InBenchmark.cpprelative  time/iter  iters/s
============================================================================
fastIn                                                       4.38ms   228.06
in                                                34.65%    12.65ms    79.02
fastIn1K                                                     4.02ms   248.83
in1K                                               1.84%   218.08ms     4.59
============================================================================
```

Pull Request resolved: facebookincubator#516

Reviewed By: funrollloops

Differential Revision: D31997776

Pulled By: mbasmanova

fbshipit-source-id: 0ed9a8b559016b84b1fee070c4536f28cd4776d8
facebook-github-bot pushed a commit that referenced this pull request Aug 22, 2022
Summary:
X-link: facebook/fbthrift#516

X-link: facebook/watchman#1050

X-link: facebook/proxygen#426

X-link: facebook/folly#1842

X-link: facebook/fboss#117

Sadly, even though Ubuntu 18.04 is still in LTS, GitHub is deprecating
its runner image.

Migrate the generated GitHub Actions to 20.04.

actions/runner-images#6002

https://github.blog/changelog/2022-08-09-github-actions-the-ubuntu-18-04-actions-runner-image-is-being-deprecated-and-will-be-removed-by-12-1-22/

Reviewed By: fanzeyi

Differential Revision: D38877286

fbshipit-source-id: 85f3324d6666eacb190a43985585b438de69d545
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants