Introduce enable_expression_evaluation_cache query config #6898

kagamiori · 2023-10-04T21:48:49Z

Summary:
Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead.
Expression evaluator also has a code path evalWithMemo that caches the base vector of
dictionary input to avoid unnecessary computation later. These caches are cleared when
Tasks are destroyed. An internal streaming use case, however, observed large memory usage
by these caches when the streaming pipeline takes large nested-complex-typed input vectors,
has a large number of operators, and runs for very long time without Task destruction.

This diff introduces an enable_expression_evaluation_cache query config. When this flag is set to false,
optimizations including VectorPool, DecodedVector pool, SelectivityVector pool, and Expr::evalWithMemo are disabled.

Differential Revision: D49922027

netlify · 2023-10-04T21:48:53Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`c9f5e1a`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/65235ce262733c0008ba5d48

facebook-github-bot · 2023-10-04T21:48:56Z

This pull request was exported from Phabricator. Differential Revision: D49922027

Summary: Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead. Expression evaluator also has a code path evalWithMemo that caches the base vector of dictionary input to avoid unnecessary computation later. These caches are cleared when Tasks are destroyed. An internal streaming use case, however, observed large memory usage by these caches when the streaming pipeline takes large nested-complex-typed input vectors, has a large number of operators, and runs for very long time without Task destruction. This diff introduces an optimize_for_memory query config to trade performance for memory. When this flag is set to true, optimizations including VectorPool, ExecCtx::decodedVectorPool_, ExecCtx::selectivityVectorPool_, and Expr::evalWithMemo are disabled. Differential Revision: D49922027

facebook-github-bot · 2023-10-04T21:59:52Z

This pull request was exported from Phabricator. Differential Revision: D49922027

Summary: Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead. Expression evaluator also has a code path evalWithMemo that caches the base vector of dictionary input to avoid unnecessary computation later. These caches are cleared when Tasks are destroyed. An internal streaming use case, however, observed large memory usage by these caches when the streaming pipeline takes large nested-complex-typed input vectors, has a large number of operators, and runs for very long time without Task destruction. This diff introduces an optimize_for_low_memory query config to trade performance for memory. When this flag is set to true, optimizations VectorPool and Expr::evalWithMemo are disabled. Differential Revision: D49922027

facebook-github-bot · 2023-10-05T01:08:59Z

This pull request was exported from Phabricator. Differential Revision: D49922027

xiaoxmeng

@kagamiori Thanks for the change % minors!

xiaoxmeng · 2023-10-05T05:44:25Z

velox/core/QueryCtx.h

  }

 private:
  // Pool for all Buffers for this thread.
  memory::MemoryPool* pool_;
  QueryCtx* queryCtx_;
+
+  bool optimizeForLowMemory_;


const and the same for pool_ and queryCtx_? Thanks!

xiaoxmeng · 2023-10-05T05:45:16Z

velox/core/QueryConfig.h

@@ -277,6 +277,11 @@ class QueryConfig {
  static constexpr const char* kValidateOutputFromOperators =
      "debug.validate_output_from_operators";

+  /// If true, trade performance for memory. Optimizations including VectorPool
+  /// and Expr::evalWithMemo are disabled.
+  static constexpr const char* kOptimizeForLowMemory =


Shall we just call it enable_expresssion_eval_cache?

I think VectorPool is not limited to expression evaluation. It's part of ExecCtx inside OperatorCtx, so many operators can have VectorPool.

Then how about enable_operator_buffer_cache? kOptimizeForLowMemory naming is too broad.

xiaoxmeng · 2023-10-05T05:46:50Z

velox/expression/EvalCtx.cpp

@@ -49,6 +51,8 @@ EvalCtx::EvalCtx(core::ExecCtx* execCtx, ExprSet* exprSet, const RowVector* row)
 EvalCtx::EvalCtx(core::ExecCtx* execCtx)
    : execCtx_(execCtx), exprSet_(nullptr), row_(nullptr) {
  VELOX_CHECK_NOT_NULL(execCtx);
+
+  optimizeForLowMemory_ = execCtx->optimizeForLowMemory();


Can we put this ctor initializer list and make it const? Thanks!

xiaoxmeng · 2023-10-05T05:47:37Z

velox/vector/VectorPool.h

@@ -73,7 +73,7 @@ class VectorPool {
 /// the allocated vector back to vector pool on destruction.
 class VectorRecycler {
 public:
-  explicit VectorRecycler(VectorPtr& vector, VectorPool& pool)
+  explicit VectorRecycler(VectorPtr& vector, VectorPool* pool)


Drop explicit as there is more than one input args

xiaoxmeng · 2023-10-05T05:48:23Z

velox/vector/VectorPool.h

-    pool_.release(vector_);
+    if (pool_) {
+      pool_->release(vector_);
+    }
  }

 private:
  VectorPtr& vector_;


VectorPool* const pool_; VectorPtr& vector_;

velox/expression/Expr.cpp

Summary: Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead. Expression evaluator also has a code path evalWithMemo that caches the base vector of dictionary input to avoid unnecessary computation later. These caches are cleared when Tasks are destroyed. An internal streaming use case, however, observed large memory usage by these caches when the streaming pipeline takes large nested-complex-typed input vectors, has a large number of operators, and runs for very long time without Task destruction. This diff introduces an optimize_for_low_memory query config to trade performance for memory. When this flag is set to true, optimizations VectorPool and Expr::evalWithMemo are disabled. Reviewed By: bikramSingh91 Differential Revision: D49922027

facebook-github-bot · 2023-10-05T23:48:07Z

This pull request was exported from Phabricator. Differential Revision: D49922027

Summary: Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead. Expression evaluator also has a code path evalWithMemo that caches the base vector of dictionary input to avoid unnecessary computation later. These caches are cleared when Tasks are destroyed. An internal streaming use case, however, observed large memory usage by these caches when the streaming pipeline takes large nested-complex-typed input vectors, has a large number of operators, and runs for very long time without Task destruction. This diff introduces an optimize_for_low_memory query config to trade performance for memory. When this flag is set to true, optimizations VectorPool and Expr::evalWithMemo are disabled. Reviewed By: bikramSingh91 Differential Revision: D49922027

facebook-github-bot · 2023-10-05T23:48:51Z

This pull request was exported from Phabricator. Differential Revision: D49922027

Summary: Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead. Expression evaluator also has a code path evalWithMemo that caches the base vector of dictionary input to avoid unnecessary computation later. These caches are cleared when Tasks are destroyed. An internal streaming use case, however, observed large memory usage by these caches when the streaming pipeline takes large nested-complex-typed input vectors, has a large number of operators, and runs for very long time without Task destruction. This diff introduces an optimize_for_low_memory query config to trade performance for memory. When this flag is set to true, optimizations VectorPool and Expr::evalWithMemo are disabled. Reviewed By: bikramSingh91 Differential Revision: D49922027

facebook-github-bot · 2023-10-06T02:30:36Z

This pull request was exported from Phabricator. Differential Revision: D49922027

Summary: Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead. Expression evaluator also has a code path evalWithMemo that caches the base vector of dictionary input to avoid unnecessary computation later. These caches are cleared when Tasks are destroyed. An internal streaming use case, however, observed large memory usage by these caches when the streaming pipeline takes large nested-complex-typed input vectors, has a large number of operators, and runs for very long time without Task destruction. This diff introduces an optimize_for_low_memory query config to trade performance for memory. When this flag is set to true, optimizations VectorPool and Expr::evalWithMemo are disabled. Reviewed By: bikramSingh91 Differential Revision: D49922027

facebook-github-bot · 2023-10-06T02:35:50Z

This pull request was exported from Phabricator. Differential Revision: D49922027

Summary: Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead. Expression evaluator also has a code path evalWithMemo that caches the base vector of dictionary input to avoid unnecessary computation later. These caches are cleared when Tasks are destroyed. An internal streaming use case, however, observed large memory usage by these caches when the streaming pipeline takes large nested-complex-typed input vectors, has a large number of operators, and runs for very long time without Task destruction. This diff introduces an optimize_for_low_memory query config to trade performance for memory. When this flag is set to true, optimizations VectorPool and Expr::evalWithMemo are disabled. Reviewed By: bikramSingh91 Differential Revision: D49922027

facebook-github-bot · 2023-10-06T02:36:50Z

This pull request was exported from Phabricator. Differential Revision: D49922027

facebook-github-bot · 2023-10-06T19:37:17Z

This pull request was exported from Phabricator. Differential Revision: D49922027

…cubator#6898) Summary: Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead. Expression evaluator also has a code path evalWithMemo that caches the base vector of dictionary input to avoid unnecessary computation later. These caches are cleared when Tasks are destroyed. An internal streaming use case, however, observed large memory usage by these caches when the streaming pipeline takes large nested-complex-typed input vectors, has a large number of operators, and runs for very long time without Task destruction. This diff introduces an enable_expression_evaluation_cache query config. When this flag is set to false, optimizations including VectorPool, DecodedVector pool, SelectivityVector pool, and Expr::evalWithMemo are disabled. Reviewed By: xiaoxmeng, bikramSingh91 Differential Revision: D49922027

facebook-github-bot · 2023-10-06T19:38:23Z

This pull request was exported from Phabricator. Differential Revision: D49922027

…cubator#6898) Summary: Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead. Expression evaluator also has a code path evalWithMemo that caches the base vector of dictionary input to avoid unnecessary computation later. These caches are cleared when Tasks are destroyed. An internal streaming use case, however, observed large memory usage by these caches when the streaming pipeline takes large nested-complex-typed input vectors, has a large number of operators, and runs for very long time without Task destruction. This diff introduces an enable_expression_evaluation_cache query config. When this flag is set to false, optimizations including VectorPool, DecodedVector pool, SelectivityVector pool, and Expr::evalWithMemo are disabled. Reviewed By: xiaoxmeng, bikramSingh91 Differential Revision: D49922027

facebook-github-bot · 2023-10-06T19:56:13Z

This pull request was exported from Phabricator. Differential Revision: D49922027

kagamiori · 2023-10-06T19:56:46Z

Updated. Please let me know if you have further suggestions. Thanks! @mbasmanova, @xiaoxmeng.

…cubator#6898) Summary: Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead. Expression evaluator also has a code path evalWithMemo that caches the base vector of dictionary input to avoid unnecessary computation later. These caches are cleared when Tasks are destroyed. An internal streaming use case, however, observed large memory usage by these caches when the streaming pipeline takes large nested-complex-typed input vectors, has a large number of operators, and runs for very long time without Task destruction. This diff introduces an enable_expression_evaluation_cache query config. When this flag is set to false, optimizations including VectorPool, DecodedVector pool, SelectivityVector pool, and Expr::evalWithMemo are disabled. Reviewed By: xiaoxmeng, bikramSingh91 Differential Revision: D49922027

facebook-github-bot · 2023-10-06T19:57:16Z

This pull request was exported from Phabricator. Differential Revision: D49922027

…cubator#6898) Summary: Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead. Expression evaluator also has a code path evalWithMemo that caches the base vector of dictionary input to avoid unnecessary computation later. These caches are cleared when Tasks are destroyed. An internal streaming use case, however, observed large memory usage by these caches when the streaming pipeline takes large nested-complex-typed input vectors, has a large number of operators, and runs for very long time without Task destruction. This diff introduces an enable_expression_evaluation_cache query config. When this flag is set to false, optimizations including VectorPool, DecodedVector pool, SelectivityVector pool, and Expr::evalWithMemo are disabled. Reviewed By: xiaoxmeng, bikramSingh91 Differential Revision: D49922027

facebook-github-bot · 2023-10-06T21:16:03Z

This pull request was exported from Phabricator. Differential Revision: D49922027

mbasmanova · 2023-10-06T22:16:19Z

velox/core/QueryConfig.h

@@ -567,6 +575,10 @@ class QueryConfig {
    return get<bool>(kValidateOutputFromOperators, false);
  }

+  bool enableExpressionEvaluationCache() const {


isExpressionEvaluationCacheEnabled to avoid giving an impression that this method enables the cache

…cubator#6898) Summary: Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead. Expression evaluator also has a code path evalWithMemo that caches the base vector of dictionary input to avoid unnecessary computation later. These caches are cleared when Tasks are destroyed. An internal streaming use case, however, observed large memory usage by these caches when the streaming pipeline takes large nested-complex-typed input vectors, has a large number of operators, and runs for very long time without Task destruction. This diff introduces an enable_expression_evaluation_cache query config. When this flag is set to false, optimizations including VectorPool, DecodedVector pool, SelectivityVector pool, and Expr::evalWithMemo are disabled. Reviewed By: xiaoxmeng, bikramSingh91 Differential Revision: D49922027

facebook-github-bot · 2023-10-09T01:52:39Z

This pull request was exported from Phabricator. Differential Revision: D49922027

facebook-github-bot · 2023-10-09T01:52:47Z

This pull request was exported from Phabricator. Differential Revision: D49922027

facebook-github-bot · 2023-10-09T05:14:15Z

This pull request has been merged in 36f9621.

conbench-facebook · 2023-10-09T05:47:55Z

Conbench analyzed the 1 benchmark run on commit 36f9621f.

There was 1 benchmark result indicating a performance regression:

Commit Run on GitHub-runner-8-core at 2023-10-09 05:47:31Z
- between (C++) with source=cpp-micro, suite=velox_benchmark_basic_comparison_conjunct

The full Conbench report has more details.

…cubator#6898) Summary: Pull Request resolved: facebookincubator#6898 Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead. Expression evaluator also has a code path evalWithMemo that caches the base vector of dictionary input to avoid unnecessary computation later. These caches are cleared when Tasks are destroyed. An internal streaming use case, however, observed large memory usage by these caches when the streaming pipeline takes large nested-complex-typed input vectors, has a large number of operators, and runs for very long time without Task destruction. This diff introduces an enable_expression_evaluation_cache query config. When this flag is set to false, optimizations including VectorPool, DecodedVector pool, SelectivityVector pool, and Expr::evalWithMemo are disabled. Reviewed By: xiaoxmeng, bikramSingh91 Differential Revision: D49922027 fbshipit-source-id: dce4bf6f1a896c7b05a504dd60ce9c2480759434

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 4, 2023

facebook-github-bot added the fb-exported label Oct 4, 2023

kagamiori force-pushed the export-D49922027 branch from eecaf92 to bda5aeb Compare October 4, 2023 21:59

kagamiori requested review from bikramSingh91, xiaoxmeng and mbasmanova October 5, 2023 00:33

kagamiori changed the title ~~Introduce optimize_for_memory query config~~ Introduce optimize_for_low_memory query config Oct 5, 2023

kagamiori force-pushed the export-D49922027 branch from bda5aeb to 7ac4980 Compare October 5, 2023 01:08

xiaoxmeng reviewed Oct 5, 2023

View reviewed changes

kagamiori force-pushed the export-D49922027 branch from 7ac4980 to e3fc961 Compare October 5, 2023 23:47

kagamiori force-pushed the export-D49922027 branch from e3fc961 to 290f12e Compare October 5, 2023 23:48

kagamiori force-pushed the export-D49922027 branch from 290f12e to 06d9ec1 Compare October 6, 2023 02:30

kagamiori force-pushed the export-D49922027 branch from 06d9ec1 to a86593f Compare October 6, 2023 02:35

kagamiori force-pushed the export-D49922027 branch from a86593f to 5f507e1 Compare October 6, 2023 02:36

kagamiori force-pushed the export-D49922027 branch from a345edb to 58e594c Compare October 6, 2023 19:36

kagamiori changed the title ~~Introduce enable_operator_buffer_cache query config~~ Introduce enable_expression_evaluation_cache query config Oct 6, 2023

kagamiori force-pushed the export-D49922027 branch from 58e594c to 4d5d412 Compare October 6, 2023 19:38

kagamiori requested a review from xiaoxmeng October 6, 2023 19:45

kagamiori force-pushed the export-D49922027 branch from 4d5d412 to 5f22804 Compare October 6, 2023 19:56

kagamiori force-pushed the export-D49922027 branch from 5f22804 to 721f303 Compare October 6, 2023 19:57

kagamiori force-pushed the export-D49922027 branch from 721f303 to 95279ef Compare October 6, 2023 21:16

mbasmanova reviewed Oct 6, 2023

View reviewed changes

kagamiori force-pushed the export-D49922027 branch from 95279ef to effb34f Compare October 9, 2023 01:52

kagamiori force-pushed the export-D49922027 branch from effb34f to c9f5e1a Compare October 9, 2023 01:52

facebook-github-bot closed this in 36f9621 Oct 9, 2023

facebook-github-bot added the Merged label Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce enable_expression_evaluation_cache query config #6898

Introduce enable_expression_evaluation_cache query config #6898

kagamiori commented Oct 4, 2023 •

edited

netlify bot commented Oct 4, 2023 •

edited

facebook-github-bot commented Oct 4, 2023

facebook-github-bot commented Oct 4, 2023

facebook-github-bot commented Oct 5, 2023

xiaoxmeng left a comment

xiaoxmeng Oct 5, 2023

xiaoxmeng Oct 5, 2023

kagamiori Oct 6, 2023

xiaoxmeng Oct 6, 2023

xiaoxmeng Oct 5, 2023

xiaoxmeng Oct 5, 2023

xiaoxmeng Oct 5, 2023

facebook-github-bot commented Oct 5, 2023

facebook-github-bot commented Oct 5, 2023

facebook-github-bot commented Oct 6, 2023

facebook-github-bot commented Oct 6, 2023

facebook-github-bot commented Oct 6, 2023

facebook-github-bot commented Oct 6, 2023

facebook-github-bot commented Oct 6, 2023

facebook-github-bot commented Oct 6, 2023

kagamiori commented Oct 6, 2023

facebook-github-bot commented Oct 6, 2023

facebook-github-bot commented Oct 6, 2023

mbasmanova Oct 6, 2023

facebook-github-bot commented Oct 9, 2023

facebook-github-bot commented Oct 9, 2023

facebook-github-bot commented Oct 9, 2023

conbench-facebook bot commented Oct 9, 2023

Introduce enable_expression_evaluation_cache query config #6898

Introduce enable_expression_evaluation_cache query config #6898

Conversation

kagamiori commented Oct 4, 2023 • edited

netlify bot commented Oct 4, 2023 • edited

✅ Deploy Preview for meta-velox canceled.

facebook-github-bot commented Oct 4, 2023

facebook-github-bot commented Oct 4, 2023

facebook-github-bot commented Oct 5, 2023

xiaoxmeng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 5, 2023

facebook-github-bot commented Oct 5, 2023

facebook-github-bot commented Oct 6, 2023

facebook-github-bot commented Oct 6, 2023

facebook-github-bot commented Oct 6, 2023

facebook-github-bot commented Oct 6, 2023

facebook-github-bot commented Oct 6, 2023

facebook-github-bot commented Oct 6, 2023

kagamiori commented Oct 6, 2023

facebook-github-bot commented Oct 6, 2023

facebook-github-bot commented Oct 6, 2023

Choose a reason for hiding this comment

facebook-github-bot commented Oct 9, 2023

facebook-github-bot commented Oct 9, 2023

facebook-github-bot commented Oct 9, 2023

conbench-facebook bot commented Oct 9, 2023

kagamiori commented Oct 4, 2023 •

edited

netlify bot commented Oct 4, 2023 •

edited