Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce enable_expression_evaluation_cache query config #6898

Closed
wants to merge 1 commit into from

Conversation

kagamiori
Copy link
Contributor

@kagamiori kagamiori commented Oct 4, 2023

Summary:
Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead.
Expression evaluator also has a code path evalWithMemo that caches the base vector of
dictionary input to avoid unnecessary computation later. These caches are cleared when
Tasks are destroyed. An internal streaming use case, however, observed large memory usage
by these caches when the streaming pipeline takes large nested-complex-typed input vectors,
has a large number of operators, and runs for very long time without Task destruction.

This diff introduces an enable_expression_evaluation_cache query config. When this flag is set to false,
optimizations including VectorPool, DecodedVector pool, SelectivityVector pool, and Expr::evalWithMemo are disabled.

Differential Revision: D49922027

@netlify
Copy link

netlify bot commented Oct 4, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit c9f5e1a
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/65235ce262733c0008ba5d48

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 4, 2023
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49922027

kagamiori added a commit to kagamiori/velox that referenced this pull request Oct 4, 2023
Summary:

Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead.
Expression evaluator also has a code path evalWithMemo that caches the base vector of
dictionary input to avoid unnecessary computation later. These caches are cleared when
Tasks are destroyed. An internal streaming use case, however, observed large memory usage
by these caches when the streaming pipeline takes large nested-complex-typed input vectors,
has a large number of operators, and runs for very long time without Task destruction.

This diff introduces an optimize_for_memory query config to trade performance for memory.
When this flag is set to true, optimizations including VectorPool, ExecCtx::decodedVectorPool_,
ExecCtx::selectivityVectorPool_, and Expr::evalWithMemo are disabled.

Differential Revision: D49922027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49922027

@kagamiori kagamiori changed the title Introduce optimize_for_memory query config Introduce optimize_for_low_memory query config Oct 5, 2023
kagamiori added a commit to kagamiori/velox that referenced this pull request Oct 5, 2023
Summary:

Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead.
Expression evaluator also has a code path evalWithMemo that caches the base vector of
dictionary input to avoid unnecessary computation later. These caches are cleared when
Tasks are destroyed. An internal streaming use case, however, observed large memory usage
by these caches when the streaming pipeline takes large nested-complex-typed input vectors,
has a large number of operators, and runs for very long time without Task destruction.

This diff introduces an optimize_for_low_memory query config to trade performance for memory.
When this flag is set to true, optimizations VectorPool and Expr::evalWithMemo are disabled.

Differential Revision: D49922027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49922027

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kagamiori Thanks for the change % minors!

}

private:
// Pool for all Buffers for this thread.
memory::MemoryPool* pool_;
QueryCtx* queryCtx_;

bool optimizeForLowMemory_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const and the same for pool_ and queryCtx_? Thanks!

@@ -277,6 +277,11 @@ class QueryConfig {
static constexpr const char* kValidateOutputFromOperators =
"debug.validate_output_from_operators";

/// If true, trade performance for memory. Optimizations including VectorPool
/// and Expr::evalWithMemo are disabled.
static constexpr const char* kOptimizeForLowMemory =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we just call it enable_expresssion_eval_cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think VectorPool is not limited to expression evaluation. It's part of ExecCtx inside OperatorCtx, so many operators can have VectorPool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then how about enable_operator_buffer_cache? kOptimizeForLowMemory naming is too broad.

@@ -49,6 +51,8 @@ EvalCtx::EvalCtx(core::ExecCtx* execCtx, ExprSet* exprSet, const RowVector* row)
EvalCtx::EvalCtx(core::ExecCtx* execCtx)
: execCtx_(execCtx), exprSet_(nullptr), row_(nullptr) {
VELOX_CHECK_NOT_NULL(execCtx);

optimizeForLowMemory_ = execCtx->optimizeForLowMemory();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this ctor initializer list and make it const? Thanks!

@@ -73,7 +73,7 @@ class VectorPool {
/// the allocated vector back to vector pool on destruction.
class VectorRecycler {
public:
explicit VectorRecycler(VectorPtr& vector, VectorPool& pool)
explicit VectorRecycler(VectorPtr& vector, VectorPool* pool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop explicit as there is more than one input args

pool_.release(vector_);
if (pool_) {
pool_->release(vector_);
}
}

private:
VectorPtr& vector_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  VectorPool* const pool_;
  VectorPtr& vector_;

velox/expression/Expr.cpp Show resolved Hide resolved
kagamiori added a commit to kagamiori/velox that referenced this pull request Oct 5, 2023
Summary:

Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead.
Expression evaluator also has a code path evalWithMemo that caches the base vector of
dictionary input to avoid unnecessary computation later. These caches are cleared when
Tasks are destroyed. An internal streaming use case, however, observed large memory usage
by these caches when the streaming pipeline takes large nested-complex-typed input vectors,
has a large number of operators, and runs for very long time without Task destruction.

This diff introduces an optimize_for_low_memory query config to trade performance for memory.
When this flag is set to true, optimizations VectorPool and Expr::evalWithMemo are disabled.

Reviewed By: bikramSingh91

Differential Revision: D49922027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49922027

kagamiori added a commit to kagamiori/velox that referenced this pull request Oct 5, 2023
Summary:

Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead.
Expression evaluator also has a code path evalWithMemo that caches the base vector of
dictionary input to avoid unnecessary computation later. These caches are cleared when
Tasks are destroyed. An internal streaming use case, however, observed large memory usage
by these caches when the streaming pipeline takes large nested-complex-typed input vectors,
has a large number of operators, and runs for very long time without Task destruction.

This diff introduces an optimize_for_low_memory query config to trade performance for memory.
When this flag is set to true, optimizations VectorPool and Expr::evalWithMemo are disabled.

Reviewed By: bikramSingh91

Differential Revision: D49922027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49922027

kagamiori added a commit to kagamiori/velox that referenced this pull request Oct 6, 2023
Summary:

Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead.
Expression evaluator also has a code path evalWithMemo that caches the base vector of
dictionary input to avoid unnecessary computation later. These caches are cleared when
Tasks are destroyed. An internal streaming use case, however, observed large memory usage
by these caches when the streaming pipeline takes large nested-complex-typed input vectors,
has a large number of operators, and runs for very long time without Task destruction.

This diff introduces an optimize_for_low_memory query config to trade performance for memory.
When this flag is set to true, optimizations VectorPool and Expr::evalWithMemo are disabled.

Reviewed By: bikramSingh91

Differential Revision: D49922027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49922027

kagamiori added a commit to kagamiori/velox that referenced this pull request Oct 6, 2023
Summary:

Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead.
Expression evaluator also has a code path evalWithMemo that caches the base vector of
dictionary input to avoid unnecessary computation later. These caches are cleared when
Tasks are destroyed. An internal streaming use case, however, observed large memory usage
by these caches when the streaming pipeline takes large nested-complex-typed input vectors,
has a large number of operators, and runs for very long time without Task destruction.

This diff introduces an optimize_for_low_memory query config to trade performance for memory.
When this flag is set to true, optimizations VectorPool and Expr::evalWithMemo are disabled.

Reviewed By: bikramSingh91

Differential Revision: D49922027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49922027

kagamiori added a commit to kagamiori/velox that referenced this pull request Oct 6, 2023
Summary:

Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead.
Expression evaluator also has a code path evalWithMemo that caches the base vector of
dictionary input to avoid unnecessary computation later. These caches are cleared when
Tasks are destroyed. An internal streaming use case, however, observed large memory usage
by these caches when the streaming pipeline takes large nested-complex-typed input vectors,
has a large number of operators, and runs for very long time without Task destruction.

This diff introduces an optimize_for_low_memory query config to trade performance for memory.
When this flag is set to true, optimizations VectorPool and Expr::evalWithMemo are disabled.

Reviewed By: bikramSingh91

Differential Revision: D49922027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49922027

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49922027

@kagamiori kagamiori changed the title Introduce enable_operator_buffer_cache query config Introduce enable_expression_evaluation_cache query config Oct 6, 2023
kagamiori added a commit to kagamiori/velox that referenced this pull request Oct 6, 2023
…cubator#6898)

Summary:

Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead.
Expression evaluator also has a code path evalWithMemo that caches the base vector of
dictionary input to avoid unnecessary computation later. These caches are cleared when
Tasks are destroyed. An internal streaming use case, however, observed large memory usage
by these caches when the streaming pipeline takes large nested-complex-typed input vectors,
has a large number of operators, and runs for very long time without Task destruction.

This diff introduces an enable_expression_evaluation_cache query config. When this flag is set to false,
optimizations including VectorPool, DecodedVector pool, SelectivityVector pool, and Expr::evalWithMemo are disabled.

Reviewed By: xiaoxmeng, bikramSingh91

Differential Revision: D49922027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49922027

kagamiori added a commit to kagamiori/velox that referenced this pull request Oct 6, 2023
…cubator#6898)

Summary:

Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead.
Expression evaluator also has a code path evalWithMemo that caches the base vector of
dictionary input to avoid unnecessary computation later. These caches are cleared when
Tasks are destroyed. An internal streaming use case, however, observed large memory usage
by these caches when the streaming pipeline takes large nested-complex-typed input vectors,
has a large number of operators, and runs for very long time without Task destruction.

This diff introduces an enable_expression_evaluation_cache query config. When this flag is set to false,
optimizations including VectorPool, DecodedVector pool, SelectivityVector pool, and Expr::evalWithMemo are disabled.

Reviewed By: xiaoxmeng, bikramSingh91

Differential Revision: D49922027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49922027

@kagamiori
Copy link
Contributor Author

Updated. Please let me know if you have further suggestions. Thanks! @mbasmanova, @xiaoxmeng.

kagamiori added a commit to kagamiori/velox that referenced this pull request Oct 6, 2023
…cubator#6898)

Summary:

Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead.
Expression evaluator also has a code path evalWithMemo that caches the base vector of
dictionary input to avoid unnecessary computation later. These caches are cleared when
Tasks are destroyed. An internal streaming use case, however, observed large memory usage
by these caches when the streaming pipeline takes large nested-complex-typed input vectors,
has a large number of operators, and runs for very long time without Task destruction.

This diff introduces an enable_expression_evaluation_cache query config. When this flag is set to false,
optimizations including VectorPool, DecodedVector pool, SelectivityVector pool, and Expr::evalWithMemo are disabled.

Reviewed By: xiaoxmeng, bikramSingh91

Differential Revision: D49922027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49922027

kagamiori added a commit to kagamiori/velox that referenced this pull request Oct 6, 2023
…cubator#6898)

Summary:

Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead.
Expression evaluator also has a code path evalWithMemo that caches the base vector of
dictionary input to avoid unnecessary computation later. These caches are cleared when
Tasks are destroyed. An internal streaming use case, however, observed large memory usage
by these caches when the streaming pipeline takes large nested-complex-typed input vectors,
has a large number of operators, and runs for very long time without Task destruction.

This diff introduces an enable_expression_evaluation_cache query config. When this flag is set to false,
optimizations including VectorPool, DecodedVector pool, SelectivityVector pool, and Expr::evalWithMemo are disabled.

Reviewed By: xiaoxmeng, bikramSingh91

Differential Revision: D49922027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49922027

@@ -567,6 +575,10 @@ class QueryConfig {
return get<bool>(kValidateOutputFromOperators, false);
}

bool enableExpressionEvaluationCache() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isExpressionEvaluationCacheEnabled to avoid giving an impression that this method enables the cache

kagamiori added a commit to kagamiori/velox that referenced this pull request Oct 9, 2023
…cubator#6898)

Summary:

Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead.
Expression evaluator also has a code path evalWithMemo that caches the base vector of
dictionary input to avoid unnecessary computation later. These caches are cleared when
Tasks are destroyed. An internal streaming use case, however, observed large memory usage
by these caches when the streaming pipeline takes large nested-complex-typed input vectors,
has a large number of operators, and runs for very long time without Task destruction.

This diff introduces an enable_expression_evaluation_cache query config. When this flag is set to false,
optimizations including VectorPool, DecodedVector pool, SelectivityVector pool, and Expr::evalWithMemo are disabled.

Reviewed By: xiaoxmeng, bikramSingh91

Differential Revision: D49922027
…cubator#6898)

Summary:

Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead.
Expression evaluator also has a code path evalWithMemo that caches the base vector of
dictionary input to avoid unnecessary computation later. These caches are cleared when
Tasks are destroyed. An internal streaming use case, however, observed large memory usage
by these caches when the streaming pipeline takes large nested-complex-typed input vectors,
has a large number of operators, and runs for very long time without Task destruction.

This diff introduces an enable_expression_evaluation_cache query config. When this flag is set to false,
optimizations including VectorPool, DecodedVector pool, SelectivityVector pool, and Expr::evalWithMemo are disabled.

Reviewed By: xiaoxmeng, bikramSingh91

Differential Revision: D49922027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49922027

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49922027

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 36f9621.

@conbench-facebook
Copy link

Conbench analyzed the 1 benchmark run on commit 36f9621f.

There was 1 benchmark result indicating a performance regression:

The full Conbench report has more details.

ericyuliu pushed a commit to ericyuliu/velox that referenced this pull request Oct 12, 2023
…cubator#6898)

Summary:
Pull Request resolved: facebookincubator#6898

Some Velox optimizations cache vectors for possible reuse later to reduce runtime overhead.
Expression evaluator also has a code path evalWithMemo that caches the base vector of
dictionary input to avoid unnecessary computation later. These caches are cleared when
Tasks are destroyed. An internal streaming use case, however, observed large memory usage
by these caches when the streaming pipeline takes large nested-complex-typed input vectors,
has a large number of operators, and runs for very long time without Task destruction.

This diff introduces an enable_expression_evaluation_cache query config. When this flag is set to false,
optimizations including VectorPool, DecodedVector pool, SelectivityVector pool, and Expr::evalWithMemo are disabled.

Reviewed By: xiaoxmeng, bikramSingh91

Differential Revision: D49922027

fbshipit-source-id: dce4bf6f1a896c7b05a504dd60ce9c2480759434
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants