New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception when using non-deterministic predicate together with IN #2036

Closed
martint opened this Issue Dec 1, 2014 · 9 comments

Comments

Projects
None yet
5 participants
@martint
Contributor

martint commented Dec 1, 2014

The following query:

SELECT * FROM (VALUES (1)) t(a) WHERE rand() > 0 and a IN (VALUES (1));

fails with this exception:

java.lang.IllegalArgumentException: Only deterministic expressions may be considered for rewrite
    at com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
    at com.facebook.presto.sql.planner.EqualityInference.rewriteExpression(EqualityInference.java:101)
    at com.facebook.presto.sql.planner.optimizations.PredicatePushDown$Rewriter.rewriteSemiJoin(PredicatePushDown.java:677)
    at com.facebook.presto.sql.planner.optimizations.PredicatePushDown$Rewriter.rewriteSemiJoin(PredicatePushDown.java:134)
    at com.facebook.presto.sql.planner.plan.PlanRewriter$RewritingVisitor.visitSemiJoin(PlanRewriter.java:449)
    at com.facebook.presto.sql.planner.plan.PlanRewriter$RewritingVisitor.visitSemiJoin(PlanRewriter.java:65)
    at com.facebook.presto.sql.planner.plan.SemiJoinNode.accept(SemiJoinNode.java:120)
    at com.facebook.presto.sql.planner.plan.PlanRewriter.rewrite(PlanRewriter.java:42)
    at com.facebook.presto.sql.planner.optimizations.PredicatePushDown$Rewriter.rewriteFilter(PredicatePushDown.java:219)
    at com.facebook.presto.sql.planner.optimizations.PredicatePushDown$Rewriter.rewriteFilter(PredicatePushDown.java:134)
    at com.facebook.presto.sql.planner.plan.PlanRewriter$RewritingVisitor.visitFilter(PlanRewriter.java:200)
    at com.facebook.presto.sql.planner.plan.PlanRewriter$RewritingVisitor.visitFilter(PlanRewriter.java:65)
    at com.facebook.presto.sql.planner.plan.FilterNode.accept(FilterNode.java:71)
    at com.facebook.presto.sql.planner.plan.PlanRewriter.rewrite(PlanRewriter.java:42)
    at com.facebook.presto.sql.planner.plan.PlanRewriter$RewritingVisitor.visitProject(PlanRewriter.java:244)
    at com.facebook.presto.sql.planner.plan.PlanRewriter$RewritingVisitor.visitProject(PlanRewriter.java:65)
    at com.facebook.presto.sql.planner.plan.ProjectNode.accept(ProjectNode.java:81)
    at com.facebook.presto.sql.planner.plan.PlanRewriter.defaultRewrite(PlanRewriter.java:50)
    at com.facebook.presto.sql.planner.optimizations.PredicatePushDown$Rewriter.rewriteProject(PredicatePushDown.java:178)
    at com.facebook.presto.sql.planner.optimizations.PredicatePushDown$Rewriter.rewriteProject(PredicatePushDown.java:134)
    at com.facebook.presto.sql.planner.plan.PlanRewriter$RewritingVisitor.visitProject(PlanRewriter.java:238)
    at com.facebook.presto.sql.planner.plan.PlanRewriter$RewritingVisitor.visitProject(PlanRewriter.java:65)
    at com.facebook.presto.sql.planner.plan.ProjectNode.accept(ProjectNode.java:81)
    at com.facebook.presto.sql.planner.plan.PlanRewriter.rewrite(PlanRewriter.java:42)
    at com.facebook.presto.sql.planner.plan.PlanRewriter$RewritingVisitor.visitOutput(PlanRewriter.java:282)
    at com.facebook.presto.sql.planner.plan.PlanRewriter$RewritingVisitor.visitOutput(PlanRewriter.java:65)
    at com.facebook.presto.sql.planner.plan.OutputNode.accept(OutputNode.java:79)
    at com.facebook.presto.sql.planner.plan.PlanRewriter.defaultRewrite(PlanRewriter.java:50)
    at com.facebook.presto.sql.planner.optimizations.PredicatePushDown$Rewriter.rewriteNode(PredicatePushDown.java:166)
    at com.facebook.presto.sql.planner.optimizations.PredicatePushDown$Rewriter.rewriteNode(PredicatePushDown.java:134)
    at com.facebook.presto.sql.planner.plan.PlanNodeRewriter.rewriteOutput(PlanNodeRewriter.java:120)
    at com.facebook.presto.sql.planner.plan.PlanRewriter$RewritingVisitor.visitOutput(PlanRewriter.java:276)
    at com.facebook.presto.sql.planner.plan.PlanRewriter$RewritingVisitor.visitOutput(PlanRewriter.java:65)
    at com.facebook.presto.sql.planner.plan.OutputNode.accept(OutputNode.java:79)
    at com.facebook.presto.sql.planner.plan.PlanRewriter.rewrite(PlanRewriter.java:42)
    at com.facebook.presto.sql.planner.plan.PlanRewriter.rewriteWith(PlanRewriter.java:31)
    at com.facebook.presto.sql.planner.optimizations.PredicatePushDown.optimize(PredicatePushDown.java:131)
    at com.facebook.presto.sql.planner.LogicalPlanner.plan(LogicalPlanner.java:88)
    at com.facebook.presto.sql.analyzer.QueryExplainer.getLogicalPlan(QueryExplainer.java:104)
    at com.facebook.presto.sql.analyzer.QueryExplainer.getPlan(QueryExplainer.java:66)
    at com.facebook.presto.sql.analyzer.StatementAnalyzer.getQueryPlan(StatementAnalyzer.java:510)
    at com.facebook.presto.sql.analyzer.StatementAnalyzer.visitExplain(StatementAnalyzer.java:479)
    at com.facebook.presto.sql.analyzer.StatementAnalyzer.visitExplain(StatementAnalyzer.java:106)
    at com.facebook.presto.sql.tree.Explain.accept(Explain.java:54)
    at com.facebook.presto.sql.tree.AstVisitor.process(AstVisitor.java:24)
    at com.facebook.presto.sql.analyzer.Analyzer.analyze(Analyzer.java:52)
    at com.facebook.presto.execution.SqlQueryExecution.doAnalyzeQuery(SqlQueryExecution.java:206)
    at com.facebook.presto.execution.SqlQueryExecution.analyzeQuery(SqlQueryExecution.java:191)
    at com.facebook.presto.execution.SqlQueryExecution.start(SqlQueryExecution.java:152)
    at com.facebook.presto.execution.SqlQueryManager$QueryStarter$QuerySubmitter$2.run(SqlQueryManager.java:490)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

@martint martint added the bug label Jun 30, 2015

@yuananf

This comment has been minimized.

Show comment
Hide comment
@yuananf

yuananf Aug 17, 2015

Contributor

Taking this one.

Contributor

yuananf commented Aug 17, 2015

Taking this one.

@p-t-u

This comment has been minimized.

Show comment
Hide comment
@p-t-u

p-t-u Aug 19, 2016

Contributor

I am trying to wrap my head around this. As discussed in #3465 there does not seem to be a good reason why join predicates need to be deterministic. Agreed. We use EqualityInference to rewrite the predicate. Why does EqualityInference require the input expression to be deterministic? In other words, what could go wrong if we just removed the assertion?

Contributor

p-t-u commented Aug 19, 2016

I am trying to wrap my head around this. As discussed in #3465 there does not seem to be a good reason why join predicates need to be deterministic. Agreed. We use EqualityInference to rewrite the predicate. Why does EqualityInference require the input expression to be deterministic? In other words, what could go wrong if we just removed the assertion?

@p-t-u

This comment has been minimized.

Show comment
Hide comment
@p-t-u

p-t-u Aug 19, 2016

Contributor

fyi, removing the assertion makes the offending query execute and return correct results. So it boils down to, is there a good reason for that assertion in the first place?

Contributor

p-t-u commented Aug 19, 2016

fyi, removing the assertion makes the offending query execute and return correct results. So it boils down to, is there a good reason for that assertion in the first place?

@erichwang

This comment has been minimized.

Show comment
Hide comment
@erichwang

erichwang Aug 19, 2016

Contributor

removing the assertion is not a good idea since the EqualityInference only works on deterministic expressions right now because it may try to write the expressions one or multiple times as it reconstructs expressions and use them to substitute in subexpressions. it may just happen to work right now with this expression, but it should be easy to construct an example where this will produce the wrong results. In this case, the filter is basically always true, so you will not see any effect. But if you remove the assertion and make the filter rand() > 0.5, you should see some weird results.

join predicates do not need to be deterministic, but moving non-deterministic predicates around need to be carefully examined.

the comment in #3465 is a reference to the fact that we can push non-deterministic predicates down (that diff was banning all pushdown for non-deterministic predicates). however, they cannot be blindly applied.

Contributor

erichwang commented Aug 19, 2016

removing the assertion is not a good idea since the EqualityInference only works on deterministic expressions right now because it may try to write the expressions one or multiple times as it reconstructs expressions and use them to substitute in subexpressions. it may just happen to work right now with this expression, but it should be easy to construct an example where this will produce the wrong results. In this case, the filter is basically always true, so you will not see any effect. But if you remove the assertion and make the filter rand() > 0.5, you should see some weird results.

join predicates do not need to be deterministic, but moving non-deterministic predicates around need to be carefully examined.

the comment in #3465 is a reference to the fact that we can push non-deterministic predicates down (that diff was banning all pushdown for non-deterministic predicates). however, they cannot be blindly applied.

@erichwang

This comment has been minimized.

Show comment
Hide comment
@erichwang

erichwang Aug 19, 2016

Contributor

on that note, the test case could probably have been constructed slightly better

Contributor

erichwang commented Aug 19, 2016

on that note, the test case could probably have been constructed slightly better

@p-t-u

This comment has been minimized.

Show comment
Hide comment
@p-t-u

p-t-u Aug 19, 2016

Contributor

The query works just fine with rand() > 0.5. It produces 0 or 1 rows half the time each.

Contributor

p-t-u commented Aug 19, 2016

The query works just fine with rand() > 0.5. It produces 0 or 1 rows half the time each.

@p-t-u

This comment has been minimized.

Show comment
Hide comment
@p-t-u

p-t-u Aug 20, 2016

Contributor

I think I can see what you are saying. It would help to have a concrete query that features non-deterministic predicates that may not be pushed around freely. I can imagine something with multiple rand() predicates, where the result of the query depends on whether the expressions are folded or not.

In any case, I'm not sure a proper fix for this is a beginner task. I'd need to study the optimizer for a while and probably make non-trivial changes to how we push predicates through joins.

Contributor

p-t-u commented Aug 20, 2016

I think I can see what you are saying. It would help to have a concrete query that features non-deterministic predicates that may not be pushed around freely. I can imagine something with multiple rand() predicates, where the result of the query depends on whether the expressions are folded or not.

In any case, I'm not sure a proper fix for this is a beginner task. I'd need to study the optimizer for a while and probably make non-trivial changes to how we push predicates through joins.

@erichwang

This comment has been minimized.

Show comment
Hide comment
@erichwang

erichwang Aug 22, 2016

Contributor

@PhilippU, this task does involve predicate push down but that should be isolated just to the PredicatePushDown optimizer class. The fix may not be as complicated as you might think. I think the main problem here is that non-deterministic predicates cannot be pushed through joins (since joins can change data cardinality).

Contributor

erichwang commented Aug 22, 2016

@PhilippU, this task does involve predicate push down but that should be isolated just to the PredicatePushDown optimizer class. The fix may not be as complicated as you might think. I think the main problem here is that non-deterministic predicates cannot be pushed through joins (since joins can change data cardinality).

@rschlussel

This comment has been minimized.

Show comment
Hide comment
@rschlussel

rschlussel Apr 19, 2018

Contributor

this was fixed as part of #10070

Contributor

rschlussel commented Apr 19, 2018

this was fixed as part of #10070

@rschlussel rschlussel closed this Apr 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment