Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LEFT JOIN against empty relation produces no rows #5394

Closed
cberner opened this issue Jun 2, 2016 · 14 comments
Closed

LEFT JOIN against empty relation produces no rows #5394

cberner opened this issue Jun 2, 2016 · 14 comments
Assignees
Labels

Comments

@cberner
Copy link
Contributor

cberner commented Jun 2, 2016

This can be reproduced with the following test case:

    @Test
    public void testEmptyLeftJoin()
            throws Exception
    {
        // Use orderkey = rand() to create an empty relation
        assertQuery("SELECT * FROM lineitem a LEFT JOIN (SELECT * FROM orders WHERE orderkey = rand()) b ON a.orderkey > b.orderkey");
        assertQuery("SELECT * FROM lineitem a LEFT JOIN (SELECT * FROM orders WHERE orderkey = rand()) b ON 1 = 1");
    }
@cberner
Copy link
Contributor Author

cberner commented Jun 2, 2016

@martint or @losipiuk could you take a look? I assume this is related to the recent non-equality outer join changes

@losipiuk
Copy link
Contributor

losipiuk commented Jun 2, 2016

I will be able to look into that in 2-3 hiurs.

@losipiuk
Copy link
Contributor

losipiuk commented Jun 2, 2016

Hmm.. for some reason it gets rewritten into INNER join.

@losipiuk
Copy link
Contributor

losipiuk commented Jun 2, 2016

presto:tiny> explain SELECT * FROM (VALUES 1) t1(a) LEFT JOIN (SELECT b FROM (VALUES 1) t2(b) WHERE b=5) on a < b;
                                 Query Plan
----------------------------------------------------------------------------
 - Output[a, b] => [field:integer, field_1:integer]
         a := field
         b := field_1
     - InnerJoin[("field" < "field_1")] => [field:integer, field_1:integer]
         - Values => [field:integer]
                 (1)
         - Filter[("field_1" = 5)] => [field_1:integer]
             - Values => [field_1:integer]
                     (1)

@martint
Copy link
Contributor

martint commented Jun 2, 2016

I believe the bug is in PredicatePushdown.visitJoin

@martint
Copy link
Contributor

martint commented Jun 2, 2016

Hmm... the initial plan (before pushdown may be wrong, too)

@losipiuk
Copy link
Contributor

losipiuk commented Jun 2, 2016

Yeah - I tried to just remove original node in when filterFunction is present from PP.visitJoin
And it works for the queries with VALUES.
But Chris's query still fails. I have to run for 2 hours now. But I will be back on it when I am back (in case it is not fixed yet).
Sorry :/

@losipiuk
Copy link
Contributor

losipiuk commented Jun 2, 2016

SELECT * FROM lineitem a LEFT JOIN (SELECT * FROM orders WHERE orderkey = rand()) b ON 1 = 1 fails. As this one actually does not have filter function. As 1=1 is resolvable using just inner side and pushed down by the JoinPlanner. And then something wrong is happening in PredicatePushdown I believe.

As I said I will back on it ASAP.

@martint
Copy link
Contributor

martint commented Jun 2, 2016

Another observation... in PredicatePushdown.visitJoin, newJoinPredicate does not take into account the non-equi predicate, so that would appear to be why it ends up converting it into an inner join.

@losipiuk
Copy link
Contributor

losipiuk commented Jun 2, 2016

I understand the problem.

The proper solution is to move all the pushdown logic from RelationPlanner to happen in PredicatePushDown optimizer.
Otherwise PPD has no knowledge that FilterNode on inner source of JoinNode was actually coming from join condition. And therefore OUTER join cannot be converted to INNER.

This is not super trivial change. I think I can do it tomorrow Europe time (so it should be ready tomorrow in the morning in CA).

If thing is really urgent I can hack sth. Possible options.
(1) Always leave filter in JoinNode even for conditions which are pushable to inner side and disable PPD for JoinNodes that have filter.present()
(2) Leave pushdown in RelationPlanner but add filterPushedDownToInnerSide flag in JoinNode and make PPD skip optimizations for such node.

I do not like any of those, and would rather craft proper solution at normal pace. But if needed I can do one of those quickly. Preferably (1).

@martint, @cberner what do you say?

@martint
Copy link
Contributor

martint commented Jun 2, 2016

Let's fix it the right way.

@losipiuk
Copy link
Contributor

losipiuk commented Jun 3, 2016

I have just noticed that putting this in PPD has similar problem to what we have now.
Within PPD we can have context. And not convert OUTER join to INNER after all the predicates from JoinNode.filter got pushed down to to inner side.

But we run PPD two times. And second run is lacking the context. In similar way like currently PPD is lacking context after pushdown from RelationPlanner.

It seems that some extra information that push-down happend must be stored in JoinNode. This is not super nice.
Any suggestions?

@losipiuk
Copy link
Contributor

losipiuk commented Jun 3, 2016

@martint, @cberner
As to my understanding #5403 fixes this issue.
See my comments in of the commits.

@martint
Copy link
Contributor

martint commented Jun 13, 2016

Fixed

@martint martint closed this as completed Jun 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants