You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a bug introduced by #12013. The result would be wrong if the following situation happens:
The query uses COALESCE(joinKey) on top of FULL OUTER JOIN with equi-join.
The children of the FullJoin node uses a different hash function to compute the partition from the join keys. For example, hash is computed on (a, constant) and join key is just a.
There is another JOIN with the result of FULL OUTER JOIN using equi-join on only the coalesced keys of the FULL OUTER JOIN.
In such situation, the newly introduced optimization would assume that the result of the FULL OUTER JOIN is already partitioned on COALESCE(a) thus there's no need for another shuffle before the next join. However, because the hash function is calculated on (a, constant), even if the data is "partitioned on a" it would be on a different node as a hash function computed with just a. Thus a shuffle would still be needed to produce correct result.
The text was updated successfully, but these errors were encountered:
Theoretically it would also produce incorrect results if the input of the FULL OUTER JOIN is partitioned on more columns than just the join keys, for example, input is partitioned on (a, b) but only join on a. Though practically I don't know what query shape would make that happen.
Here's an example that can produce the wrong plan. It's harder to come up with a query that can produce meaningful & wrong results though:
SELECT*FROM customer t3
JOIN (
SELECT coalesce(c1, c2) c
FROM (
SELECT custkey c1, name FROM customer WHERE name ='a') t1
FULL OUTER JOIN (
SELECT custkey c2, name FROM customer WHERE name ='b'GROUP BY1, 2) t2
ONt1.c1=t2.c2) t
ONt3.custkey=t.c;
This is a bug introduced by #12013. The result would be wrong if the following situation happens:
COALESCE(joinKey)
on top ofFULL OUTER JOIN
with equi-join.(a, constant)
and join key is justa
.JOIN
with the result ofFULL OUTER JOIN
using equi-join on only the coalesced keys of theFULL OUTER JOIN
.In such situation, the newly introduced optimization would assume that the result of the
FULL OUTER JOIN
is already partitioned onCOALESCE(a)
thus there's no need for another shuffle before the next join. However, because the hash function is calculated on(a, constant)
, even if the data is "partitioned ona
" it would be on a different node as a hash function computed with justa
. Thus a shuffle would still be needed to produce correct result.The text was updated successfully, but these errors were encountered: