Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Ensure Execution of Shared Scan Writer On Squelch [#149182449]
SharedInputScan (a.k.a. "Shared Scan" in EXPLAIN) is the operator through which Greenplum implements Common Table Expression execution. It executes in two modes: writer (a.k.a. producer) and reader (a.k.a. consumer). Writers will execute the common table expression definition and materialize the output, and readers can read the materialized output (potentially in parallel). Because of the parallel nature of Greenplum execution, slices containing Shared Scans need to synchronize among themselves to ensure that readers don't start until writers are finished writing. Specifically, a slice with readers depending on writers on a different slice will block during `ExecutorRun`, before even pulling the first tuple from the executor tree. Greenplum's Hash Join implementation will skip executing its outer ("probe side") subtree if it detects an empty inner ("hash side"), and declare all motions in the skipped subtree as "stopped" (we call this "squelching"). That means we can potentially squelch a subtree that contains a shared scan writer, leaving cross-slice readers waiting forever. For example, with ORCA enabled, the following query: ```SQL CREATE TABLE foo (a int, b int); CREATE TABLE bar (c int, d int); CREATE TABLE jazz(e int, f int); INSERT INTO bar VALUES (1, 1), (2, 2), (3, 3); INSERT INTO jazz VALUES (2, 2), (3, 3); ANALYZE foo; ANALYZE bar; ANALYZE jazz; SET statement_timeout = '15s'; SELECT * FROM ( WITH cte AS (SELECT * FROM foo) SELECT * FROM (SELECT * FROM cte UNION ALL SELECT * FROM cte) AS X JOIN bar ON b = c ) AS XY JOIN jazz on c = e AND b = f; ``` leads to a plan that will expose this problem: ``` QUERY PLAN ------------------------------------------------------------------------------------------------------------ Gather Motion 3:1 (slice2; segments: 3) (cost=0.00..2155.00 rows=1 width=24) -> Hash Join (cost=0.00..2155.00 rows=1 width=24) Hash Cond: bar.c = jazz.e AND share0_ref2.b = jazz.f AND share0_ref2.b = jazz.e AND bar.c = jazz.f -> Sequence (cost=0.00..1724.00 rows=1 width=16) -> Shared Scan (share slice:id 2:0) (cost=0.00..431.00 rows=1 width=1) -> Materialize (cost=0.00..431.00 rows=1 width=1) -> Table Scan on foo (cost=0.00..431.00 rows=1 width=8) -> Hash Join (cost=0.00..1293.00 rows=1 width=16) Hash Cond: share0_ref2.b = bar.c -> Redistribute Motion 3:3 (slice1; segments: 3) (cost=0.00..862.00 rows=1 width=8) Hash Key: share0_ref2.b -> Append (cost=0.00..862.00 rows=1 width=8) -> Shared Scan (share slice:id 1:0) (cost=0.00..431.00 rows=1 width=8) -> Shared Scan (share slice:id 1:0) (cost=0.00..431.00 rows=1 width=8) -> Hash (cost=431.00..431.00 rows=1 width=8) -> Table Scan on bar (cost=0.00..431.00 rows=1 width=8) -> Hash (cost=431.00..431.00 rows=1 width=8) -> Table Scan on jazz (cost=0.00..431.00 rows=1 width=8) Filter: e = f Optimizer status: PQO version 2.39.1 (20 rows) ``` where processes executing slice1 on the segments that have an empty `jazz` will hang. We fix this by ensuring we execute the Shared Scan writer even if it's in the sub tree that we're squelching. Signed-off-by: Melanie Plageman <mplageman@pivotal.io> Signed-off-by: Sambitesh Dash <sdash@pivotal.io>
- Loading branch information