Skip to content

Lateral pg19#13

Merged
yjhjstz merged 12 commits into
mainfrom
lateral-pg19
Jun 23, 2026
Merged

Lateral pg19#13
yjhjstz merged 12 commits into
mainfrom
lateral-pg19

Conversation

@yjhjstz

@yjhjstz yjhjstz commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

No description provided.

yjhjstz added 10 commits June 19, 2026 14:51
…ed commute

PG-style LATERAL in the FROM clause now goes through ORCA instead of falling
back to the standard planner.

Three changes are needed together to make this work:

1. gpopt/translate/CTranslatorQueryToDXL.cpp
   Remove the blanket "LATERAL unsupported" raise. Outer references from a
   LATERAL RTE's inner Query already resolve via the parent translator's
   CMappingVarColId (initialised in the constructor) — no separate plumbing
   needed for the Var-to-ColRef mapping.

2. libgpopt/src/translate/CTranslatorExprToDXL.cpp (PdxlnNLJoin)
   Generalise the existing DPE-PartitionSelector branch that turns a NL into
   an IndexNLJ when the inner side has outer refs to the outer child's
   output. Any inner subtree whose outer references intersect the outer
   child's output now goes the same route: outer_refs get registered in
   m_phmcrdxlnIndexLookup so the inner scalar translator emits a resolved
   Ident, and the NL emits PARAM_EXEC nest params (under
   EopttraceIndexedNLJOuterRefAsParams, which pg_orca enables by default).
   Also drops the GPOS_ASSERT_IMP that forbade outer refs in non-index NL
   inner children — that condition is now legal.

3. libgpopt/src/xforms/CXformInnerJoinCommutativity.{cpp,h}
   Tighten the xform's promise: refuse to commute an InnerJoin when one
   child's outer references reach into the other child's output columns. A
   plain CLogicalInnerJoin is symmetric in ORCA's algebra, so without this
   guard the join enumerator happily produced the swapped orientation for a
   LATERAL-shaped join. Putting the correlated side on the NL outer is
   unexecutable: the executor opens outer first and the outer-ref columns
   are still unbound.

After this commit the six LATERAL shapes (uncorrelated, equi-correlated,
constant projection, TVF outer-ref arg, LIMIT, LEFT JOIN LATERAL) all
optimise to a correct ORCA plan, though equi-correlated cases still emit
NL+Materialize rather than HashJoin — that decorrelation comes in the
follow-up commit.
LATERAL with a top-level CLogicalSelect on the right side (the typical
"FROM x, LATERAL (SELECT ... WHERE inner = x.col)" pattern) was being
turned into a plain CLogicalInnerJoin, which forced the optimizer onto
the NL+Materialize path. PG's own planner pulls these LATERALs up into
a HashJoin; ORCA can do the same once the algebra is right.

libgpopt/src/translate/CTranslatorDXLToExpr.cpp (PexprLogicalJoin)
  Heuristic gate: if the right child's top operator is a CLogicalSelect
  whose outer references intersect the left child's output, rebuild the
  join as CLogicalInnerApply / CLogicalLeftOuterApply instead of a plain
  Join. CXformInnerApply2InnerJoin / CXformLeftOuterApply2LeftOuterJoin
  then pull the correlated predicate out of the Select and lower the
  result to a Join, which the cost model picks as HashJoin.

  Restricting to the "top is CLogicalSelect" shape keeps the other
  LATERAL forms (TVF outer-ref args, LIMIT/Sort, constant projection)
  on the plain-Join path, where the previous commit's commutativity
  guard handles them via NL + nest params.

libgpopt/include/gpopt/xforms/CXformApply2Join.h (CreateCorrelatedApply)
  Guard against TApply::PdrgPcrInner() == nullptr. Apply objects built
  from LATERAL have no inner scalar colref (LATERAL returns a relation,
  not a scalar), so the scalar-subquery-shaped correlated-apply form
  here doesn't apply — no-op instead of dereferencing the null pointer.

Effect on the six LATERAL cases: equi-correlated inner-LATERAL drops
from NL+Materialize to a clean HashJoin matching PG; LEFT LATERAL drops
to HashRightJoin (still carries a triplicated Hash Cond pending the
LOJ inferred-pred dedup in the next commit). Other shapes unchanged.
PexprInferPredicates extends a join's predicate with extras derived from
constraint propagation (e.g. the commuted form of an equality, transitive
closures, etc). After predicate push-through, MakeJoinWithoutInferredPreds
strips the redundant ones back out via PexprRemoveImpliedConjuncts, keyed
on equivalence classes.

CanRemoveInferredPredicates was hard-coded to InnerJoin only — left over
from when LOJ semantics were considered too tricky to dedupe. The original
note "currently, only inner join is included, but we can add more later"
acknowledged the limitation. LeftOuterJoin's null-preserving side cares
about which qualifying tuples pair up, not about how many copies of an
equivalent equality predicate the matcher evaluates, so the dedup is
semantically safe.

Symptom: LEFT JOIN LATERAL (... WHERE inner = outer.col) produced a
HashRightJoin whose Hash Cond was a 3-way AND of equivalent equalities:

  Hash Cond: ((lt2.a = lt1.x) AND (lt2.a = lt1.x) AND (lt2.a = lt1.x))

Three forms entered the predicate during preprocessing — original a=x,
commuted x=a from constraint inference, and a re-pushed copy — and the
LOJ branch of MakeJoinWithoutInferredPreds was a no-op. After this
commit the predicate is the single (lt2.a = lt1.x).
PexprPruneUnusedComputedColsRecursive walks the expression top-down with
a required-columns set, dropping CScalarProjectElements that nothing
upstream consumes. The required set was built from each operator's own
PcrsLocalUsed and its scalar children's used columns (via
CExpressionHandle::PcrsUsedColumns) — neither of which captures outer
references that one relational child holds against another sibling.

For a LATERAL whose inner references a computed column from a derived
table on the outer side, the chain is:

  LogicalApply / LogicalJoin
    ├── LogicalProject(dv = val * 2)
    │    └── LogicalGet(na)
    └── LogicalSelect(filter: nb.id = dv)
         └── LogicalGet(nb)

The inner Select's DeriveOuterReferences() = {dv}, but the Apply's
PcrsUsedColumns() returns only the columns from the scalar predicate
(true) and PcrsLocalUsed (empty). The pruner descends into the outer
Project with `dv` absent from pcrsReqd → defined - required = {dv} →
the Project gets stripped. The dangling CScalarIdent "dv" then crashes
DXL→PlStmt translation with "Attribute number N not found".

Fix: before recursing into children, fold each relational child's
DeriveOuterReferences() into pcrsReqd. Those refs are columns the child
needs from its siblings, so siblings' producers must be preserved.

Includes outer refs that escape this operator entirely (genuine refs to
the grandparent) — those just stay in pcrsReqd as we descend; they have
no producer at this level and pruning logic only acts on Project /
GbAgg defined columns, so the extra entries are harmless.

Symptom: `SELECT count(*) FROM (SELECT id, val*2 AS dv FROM t) a,
LATERAL (SELECT * FROM s WHERE s.id = a.dv) x` fell back to PG with
"DXL-to-PlStmt Translation: Attribute number 8 not found in project list".
After this commit the query lowers to a clean HashJoin under ORCA.
Adds a dedicated lateral.sql / lateral.out regression test under
test/schedule covering the six base LATERAL shapes (uncorrelated,
equi-correlated, scalar projection, TVF outer-ref arg, LIMIT, LEFT
JOIN LATERAL) plus twelve nested / composite variants:

- 2-level and 3-level chains (each LATERAL references its immediate
  outer; or one LATERAL references both outer and middle)
- LATERAL nested inside another LATERAL
- LATERAL containing an inner JOIN (decorrelates to HashJoin chain)
- LEFT JOIN LATERAL with a nested LATERAL + LIMIT
- LATERAL containing an aggregate
- LATERAL with non-equi range correlation
- 3-level chain ending in a TVF
- LATERAL inside EXISTS
- LEFT LATERAL with an inner filter that excludes everything
- LATERAL referencing a derived-table computed column under an
  aggregate (regression for the sibling-correlated outer-ref pruning
  bug fixed in PexprPruneUnusedComputedColsRecursive)

The expected file pins the actual plan shape (Hash Join / Hash Right
Join / Function Scan / NL with nest params, etc.), so any future
change that regresses the commutativity guard, the selective Apply
conversion, the LOJ inferred-pred dedup, or the preprocessor outer-
ref preservation will diff visibly. Three back-to-back fresh-instance
runs show the plans are deterministic at ~150 ms.
CLogicalInnerApply / CLogicalLeftOuterApply built from LATERAL have no
inner scalar colref (LATERAL returns a relation, not a scalar), so
m_pdrgpcrInner is nullptr. PopCopyWithRemappedColumns blindly called
CUtils::PdrgpcrRemap on it, which dereferences the null array in
Release builds and SIGSEGVs (Debug builds catch it at the assert).

The crash is reached whenever ORCA needs to deep-copy a LATERAL-derived
Apply with column remapping, e.g. when a CLogicalCTEConsumer inlines a
producer whose body contains the Apply:

  WITH t AS (SELECT * FROM a, LATERAL (SELECT * FROM b WHERE ...) s)
  SELECT count(*) FROM t;

Guard against nullptr m_pdrgpcrInner and rebuild the copy with the
1-arg ctor (the 2-arg form asserts pdrgpcrInner is non-null+non-empty).

Found by walking the LATERAL edge-case matrix; lateral.sql now covers
this shape under E2.
Adds an "Edge cases" section to lateral.sql covering shapes that came
out of an LATERAL edge-case sweep:

  E1   varlevelsup=2: LATERAL nested in a correlated scalar subquery
  E2   LATERAL inside a CTE body (locks in the InnerApply nullptr-
       PdrgPcrInner copy-with-remap crash that this commit pairs with)
  E3   VALUES + LATERAL
  E4   LATERAL + GROUP BY at outer
  E5   LATERAL + GROUPING SETS
  E6   LATERAL + window function
  E7   UNION ALL inside the LATERAL body
  E8   LATERAL with DISTINCT outside
  E9   LATERAL top-N per outer (ORDER BY ... LIMIT 1)
  E10  INSERT...SELECT with LATERAL
  E11  3-level LATERAL where the grandchild references the outermost
  E12  LATERAL unnest(array)

All twelve go through ORCA without fallback. Three back-to-back
fresh-instance runs settle at ~211 ms; the plan shapes are stable.

Edge cases that intentionally do NOT land here:
  - CTE inside a LATERAL body -> pre-existing ORCA limitation
    "Operator CTE with outer references not supported"
  - PREPARE/EXECUTE with $params -> pre-existing limitation
    (requires optimizer_enable_query_parameter, not wired in pg_orca)
  - FOR UPDATE on an aggregate query -> SQL-level rejection
Adds test/sql/pg_lateral.sql — a verbatim port of the "Test LATERAL"
block from PostgreSQL upstream's src/test/regress/sql/join.sql (the
chunk introduced by the "-- Test LATERAL" header), plus inline setup
for int2_tbl / int4_tbl / int8_tbl / tenk1 / onerow mirroring upstream's
test_setup.sql and the top of join.sql. tenk1's 10000-row payload is
loaded via COPY from $PG_REGRESS_SQL/data/tenk.data so the test stays
in sync with PG when that file changes.

Locks in roughly 95 queries covering:
  - basic equi-correlated LATERAL with tenk1 / int4_tbl / int8_tbl
  - lateral-versus-parent scope resolution (the int8_tbl q1/q2 case)
  - LATERAL with TVF args, UNION ALL, VALUES, GROUPING SETS-adjacent
    aggregates, JOIN inside LATERAL
  - lateral references requiring pullup at outer-join boundaries
  - PlaceHolderVar nesting and the bug #9041 postponed-quals case
  - dummy/empty inner rels (bug #15694)
  - LATERAL with VALUES tuple containing outer refs to both sides of
    an enclosing LEFT JOIN
  - intentional SQL-level rejections (missing LATERAL keyword,
    RIGHT/FULL JOIN with LATERAL, ambiguous column refs, UPDATE/DELETE
    LATERAL restrictions)

Current ORCA coverage on this set: 15 queries optimised by ORCA (visible
Optimizer: pg_orca marker), 22 queries fall back to PG, all correctness
preserved.  Fallback breakdown:
  - 14 "DXL-to-PlStmt Translation: Attribute number N not found in
    project list" — same class as the sibling-correlated outer-ref bug
    fixed in CExpressionPreprocessor, but in more complex shapes
    involving LeftOuter + LATERAL VALUES referencing both sides of
    the outer join; ORCA's enumerator places the LATERAL on a side that
    can't see one of its referenced columns at execution time
  - 2 "Whole-row variable" (pre-existing ORCA limitation, e.g.
    coalesce(i) on a record type)
  - 2 "no plan has been computed for required properties" (enum gap)
  - 4 intentional PG-side SQL rejections

Three back-to-back fresh-instance runs land at ~370 ms with stable
plans. Future fixes to the still-falling-back patterns will show up as
fewer fallback INFO lines and more Optimizer: pg_orca markers in the
expected output diff.

Statement timeout of 20s guards against ORCA picking a pathologically
bad plan that would otherwise hang the whole regression suite.
CJoinOrderDPv2 enumerates join orders by combining subsets of the
NAryJoin's atoms via dynamic programming. It tracked LOJ right-child
dependencies (an LOJ's right side must be paired with its left), but
did not track the more general LATERAL-style dependency where one
atom holds outer references to another sibling atom's output columns.

Without this check the enumerator happily formed subsets like
{x, lateral_ref_to_y} from a query

  SELECT * FROM
    int8_tbl x LEFT JOIN (SELECT q1, coalesce(q2,0) q2 FROM int8_tbl) y
                       ON x.q2 = y.q1,
    LATERAL (VALUES (x.q1, y.q1, y.q2)) v(xq1, yq1, yq2);

The chosen physical plan placed the LATERAL VALUES inside a Nested
Loop whose outer was just x — but the VALUES references colids
produced by y, which is on a different side of the enclosing
LeftOuter. At execution time those refs are unbound; DXL→PlStmt
catches it as "Attribute number N not found in project list" and
falls back to the PG planner.

Precompute per-atom sibling requirements in the DPv2 constructor:

  outer_refs_i  = atom_i.DeriveOuterReferences()
  sibling_refs  = outer_refs_i − m_outer_refs     // refs to NAryJoin
                                                  // siblings, not refs
                                                  // escaping the join
  sibling_required[i] = { j | sibling_refs ∩ atom_j.DeriveOutputColumns() }

In GetJoinExpr, reject any candidate join whose combined atom set is
missing a required sibling of one of its members. The DP table then
never enumerates the unexecutable subset, and the LATERAL atom can
only enter the join once all its required siblings are already in.

Effect on test/sql/pg_lateral.sql (the PostgreSQL upstream LATERAL
section ported over): fallbacks drop from 22 to 12, ORCA-handled
EXPLAINs go from 15 to 17, and the "DXL-to-PlStmt Translation:
Attribute number N not found" pattern from this query shape
disappears.  Other ORCA tests, the PG --pg-tests suite, and
cost_align.sh are unchanged.

Three back-to-back fresh-instance pg_lateral runs land at ~358 ms
with deterministic plans.
Re-captures test/expected/pg_lateral.out after the DPv2 sibling-
visibility enforcement. Bottom-line change:

  fallback INFO lines: 22 -> 12   (-10)
  "Optimizer: pg_orca" markers: 15 -> 17   (+2)

The "DXL-to-PlStmt Translation: Attribute number N not found in
project list" class of failure that came from the LeftOuter +
LATERAL VALUES atom-subset bug is now gone. Remaining 12 fallbacks
are unrelated patterns (Whole-row variable, "no plan computed for
required properties", and the four intentional PG-side SQL errors
from upstream's join.sql).
@cr-gpt

cr-gpt Bot commented Jun 20, 2026

Copy link
Copy Markdown

Seems you are using me but didn't get OPENAI_API_KEY seted in Variables/Secrets for this repo. you could follow readme for more information

CI ran pg_lateral with $PG_REGRESS_SQL unset, so the
  \set tenkdata `echo "$PG_REGRESS_SQL/data/tenk.data"`
trick expanded to "/data/tenk.data" and the server-side COPY failed:

  ERROR:  could not open file "/data/tenk.data" for reading:
          No such file or directory

Copy PostgreSQL's tenk.data (670 KB, 10 000 rows) into the repo at
test/data/tenk.data and locate it via pg_regress's :abs_srcdir
variable (populated from PG_ABS_SRCDIR with \getenv).  That env var
is set unconditionally by pg_regress to the --inputdir argument, so
the COPY now resolves wherever the test runs.

The file is verbatim from PG REL_18_3 src/test/regress/data/tenk.data;
since the test header already declares the section as a port of PG's
join.sql LATERAL block, shipping the matching data file alongside
keeps the test reproducible without an external postgres source tree.

Verified locally with three back-to-back fresh-instance --orca-tests
runs: pg_lateral now passes in ~350 ms on each.
@cr-gpt

cr-gpt Bot commented Jun 20, 2026

Copy link
Copy Markdown

Seems you are using me but didn't get OPENAI_API_KEY seted in Variables/Secrets for this repo. you could follow readme for more information

…tchlevels

CI's Debug build (apt postgresql-server-dev-18, likely 18.4+) renders
the EXPLAIN VERBOSE output of

  select * from int4_tbl i left join
    lateral (select coalesce(i) from int2_tbl j where i.f1 = j.f1) k
    on true;

as

  Output: i.f1, (i.*)
  ...
  Output: j.f1, i.*

while the older PG18.3 (local dev box) emits the COALESCE wrapper:

  Output: i.f1, (COALESCE(i.*))
  ...
  Output: j.f1, COALESCE(i.*)

This is a PG-side deparser/simplification change: newer PG18 elides
COALESCE on a whole-row reference inside a LATERAL output list when
the planner can prove the row is non-null in that position. The
underlying query goes through ORCA's "Whole-row variable" fallback,
so the output comes from PG's planner+executor; ORCA doesn't see this
difference.

Add a matchsubs rule that strips COALESCE(<ident>.*) wrappers in the
test output so the same expected file works regardless of which PG18
patchlevel CI installs.
@cr-gpt

cr-gpt Bot commented Jun 20, 2026

Copy link
Copy Markdown

Seems you are using me but didn't get OPENAI_API_KEY seted in Variables/Secrets for this repo. you could follow readme for more information

@yjhjstz yjhjstz merged commit f705c04 into main Jun 23, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant