fix(core): make appendcol row ordering deterministic on parallel engines#5474
Open
ahkcs wants to merge 1 commit into
Open
fix(core): make appendcol row ordering deterministic on parallel engines#5474ahkcs wants to merge 1 commit into
ahkcs wants to merge 1 commit into
Conversation
Contributor
PR Reviewer Guide 🔍(Review updated until commit 41ef5f7)Here are some key observations to aid the review process:
|
appendcol lowers to a FULL JOIN of two ROW_NUMBER() OVER () windows (empty PARTITION BY / ORDER BY) on _row_number_main_ = _row_number_subsearch_, with no trailing sort. That positional zip is only correct on a serial, order-preserving executor: a bare ROW_NUMBER() OVER () assigns sequence numbers in input order and the join preserves it. On a parallel/distributed backend the row-number assignment is arbitrary and the hash join drops ordering, so columns get zipped onto the wrong rows and downstream `head` slices a non-deterministic subset. Fix visitAppendCol to not depend on implicit input-order preservation: - derive an explicit window ORDER BY from each child's collation (deriveCollationOrderKeys), so ROW_NUMBER assignment follows the upstream sort; falls back to the prior bare OVER () when the input has no collation (positional correspondence is undefined without a sort). - add a trailing sort by the row-number columns after the join (NULLS LAST, same pattern as streamstats) so output order is deterministic regardless of how the backend executes the join. No behavior change on the serial v2/Calcite engine; makes the lowering correct on parallel backends. Updates CalcitePPLAppendcolTest expected plans/SparkSQL. Signed-off-by: Kai Huang <ahkcs@amazon.com>
094e05e to
41ef5f7
Compare
Contributor
|
Persistent review updated to latest commit 41ef5f7 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
appendcolzips a subsearch's columns onto the main search's rows by position. Its lowering (CalciteRelNodeVisitor.visitAppendCol) implements this as aFULL JOINof twoROW_NUMBER() OVER ()windows (emptyPARTITION BY/ORDER BY) on_row_number_main_ = _row_number_subsearch_, with no trailing sort.That positional zip is only correct on a serial, order-preserving executor: a bare
ROW_NUMBER() OVER ()assigns sequence numbers in input order, and the join preserves it. On a parallel/distributed backend the row-number assignment is arbitrary and the hash join drops ordering, so columns get zipped onto the wrong rows and a downstreamheadslices a non-deterministic subset.This is currently masked on the serial v2/Calcite engine, but it is a latent correctness bug for any parallel backend (the analytics engine, and the Spark pushdown path — the
verifyPPLToSparkSQLgolden output bakes in the same non-deterministicROW_NUMBER() OVER ()).Root cause (observed)
Running the query below through a parallel backend returned rows out of
sortorder, withcntattached to the wrong rows andMrows leaking into the top 10:A baseline
... | sort gender, state | head 10(noappendcol) returned correctly ordered rows on the same backend, isolating the cause to the row-number join.Fix
Make
visitAppendColindependent of implicit input-order preservation:ORDER BYfrom each child's collation (deriveCollationOrderKeys), soROW_NUMBERfollows the upstream sort. Falls back to the prior bareOVER ()when the input carries no collation (positional correspondence is undefined without a sort).NULLS LAST; extra subsearch-only rows sort last), the same patternstreamstatsalready uses, so output order no longer depends on how the backend executes the join.No behavior change on the serial v2/Calcite engine; the lowering becomes correct on parallel backends.
Results
CalcitePPLAppendcolITrun against the analytics-engine route (force-routed, parquet-backed indices) before/after, and on the v2/Calcite path:testAppendColtestAppendColOverrideTesting
CalcitePPLAppendcolTest(5 unit tests) — updated expected logical plans + Spark SQL; all pass.CalcitePPLAppendcolIT— 2/2 on the analytics-engine route and 2/2 on v2/Calcite.NewAddedCommandsIT.testAppendcol— passes.spotlessCheckclean on:coreand:ppl.Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.