Skip to content

Batch-mode subtree elapsed sums when computing parent self-time (#215 D1)#268

Merged
erikdarlingdata merged 1 commit intodevfrom
fix/d1-batch-mode-pipelined-elapsed
Apr 24, 2026
Merged

Batch-mode subtree elapsed sums when computing parent self-time (#215 D1)#268
erikdarlingdata merged 1 commit intodevfrom
fix/d1-batch-mode-pipelined-elapsed

Conversation

@erikdarlingdata
Copy link
Copy Markdown
Owner

Summary

Parallelism (Gather Streams) over a batch-mode zone was reporting bogus self-time because `GetEffectiveChildElapsedMs` only subtracted the direct child's elapsed. Batch-mode operators pipeline — each elapsed is standalone, not cumulative — so the correct subtraction sums across the batch pipeline.

Verified on Joe's plan 5GQmlu6m7W

  • Parallelism elapsed = 21,177ms
  • Subtree: Compute Scalar (batch, 9ms) → Hash Match Aggregate (batch, 10,294ms) → Clustered Index Scan (batch, 13,761ms)
  • Old: self = 21,177 − 9 = 21,168ms (~56% of statement — bogus)
  • New: self = max(0, 21,177 − (9 + 10,294 + 13,761)) = 0ms (matches Joe's `21.177 - 0.009 - 10.294 - 13.761`)

Regression check

  • `c1-c5.sqlplan` (serial row mode): all warnings unchanged
  • `20260415_1.sqlplan` (parallel batch): all wait benefits + operator warnings unchanged — only triggers when a row-mode parent sits above a batch-mode subtree with stats

D2 note

Joe also flagged an unhelpful CXPACKET warning on the same plan. That was addressed by v1.7.7's WaitStatsKnowledge content strip; the CXPACKET item now shows raw ms/count only. The deeper concern about CXPACKET double-counting other waits is a future refinement.

Version 1.7.7 → 1.7.8.

🤖 Generated with Claude Code

…f-time (#215 D1)

Joe's plan 5GQmlu6m7W has a Parallelism (Gather Streams) above a batch-mode
zone of three operators: Compute Scalar (9ms) / Hash Match Aggregate
(10,294ms) / Clustered Index Scan (13,761ms). The exchange itself has
elapsed 21,177ms and was being reported with 21,168ms of self-time because
GetEffectiveChildElapsedMs only subtracted the direct child (the 9ms
Compute Scalar).

Batch mode pipelines operators — each operator's elapsed is standalone wall
time for that operator, not cumulative of descendants the way row mode is.
So for a row-mode parent above a batch-mode subtree, the correct "effective
child elapsed" is the sum across the whole batch pipeline, not just the
direct child.

New SumBatchSubtreeElapsedMs walks a contiguous batch-mode zone (stopping at
Parallelism boundaries) and sums ActualElapsedMs. GetEffectiveChildElapsedMs
now routes batch-mode children with actual stats through that helper.

Result on 5GQmlu6m7W: Parallelism self-time goes from 21,168ms (~56% of
statement) to 0ms (clamped — 21.177 - 24.064 is negative, matching Joe's
math). No more bogus "Expensive Operator" warning on the gather-streams.

Regression check:
- c1-c5.sqlplan (serial row mode) — all warnings unchanged
- 20260415_1.sqlplan (parallel batch-mode) — all wait benefits and operator
  warnings unchanged (the batch subtree rule only fires when computing
  self-time of a row-mode parent above a batch child)

D2 (unhelpful CXPACKET warning) was already addressed by v1.7.7's
WaitStatsKnowledge content strip — the CXPACKET item now shows only the
raw wait ms/count, no speculative fix text. Joe's deeper concern about
CXPACKET double-counting other waits is a separate future refinement.

Version bump 1.7.7 -> 1.7.8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@erikdarlingdata erikdarlingdata merged commit 3fd40d1 into dev Apr 24, 2026
2 checks passed
@erikdarlingdata erikdarlingdata deleted the fix/d1-batch-mode-pipelined-elapsed branch April 24, 2026 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant