Skip to content

Release v1.7.0 — wait stats as warnings#245

Merged
erikdarlingdata merged 1 commit intomainfrom
dev
Apr 21, 2026
Merged

Release v1.7.0 — wait stats as warnings#245
erikdarlingdata merged 1 commit intomainfrom
dev

Conversation

@erikdarlingdata
Copy link
Copy Markdown
Owner

Summary

See PR #244 for full details.

Test plan

🤖 Generated with Claude Code

Each wait stat recorded on a statement now emits a first-class PlanWarning
alongside the existing wait-stats card, so waits sort by benefit % in the
warnings list next to operator findings instead of living only in their own
panel.

- WaitStatsKnowledge: per-wait description + how-to-fix content covering
  PAGEIOLATCH_*, PAGELATCH_*, LATCH_*, LCK_M_*, MEMORY_ALLOCATION_EXT,
  RESOURCE_SEMAPHORE*, CX*/HT*/BMPBUILD parallel + batch-mode families,
  SOS_SCHEDULER_YIELD, THREADPOOL, ASYNC_NETWORK_IO, WRITELOG, IO_COMPLETION,
  LOGBUFFER, EXECSYNC, PREEMPTIVE_OS_WRITEFILEGATHER, with prefix fallbacks.
- PAGEIOLATCH_SH/EX/UP/DT warnings include effective latency per wait
  (wait_ms / wait_count) in the message.
- Severity tiers off benefit %: >=50 Critical, >=10 Warning, else Info.
- Flows through existing PlanWarnings pipeline so web viewer, HTML export,
  JSON/text, and Avalonia Plan Warnings expander all pick it up with no
  renderer changes. Wait-stats chart stays as a complementary view.
- Version bump 1.6.0 -> 1.7.0.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@erikdarlingdata erikdarlingdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary

Release PR for v1.7.0 surfacing wait stats as first-class PlanWarnings. Four files, ~370 LOC, mostly the new WaitStatsKnowledge lookup table plus an EmitWaitStatWarnings pass in BenefitScorer. No XAML / Avalonia surface area changes, no MVVM creep, no contrast/brush regressions to flag.

What this does

  • BenefitScorer.Score gets a new pass that, for every WaitStatInfo on a statement, emits a PlanWarning with description + how-to-fix from WaitStatsKnowledge, severity bucketed by the per-wait MaxBenefitPercent from ScoreWaitStats (>=50 Critical / >=10 Warning / else Info).
  • WaitStatsKnowledge is a static lookup with ~50 exact-match entries plus prefix fallbacks (LCK_M_, PAGEIOLATCH_, PAGELATCH_, LATCH_, HT, CX, PREEMPTIVE_) and a generic default. Never returns null.
  • PAGEIOLATCH_* entries set ShowEffectiveLatency = true, so their warning message includes wait_ms / wait_count formatted via FormatLatency.
  • Plumbed into the Web project via a <Compile Include=... Link=...> reference in PlanViewer.Web.csproj (correct pattern matching the rest of the file).

What's good

  • Lookup falls back gracefully and is null-safe.
  • FormatLatency covers s/ms/µs ranges sensibly; division by WaitCount is guarded by wait.WaitCount > 0.
  • Severity bucketing on Nullable<double> via pattern switch is clean — null benefit cleanly defaults to Info.
  • Change is additive — no renderer/UI changes needed because everything flows through the existing PlanWarnings pipeline, which matches the PR description.

What needs attention

  1. Base branch is main. Project convention is to target dev. Confirmed this is intentional because it's a release PR (v1.6.0 → v1.7.0 in PlanViewer.App.csproj), but flagging per the standing review checklist.

  2. PerformanceMonitor sync. This PR changes BenefitScorer (scoring) and adds the WaitStatsKnowledge rule data — both live in the analyzer surface that the PerformanceMonitor Dashboard and PerformanceMonitor Lite repos keep in sync with Studio. The same wait-warning emission + knowledge table needs to land there too, otherwise those products will diverge from Studio on what wait stats produce as warnings. (Also worth deciding whether the severity thresholds — >=50 / >=10 — should be a shared constant rather than duplicated in three repos.)

  3. No tests. New logic in src/PlanViewer.Core/Services/ (the entire WaitStatsKnowledge.Lookup fallback chain, severity bucketing, FormatLatency, message assembly) has no corresponding test in tests/PlanViewer.Core.Tests/. The prefix-fallback ordering and the Exact["MEMORY_ALLOCATION_EXT"] direct access (see inline) are exactly the kind of thing a 30-line table-driven test would have caught any regression on. At minimum: a few [InlineData] cases verifying exact match wins over prefix, LATCH_FOO vs PAGELATCH_FOO vs PAGEIOLATCH_FOO go to the right entry, unknown returns Default, severity buckets pick the right label.

  4. Inline issues (left on the lines):

    • WaitStatsKnowledge.cs:303Exact["MEMORY_ALLOCATION_EXT"] is a hard dictionary read in the fallback path; will throw if the key is ever removed.
    • WaitStatsKnowledge.cs:281 and :288StartsWith("HT") and StartsWith("CX") prefix fallbacks are too broad given the full known wait-type sets are already in Exact.
    • BenefitScorer.cs:43EmitWaitStatWarnings runs even when QueryTimeStats is null, in which case all severities silently collapse to Info. Confirm intended.
    • BenefitScorer.cs:62 — no minimum-benefit threshold; every nonzero wait emits a warning, which can flood the Plan Warnings expander on busy plans.
  5. Release-PR checklist — the unchecked items in the description (web viewer + desktop expander visual checks post-deploy) are exactly the surfaces this change lights up. Worth doing before merging to main rather than after.

Not approving / not requesting changes — comments only.


Generated by Claude Code

};

if (wt.Contains("MEMORY_ALLOCATION"))
return Exact["MEMORY_ALLOCATION_EXT"];
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exact["MEMORY_ALLOCATION_EXT"] is a direct dictionary access — if anyone later removes or renames that key in the Exact initializer above, this throws KeyNotFoundException at runtime instead of falling back to Default. Either TryGetValue it, or just inline a fresh Entry { ... } here so the fallback isn't coupled to the exact-match table.


Generated by Claude Code

HowToFix = "Check sys.dm_os_latch_stats for the specific latch class — these are usually concurrency signals rather than query-tuning problems."
};

if (wt.StartsWith("HT") && wt.Length > 2)
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StartsWith("HT") && Length > 2 is too broad — it will swallow any future/unrelated wait type whose name happens to begin with HT and mislabel it as a batch-mode hash wait with batch-mode-hash how-to-fix advice. The known batch-mode hash family is HTBUILD/HTREPARTITION/HTDELETE/HTMEMO/HTREINIT (all already in Exact), so the prefix fallback adds no value today and risks wrong attribution tomorrow. Consider either dropping this branch entirely (let unknown HT* fall through to Default) or tightening to a known prefix like HTREP / explicit list.


Generated by Claude Code

HowToFix = "Reduce input rows and skew for the batch-mode hash. Confirm the smaller side is on the build, and check that batch mode is the right choice for this plan shape."
};

if (wt.StartsWith("CX"))
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as the HT prefix above: StartsWith("CX") matches anything starting with CX. The full known parallel-exchange set (CXPACKET/CXCONSUMER/CXSYNC_PORT/CXSYNC_CONSUMER) is already in Exact. A bare CX prefix gives a generic "parallel exchange" answer for anything that happens to start with those two letters — same false-positive risk as HT. Either tighten to CXSYNC_ / CXPACKET / CXCONSUMER, or drop the fallback and let unknowns hit Default.


Generated by Claude Code

ScoreWaitStats(stmt);

if (stmt.WaitStats.Count > 0)
EmitWaitStatWarnings(stmt);
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EmitWaitStatWarnings runs even when QueryTimeStats == null (line 39's ScoreWaitStats is skipped in that case). With no WaitBenefits populated, benefitByType is empty and every wait gets severity = Info regardless of how dominant the wait is. That may be intentional (estimated/incomplete plans don't have elapsed to score against) but worth confirming — if it is, a one-line comment here saying "no QueryTimeStats → all severities pinned to Info" would save the next reader a trip through the lookup chain.

Also: the two adjacent if (stmt.WaitStats.Count > 0) blocks can collapse into a single guarded block, slightly cheaper and reads more clearly:

if (stmt.WaitStats.Count > 0)
{
    if (stmt.QueryTimeStats != null) ScoreWaitStats(stmt);
    EmitWaitStatWarnings(stmt);
}

Generated by Claude Code


foreach (var wait in stmt.WaitStats)
{
if (wait.WaitTimeMs <= 0) continue;
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only filter on emitting a warning is WaitTimeMs <= 0. Every nonzero wait — including 1ms SOS_SCHEDULER_YIELD chatter on a 30s query — becomes a first-class PlanWarning in the expander. On busy real-world plans (10–30 distinct wait types) this can flood the Plan Warnings UI with low-signal Info rows and make the genuinely actionable ones harder to find.

Consider a minimum-relevance threshold, e.g. skip emission if wait.WaitTimeMs / elapsedMs < 0.005 and benefitPct < 1, or drop pure Info waits below some absolute floor.


Generated by Claude Code

@erikdarlingdata erikdarlingdata merged commit ef7690c into main Apr 21, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant