Add investigate-trace template and improve root cause analysis#251
Merged
Alan-Jowett merged 4 commits intomicrosoft:mainfrom Apr 21, 2026
Merged
Conversation
Add a purpose-built investigate-trace template for ETW/telemetry/profiling trace analysis, based on real-world feedback from an LLM executing an assembled investigate-bug prompt for Windows power trace analysis. New template (investigate-trace): - Call stack analysis as primary technique, not afterthought - Energy-vs-metric divergence detection (CPU% vs energy%) - Cross-process amplification cascade analysis - Tool-agnostic analysis steps (not WPA-specific) - Iterative deepening workflow: broad survey → module → stack → cross-process Protocol improvements (root-cause-analysis): - Phase 3a: Iterative Deepening — investigation proceeds in layers of increasing resolution; do not write report until deep analysis complete - Phase 4a: Cross-Component Causal Chains — trace trigger-response pairs, map amplification cascades, quantify amplification factors, identify leverage points Guardrail improvements: - anti-hallucination: scoped labeling relaxation for direct observations from authoritative tool output; causal claims retain full labeling - operational-constraints: data-driven scoping rules for trace/telemetry analysis (data categories and time ranges, not file counts) Format and template improvements: - investigation-report: recognize template-level full-format overrides - investigate-bug: explicit full-format override for root cause tasks - bootstrap: taxonomy relevance evaluation during assembly Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new trace/telemetry-focused investigation template and updates shared investigation protocols/formats so root-cause workflows better support iterative deepening and cross-component causal chains.
Changes:
- Added
investigate-tracetemplate tailored for ETW/ETL/telemetry investigations (stack-first analysis, energy/metric divergence, amplification cascades). - Enhanced
root-cause-analysiswith iterative deepening (Phase 3a) and cross-component causal chains (Phase 4a). - Updated guardrails, bootstrap guidance, and investigation-report format to better fit data-driven/trace investigations and full-format overrides.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| templates/investigate-trace.md | New template for systematic trace/telemetry investigations using investigation-report output. |
| templates/investigate-bug.md | Forces full investigation-report format for bug investigations. |
| protocols/reasoning/root-cause-analysis.md | Adds iterative deepening and cross-component causal-chain phases. |
| protocols/guardrails/operational-constraints.md | Adds scoping/retrieval constraints for traces/logs and structured query results. |
| protocols/guardrails/anti-hallucination.md | Relaxes explicit [KNOWN] labeling for authoritative tool/telemetry measurements while keeping inference labeling. |
| formats/investigation-report.md | Clarifies that some investigation templates always require full format. |
| bootstrap.md | Adds guidance to evaluate relevance of template-declared taxonomies before including them. |
| manifest.yaml | Registers the new investigate-trace template. |
- Align ASSUMED marker with anti-hallucination protocol ([ASSUMPTION]) - Soften 'at least top 5' to 'up to 5' with data-limitation escape hatch in both investigate-trace template and root-cause-analysis protocol - Add investigate-trace to root-cause-analysis applicable_to list - Remove root-cause-ci-failure from full-format example list (not applicable) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add low-CPU edge case handling for energy-to-CPU ratio threshold - Qualify call stack requirement as 'when available' in quality checklist - Align [UNKNOWN] marker with protocol's [UNKNOWN: <what is missing>] - Fix remaining INFERRED/ASSUMED instance to use [ASSUMPTION] marker Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove hardcoded '8 sections' from investigate-trace and investigate-bug templates — the investigation-report format defines 9 sections (§1–§9). Avoids drift by not embedding a count that the format owns. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a purpose-built \investigate-trace\ template for ETW/telemetry/profiling trace analysis and improves the root cause analysis protocol with iterative deepening and cross-component causal chain analysis. Based on real-world feedback from an LLM that executed an assembled \investigate-bug\ prompt for a Windows power/ETL trace investigation.
Motivation
The \investigate-bug\ template is code-centric — it asks for file:line evidence, code-level fixes, and tests that would have caught the bug. When used for trace/telemetry analysis (e.g., analyzing an ETW capture for power usage), the executing LLM identified several gaps:
Changes
New template
Protocol improvements
oot-cause-analysis* — Added Phase 3a (Iterative Deepening: broad survey → attribution → deep analysis → cross-component tracing) and Phase 4a (Cross-Component Causal Chains: trigger-response pairs, amplification cascades, leverage point identification). These are domain-agnostic improvements that benefit all investigation tasks.
Guardrail improvements
Format and template improvements
Validation
\
$ python tests/validate-manifest.py
OK: manifest.yaml protocols match all template frontmatter.
\\
Design Decisions