Add Sonde specification audit case study#45
Merged
Alan-Jowett merged 5 commits intomicrosoft:mainfrom Mar 20, 2026
Merged
Conversation
Real-world case study auditing 5 components (260 requirements) of the Sonde IoT runtime using PromptKit's trifecta audit. Key results: - 60 findings across 5 components using one reusable prompt - Systemic BLE design gap found across modem, node, and gateway - Cross-reference with prior manual audit: 49% of findings were net-new, almost all design traceability gaps (D1/D6) that the manual audit's test-focused lens missed - Manual and automated audits are complementary, not competing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The prior audit used ad-hoc LLM prompts, not manual human review. The comparison is structured PromptKit prompt vs. ad-hoc prompt — same tool (LLM), different prompt engineering. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new case study documenting a real-world PromptKit traceability audit of the Sonde IoT runtime, including quantified results and a manual-vs-PromptKit comparison.
Changes:
- Introduces a full narrative case study describing the project context, PromptKit audit method, and outcomes
- Adds cross-component metrics tables and drift/severity breakdowns
- Includes a comparison of PromptKit findings vs previously filed ad-hoc audit GitHub issues
- Modem findings: 13 consistently (was 12 in cross-ref table) - Total: 60 consistently (was 59 in cross-ref section) - Remove 'no informational' claim that contradicted severity table - Percentages updated to match corrected totals Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds three categories of findings the trifecta audit cannot detect: - Semantic test gaps (6 issues, ~45 gaps): tests exist but don't verify deeply enough - Domain safety (4 issues, ~50+ gaps): BPF safety invariants from a spec outside the trifecta - Cross-component integration (1 issue, 5 gaps): flows spanning multiple components Adds complementarity summary table and updates takeaways. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Issue numbers like #357 were ambiguous — could be read as PromptKit issues. Now qualified as 'Sonde #357' with full GitHub URLs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a case study based on a real-world audit of the Sonde IoT runtime — 5 components, 260 requirements, 60 findings from one reusable prompt.
What makes this case study different
Unlike the existing \�udit-traceability\ case study (hypothetical auth service), this is based on real audit results from a production project, with a manual vs. PromptKit comparison:
Key insight
Manual audit and PromptKit audit found different types of issues:
Systemic finding
BLE pairing design was missing from 3 of 5 component design docs — a pattern invisible to per-component manual review but immediately obvious when the same prompt surfaced it independently in modem (45% gap), node (32% gap), and gateway (30% gap).