Add Sonde specification audit case study by Alan-Jowett · Pull Request #45 · microsoft/PromptKit

Alan-Jowett · 2026-03-20T15:00:23Z

Summary

Adds a case study based on a real-world audit of the Sonde IoT runtime — 5 components, 260 requirements, 60 findings from one reusable prompt.

What makes this case study different

Unlike the existing \�udit-traceability\ case study (hypothetical auth service), this is based on real audit results from a production project, with a manual vs. PromptKit comparison:

Metric	Value
Components audited	5 (protocol, node, gateway, modem, BLE tool)
Requirements analyzed	260
Total findings	60
Findings already known (manual audit)	17 (29%)
Findings partially known	13 (22%)
Net-new findings	29 (49%)

Key insight

Manual audit and PromptKit audit found different types of issues:

Manual excelled at validation/test gaps (D2, D7)
PromptKit excelled at design traceability gaps (D1, D6)
The two are complementary, not competing

Systemic finding

BLE pairing design was missing from 3 of 5 component design docs — a pattern invisible to per-component manual review but immediately obvious when the same prompt surfaced it independently in modem (45% gap), node (32% gap), and gateway (30% gap).

Real-world case study auditing 5 components (260 requirements) of the Sonde IoT runtime using PromptKit's trifecta audit. Key results: - 60 findings across 5 components using one reusable prompt - Systemic BLE design gap found across modem, node, and gateway - Cross-reference with prior manual audit: 49% of findings were net-new, almost all design traceability gaps (D1/D6) that the manual audit's test-focused lens missed - Manual and automated audits are complementary, not competing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The prior audit used ad-hoc LLM prompts, not manual human review. The comparison is structured PromptKit prompt vs. ad-hoc prompt — same tool (LLM), different prompt engineering. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds a new case study documenting a real-world PromptKit traceability audit of the Sonde IoT runtime, including quantified results and a manual-vs-PromptKit comparison.

Changes:

Introduces a full narrative case study describing the project context, PromptKit audit method, and outcomes
Adds cross-component metrics tables and drift/severity breakdowns
Includes a comparison of PromptKit findings vs previously filed ad-hoc audit GitHub issues

docs/case-studies/sonde-specification-audit.md

- Modem findings: 13 consistently (was 12 in cross-ref table) - Total: 60 consistently (was 59 in cross-ref section) - Remove 'no informational' claim that contradicted severity table - Percentages updated to match corrected totals Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds three categories of findings the trifecta audit cannot detect: - Semantic test gaps (6 issues, ~45 gaps): tests exist but don't verify deeply enough - Domain safety (4 issues, ~50+ gaps): BPF safety invariants from a spec outside the trifecta - Cross-component integration (1 issue, 5 gaps): flows spanning multiple components Adds complementarity summary table and updates takeaways. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

docs/case-studies/sonde-specification-audit.md

Issue numbers like #357 were ambiguous — could be read as PromptKit issues. Now qualified as 'Sonde #357' with full GitHub URLs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 20, 2026 15:00

Copilot AI reviewed Mar 20, 2026

View reviewed changes

Alan Jowett and others added 2 commits March 20, 2026 08:09

Copilot AI review requested due to automatic review settings March 20, 2026 15:12

Copilot started reviewing on behalf of Alan-Jowett March 20, 2026 15:15 View session

Copilot started reviewing on behalf of Alan-Jowett March 20, 2026 15:16 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

docs/case-studies/sonde-specification-audit.md Show resolved Hide resolved

docs/case-studies/sonde-specification-audit.md Outdated Show resolved Hide resolved

Qualify Sonde issue references with repo links

a5e5491

Issue numbers like #357 were ambiguous — could be read as PromptKit issues. Now qualified as 'Sonde #357' with full GitHub URLs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Alan-Jowett merged commit f75fe0d into microsoft:main Mar 20, 2026
1 check passed

Alan-Jowett deleted the add-sonde-case-study branch March 20, 2026 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Sonde specification audit case study#45

Add Sonde specification audit case study#45
Alan-Jowett merged 5 commits intomicrosoft:mainfrom
Alan-Jowett:add-sonde-case-study

Alan-Jowett commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Alan-Jowett commented Mar 20, 2026

Summary

What makes this case study different

Key insight

Systemic finding

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants