Skip to content

Add Sonde specification audit case study#45

Merged
Alan-Jowett merged 5 commits intomicrosoft:mainfrom
Alan-Jowett:add-sonde-case-study
Mar 20, 2026
Merged

Add Sonde specification audit case study#45
Alan-Jowett merged 5 commits intomicrosoft:mainfrom
Alan-Jowett:add-sonde-case-study

Conversation

@Alan-Jowett
Copy link
Copy Markdown
Member

Summary

Adds a case study based on a real-world audit of the Sonde IoT runtime — 5 components, 260 requirements, 60 findings from one reusable prompt.

What makes this case study different

Unlike the existing \�udit-traceability\ case study (hypothetical auth service), this is based on real audit results from a production project, with a manual vs. PromptKit comparison:

Metric Value
Components audited 5 (protocol, node, gateway, modem, BLE tool)
Requirements analyzed 260
Total findings 60
Findings already known (manual audit) 17 (29%)
Findings partially known 13 (22%)
Net-new findings 29 (49%)

Key insight

Manual audit and PromptKit audit found different types of issues:

  • Manual excelled at validation/test gaps (D2, D7)
  • PromptKit excelled at design traceability gaps (D1, D6)
  • The two are complementary, not competing

Systemic finding

BLE pairing design was missing from 3 of 5 component design docs — a pattern invisible to per-component manual review but immediately obvious when the same prompt surfaced it independently in modem (45% gap), node (32% gap), and gateway (30% gap).

Real-world case study auditing 5 components (260 requirements) of the
Sonde IoT runtime using PromptKit's trifecta audit. Key results:

- 60 findings across 5 components using one reusable prompt
- Systemic BLE design gap found across modem, node, and gateway
- Cross-reference with prior manual audit: 49% of findings were
  net-new, almost all design traceability gaps (D1/D6) that the
  manual audit's test-focused lens missed
- Manual and automated audits are complementary, not competing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 20, 2026 15:00
The prior audit used ad-hoc LLM prompts, not manual human review.
The comparison is structured PromptKit prompt vs. ad-hoc prompt —
same tool (LLM), different prompt engineering.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new case study documenting a real-world PromptKit traceability audit of the Sonde IoT runtime, including quantified results and a manual-vs-PromptKit comparison.

Changes:

  • Introduces a full narrative case study describing the project context, PromptKit audit method, and outcomes
  • Adds cross-component metrics tables and drift/severity breakdowns
  • Includes a comparison of PromptKit findings vs previously filed ad-hoc audit GitHub issues

Alan Jowett and others added 2 commits March 20, 2026 08:09
- Modem findings: 13 consistently (was 12 in cross-ref table)
- Total: 60 consistently (was 59 in cross-ref section)
- Remove 'no informational' claim that contradicted severity table
- Percentages updated to match corrected totals

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds three categories of findings the trifecta audit cannot detect:
- Semantic test gaps (6 issues, ~45 gaps): tests exist but don't
  verify deeply enough
- Domain safety (4 issues, ~50+ gaps): BPF safety invariants from
  a spec outside the trifecta
- Cross-component integration (1 issue, 5 gaps): flows spanning
  multiple components

Adds complementarity summary table and updates takeaways.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Issue numbers like #357 were ambiguous — could be read as PromptKit
issues. Now qualified as 'Sonde #357' with full GitHub URLs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Alan-Jowett Alan-Jowett merged commit f75fe0d into microsoft:main Mar 20, 2026
1 check passed
@Alan-Jowett Alan-Jowett deleted the add-sonde-case-study branch March 20, 2026 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants