Skip to content

fix: add verbatim inclusion rule to bootstrap assembly instructions#138

Merged
Alan-Jowett merged 3 commits intomicrosoft:mainfrom
Alan-Jowett:fix/bootstrap-verbatim-assembly
Mar 30, 2026
Merged

fix: add verbatim inclusion rule to bootstrap assembly instructions#138
Alan-Jowett merged 3 commits intomicrosoft:mainfrom
Alan-Jowett:fix/bootstrap-verbatim-assembly

Conversation

@Alan-Jowett
Copy link
Copy Markdown
Member

Summary

Fixes #137 — bootstrap.md's assembly instructions caused LLMs to summarize and condense protocol content instead of including it verbatim, producing assembled prompts with as little as 6–7% of the source protocol content.

Problem

The Assembly Process section used ambiguous language:

  • ""Load the persona file"" — LLMs interpreted ""load"" as ""read and understand"" rather than ""copy the full text""
  • <protocol 1 content> — interpreted as ""include the substance"" rather than ""transcribe verbatim""
  • ""Contains condensed persona or protocol directives"" — although this only applied to agent-instructions output, the word ""condensed"" bled into the LLM's behavior for all output modes

Observed in eBPF for Windows workflow prompts where:

  • Thread Safety and Security protocols were reduced to phase headings only (6–7% retention)
  • Kernel Correctness was missing all 7 known-safe patterns (false-positive suppression rules)
  • Self-Verification lost its entire completeness gate checklist
  • Adversarial Falsification lost the 4-step disproof methodology

Changes

  1. Assembly steps: ""Load"" → ""Read and include full body text verbatim"" for all component types
  2. New section: ""Verbatim Inclusion Rule"" with explicit anti-summarization guidance and concrete examples
  3. Output template placeholders: <persona content><complete body of the persona file — verbatim, not summarized>
  4. Scoped condensation: ""condensed"" language now explicitly scoped to agent-instructions output mode only, with note that it never applies to raw prompt output

Validation

  • python tests/validate-manifest.py passes ✅
  • No changes to CLI assembly code (assemble.js already handles this correctly via programmatic file concatenation)

The Assembly Process section in bootstrap.md used ambiguous language
("Load the persona file", "<protocol 1 content>") that LLMs
interpreted as permission to summarize and condense protocol content.

This produced assembled prompts with as little as 6-7% of the source
protocol content — some protocols were reduced to phase headings
with zero operational detail, and critical sections like Known-Safe
Patterns (false-positive suppression rules) were completely omitted.

Changes:
- Replace "Load" with "Read and include full body text verbatim"
  for all component types in the assembly steps
- Add explicit Verbatim Inclusion Rule section with anti-summarization
  guidance and concrete examples
- Change output template placeholders from "<persona content>" to
  "<complete body of the persona file — verbatim, not summarized>"
- Scope the "condensed" language to agent-instructions output mode
  only, with explicit note that condensation never applies to raw
  prompt output

Fixes microsoft#137

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 30, 2026 18:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates bootstrap.md to ensure PromptKit’s LLM-driven prompt assembly includes referenced component content verbatim (rather than summarizing/condensing), addressing observed protocol content loss when the bootstrap instructions are followed.

Changes:

  • Reworded assembly steps to explicitly require verbatim inclusion of persona/protocol/taxonomy/format/template bodies.
  • Added a “Verbatim Inclusion Rule” section with explicit anti-summarization guidance.
  • Updated raw prompt output placeholders and clarified that condensation applies only to agent-instructions output mode.

…y, step refs, and align earlier Load wording

- Verbatim Inclusion Rule: enumerate allowed transformations explicitly
  (frontmatter removal, param substitution, whitespace trimming) instead
  of saying 'unmodified' which conflicts with param substitution
- Taxonomy step: add 'If one or more taxonomies are referenced' condition
  and make output template plural with omit-if-empty note
- Fix step reference: 'step 6' -> 'step 5b' for output mode selection
- Align steps 5a and 8 with verbatim wording (was still using 'Load')

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

…cope

- Step 5a: make format component conditional — skip when template
  declares format: null or omits the format field, matching how
  cli/lib/assemble.js behaves
- Verbatim Inclusion Rule: clarify that allowed transforms apply to
  component body text extraction, not overall document structure
  (section headers and separators are assembly structure)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Alan-Jowett Alan-Jowett merged commit 1ec0475 into microsoft:main Mar 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: bootstrap.md assembly instructions cause LLM to summarize protocols instead of including verbatim

2 participants