Skip to content

Add CWE ingestion prompt and skill for per-domain taxonomy generation#229

Merged
Alan-Jowett merged 4 commits intomicrosoft:mainfrom
Alan-Jowett:cwe-ingestion-prompt
Apr 8, 2026
Merged

Add CWE ingestion prompt and skill for per-domain taxonomy generation#229
Alan-Jowett merged 4 commits intomicrosoft:mainfrom
Alan-Jowett:cwe-ingestion-prompt

Conversation

@Alan-Jowett
Copy link
Copy Markdown
Member

@Alan-Jowett Alan-Jowett commented Apr 8, 2026

Summary

Adds a reusable /ingest-cwe-taxonomies slash command (prompt file) and matching CLI skill that implements the CWE taxonomy ingestion pipeline from #228.

Files Added

File Purpose
.github/prompts/ingest-cwe-taxonomies.prompt.md Copilot Chat slash command (/ingest-cwe-taxonomies)
.github/skills/ingest-cwe-taxonomies/SKILL.md Copilot CLI skill (same name)

What the Prompt Does

When invoked, the prompt guides an agent through a 6-phase pipeline:

  1. Acquisition — Downloads/locates official MITRE CWE XML (version 4.x)
  2. Normalization — Writes scripts/ingest-cwe.py to parse XML into structured JSON
  3. Domain Mapping — Maps 900+ CWEs to 13 audit domains using a 4-priority algorithm:
    • Priority 1: CWE View membership (language-specific views like View 658/C, View 919/Mobile)
    • Priority 2: Applicable_Platforms field matching
    • Priority 3: Consequence and context analysis
    • Priority 4: PromptKit manual override rules
  4. Taxonomy Generation — Produces 13 taxonomies/cwe-<domain>.md files in PromptKit format
  5. Integration — Updates manifest.yaml, generates version diffs
  6. Verification — Domain exclusion sanity checks (no XSS in kernel, no IRQL in web, etc.)

Domain Registry (13 domains)

kernel-mode-c-cpp, native-user-mode-c-cpp, managed-dotnet, web-js-ts, web-backend, cloud-service, iac, irmware-embedded, crypto-protocols, data-processing, cli-tools, mobile-app, container-k8s

Components Assembled

  • Persona: security-auditor (CWE/vulnerability expertise)
  • Protocols: nti-hallucination + self-verification + operational-constraints (all verbatim)
  • Format: multi-artifact (adapted for taxonomy + metadata output)
  • Template: Custom 6-phase CWE ingestion workflow

Re-run Workflow

Download new CWE XML → invoke /ingest-cwe-taxonomies → review diff report → commit.

Testing

  • python tests/validate-manifest.py passes ✅
  • Prompt file and skill follow existing conventions (see �oot.prompt.md, promptkit/SKILL.md)

Partially addresses #228 — template integration (domain parameter in security audit templates) is tracked as a follow-up.

Copilot AI review requested due to automatic review settings April 8, 2026 15:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Copilot Chat slash command and matching Copilot CLI skill to run a reproducible, multi-phase workflow for ingesting the official MITRE CWE XML corpus and generating per-domain PromptKit CWE taxonomy artifacts (per Issue #228’s ingestion pipeline concept).

Changes:

  • Added /ingest-cwe-taxonomies prompt file that specifies a 6-phase ingestion → normalization → mapping → taxonomy generation → integration → verification workflow.
  • Added a corresponding Copilot CLI skill that delegates execution to the prompt file.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
.github/skills/ingest-cwe-taxonomies/SKILL.md New CLI skill entrypoint that directs users/agents to run the CWE ingestion prompt.
.github/prompts/ingest-cwe-taxonomies.prompt.md New slash-command prompt implementing the end-to-end CWE ingestion and taxonomy generation pipeline.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Alan Jowett and others added 4 commits April 8, 2026 08:53
Adds a reusable /ingest-cwe-taxonomies slash command (prompt file) and
matching CLI skill that implements the CWE taxonomy ingestion pipeline
from Issue microsoft#228. The prompt guides an agent through:

- Downloading and parsing official MITRE CWE XML
- Mapping 900+ CWEs to 13 audit domains using a 4-priority algorithm
- Generating per-domain PromptKit taxonomy files
- Updating manifest.yaml and running sanity checks
- Producing version diffs for incremental CWE updates

Closes microsoft#228 (partially — template integration is a follow-up)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace hardcoded CWE XML namespace with runtime namespace detection
- Resolve CWE views by name first, use IDs as optional hints only
- Remove Java-to-.NET analogy from core view table (now requires
  explicit override rule if desired)
- Replace underspecified Priority 3 context analysis with deterministic
  keyword-based scoring heuristic with exact term lists and thresholds
- Scope .gitignore pattern from *.xml.zip to cwec_*.xml.zip

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace 'header comment or frontmatter' with provenance metadata
  guidance: top-level _meta object for JSON, comment/frontmatter for
  Markdown/YAML
- Drop Rarely/Undetermined prevalence as secondary/optional entries;
  simply ignore them for domain assignment (deterministic, no schema
  ambiguity)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase 2 now instructs the agent to run the existing script first,
with the full parsing/mapping specification preserved as a reference
for validation and regeneration. Phase 3 reframed as algorithm
documentation rather than imperative instructions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Alan-Jowett Alan-Jowett force-pushed the cwe-ingestion-prompt branch from ad0d505 to 4714930 Compare April 8, 2026 15:57
Copilot AI review requested due to automatic review settings April 8, 2026 15:57
@Alan-Jowett Alan-Jowett merged commit f300bb1 into microsoft:main Apr 8, 2026
3 checks passed
@Alan-Jowett Alan-Jowett deleted the cwe-ingestion-prompt branch April 8, 2026 15:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment on lines +2 to +5
description: 'Ingest CWE XML from MITRE and generate per-domain security audit taxonomies for PromptKit'
agent: 'agent'
tools: ['search/codebase', 'edit']
argument-hint: 'Path to CWE XML file, or "latest" to download cwec_latest.xml.zip from MITRE'
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frontmatter limits tools to ['search/codebase', 'edit'], but the workflow supports latest (download from MITRE) and instructs running python scripts/ingest-cwe.py and extracting a zip. With these tools, an agent typically can’t fetch remote content or execute shell commands, so the latest path (and “run the script” steps) may be non-actionable. Consider either (a) adding the appropriate toolset(s) (e.g., web/fetch) and explicitly describing how to handle zip extraction/execution, or (b) removing/softening the latest automation and requiring the user to provide a local XML path + run commands themselves.

Copilot uses AI. Check for mistakes.
Comment on lines +762 to +769
2. **Update `.gitignore`.** Add an entry to exclude the raw downloaded
CWE XML file (but NOT the `data/cwe/` processed artifacts):

```
# Raw CWE XML downloads (large, available from MITRE)
cwec_*.xml.zip
cwec_*.xml
```
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .gitignore instruction says to “add” cwec_*.xml.zip / cwec_*.xml, but this repo already has a broad *.xml.zip ignore (see .gitignore:8). If a user follows these steps, they’ll likely end up with both patterns and still ignore all *.xml.zip artifacts. Consider updating the instruction to replace any existing *.xml.zip rule with the CWE-scoped patterns to avoid unintentionally ignoring unrelated XML zip files.

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +28
- **CWE source**: A local path to a CWE XML file, or `latest` to download
the current release from `https://cwe.mitre.org/data/xml/cwec_latest.xml.zip`.
- Any overrides to the domain registry or mapping rules the user specifies.

## Output

- Per-domain taxonomy files at `taxonomies/cwe-<domain>.md` (13 domains)
- Normalized CWE data at `data/cwe/<version>/`
- Updated `manifest.yaml` with new taxonomy entries
- Reusable ingestion script at `scripts/ingest-cwe.py`
- Diff report if a previous CWE version exists
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This skill advertises a latest mode that downloads from cwe.mitre.org and outputs multiple repo modifications (new/updated taxonomy files, data/cwe/..., manifest.yaml, and scripts/ingest-cwe.py). It would be helpful to explicitly call out the required permissions/capabilities (network access for download, filesystem write access, and the ability to run Python) so users aren’t surprised when invoking the skill in constrained environments.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants