microsoft · benhillis · May 13, 2026 · May 13, 2026 · May 13, 2026 · May 13, 2026
@@ -0,0 +1,71 @@
+name: AI issue triage (dry-run)
+
+# Companion to the rule-based wti triage (see new_issue.yml). Runs in parallel
+# on newly-opened issues, asks an LLM to classify component / type, detect
+# missing template fields, and surface possible duplicates, then posts a single
+# collapsible maintainer-facing comment.
+#
+# v1 is dry-run: no labels are applied, no issue state is changed.
+# See triage/ai/README.md for full design and graduation plan.
+
+on:
+  workflow_dispatch:
+    inputs:
+      issue:
+        description: 'Issue number to (re-)triage'
+        required: true
+        type: number
+      force:
+        description: 'Bypass the input-sha skip check'
+        required: false
+        type: boolean
+        default: false
+  # Initial rollout is manual-only via workflow_dispatch so maintainers can
+  # vet output quality on real issues before opening the firehose. Once the
+  # comment style and signal-to-noise are validated, uncomment the block
+  # below to trigger automatically on every newly-opened issue.
+  # issues:
+  #   types: [opened]
+
+permissions:
+  issues: write
+  # `models: read` is the documented permission for GitHub Models inference
+  # from Actions. See https://github.com/actions/ai-inference#usage and
+  # https://docs.github.com/en/github-models.
+  models: read
+  contents: read
+
+concurrency:
+  # Final fallback to github.run_id guards against an empty group key (which
+  # would collapse all runs into one) if both event payload and inputs are missing.
+  group: ai-triage-${{ github.event.issue.number || inputs.issue || github.run_id }}
+  cancel-in-progress: true
+
+jobs:
+  ai-triage:
+    name: Run ai_triage.py
+    runs-on: ubuntu-latest
+    timeout-minutes: 5
+    steps:
+      - name: Checkout repo
+        uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.12'
+
+      - name: Install gh-models extension
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: gh extension install github/gh-models
+
+      - name: Run AI triage
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          PYTHONIOENCODING: utf-8
+          AI_TRIAGE_MODEL: openai/gpt-4o-mini
+          ISSUE_NUMBER: ${{ github.event.issue.number || inputs.issue }}
+          FORCE_FLAG: ${{ inputs.force == true && '--force' || '' }}
+        run: |
+          python triage/ai/ai_triage.py --issue "$ISSUE_NUMBER" $FORCE_FLAG
@@ -0,0 +1,36 @@
+name: AI triage tests
+
+# Unit tests for the AI triage script (triage/ai/ai_triage.py). Pure-function
+# only — no network, no model calls — so this is safe to run on PRs from forks.
+
+on:
+  workflow_dispatch:
+  pull_request:
+    paths:
+      - 'triage/ai/**'
+      - '.github/workflows/ai_triage*.yml'
+
+permissions:
+  contents: read
+
+jobs:
+  pytest:
+    name: pytest
+    runs-on: ubuntu-latest
+    timeout-minutes: 5
+    steps:
+      - name: Checkout repo
+        uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.12'
+
+      - name: Install pytest
+        run: pip install --quiet pytest
+
+      - name: Run unit tests
+        env:
+          PYTHONIOENCODING: utf-8
+        run: python -m pytest triage/ai -v
@@ -0,0 +1,3 @@
+__pycache__/
+*.pyc
+.pytest_cache/
@@ -0,0 +1,159 @@
+# AI issue triage (v1, dry-run)
+
+A complementary triage agent for the **microsoft/WSL** GitHub repository. Reads
+newly-opened issues, asks an LLM via [GitHub Models][gh-models] to classify
+them, and posts a single collapsible maintainer-facing comment with:
+
+* a 1–3 sentence plain-English summary,
+* a suggested issue type (`bug`, `feature`, `question`, …),
+* suggested component labels (e.g. `network`, `msix`, `GPU`),
+* missing bug-template fields (Windows version, repro steps, …),
+* up to ~5 possible duplicate issues.
+
+This is **dry-run only**. The agent never applies labels and never changes
+issue state. It is purely additive to the existing rule-based [`wti`][wti]
+pipeline driven by [`triage/config.yml`](../config.yml).
+
+## Files
+
+| Path | Purpose |
+|---|---|
+| `triage/ai/ai_triage.py` | The Python script. Reads the issue, fetches duplicate candidates, calls `gh models run`, validates the output, upserts the comment. |
+| `triage/ai/prompt.md` | The system+user prompt. The script substitutes `{{ISSUE_NUMBER}}`, `{{ISSUE_TITLE}}`, `{{ISSUE_BODY}}`, `{{CANDIDATES_JSON}}`. |
+| `.github/workflows/ai_triage.yml` | The Actions workflow. Initial rollout is **manual `workflow_dispatch` only**; the `issues.opened` trigger is committed but commented out and can be enabled once the comment quality has been validated on real issues. |
+
+## How to run locally
+
+Prerequisites:
+
+* Python 3.10+ (the script uses `list[str]` style annotations).
+* `gh` CLI authenticated with at least `repo` and `read:user` scopes.
+* The `gh-models` extension: `gh extension install github/gh-models`.
+
+```bash
+# Dry-run: print the rendered comment to stdout, do not post anything.
+python triage/ai/ai_triage.py --issue 40488 --dry-run
+
+# Force a re-run even if the input-sha marker says nothing changed.
+python triage/ai/ai_triage.py --issue 40488 --dry-run --force
+
+# Use a different GitHub Models model.
+python triage/ai/ai_triage.py --issue 40488 --dry-run --model openai/gpt-4.1-mini
+
+# Or via env var (matches the workflow):
+AI_TRIAGE_MODEL=openai/gpt-4.1-mini python triage/ai/ai_triage.py --issue 40488 --dry-run
+```
+
+When run **without** `--dry-run`, the script will upsert a comment on the issue.
+Don't do this against the live repo from a developer machine unless you're
+deliberately testing — the workflow is the intended posting path.
+
+## Skip rules
+
+The agent does not run for issues where any of these is true:
+
+* the issue is closed or locked,
+* the author is a bot (`type == "Bot"` or login matches `*[bot]`),
+* the author's `author_association` is `OWNER`, `MEMBER`, or `COLLABORATOR`
+  (maintainer-authored issues don't need this triage),
+* the body is shorter than 50 characters (likely empty or spam),
+* the issue's input hash already matches the marker on an existing comment
+  (use `--force` to override).
+
+## Idempotency
+
+Each posted comment includes a hidden marker:
+
+```html
+<!-- ai-triage:v1 input-sha=<hex> prompt-sha=<hex> -->
+```
+
+`input-sha` is computed over `(title, body, prompt-version)`. `prompt-sha` is
+computed over the prompt template content. Re-runs that produce the same
+hashes are skipped. After the model call, the script re-fetches the issue and
+recomputes the hash — if it changed during the call, the run is aborted so a
+slow run never overwrites a newer one.
+
+Bumping `PROMPT_VERSION` in `ai_triage.py` (or editing `prompt.md`) invalidates
+existing markers and forces the next run to re-post.
+
+## Untrusted-input hardening
+
+The model is treated as an untrusted text generator:
+
+* JSON output is validated against a strict schema; any deviation aborts
+  silently (no comment posted).
+* `component_labels` are intersected with a hardcoded allowlist **and** the
+  live `gh label list` for the repo.
+* `duplicate_candidate_numbers` are intersected with the candidate set we
+  pre-fetched via `gh search issues` — the model cannot invent issue numbers.
+* The maintainer summary is HTML-escaped and run through a sanitizer that
+  strips Markdown links, raw URLs, code fences, and defangs `@mentions` with
+  a zero-width space.
+* The prompt sent to the model contains only the issue title and body — never
+  any comments. This means the model can never see (and therefore can never
+  summarize) its own prior `<!-- ai-triage:v1 -->` comment, even on re-runs.
+
+The prompt itself includes a hard rule telling the model to ignore
+instructions inside the issue body.
+
+## Failure mode
+
+Two tiers:
+
+* **Silent (exit 0, workflow green):** model errors, JSON-parse failures,
+  schema violations, rate limits, transient `gh` API errors on read paths,
+  staleness aborts. The script logs to stderr; users see nothing.
+* **Loud (exit 1, workflow red):** comment-upsert failures (permission 403,
+  5xx), and any unexpected exception escaping the inline handlers. These
+  indicate a real maintainer-actionable problem (misconfigured permissions,
+  programming bug) and surface as a failed workflow run.
+
+The split is intentional: model flakes and bot-vs-issue races shouldn't page
+anyone, but a permission misconfig that prevents the agent from ever posting
+should fail visibly.
+
+## Cost / abuse posture
+
+* `concurrency: cancel-in-progress` per issue prevents pile-ups on rapid edits.
+* The body is truncated to 8000 characters before prompting.
+* Duplicate retrieval is capped to ~15 candidates.
+* The trigger is `issues.opened` only in v1 (no `edited`, no comment events).
+
+If GitHub Models quota becomes a concern, mitigations to consider:
+
+* tighten the body-length floor,
+* add an author reputation prefilter (e.g. require N prior comments),
+* widen the body truncation cap downward,
+* downgrade to a smaller model.
+
+## Graduation plan (v2 and beyond)
+
+v1 deliberately does **not** apply labels. Before turning that on:
+
+1. Run v1 in dry-run for a sustained period; spot-check a sample.
+2. Compare suggested labels to what maintainers actually applied.
+3. Pick a per-label confidence/calibration threshold.
+4. Auto-apply only the safest labels first (suggested order: component labels
+   that maintainers agree with most often). Type labels and any process labels
+   (`needs-author-feedback`, `duplicate`, …) stay maintainer-only.
+
+Other v2 candidates:
+
+* Trigger on `issues.edited` with throttling.
+* Trigger on first author comment to refresh the summary.
+* Embed-based duplicate retrieval instead of keyword search.
+* Cross-reference the diagnostic findings from `wti` to enrich the summary.
+
+## Relationship to wti
+
+`wti` (rule-based, runs from `new_issue.yml` / `new_issue_comment.yml` /
+`issue_edited.yml`) is the existing pipeline. It excels at parsing attached
+ETL log files against known signatures, applying tags like `init-crash` /
+`network`, and posting canned remediation messages.
+
+This AI agent is **complementary**, not a replacement. It works on the issue
+prose. The two run independently and do not share state.
+
+[gh-models]: https://github.com/github/gh-models
+[wti]: https://github.com/OneBlue/wti