Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions .github/workflows/ai_triage.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
name: AI issue triage (dry-run)

# Companion to the rule-based wti triage (see new_issue.yml). Runs in parallel
# on newly-opened issues, asks an LLM to classify component / type, detect
# missing template fields, and surface possible duplicates, then posts a single
# collapsible maintainer-facing comment.
#
# v1 is dry-run: no labels are applied, no issue state is changed.
# See triage/ai/README.md for full design and graduation plan.

on:
workflow_dispatch:
inputs:
issue:
description: 'Issue number to (re-)triage'
required: true
type: number
force:
description: 'Bypass the input-sha skip check'
required: false
type: boolean
default: false
# Initial rollout is manual-only via workflow_dispatch so maintainers can
# vet output quality on real issues before opening the firehose. Once the
# comment style and signal-to-noise are validated, uncomment the block
# below to trigger automatically on every newly-opened issue.
# issues:
# types: [opened]

permissions:
issues: write
# `models: read` is the documented permission for GitHub Models inference
# from Actions. See https://github.com/actions/ai-inference#usage and
# https://docs.github.com/en/github-models.
models: read
contents: read

concurrency:
# Final fallback to github.run_id guards against an empty group key (which
# would collapse all runs into one) if both event payload and inputs are missing.
group: ai-triage-${{ github.event.issue.number || inputs.issue || github.run_id }}
cancel-in-progress: true

jobs:
ai-triage:
name: Run ai_triage.py
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- name: Checkout repo
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.12'

- name: Install gh-models extension
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: gh extension install github/gh-models

- name: Run AI triage
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
PYTHONIOENCODING: utf-8
AI_TRIAGE_MODEL: openai/gpt-4o-mini
ISSUE_NUMBER: ${{ github.event.issue.number || inputs.issue }}
FORCE_FLAG: ${{ inputs.force == true && '--force' || '' }}
run: |
python triage/ai/ai_triage.py --issue "$ISSUE_NUMBER" $FORCE_FLAG
36 changes: 36 additions & 0 deletions .github/workflows/ai_triage_tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: AI triage tests

# Unit tests for the AI triage script (triage/ai/ai_triage.py). Pure-function
# only — no network, no model calls — so this is safe to run on PRs from forks.

on:
workflow_dispatch:
pull_request:
paths:
- 'triage/ai/**'
- '.github/workflows/ai_triage*.yml'

permissions:
contents: read

jobs:
pytest:
name: pytest
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- name: Checkout repo
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.12'

- name: Install pytest
run: pip install --quiet pytest

- name: Run unit tests
env:
PYTHONIOENCODING: utf-8
run: python -m pytest triage/ai -v
3 changes: 3 additions & 0 deletions triage/ai/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
__pycache__/
*.pyc
.pytest_cache/
159 changes: 159 additions & 0 deletions triage/ai/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# AI issue triage (v1, dry-run)

A complementary triage agent for the **microsoft/WSL** GitHub repository. Reads
newly-opened issues, asks an LLM via [GitHub Models][gh-models] to classify
them, and posts a single collapsible maintainer-facing comment with:

* a 1–3 sentence plain-English summary,
* a suggested issue type (`bug`, `feature`, `question`, …),
* suggested component labels (e.g. `network`, `msix`, `GPU`),
* missing bug-template fields (Windows version, repro steps, …),
* up to ~5 possible duplicate issues.

This is **dry-run only**. The agent never applies labels and never changes
issue state. It is purely additive to the existing rule-based [`wti`][wti]
pipeline driven by [`triage/config.yml`](../config.yml).

## Files

| Path | Purpose |
|---|---|
| `triage/ai/ai_triage.py` | The Python script. Reads the issue, fetches duplicate candidates, calls `gh models run`, validates the output, upserts the comment. |
| `triage/ai/prompt.md` | The system+user prompt. The script substitutes `{{ISSUE_NUMBER}}`, `{{ISSUE_TITLE}}`, `{{ISSUE_BODY}}`, `{{CANDIDATES_JSON}}`. |
| `.github/workflows/ai_triage.yml` | The Actions workflow. Initial rollout is **manual `workflow_dispatch` only**; the `issues.opened` trigger is committed but commented out and can be enabled once the comment quality has been validated on real issues. |

## How to run locally

Prerequisites:

* Python 3.10+ (the script uses `list[str]` style annotations).
* `gh` CLI authenticated with at least `repo` and `read:user` scopes.
* The `gh-models` extension: `gh extension install github/gh-models`.

```bash
# Dry-run: print the rendered comment to stdout, do not post anything.
python triage/ai/ai_triage.py --issue 40488 --dry-run

# Force a re-run even if the input-sha marker says nothing changed.
python triage/ai/ai_triage.py --issue 40488 --dry-run --force

# Use a different GitHub Models model.
python triage/ai/ai_triage.py --issue 40488 --dry-run --model openai/gpt-4.1-mini

# Or via env var (matches the workflow):
AI_TRIAGE_MODEL=openai/gpt-4.1-mini python triage/ai/ai_triage.py --issue 40488 --dry-run
```

When run **without** `--dry-run`, the script will upsert a comment on the issue.
Don't do this against the live repo from a developer machine unless you're
deliberately testing — the workflow is the intended posting path.

## Skip rules

The agent does not run for issues where any of these is true:

* the issue is closed or locked,
* the author is a bot (`type == "Bot"` or login matches `*[bot]`),
* the author's `author_association` is `OWNER`, `MEMBER`, or `COLLABORATOR`
(maintainer-authored issues don't need this triage),
* the body is shorter than 50 characters (likely empty or spam),
* the issue's input hash already matches the marker on an existing comment
(use `--force` to override).

## Idempotency

Each posted comment includes a hidden marker:

```html
<!-- ai-triage:v1 input-sha=<hex> prompt-sha=<hex> -->
```

`input-sha` is computed over `(title, body, prompt-version)`. `prompt-sha` is
computed over the prompt template content. Re-runs that produce the same
hashes are skipped. After the model call, the script re-fetches the issue and
recomputes the hash — if it changed during the call, the run is aborted so a
slow run never overwrites a newer one.

Bumping `PROMPT_VERSION` in `ai_triage.py` (or editing `prompt.md`) invalidates
existing markers and forces the next run to re-post.

## Untrusted-input hardening

The model is treated as an untrusted text generator:

* JSON output is validated against a strict schema; any deviation aborts
silently (no comment posted).
* `component_labels` are intersected with a hardcoded allowlist **and** the
live `gh label list` for the repo.
* `duplicate_candidate_numbers` are intersected with the candidate set we
pre-fetched via `gh search issues` — the model cannot invent issue numbers.
* The maintainer summary is HTML-escaped and run through a sanitizer that
strips Markdown links, raw URLs, code fences, and defangs `@mentions` with
a zero-width space.
* The prompt sent to the model contains only the issue title and body — never
any comments. This means the model can never see (and therefore can never
summarize) its own prior `<!-- ai-triage:v1 -->` comment, even on re-runs.

The prompt itself includes a hard rule telling the model to ignore
instructions inside the issue body.

## Failure mode

Two tiers:

* **Silent (exit 0, workflow green):** model errors, JSON-parse failures,
schema violations, rate limits, transient `gh` API errors on read paths,
staleness aborts. The script logs to stderr; users see nothing.
* **Loud (exit 1, workflow red):** comment-upsert failures (permission 403,
5xx), and any unexpected exception escaping the inline handlers. These
indicate a real maintainer-actionable problem (misconfigured permissions,
programming bug) and surface as a failed workflow run.

The split is intentional: model flakes and bot-vs-issue races shouldn't page
anyone, but a permission misconfig that prevents the agent from ever posting
should fail visibly.

## Cost / abuse posture

* `concurrency: cancel-in-progress` per issue prevents pile-ups on rapid edits.
* The body is truncated to 8000 characters before prompting.
* Duplicate retrieval is capped to ~15 candidates.
* The trigger is `issues.opened` only in v1 (no `edited`, no comment events).

If GitHub Models quota becomes a concern, mitigations to consider:

* tighten the body-length floor,
* add an author reputation prefilter (e.g. require N prior comments),
* widen the body truncation cap downward,
* downgrade to a smaller model.

## Graduation plan (v2 and beyond)

v1 deliberately does **not** apply labels. Before turning that on:

1. Run v1 in dry-run for a sustained period; spot-check a sample.
2. Compare suggested labels to what maintainers actually applied.
3. Pick a per-label confidence/calibration threshold.
4. Auto-apply only the safest labels first (suggested order: component labels
that maintainers agree with most often). Type labels and any process labels
(`needs-author-feedback`, `duplicate`, …) stay maintainer-only.

Other v2 candidates:

* Trigger on `issues.edited` with throttling.
* Trigger on first author comment to refresh the summary.
* Embed-based duplicate retrieval instead of keyword search.
* Cross-reference the diagnostic findings from `wti` to enrich the summary.

## Relationship to wti

`wti` (rule-based, runs from `new_issue.yml` / `new_issue_comment.yml` /
`issue_edited.yml`) is the existing pipeline. It excels at parsing attached
ETL log files against known signatures, applying tags like `init-crash` /
`network`, and posting canned remediation messages.

This AI agent is **complementary**, not a replacement. It works on the issue
prose. The two run independently and do not share state.

[gh-models]: https://github.com/github/gh-models
[wti]: https://github.com/OneBlue/wti
Loading
Loading