Skip to content

Add OWASP LLM02 output-side scorer pack (XSS / SQLi / Shell / Path)#1868

Open
ppcvote wants to merge 1 commit into
microsoft:mainfrom
ppcvote:owasp-llm02-scorers
Open

Add OWASP LLM02 output-side scorer pack (XSS / SQLi / Shell / Path)#1868
ppcvote wants to merge 1 commit into
microsoft:mainfrom
ppcvote:owasp-llm02-scorers

Conversation

@ppcvote
Copy link
Copy Markdown

@ppcvote ppcvote commented Jun 1, 2026

Summary

Adds four RegexScorer subclasses covering OWASP LLM02 (Insecure Output Handling) payload families not yet instrumented in PyRIT, completing the LLM02 sub-categories that are tractable via static regex without an LLM call.

Closes #1737. Builds on the RegexScorer base shipped in #1704.

Per-scorer breakdown

Scorer Default patterns Payload family
XSSOutputScorer 6 <script>, onerror=, javascript: URI, data:text/html, iframe srcdoc, SVG-embedded script
SQLInjectionOutputScorer 3 ;DROP TABLE / ;DELETE FROM, UNION SELECT, ';-- comment-bypass
ShellCommandOutputScorer 4 `curl …
PathTraversalOutputScorer 1 (dual-condition) (../){2,} + sensitive target (etc/passwd, etc/shadow, windows\system32, proc/self)

Resolved design questions from #1737

  1. 4 separate subclasses, not 1 parameterized. Each scorer is ~30-60 lines, follows the CredentialLeakScorer pattern exactly, and can be enabled/disabled independently in a scenario. Matches your guidance on the issue.
  2. Subdirectory. A new pyrit/score/true_false/regex/ houses RegexScorer + all five subclasses (CredentialLeak + 4 new). from pyrit.score import … continues to be the supported import path, so no external callers break; one internal import in static_prompt_injection_scorer.py was updated. Matches "perhaps we should have a subdirectory" from your reply on Proposal: OWASP LLM02 output-side scorer pack (XSS / SQLi / Shell / Path) + companion seed dataset #1737.
  3. Dataset. Deferred. The proposal mentioned an optional OWASPLLM02SeedDataset, but the consensus from Proposal: Add Agent Threat Rules (ATR) dataset loader and taxonomy scorer #1702 (HF hosting) means the dataset is a separate, larger contribution worth its own PR. This PR is scorers-only.
  4. Category tag. Kept consistent with CredentialLeakScorer (categories=[\"security\"]). Happy to thread an owasp-llm02-<subtype> tag through later if you want finer scenario filtering — easy follow-up.

Files

  • pyrit/score/true_false/regex/__init__.py — new
  • pyrit/score/true_false/regex/regex_scorer.pymoved from pyrit/score/true_false/
  • pyrit/score/true_false/regex/credential_leak_scorer.pymoved (+1 line import update)
  • pyrit/score/true_false/regex/xss_output_scorer.py — new
  • pyrit/score/true_false/regex/sql_injection_output_scorer.py — new
  • pyrit/score/true_false/regex/shell_command_output_scorer.py — new
  • pyrit/score/true_false/regex/path_traversal_output_scorer.py — new
  • pyrit/score/true_false/static_prompt_injection_scorer.py — internal import path bump only
  • pyrit/score/__init__.py — exports for the 4 new scorers and updated paths for moved files
  • tests/unit/score/regex/ — existing test_regex_scorer.py + test_credential_leak_scorer.py moved in, four new test files added (parametrized positive/negative + rationale + custom-patterns + memory-write assertions, mirroring test_credential_leak_scorer.py)
  • doc/code/scoring/owasp_llm02_scorers.{py,ipynb} — single combined notebook walking through all four scorers, custom-pattern overrides, and a composition note
  • doc/myst.yml — wires the new notebook into the scoring TOC

Pattern provenance

The regex catalog is ported from the MIT-licensed prompt-defense-audit-py package (also authored by @ppcvote). Each pattern was manually verified against both its positive examples and the negative-case set in the new test files (all 4 scorers, 0 failures in the parity run). The README of prompt-defense-audit-py is linked from the original #1737 issue thread.

Out of scope (per #1737)

Testing notes

Each new scorer has its own test_<name>_output_scorer.py with:

  • Parametrized positive cases (one per payload family + a custom-pattern marker test).
  • Parametrized negative cases for benign prose that mentions the same primitives (e.g. "<p> is a paragraph block", "use parameterized queries", "ls -la lists files", "/etc/passwd lists local users").
  • A rationale assertion that the matched pattern name appears in score.score_rationale.
  • A custom_patterns test confirming patterns= overrides the default set entirely.
  • A memory.add_scores_to_memory.assert_called_once() assertion.

False-positive risk was a particular focus for PathTraversalOutputScorer — the default pattern requires both multi-segment ../ and a known-sensitive target, so prose like "the /etc/passwd file lists users" or a single ../README.md does not trigger.

Adds four `RegexScorer` subclasses covering OWASP LLM02 (Insecure Output
Handling) payload families not yet instrumented in PyRIT:

* `XSSOutputScorer` — script tags, inline event handlers, javascript:
  URIs, data:text/html, iframe srcdoc, SVG-embedded scripts (6 patterns).
* `SQLInjectionOutputScorer` — destructive `;DROP`/`;DELETE`,
  `UNION SELECT`, comment-bypass `';--` (3 patterns).
* `ShellCommandOutputScorer` — pipe-to-shell installers, destructive
  filesystem commands, reverse-shell primitives, env-var exfiltration
  (4 patterns).
* `PathTraversalOutputScorer` — multi-segment `../` walk to a
  known-sensitive target (passwd/shadow/system32/proc-self).

These complement the existing `MarkdownInjectionScorer` and the
`CredentialLeakScorer` shipped in microsoft#1704, completing PyRIT's coverage of
LLM02 sub-categories that are tractable via static regex without an LLM
call.

Per discussion on microsoft#1737:
* New `pyrit/score/true_false/regex/` subdirectory groups all six
  `RegexScorer`-family files (RegexScorer base + 5 subclasses); `from
  pyrit.score import ...` continues to be the supported import path so
  no external callers break.
* Existing `regex_scorer.py` and `credential_leak_scorer.py` move into
  the new subdirectory; one internal import in
  `static_prompt_injection_scorer.py` is updated to the new path.
* Existing unit tests move into `tests/unit/score/regex/` alongside
  four new test files (one per new scorer), each parametrized over
  positive and negative cases with rationale, custom-patterns, and
  memory-write assertions matching the `CredentialLeakScorer` test
  style.
* Adds `doc/code/scoring/owasp_llm02_scorers.{py,ipynb}` showing all
  four scorers, custom-pattern overrides, and a note on composition
  with `TrueFalseCompositeScorer` and `BatchScorer`; wired into
  `doc/myst.yml`.

Regex catalog provenance: patterns are ported from the MIT-licensed
`prompt-defense-audit-py` package (also authored by @ppcvote) — the
package README is referenced in the issue thread.

Closes microsoft#1737
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal: OWASP LLM02 output-side scorer pack (XSS / SQLi / Shell / Path) + companion seed dataset

1 participant