Add OWASP LLM02 output-side scorer pack (XSS / SQLi / Shell / Path)#1868
Open
ppcvote wants to merge 1 commit into
Open
Add OWASP LLM02 output-side scorer pack (XSS / SQLi / Shell / Path)#1868ppcvote wants to merge 1 commit into
ppcvote wants to merge 1 commit into
Conversation
Adds four `RegexScorer` subclasses covering OWASP LLM02 (Insecure Output Handling) payload families not yet instrumented in PyRIT: * `XSSOutputScorer` — script tags, inline event handlers, javascript: URIs, data:text/html, iframe srcdoc, SVG-embedded scripts (6 patterns). * `SQLInjectionOutputScorer` — destructive `;DROP`/`;DELETE`, `UNION SELECT`, comment-bypass `';--` (3 patterns). * `ShellCommandOutputScorer` — pipe-to-shell installers, destructive filesystem commands, reverse-shell primitives, env-var exfiltration (4 patterns). * `PathTraversalOutputScorer` — multi-segment `../` walk to a known-sensitive target (passwd/shadow/system32/proc-self). These complement the existing `MarkdownInjectionScorer` and the `CredentialLeakScorer` shipped in microsoft#1704, completing PyRIT's coverage of LLM02 sub-categories that are tractable via static regex without an LLM call. Per discussion on microsoft#1737: * New `pyrit/score/true_false/regex/` subdirectory groups all six `RegexScorer`-family files (RegexScorer base + 5 subclasses); `from pyrit.score import ...` continues to be the supported import path so no external callers break. * Existing `regex_scorer.py` and `credential_leak_scorer.py` move into the new subdirectory; one internal import in `static_prompt_injection_scorer.py` is updated to the new path. * Existing unit tests move into `tests/unit/score/regex/` alongside four new test files (one per new scorer), each parametrized over positive and negative cases with rationale, custom-patterns, and memory-write assertions matching the `CredentialLeakScorer` test style. * Adds `doc/code/scoring/owasp_llm02_scorers.{py,ipynb}` showing all four scorers, custom-pattern overrides, and a note on composition with `TrueFalseCompositeScorer` and `BatchScorer`; wired into `doc/myst.yml`. Regex catalog provenance: patterns are ported from the MIT-licensed `prompt-defense-audit-py` package (also authored by @ppcvote) — the package README is referenced in the issue thread. Closes microsoft#1737
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds four
RegexScorersubclasses covering OWASP LLM02 (Insecure Output Handling) payload families not yet instrumented in PyRIT, completing the LLM02 sub-categories that are tractable via static regex without an LLM call.Closes #1737. Builds on the
RegexScorerbase shipped in #1704.Per-scorer breakdown
XSSOutputScorer<script>,onerror=,javascript:URI,data:text/html, iframesrcdoc, SVG-embedded scriptSQLInjectionOutputScorer;DROP TABLE/;DELETE FROM,UNION SELECT,';--comment-bypassShellCommandOutputScorerPathTraversalOutputScorer(../){2,}+ sensitive target (etc/passwd,etc/shadow,windows\system32,proc/self)Resolved design questions from #1737
CredentialLeakScorerpattern exactly, and can be enabled/disabled independently in a scenario. Matches your guidance on the issue.pyrit/score/true_false/regex/houses RegexScorer + all five subclasses (CredentialLeak + 4 new).from pyrit.score import …continues to be the supported import path, so no external callers break; one internal import instatic_prompt_injection_scorer.pywas updated. Matches "perhaps we should have a subdirectory" from your reply on Proposal: OWASP LLM02 output-side scorer pack (XSS / SQLi / Shell / Path) + companion seed dataset #1737.OWASPLLM02SeedDataset, but the consensus from Proposal: Add Agent Threat Rules (ATR) dataset loader and taxonomy scorer #1702 (HF hosting) means the dataset is a separate, larger contribution worth its own PR. This PR is scorers-only.CredentialLeakScorer(categories=[\"security\"]). Happy to thread anowasp-llm02-<subtype>tag through later if you want finer scenario filtering — easy follow-up.Files
pyrit/score/true_false/regex/__init__.py— newpyrit/score/true_false/regex/regex_scorer.py— moved frompyrit/score/true_false/pyrit/score/true_false/regex/credential_leak_scorer.py— moved (+1 line import update)pyrit/score/true_false/regex/xss_output_scorer.py— newpyrit/score/true_false/regex/sql_injection_output_scorer.py— newpyrit/score/true_false/regex/shell_command_output_scorer.py— newpyrit/score/true_false/regex/path_traversal_output_scorer.py— newpyrit/score/true_false/static_prompt_injection_scorer.py— internal import path bump onlypyrit/score/__init__.py— exports for the 4 new scorers and updated paths for moved filestests/unit/score/regex/— existingtest_regex_scorer.py+test_credential_leak_scorer.pymoved in, four new test files added (parametrized positive/negative + rationale + custom-patterns + memory-write assertions, mirroringtest_credential_leak_scorer.py)doc/code/scoring/owasp_llm02_scorers.{py,ipynb}— single combined notebook walking through all four scorers, custom-pattern overrides, and a composition notedoc/myst.yml— wires the new notebook into the scoring TOCPattern provenance
The regex catalog is ported from the MIT-licensed
prompt-defense-audit-pypackage (also authored by @ppcvote). Each pattern was manually verified against both its positive examples and the negative-case set in the new test files (all 4 scorers, 0 failures in the parity run). The README ofprompt-defense-audit-pyis linked from the original #1737 issue thread.Out of scope (per #1737)
MarkdownInjectionScorer.Testing notes
Each new scorer has its own
test_<name>_output_scorer.pywith:<p>is a paragraph block", "use parameterized queries", "ls -lalists files", "/etc/passwdlists local users").score.score_rationale.custom_patternstest confirmingpatterns=overrides the default set entirely.memory.add_scores_to_memory.assert_called_once()assertion.False-positive risk was a particular focus for
PathTraversalOutputScorer— the default pattern requires both multi-segment../and a known-sensitive target, so prose like "the/etc/passwdfile lists users" or a single../README.mddoes not trigger.