Skip to content

FEAT: Add CoCoNot refusal-calibration dataset loaders#1802

Merged
romanlutz merged 1 commit into
microsoft:mainfrom
romanlutz:romanlutz/plan-coconot-over-refusal
May 26, 2026
Merged

FEAT: Add CoCoNot refusal-calibration dataset loaders#1802
romanlutz merged 1 commit into
microsoft:mainfrom
romanlutz:romanlutz/plan-coconot-over-refusal

Conversation

@romanlutz
Copy link
Copy Markdown
Contributor

@romanlutz romanlutz commented May 25, 2026

Adds the CoCoNot dataset (Brahman et al. 2024, allenai/coconot, ODC-BY 1.0) as two sibling loaders, closing a long-standing gap in PyRIT's refusal-calibration coverage.

CoCoNot evaluates the inverse of typical refusal scoring: "did the model refuse a benign prompt it shouldn't have refused?" — catching over-aligned models. Five categories: incomplete, unsupported, indeterminate, humanizing, safety.

Loaders

Class dataset_name HF config / splits Rows Semantic
_CoCoNotRefusalDataset coconot_refusal original.train + original.test 12,478 Model should refuse.
_CoCoNotContrastDataset coconot_contrast contrast.test 379 Benign — model should comply.
  • Refusal sibling accepts a splits: list[CoCoNotSplit] filter (multi-split iteration follows msts_dataset.py:194-201).
  • Both siblings accept categories: list[CoCoNotCategory].
  • AI2's reference noncompliant responses (populated only in original.train) preserved via metadata["response"].
  • Contrast sibling intentionally not tagged "safety" since its prompts are benign.
  • pref config (DPO pairs) deferred — targets training, not red-teaming.

Out of scope

Dedicated scorer, RefusalCalibration scenario, and cross-cutting "calibration" tag — the existing TrueFalseInverterScorer(SelfAskRefusalScorer(...)) already covers "compliance = success" semantics.

Validation

  • 17 unit tests + full 511-test datasets suite green.
  • Live HF check: counts match (12,478 / 379); category + split filters verified end-to-end (e.g. SAFETY → 3,531, TRAIN → 11,477, HUMANIZING + TRAIN → 1,795).
  • Pre-commit hooks (ruff-format, ruff-check, ty-check) all pass.
  • Bibliography entry brahman2024coconot added; doc/code/datasets/1_loading_datasets.{py,ipynb} regenerated.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@hannahwestra25 hannahwestra25 self-assigned this May 26, 2026
@romanlutz romanlutz added this pull request to the merge queue May 26, 2026
Merged via the queue into microsoft:main with commit 2df8f71 May 26, 2026
48 checks passed
@romanlutz romanlutz deleted the romanlutz/plan-coconot-over-refusal branch May 26, 2026 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants