feat: Add BijectionConverter and BijectionAttack (#1903) by sajisanchu1913-source · Pull Request #1942 · microsoft/PyRIT

sajisanchu1913-source · 2026-06-04T22:30:00Z

Summary

Implements the Bijection Attack from arXiv:2410.01294 (Haize Labs) into PyRIT.

The attack works by teaching a target LLM a secret character mapping through
demonstration shots, then sending harmful prompts encoded in that mapping to
bypass safety filters. Responses are decoded using the inverse mapping.

Changes

New Files

pyrit/prompt_converter/bijection_converter.py — generates random letter-to-letter mapping, encodes prompts, decodes responses
pyrit/executor/attack/single_turn/bijection_attack.py — runs full bijection attack with teaching phase
tests/unit/prompt_converter/test_bijection_converter.py — 11 unit tests for converter
tests/unit/executor/test_bijection_attack.py — 5 unit tests for attack
doc/code/executor/attack/bijection_attack.ipynb — usage notebook

Modified Files

pyrit/prompt_converter/__init__.py — registered BijectionConverter
pyrit/executor/attack/single_turn/__init__.py — registered BijectionAttack

How It Works

BijectionConverter generates a random secret mapping (e.g. a→q, b→x...)
BijectionAttack sends teaching messages to target AI to teach the mapping
Harmful prompt is encoded and sent as TASK is '⟪encoded prompt⟫'
Response is decoded using inverse mapping
Decoded response is scored by the judge

Pattern Followed

BijectionConverter follows FlipConverter pattern
BijectionAttack follows FlipAttack pattern

Reference

Haize Labs implementation: https://github.com/haizelabs/bijection-learning
Paper: arXiv:2410.01294
Closes FEAT Bijection #1903

…dup and harm categories

… fix imports and ordering

- _RemoteDatasetLoader._fetch_zip_from_url: - keyword-only args (source, inner_files, cache) - streams download (requests stream=True + iter_content) to avoid double-buffering large archives - md5-keyed disk cache under DB_DATA_PATH / seed-prompt-entries when cache=True; named temp file otherwise (cleaned up after parse) - validates each inner_files extension against FILE_TYPE_HANDLERS; raises ValueError with a member preview if an inner file is missing - parses inner files via FILE_TYPE_HANDLERS and returns parsed dicts, so the open ZipFile never escapes the worker thread - adds the missing import zipfile that broke the previous commit - _MICDataset: - drops unused io / json / requests imports (helper handles them) - delegates download + parse to the helper; only owns the seed construction loop - guards non-string Q values (in addition to NaN moral values) - forwards cache from fetch_dataset_async to the helper - factors authors into AUTHORS class constant - Tests: - test_moral_integrity_corpus_dataset.py: stops mocking requests.get directly; patches _fetch_zip_from_url to return parsed dicts so tests don't depend on the helper's internal shape - adds test_fetch_dataset_non_string_q and test_fetch_dataset_passes_cache_flag - hoists imports into the right groups so ruff I001 stops firing - removes trailing whitespace / extra newlines - test_remote_dataset_loader.py: adds TestFetchZipFromUrl covering happy path, on-disk caching (hits 1 network call across 2 fetches), cache=False does not persist, missing inner file raises ValueError, unsupported extension raises ValueError Verified live against the real MIC.zip: 35,408 unique seeds across all 6 moral foundations in ~2.4s cold / ~1.3s warm. All 559 dataset unit tests pass; ruff clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Use tempfile.NamedTemporaryFile instead of fixed temp_audio.wav to prevent concurrent call collisions - Wrap Azure upload in try/finally to ensure temp file is always deleted even when upload fails - Add regression test to verify cleanup on upload failure Fixes microsoft#1894

- Add BijectionConverter that generates random letter-to-letter mapping - Add BijectionAttack that teaches the mapping to target AI and encodes harmful prompts - Add unit tests for both converter and attack - Add notebook demonstrating usage - Update __init__.py files to register new classes Based on arXiv:2410.01294 (Haize Labs bijection-learning)

sajisanchu1913-source and others added 12 commits May 28, 2026 17:14

FEAT: Add SALT-NLP Moral Integrity Corpus (MIC) dataset loader

ff0843e

FEAT: Add SALT-NLP MIC dataset loader with tests and documentation

83dd517

REFACTOR: Rename to moral_integrity_corpus_dataset, fix async, add de…

abc1e16

…dup and harm categories

fix: address reviewer feedback - fix NaN crash, add liberty category,…

88f89f0

… fix imports and ordering

fix: correct import ordering and trailing newline

fedba1c

fix: add reusable _fetch_zip_from_url helper to base class

cf197d9

Merge branch 'main' into main

039e713

Merge branch 'microsoft:main' into main

010a439

fix: add missing newline at end of file

056e938

sajisanchu1913-source mentioned this pull request Jun 4, 2026

FEAT Bijection #1903

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add BijectionConverter and BijectionAttack (#1903)#1942

feat: Add BijectionConverter and BijectionAttack (#1903)#1942
sajisanchu1913-source wants to merge 12 commits into
microsoft:mainfrom
sajisanchu1913-source:feat/bijection-attack

sajisanchu1913-source commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sajisanchu1913-source commented Jun 4, 2026

Summary

Changes

New Files

Modified Files

How It Works

Pattern Followed

Reference

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants