Skip to content

FEAT: Add Image functionality to TAP#1036

Open
awksrj wants to merge 14 commits intomicrosoft:mainfrom
awksrj:feature/tap-image-target
Open

FEAT: Add Image functionality to TAP#1036
awksrj wants to merge 14 commits intomicrosoft:mainfrom
awksrj:feature/tap-image-target

Conversation

@awksrj
Copy link
Copy Markdown
Contributor

@awksrj awksrj commented Jul 30, 2025

Description

This PR adds a code cell to tree_of_attacks_with_pruning.ipynb to demonstrate an image target example and modifies tree_of_attacks.py to adapt the Tree of Attacks orchestrator for image targets, particularly adding a dictionary to map content_policy_violation errors to an objective_score of 0.0, which ensure nodes are kept in the completed nodes list until the branch width limit is exceeded to prevent premature pruning.

Related Issue

Closes: #585

Tests and Documentation

No tests included in this commit.

@romanlutz romanlutz self-assigned this Jul 30, 2025
Comment thread doc/code/orchestrators/tree_of_attacks_with_pruning.py Outdated
Comment thread pyrit/executor/attack/multi_turn/tree_of_attacks.py Outdated
Comment thread pyrit/executor/attack/multi_turn/tree_of_attacks.py Outdated
Comment thread doc/code/orchestrators/tree_of_attacks_with_pruning.ipynb Outdated
Comment thread doc/code/orchestrators/tree_of_attacks_with_pruning.ipynb Outdated
@awksrj
Copy link
Copy Markdown
Contributor Author

awksrj commented Aug 1, 2025

Thanks for all the comments. I'll go through them and push changes soon!

@awksrj
Copy link
Copy Markdown
Contributor Author

awksrj commented Aug 6, 2025

I added two unit tests to cover the pruning logic, ensuring blocked responses are scored as 0.0 and pruning only occurs when we exceed tree_width. I also updated the example in tree_of_attacks_with_pruning.py, which used to show how the old TreeOfAttacksWithPruningOrchestrator worked with text targets. I replaced it with the new TAPAttack class to reflect the current implementation, which hopefully makes the documentation more complete.

Copy link
Copy Markdown
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the maintainers should run the notebook as well once it exists. Just to make sure we aren't missing anything

Comment thread pyrit/attacks/multi_turn/tree_of_attacks.py Outdated
Comment thread pyrit/attacks/multi_turn/tree_of_attacks.py Outdated
Comment thread tests/unit/attacks/test_tree_of_attacks.py Outdated
awksrj and others added 8 commits September 11, 2025 15:10
Resolve conflicts: keep both error_score_map (PR) and initial_prompt/prepended_conversation_config (main).
Take main version for doc files (need separate follow-up).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add error_score_map parameter to TreeOfAttacksWithPruningAttack and
_TreeOfAttacksNode that maps response error types to fixed scores.
This prevents premature branch pruning when targets return blocked
or content-filtered responses (e.g., image generation targets).

Key changes:
- Default error_score_map maps 'blocked' -> 0.0 (pass {} to disable)
- Intercepts mapped errors in _score_response_async before calling scorer
- Creates synthetic float_scale Score for mapped errors
- Propagates map through duplicate() and _create_attack_node()
- Copies dict to avoid shared mutable state

Updates from original PR microsoft#1036:
- Adapted to current Message/MessagePiece API (was PromptRequestResponse)
- Fixed Score constructor args (message_piece_id, score_category as list)
- Made default None -> {'blocked': 0.0} per reviewer feedback
- Added comprehensive unit tests for error interception, scoring, and
  map propagation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add input validation: keys must be valid PromptResponseError values,
  scores must be in [0, 1] range. Errors caught at construction time.
- Persist synthetic scores to memory via add_scores_to_memory()
- Fix multi-piece handling: iterate all message_pieces to find the
  error piece, not just the first piece
- Add validation unit tests for invalid key and out-of-range value

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add TAPSystemPromptPaths enum with TEXT_GENERATION and IMAGE_GENERATION
  variants, matching the RTASystemPromptPaths pattern
- Export TAPSystemPromptPaths from pyrit.executor.attack
- Add image generation target example to TAP doc (tap_attack.py/.ipynb)
  demonstrating use of IMAGE_GENERATION system prompt
- Add TAP integration tests for both text and image targets
- Regenerate tap_attack.ipynb from updated .py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FEAT: Add Image functionality to TAP

4 participants