Skip to content

FEAT: Adding Master Key Jailbreak#248

Merged
romanlutz merged 7 commits into
microsoft:mainfrom
SafwanA02:master_key_orchestrator
Jun 18, 2024
Merged

FEAT: Adding Master Key Jailbreak#248
romanlutz merged 7 commits into
microsoft:mainfrom
SafwanA02:master_key_orchestrator

Conversation

@SafwanA02

Copy link
Copy Markdown
Contributor

Description

Opening this after accidently closing #235 due to git errors.

  • implemented batching to improve efficiency in "send_master_key_with_prompts_async" function in master_key_orchestrator
  • added "print_conversation" function to master_key_orchestrator
  • addressed comments from previous PR

Tests and Documentation

  • added test to make sure that conversation id's are the same in corresponding prompts

@SafwanA02

Copy link
Copy Markdown
Contributor Author

10. ie

@microsoft-github-policy-service agree company="Microsoft"

@SafwanA02 SafwanA02 changed the title FEAT: Updating Master Key Jailbreak FEAT: Adding Master Key Jailbreak Jun 18, 2024
Comment thread pyrit/orchestrator/master_key_orchestrator.py Outdated
Comment thread pyrit/orchestrator/master_key_orchestrator.py Outdated
Comment thread pyrit/orchestrator/master_key_orchestrator.py Outdated
Comment thread pyrit/orchestrator/master_key_orchestrator.py Outdated
Comment thread pyrit/orchestrator/master_key_orchestrator.py Outdated
Comment thread pyrit/orchestrator/master_key_orchestrator.py Outdated
Comment thread doc/demo/7_master_key.ipynb Outdated
Comment thread tests/orchestrator/test_master_key_orchestrator.py Outdated
Comment on lines +99 to +105
await self._prompt_normalizer.send_prompt_async(
normalizer_request=NormalizerRequest([target_master_prompt_obj]),
target=self._prompt_target,
conversation_id=conversation_id,
labels=self._global_memory_labels,
orchestrator_identifier=self.get_identifier(),
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably work most of the time, but I wonder if we might need some error handling in case things don't go as planned. Have we seen any issues during testing that we might want to handle here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I havn't run into any issues with this part of the code yet, but I'll keep an eye out for it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Targets handle retries if that's what you mean @dlmgary

)
)

batch_results = await asyncio.gather(*tasks)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PromptNormalizer handles all of this and does the asyncio.gather() for you. Is there a a reason why we're not using that?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't support multi-turn conversations at this point. It may be possible to extend it by plumbing through the conversation ID but there will be more work on this orchestrator anyway and I don't want to hold it up further.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prompt normalizer just sends the prompts directly using the send_prompt_async function, but here I'm using the send_master_key_prompt_async function instead becuase I need it to the master key prompt first, and then follow it up with the attack prompt.

Comment on lines +154 to +156
def _chunked_prompts(self, prompts, size):
for i in range(0, len(prompts), size):
yield prompts[i : i + size]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add type hints.
  • Is there a specific reason why we're using generators here? I'm a fan but it seems a bit of an anti-pattern here.
  • This function might not be needed at all, consider removing it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from prompt normalizer batching code. I suppose it makes it a tad cleaner than having this indexing logic mixed in with the rest of the code. I don't really care either way tbh.

@romanlutz romanlutz merged commit 82df0be into microsoft:main Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants