FEAT: Adding Master Key Jailbreak#248
Conversation
@microsoft-github-policy-service agree company="Microsoft" |
| await self._prompt_normalizer.send_prompt_async( | ||
| normalizer_request=NormalizerRequest([target_master_prompt_obj]), | ||
| target=self._prompt_target, | ||
| conversation_id=conversation_id, | ||
| labels=self._global_memory_labels, | ||
| orchestrator_identifier=self.get_identifier(), | ||
| ) |
There was a problem hiding this comment.
This will probably work most of the time, but I wonder if we might need some error handling in case things don't go as planned. Have we seen any issues during testing that we might want to handle here?
There was a problem hiding this comment.
I havn't run into any issues with this part of the code yet, but I'll keep an eye out for it.
There was a problem hiding this comment.
Targets handle retries if that's what you mean @dlmgary
| ) | ||
| ) | ||
|
|
||
| batch_results = await asyncio.gather(*tasks) |
There was a problem hiding this comment.
The PromptNormalizer handles all of this and does the asyncio.gather() for you. Is there a a reason why we're not using that?
There was a problem hiding this comment.
It doesn't support multi-turn conversations at this point. It may be possible to extend it by plumbing through the conversation ID but there will be more work on this orchestrator anyway and I don't want to hold it up further.
There was a problem hiding this comment.
The prompt normalizer just sends the prompts directly using the send_prompt_async function, but here I'm using the send_master_key_prompt_async function instead becuase I need it to the master key prompt first, and then follow it up with the attack prompt.
| def _chunked_prompts(self, prompts, size): | ||
| for i in range(0, len(prompts), size): | ||
| yield prompts[i : i + size] |
There was a problem hiding this comment.
- Add type hints.
- Is there a specific reason why we're using generators here? I'm a fan but it seems a bit of an anti-pattern here.
- This function might not be needed at all, consider removing it.
There was a problem hiding this comment.
This is copied from prompt normalizer batching code. I suppose it makes it a tad cleaner than having this indexing logic mixed in with the rest of the code. I don't really care either way tbh.
Description
Opening this after accidently closing #235 due to git errors.
Tests and Documentation