FEAT: Use registry-based default objective scorer in scenarios#1528
Conversation
|
Assuming we're only updating content_harms, jailbreaks, and foundry scenarios to use the default objective scorer from core because they most closely resemble that type of scorer - but all other scenarios have custom handling for the objective scorer that may rely on the user having a "gpt-4o-unsafe" model. I see value in removing duplicate code from the changes in this PR and also making it so that we can use metrics-informed scorers from the registry when possible for these scenarios...but for other scenarios we're not getting that same benefit. Are there follow-up stories we need to make so all scenarios are covered? |
nina-msft
left a comment
There was a problem hiding this comment.
I think this is a net-positive change, we may want to consider follow up stories for other scenarios (based off of my comment below) so that we can offer this advantage across the board for scenarios as it makes sense.
Most of the comments look like nits but all are small improvements - please take if no objections :)
Centralizes the default objective scorer into
Scenario._get_default_objective_scorer(), replacing duplicated hardcoded scorer construction acrossContentHarms,Jailbreak, andRedTeamAgent. The base class method checks theScorerRegistryfor a scorer taggedDEFAULT_OBJECTIVE_SCORER(the best F1 scorer from initialization), falling back toTrueFalseInverterScorer(SelfAskRefusalScorer(OpenAIChatTarget())).Note, we also run scorer metrics from the registry. Because of this, there was a mismatch between the Scenario default scorers and the metrics. Now, these match so metrics are reported.