Attack goal function details #75

ary4n99 · 2024-06-12T22:40:14Z

Hey, what's the reason behind using the difference in scores for the attack goal function instead of the PDR?

promptbench/promptbench/prompt_attack/goal_function.py

Lines 276 to 279 in 3daaf2d

    
           def _get_score(self, model_output, _): 
        
               score = self.ground_truth_output - model_output 
        
               return score

Also, why has maximizable=True not been passed to the goal function?

promptbench/promptbench/prompt_attack/goal_function.py

Lines 242 to 248 in 3daaf2d

    
           class AdvPromptGoalFunction(GoalFunction): 
        
               """A goal function defined on a model that outputs a probability for some 
        
               number of classes.""" 
        
               def __init__(self, model, dataset, eval_func, query_budget, *args, target_max_acc=0, verbose=True, **kwargs): 
        
                   super().__init__(*args, **kwargs)

The text was updated successfully, but these errors were encountered:

Immortalise · 2024-06-13T03:04:48Z

Using PDR is also beneficial, I think it has a similar impact as the score differences.
In the adversarial prompts goal function, the maximizable argument is redundant.

ary4n99 · 2024-06-16T09:35:37Z

Thank you for clarifying!

ary4n99 closed this as completed Jun 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attack goal function details #75

Attack goal function details #75

ary4n99 commented Jun 12, 2024

Immortalise commented Jun 13, 2024

ary4n99 commented Jun 16, 2024

Attack goal function details #75

Attack goal function details #75

Comments

ary4n99 commented Jun 12, 2024

Immortalise commented Jun 13, 2024

ary4n99 commented Jun 16, 2024