Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attack goal function details #75

Closed
ary4n99 opened this issue Jun 12, 2024 · 2 comments
Closed

Attack goal function details #75

ary4n99 opened this issue Jun 12, 2024 · 2 comments

Comments

@ary4n99
Copy link

ary4n99 commented Jun 12, 2024

Hey, what's the reason behind using the difference in scores for the attack goal function instead of the PDR?

def _get_score(self, model_output, _):
score = self.ground_truth_output - model_output
return score

Also, why has maximizable=True not been passed to the goal function?

class AdvPromptGoalFunction(GoalFunction):
"""A goal function defined on a model that outputs a probability for some
number of classes."""
def __init__(self, model, dataset, eval_func, query_budget, *args, target_max_acc=0, verbose=True, **kwargs):
super().__init__(*args, **kwargs)

@Immortalise
Copy link
Collaborator

  1. Using PDR is also beneficial, I think it has a similar impact as the score differences.
  2. In the adversarial prompts goal function, the maximizable argument is redundant.

@ary4n99
Copy link
Author

ary4n99 commented Jun 16, 2024

Thank you for clarifying!

@ary4n99 ary4n99 closed this as completed Jun 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants