Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do you measure the after-attack accuracy ? #17

Closed
mahossam opened this issue Mar 30, 2020 · 6 comments
Closed

How do you measure the after-attack accuracy ? #17

mahossam opened this issue Mar 30, 2020 · 6 comments

Comments

@mahossam
Copy link

I am currently replicating the numbers from your paper, and I am not sure how you measure the after-attack accuracy ?
When I looked into the code, you do this:

if true_label != new_label:            
      adv_failures += 1 
...

(1-adv_failures/1000)*100.0

This suggests that in "adv_failures"  you consider non-flipped labels (in case of true_label != original_label) as successful attack.I don't see why ? Aren't successful attacks supposed to be only the ones that flip the original model label, while the true and original labels are equal (and then divide by the sum of true-positives and negatives, not the whole data) ?

@jind11
Copy link
Owner

jind11 commented Mar 30, 2020

The whole test data can be divided into two parts: one part is the ones that cannot be correctly predicted by the target model, for this part, we do not perform attacking; the other part is the ones that can be predicted correctly by the target model, then we create adversarial examples towards this part. The adv_failures also has two parts: the first part is those that cannot be correctly predicted by the original model; the other part is those we successfully flip the model prediction by the adversarial examples. Finally, therefore, the after attack accuracy is the accuracy of the target model on the whole adversarial data. I hope this explanation can solve your question. If so, let me know. Thanks!

@zhangshuoyang
Copy link

In that case, how does it lead to the statement that "the larger gap between the original and after-attack accuracy signals the more successful our attack is"?

@jind11
Copy link
Owner

jind11 commented Apr 2, 2020

The attack success rate can be calculated by: (original_accuracy - after_attack_accuracy) / original_accuracy * 100%.

@mahossam
Copy link
Author

mahossam commented Apr 7, 2020

Thank you @jind11 for the response.
However, I still a little bit confused. The description in your latest answer :
"The attack success rate can be calculated by: (original_accuracy - after_attack_accuracy) / original_accuracy * 100%. "
does not seem consistent with the main results of the paper. The adv_failures is computed out of all the test set, including the samples that are cannot be predicted by the target model. This part is what confuses me, that adv_failures does not distinguish between correctly predicted samples and the ones that are not correctly predicted by the target model.

Also another question:
What vocabulary size did you use for all classifiers ? (BERT, WordCNN, and WordLSTM)
I found a file called vocab.txt that contains 78K words, while the bert_config.json for BERT contained 30K. For other classifiers, I didn't find any hint of the vocabulary size.

Thank you.
Cheers.

@jind11
Copy link
Owner

jind11 commented Apr 7, 2020

For the first question, I can illustrate by one example: suppose we have 100 examples, the original accuracy is 90%, then it means we have 90 examples that can be correctly predicted and we will generate adversarial examples for. Adv_failures (i.e., afer_attack_accuracy) measures the number of prediction failures, which include two parts: one part is from the adversarial attacking, and the other is from the original errors by the target model. For example, if the after_attack_accuracy is 10%, then the adv_failures should be 90, among which 10 is from the original errors and 80 are created by the attack model. So (original_accuracy - after_attack_accuracy) can measure how many examples are successfully attacked by our attack model. In this example is 80%, that is, 80 examples are successfully attacked. Then we can calculate the attack success rate as 80 / 90 * 100%, which means among 90 examples we try to attack, we can have 80 success cases. I hope this can resolve your confusion
For the second question, BERT model is using ~30K vocab, which is the same as the original official release. For WordCNN and WordLSTM, the vocab is between 10K and 20K.

@mahossam
Copy link
Author

mahossam commented Apr 8, 2020

Thank you @jind11, this makes sense to me now. Thank you for taking time to explaing in details.

For the replication of wordCNN and LSTM results, I posted a new issue regarding vocab sizes here .
Thank you very much, looking forward to your response.
Cheers.

@mahossam mahossam closed this as completed Apr 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants