How do you measure the after-attack accuracy ? #17

mahossam · 2020-03-30T04:10:36Z

I am currently replicating the numbers from your paper, and I am not sure how you measure the after-attack accuracy ?
When I looked into the code, you do this:

if true_label != new_label:            
      adv_failures += 1 
...

(1-adv_failures/1000)*100.0

This suggests that in "adv_failures" you consider non-flipped labels (in case of true_label != original_label) as successful attack.I don't see why ? Aren't successful attacks supposed to be only the ones that flip the original model label, while the true and original labels are equal (and then divide by the sum of true-positives and negatives, not the whole data) ?

The text was updated successfully, but these errors were encountered:

jind11 · 2020-03-30T05:02:01Z

The whole test data can be divided into two parts: one part is the ones that cannot be correctly predicted by the target model, for this part, we do not perform attacking; the other part is the ones that can be predicted correctly by the target model, then we create adversarial examples towards this part. The adv_failures also has two parts: the first part is those that cannot be correctly predicted by the original model; the other part is those we successfully flip the model prediction by the adversarial examples. Finally, therefore, the after attack accuracy is the accuracy of the target model on the whole adversarial data. I hope this explanation can solve your question. If so, let me know. Thanks!

zhangshuoyang · 2020-04-02T08:13:38Z

In that case, how does it lead to the statement that "the larger gap between the original and after-attack accuracy signals the more successful our attack is"?

jind11 · 2020-04-02T14:56:56Z

The attack success rate can be calculated by: (original_accuracy - after_attack_accuracy) / original_accuracy * 100%.

mahossam · 2020-04-07T06:32:55Z

Thank you @jind11 for the response.
However, I still a little bit confused. The description in your latest answer :
"The attack success rate can be calculated by: (original_accuracy - after_attack_accuracy) / original_accuracy * 100%. "
does not seem consistent with the main results of the paper. The adv_failures is computed out of all the test set, including the samples that are cannot be predicted by the target model. This part is what confuses me, that adv_failures does not distinguish between correctly predicted samples and the ones that are not correctly predicted by the target model.

Also another question:
What vocabulary size did you use for all classifiers ? (BERT, WordCNN, and WordLSTM)
I found a file called vocab.txt that contains 78K words, while the bert_config.json for BERT contained 30K. For other classifiers, I didn't find any hint of the vocabulary size.

Thank you.
Cheers.

jind11 · 2020-04-07T16:13:15Z

For the first question, I can illustrate by one example: suppose we have 100 examples, the original accuracy is 90%, then it means we have 90 examples that can be correctly predicted and we will generate adversarial examples for. Adv_failures (i.e., afer_attack_accuracy) measures the number of prediction failures, which include two parts: one part is from the adversarial attacking, and the other is from the original errors by the target model. For example, if the after_attack_accuracy is 10%, then the adv_failures should be 90, among which 10 is from the original errors and 80 are created by the attack model. So (original_accuracy - after_attack_accuracy) can measure how many examples are successfully attacked by our attack model. In this example is 80%, that is, 80 examples are successfully attacked. Then we can calculate the attack success rate as 80 / 90 * 100%, which means among 90 examples we try to attack, we can have 80 success cases. I hope this can resolve your confusion
For the second question, BERT model is using ~30K vocab, which is the same as the original official release. For WordCNN and WordLSTM, the vocab is between 10K and 20K.

mahossam · 2020-04-08T16:40:46Z

Thank you @jind11, this makes sense to me now. Thank you for taking time to explaing in details.

For the replication of wordCNN and LSTM results, I posted a new issue regarding vocab sizes here .
Thank you very much, looking forward to your response.
Cheers.

mahossam mentioned this issue Apr 8, 2020

Missing vocabularies for wordCNN and wordLSTM pretrained models #18

Open

mahossam closed this as completed Apr 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do you measure the after-attack accuracy ? #17

How do you measure the after-attack accuracy ? #17

mahossam commented Mar 30, 2020

jind11 commented Mar 30, 2020

zhangshuoyang commented Apr 2, 2020

jind11 commented Apr 2, 2020

mahossam commented Apr 7, 2020 •

edited

Loading

jind11 commented Apr 7, 2020

mahossam commented Apr 8, 2020

How do you measure the after-attack accuracy ? #17

How do you measure the after-attack accuracy ? #17

Comments

mahossam commented Mar 30, 2020

jind11 commented Mar 30, 2020

zhangshuoyang commented Apr 2, 2020

jind11 commented Apr 2, 2020

mahossam commented Apr 7, 2020 • edited Loading

jind11 commented Apr 7, 2020

mahossam commented Apr 8, 2020

mahossam commented Apr 7, 2020 •

edited

Loading