diff --git a/INSTRUCTIONS.md b/INSTRUCTIONS.md index 80839dc..58568ed 100644 --- a/INSTRUCTIONS.md +++ b/INSTRUCTIONS.md @@ -65,10 +65,11 @@ Note: 1. As mentioned in the paper, the threat model is: - 1. targeted PGD attack with one random uniform target label associated with each image - 2. maximum perturbation per pixel is 16. + 1. __Targeted attack__, with one target label associated with each image. The target lable is + independently generated by uniformly sampling the incorrect labels. + 2. Maximum perturbation per pixel is 16. - We do not consider untargeted attack, nor do we let the attacker control the target label, + We do not consider untargeted attacks, nor do we let the attacker control the target labels, because we think such tasks are not realistic on the ImageNet-1k categories. 2. For each (attacker, model) pair, we provide both the __error rate__ of our model, diff --git a/README.md b/README.md index 0ec7702..ec505a3 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ By combining large-scale adversarial training and feature-denoising layers, we developed ImageNet classifiers with strong adversarial robustness. Trained on __128 GPUs__, our ImageNet classifier has 42.6% accuracy against an extremely strong -__2000-steps white-box__ PGD attacker. +__2000-steps white-box__ PGD targeted attack. This is a scenario where no previous models have achieved more than 1% accuracy. On black-box adversarial defense, our method won the __champion of defense track__ in the