# Evaluating "Combating Adversaries with Anti-Adversaries"

Note on runtime: To stay within the free colab limits, this colab evaluates only 100 imgs for 1k RayS queries each. Takes ~3h (using a K80 GPU). More will probably time out.

In order to reproduce the full evaluation (1000 imgs for 10k queries), run the colab with a faster GPU and/or custom runtime.

Paper: https://arxiv.org/pdf/2103.14347.pdf

# Setup

In [1]:
# Clone repo and install some dependencies
!git clone https://github.com/MotasemAlfarra/Combating-Adversaries-with-Anti-Adversaries
!git clone https://github.com/uclaml/RayS
!sed -i "s/_, term_width = os.popen('stty size', 'r').read().split()/term_width=80/g" RayS/pgbar.py     # Fix terminal width in pgbar code
!pip install git+https://github.com/fra31/auto-attack
import sys
sys.path.insert(0,'Combating-Adversaries-with-Anti-Adversaries')
sys.path.insert(0,'RayS')

Cloning into 'Combating-Adversaries-with-Anti-Adversaries'...
remote: Enumerating objects: 74, done.[K
remote: Counting objects: 100% (74/74), done.[K
remote: Compressing objects: 100% (68/68), done.[K
remote: Total 74 (delta 36), reused 12 (delta 4), pack-reused 0[K
Unpacking objects: 100% (74/74), done.
Cloning into 'RayS'...
remote: Enumerating objects: 123, done.[K
remote: Counting objects: 100% (123/123), done.[K
remote: Compressing objects: 100% (86/86), done.[K
remote: Total 123 (delta 68), reused 80 (delta 31), pack-reused 0[K
Receiving objects: 100% (123/123), 5.62 MiB | 4.61 MiB/s, done.
Resolving deltas: 100% (68/68), done.
Collecting git+https://github.com/fra31/auto-attack
  Cloning https://github.com/fra31/auto-attack to /tmp/pip-req-build-hno0st4g
  Running command git clone -q https://github.com/fra31/auto-attack /tmp/pip-req-build-hno0st4g
Building wheels for collected packages: autoattack
  Building wheel for autoattack (setup.py) ... [?25l[?25hdone
  Create

# Params

In [2]:
eps_linf = 0.031         # Almost 8/255. This is the exact value as used in the defense reference implementation
n_imgs = 100             # Don't eval more than n images
batch_size = 100
batch_size = min(batch_size, n_imgs)

rays_n_queries = 1000

# Get CIFAR10


In [3]:
# Get CIFAR10 dataset
import numpy as np
import torch
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose(
    [transforms.ToTensor(),
     ])     # The model already contains the preprocessing

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=1)

# Move everything into memory
x_test_clean = torch.cat([x for (x, y) in testloader], 0).to("cuda")[:n_imgs]
y_test = torch.cat([y for (x, y) in testloader], 0).to("cuda")[:n_imgs]

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting ./data/cifar-10-python.tar.gz to ./data


# Setup the pretrained AWP classifier

Using the same model as in the reference implementation at https://github.com/MotasemAlfarra/Combating-Adversaries-with-Anti-Adversaries. This model is adversarially-trained with Adversarial Weight Perturbation, so it is already quite robust. The defense (Anti-Adversaries) is included as preprocessing step and claims to further increase robustness.

In [4]:
# Get pretrained AWP model, same as used in the reference implementation
!gdown --id 1sSjh4i2imdoprw_JcPj2cZzrJm0RIRI6
!mkdir weights
!mv RST-AWP_cifar10_linf_wrn28-10.pt weights/newmodel1_RST-AWP_cifar10_linf_wrn28-10.pt
from experiments.adv_weight_pert import get_model as get_awp_model
model_undefended = get_awp_model(k=0, alpha=0).eval().to("cuda")
model_defended = get_awp_model(k=2, alpha=0.15).eval().to("cuda")

Downloading...
From: https://drive.google.com/uc?id=1sSjh4i2imdoprw_JcPj2cZzrJm0RIRI6
To: /content/RST-AWP_cifar10_linf_wrn28-10.pt
100% 153M/153M [00:01<00:00, 83.4MB/s]


In [5]:
import tqdm

def eval_acc(model, x, y_gt):
  assert x.shape[0] == y_gt.shape[0]
  n = x.shape[0]
  
  n_batches = n // batch_size
  if n % batch_size != 0:
    n_batches += 1

  correct = 0
  with torch.no_grad():
      for i_batch in tqdm.tqdm(range(n_batches)):    
          excerpt = slice(i_batch * batch_size, (i_batch+1) * batch_size)
          outputs = model(x[excerpt])
          _, predicted = torch.max(outputs.data, 1)
          correct += (predicted == y_gt[excerpt]).sum().item()
  return correct / n

# Clean accuracy

Both undefended and defended models have similar accuracy. This is in line with the paper.

In [6]:
print(f"Undefended model accuracy on clean imgs: {eval_acc(model_undefended, x=x_test_clean, y_gt=y_test)}")
print(f"Defended model accuracy on clean imgs: {eval_acc(model_defended, x=x_test_clean, y_gt=y_test)}")

100%|██████████| 1/1 [00:00<00:00,  1.82it/s]


Undefended model accuracy on clean imgs: 0.88


100%|██████████| 1/1 [00:01<00:00,  1.36s/it]

Defended model accuracy on clean imgs: 0.88





# Reproducing paper claims with AutoAttack

Only running APGD-CE for brevity.

In [7]:
from autoattack import AutoAttack
print("AutoAttack on undefended model:")
adversary = AutoAttack(model_undefended.forward, norm='Linf', eps=eps_linf, verbose=True)
adversary.attacks_to_run = ['apgd-ce']
adv_autoattack_undefended = adversary.run_standard_evaluation_individual(x_orig=x_test_clean, y_orig=y_test, bs=batch_size)

AutoAttack on undefended model:
setting parameters for standard version
using standard version including apgd-ce



sys.settrace() should not be used when the debugger is being used.
This may cause the debugger to stop working correctly.
If this is needed, please check: 
http://pydev.blogspot.com/2007/06/why-cant-pydev-debugger-work-with.html
to see how to restore the debug tracing back correctly.
Call Location:
  File "/usr/local/lib/python3.7/dist-packages/autoattack/checks.py", line 100, in check_dynamic
    sys.settrace(tracefunc)


sys.settrace() should not be used when the debugger is being used.
This may cause the debugger to stop working correctly.
If this is needed, please check: 
http://pydev.blogspot.com/2007/06/why-cant-pydev-debugger-work-with.html
to see how to restore the debug tracing back correctly.
Call Location:
  File "/usr/local/lib/python3.7/dist-packages/autoattack/checks.py", line 102, in check_dynamic
    sys.settrace(None)



robust accuracy by APGD-CE 	 62.00% 	 (time attack: 45.0 s)


In [8]:
print("AutoAttack on defended model:")
adversary = AutoAttack(model_defended.forward, norm='Linf', eps=eps_linf, verbose=True)
adversary.attacks_to_run = ['apgd-ce']
adv_autoattack_defended = adversary.run_standard_evaluation_individual(x_orig=x_test_clean, y_orig=y_test, bs=batch_size)

AutoAttack on defended model:
setting parameters for standard version
using standard version including apgd-ce
robust accuracy by APGD-CE 	 81.00% 	 (time attack: 174.7 s)


The results are similar to the paper: The defense increases robustness against APGD-CE when directly attacked.

# A trivial transfer attack

The defense is easily circumvented by transferring adversarial examples from the underlying model:

In [9]:
x_test_adv = adv_autoattack_undefended['apgd-ce']
print(f"Undefended model accuracy on adv. examples created for undefended model: {eval_acc(model_undefended, x=x_test_adv, y_gt=y_test)}")
print(f"Defended model accuracy on adv. examples created for undefended model: {eval_acc(model_defended, x=x_test_adv, y_gt=y_test)}")

100%|██████████| 1/1 [00:00<00:00,  5.08it/s]


Undefended model accuracy on adv. examples created for undefended model: 0.62


100%|██████████| 1/1 [00:01<00:00,  1.40s/it]

Defended model accuracy on adv. examples created for undefended model: 0.62





# Decision-based attack with RayS

However, it is also possible to successfully attack the defense without any knowledge of the underlying classifier:

## Setup

In [None]:
from general_torch_model import GeneralTorchModel
from RayS import RayS

def run_rays_attack(model, n_queries):
  print(f"Running RayS attack with {n_queries} queries. This could take a while...")
  rays_torch_model = GeneralTorchModel(model, n_class=10, im_mean=None, im_std=None)
  attack = RayS(rays_torch_model, epsilon=eps_linf)

  n_batches = x_test_clean.shape[0] // batch_size
  if x_test_clean.shape[0] % batch_size != 0:
    n_batches += 1

  n_total = 0
  n_robust_correct = 0

  progress_bar = tqdm.tqdm(range(n_batches))
  for i_batch in progress_bar:    
      excerpt = slice(i_batch * batch_size, (i_batch+1) * batch_size)
      x_batch_clean = x_test_clean[excerpt]
      y_batch_gt = y_test[excerpt]

      x_batch_adv, queries, adbd, succ = attack(data=x_batch_clean, label=y_batch_gt, query_limit=n_queries)
      #print(f"This batch: attack reports success rate of {torch.sum(succ).item() / x_batch_clean.shape[0]}")

      # Filter by attack success
      below_eps_filter = torch.max(torch.abs(x_batch_adv - x_batch_clean).view(x_batch_clean.shape[0], -1), dim=1)[0] < eps_linf
      if torch.sum(below_eps_filter) != torch.sum(succ):      
        # Shouldn't happen, but if it does then it should be investigated
        print(f"WARN: Actual attack success ({torch.sum(below_eps_filter).item()}) != reported attack success ({torch.sum(succ).item()})!")

      # Combine clean images with successfully attacked images and measure overall accuracy
      x_batch_adv_below_eps = x_batch_clean.clone()
      x_batch_adv_below_eps[below_eps_filter] = x_batch_adv[below_eps_filter]      
      outputs = model(x_batch_adv_below_eps)
      _, y_batch_pred = torch.max(outputs.data, 1)

      n_total += x_batch_clean.shape[0]
      n_robust_correct += (y_batch_pred == y_batch_gt).sum().item()
      robust_acc = n_robust_correct / n_total

      # This might take a long time, so display the running accuracy after every batch
      progress_bar.set_description(f"acc={robust_acc}")

  return robust_acc


## Running the attack

In [None]:
robust_acc = run_rays_attack(model_undefended, n_queries=rays_n_queries)
print(f"Undefended model robust accuracy: {robust_acc}")

Running RayS attack with 1000 queries. This could take a while...


  0%|          | 0/1 [00:00<?, ?it/s]

out of queries


acc=0.73: 100%|██████████| 1/1 [22:32<00:00, 1352.15s/it]

Undefended model robust accuracy: 0.73





In [None]:
robust_acc = run_rays_attack(model_defended, n_queries=rays_n_queries)
print(f"Defended model robust accuracy: {robust_acc}")

Running RayS attack with 1000 queries. This could take a while...


  0%|          | 0/1 [00:00<?, ?it/s]

out of queries


acc=0.73: 100%|██████████| 1/1 [2:43:33<00:00, 9813.35s/it]

Defended model robust accuracy: 0.73





- Robust accuracy is exactly the same, so the defense doesn't work against this attack. It does make evaluation very slow though. 
- Robust accuracy is significantly below the strongest result reported by Alfarra et al. (73% < 79%).
- Robust accuracy further decreases when the attack is run for more iterations (we measured 67% against 10k iterations).