## Attack Transfer Learning

### Prepare environment

In [1]:
import torch

print(torch.__version__)

2.4.1+cu121


In [4]:
!pip install tensorboardX

Defaulting to user installation because normal site-packages is not writeable
Collecting tensorboardX
  Downloading tensorboardX-2.6.2.2-py2.py3-none-any.whl.metadata (5.8 kB)
Downloading tensorboardX-2.6.2.2-py2.py3-none-any.whl (101 kB)
Installing collected packages: tensorboardX
Successfully installed tensorboardX-2.6.2.2


### White-box Attacks

To ensure everything works without multiprocessing interference, test the DataLoader with num_workers=0 in whitebox_attack.py

To attack an MNIST model that is trained from scratch, the following command can be used:

In [23]:
!python whitebox_attack.py --dataset=mnist --arch=DTN --ckpt_file=./ckpt/white-box/mnist_scratch.pt --attack_method=FGSM --eps=4,8,16,32

transform ops: [ToTensor(), Lambda(), Lambda(), Normalize(mean=(0.5,), std=(0.5,))]
CUDA Available:  True
test accuracy w/ input transform. 

Test set: Average loss: 0.1971, Accuracy: 9423/10000 (94%)

Epsilon: 0.01568627450980392	Test Accuracy = 9110 / 10000 = 0.911
Epsilon: 0.03137254901960784	Test Accuracy = 8714 / 10000 = 0.8714
Epsilon: 0.06274509803921569	Test Accuracy = 7755 / 10000 = 0.7755
Epsilon: 0.12549019607843137	Test Accuracy = 5086 / 10000 = 0.5086


12/05/2024 00:18:51 - INFO - __main__ - output to ./ckpt/white-box\white-box\FGSM\non_targeted\05Dec24-001851
  model.load_state_dict(torch.load(pretrained_model, map_location='cpu'))

  0%|          | 0/10000 [00:00<?, ?it/s]
  0%|          | 1/10000 [00:00<24:36,  6.77it/s]
  0%|          | 16/10000 [00:00<02:09, 76.91it/s]
  0%|          | 33/10000 [00:00<01:27, 113.91it/s]
  1%|          | 53/10000 [00:00<01:09, 143.59it/s]
  1%|          | 70/10000 [00:00<01:05, 150.64it/s]
  1%|          | 88/10000 [00:00<01:02, 157.72it/s]
  1%|          | 105/10000 [00:00<01:02, 159.14it/s]
  1%|          | 122/10000 [00:00<01:01, 161.54it/s]
  1%|▏         | 139/10000 [00:00<01:00, 163.20it/s]
  2%|▏         | 157/10000 [00:01<00:58, 167.23it/s]
  2%|▏         | 174/10000 [00:01<00:58, 167.57it/s]
  2%|▏         | 193/10000 [00:01<00:57, 171.84it/s]
  2%|▏         | 211/10000 [00:01<00:58, 166.51it/s]
  2%|▏         | 228/10000 [00:01<00:59, 163.81it/s]
  2%|▏         | 246/10000 [00:01<00:58,

Similarly, to attack an MNIST model that is fine-tuned from a model that is pre-trained on SVHN, the following command can be used:

In [22]:
!python whitebox_attack.py --dataset=mnist --arch=DTN --ckpt_file=./ckpt/white-box/mnist_ft_svhn.pt --attack_method=FGSM --eps=4,8,16,32

transform ops: [ToTensor(), Lambda(), Lambda(), Normalize(mean=(0.5,), std=(0.5,))]
CUDA Available:  True
test accuracy w/ input transform. 

Test set: Average loss: 0.0734, Accuracy: 9777/10000 (98%)

Epsilon: 0.01568627450980392	Test Accuracy = 9670 / 10000 = 0.967
Epsilon: 0.03137254901960784	Test Accuracy = 9551 / 10000 = 0.9551
Epsilon: 0.06274509803921569	Test Accuracy = 9303 / 10000 = 0.9303
Epsilon: 0.12549019607843137	Test Accuracy = 8493 / 10000 = 0.8493


12/05/2024 00:13:49 - INFO - __main__ - output to ./ckpt/white-box\white-box\FGSM\non_targeted\05Dec24-001349
  model.load_state_dict(torch.load(pretrained_model, map_location='cpu'))

  0%|          | 0/10000 [00:00<?, ?it/s]
  0%|          | 1/10000 [00:04<12:15:06,  4.41s/it]
  0%|          | 16/10000 [00:04<33:50,  4.92it/s]  
  0%|          | 31/10000 [00:04<14:56, 11.12it/s]
  0%|          | 45/10000 [00:04<08:57, 18.52it/s]
  1%|          | 60/10000 [00:04<05:50, 28.38it/s]
  1%|          | 75/10000 [00:04<04:07, 40.02it/s]
  1%|          | 90/10000 [00:05<03:06, 53.11it/s]
  1%|          | 106/10000 [00:05<02:24, 68.41it/s]
  1%|          | 122/10000 [00:05<01:57, 84.17it/s]
  1%|▏         | 140/10000 [00:05<01:36, 102.42it/s]
  2%|▏         | 159/10000 [00:05<01:21, 121.48it/s]
  2%|▏         | 177/10000 [00:05<01:13, 134.43it/s]
  2%|▏         | 195/10000 [00:05<01:07, 144.28it/s]
  2%|▏         | 212/10000 [00:05<01:05, 150.03it/s]
  2%|▏         | 230/10000 [00:05<01:02, 15

### Black-box Attacks

When two models are trained independently:

In [10]:
!python blackbox_attack_by_transfer.py --dataset=usps --arch=DTN --ckpt_a=./ckpt/black-box/mnist_source.pt --ckpt_b=./ckpt/black-box/usps_scratch.pt --attack_method=FGSM --eps=4,8,16,32

CUDA Available:  True

Test set: Average loss: 0.5191, Accuracy: 1636/1860 (88%)

Epsilon: 0.01568627450980392	Test Accuracy = 1602 / 1860 = 0.8612903225806452
Epsilon: 0.03137254901960784	Test Accuracy = 1597 / 1860 = 0.8586021505376344
Epsilon: 0.06274509803921569	Test Accuracy = 1574 / 1860 = 0.8462365591397849
Epsilon: 0.12549019607843137	Test Accuracy = 1540 / 1860 = 0.8279569892473119
eps: 0.01568627450980392,0.03137254901960784,0.06274509803921569,0.12549019607843137
accuracy: 0.8612903225806452,0.8586021505376344,0.8462365591397849,0.8279569892473119


12/04/2024 23:49:26 - INFO - __main__ - output to ./ckpt/black-box\black-box\FGSM\non_targeted\04Dec24-234926
  model_A.load_state_dict(torch.load(args.ckpt_a, map_location='cpu'))
  model_B.load_state_dict(torch.load(args.ckpt_b, map_location='cpu'))

  0%|          | 0/1860 [00:00<?, ?it/s]
  0%|          | 1/1860 [00:04<2:18:25,  4.47s/it]
  0%|          | 7/1860 [00:04<14:54,  2.07it/s]  
  1%|          | 13/1860 [00:04<06:48,  4.52it/s]
  2%|▏         | 31/1860 [00:04<02:06, 14.51it/s]
  3%|▎         | 50/1860 [00:04<01:05, 27.60it/s]
  4%|▍         | 71/1860 [00:04<00:39, 45.19it/s]
  5%|▍         | 91/1860 [00:05<00:27, 63.74it/s]
  6%|▌         | 110/1860 [00:05<00:21, 81.94it/s]
  7%|▋         | 130/1860 [00:05<00:16, 102.03it/s]
  8%|▊         | 149/1860 [00:05<00:14, 117.36it/s]
  9%|▉         | 169/1860 [00:05<00:12, 133.65it/s]
 10%|█         | 188/1860 [00:05<00:11, 144.05it/s]
 11%|█         | 208/1860 [00:05<00:10, 155.65it/s]
 12%|█▏        | 227/1860 [00:05<00:10, 162

When the USPS model is fine-tuned from an MNIST model:

In [11]:
!python blackbox_attack_by_transfer.py --dataset=usps --arch=DTN --ckpt_a=./ckpt/black-box/mnist_source.pt --ckpt_b=./ckpt/black-box/usps_ft_from_mnist.pt --attack_method=FGSM --eps=4,8,16,32

CUDA Available:  True

Test set: Average loss: 0.5191, Accuracy: 1636/1860 (88%)

Epsilon: 0.01568627450980392	Test Accuracy = 1742 / 1860 = 0.9365591397849462
Epsilon: 0.03137254901960784	Test Accuracy = 1635 / 1860 = 0.8790322580645161
Epsilon: 0.06274509803921569	Test Accuracy = 1269 / 1860 = 0.682258064516129
Epsilon: 0.12549019607843137	Test Accuracy = 287 / 1860 = 0.1543010752688172
eps: 0.01568627450980392,0.03137254901960784,0.06274509803921569,0.12549019607843137
accuracy: 0.9365591397849462,0.8790322580645161,0.682258064516129,0.1543010752688172


12/04/2024 23:51:12 - INFO - __main__ - output to ./ckpt/black-box\black-box\FGSM\non_targeted\04Dec24-235112
  model_A.load_state_dict(torch.load(args.ckpt_a, map_location='cpu'))
  model_B.load_state_dict(torch.load(args.ckpt_b, map_location='cpu'))

  0%|          | 0/1860 [00:00<?, ?it/s]
  0%|          | 1/1860 [00:04<2:22:22,  4.60s/it]
  1%|          | 18/1860 [00:04<05:45,  5.33it/s] 
  2%|▏         | 37/1860 [00:04<02:20, 12.93it/s]
  3%|▎         | 56/1860 [00:04<01:19, 22.70it/s]
  4%|▍         | 76/1860 [00:05<00:50, 35.56it/s]
  5%|▌         | 95/1860 [00:05<00:35, 50.08it/s]
  6%|▌         | 114/1860 [00:05<00:26, 66.41it/s]
  7%|▋         | 134/1860 [00:05<00:20, 85.12it/s]
  8%|▊         | 154/1860 [00:05<00:16, 104.36it/s]
  9%|▉         | 174/1860 [00:05<00:13, 122.34it/s]
 10%|█         | 193/1860 [00:05<00:12, 136.50it/s]
 12%|█▏        | 214/1860 [00:05<00:10, 152.32it/s]
 13%|█▎        | 234/1860 [00:05<00:10, 160.43it/s]
 14%|█▎        | 253/1860 [00:05<00:09, 16

When the two models are commonly initialized from a model pre-trained on SVHN:

In [12]:
!python blackbox_attack_by_transfer.py --dataset=usps --arch=DTN --ckpt_a=./ckpt/black-box/mnist_commoninit.pt --ckpt_b=./ckpt/black-box/usps_commoninit.pt --attack_method=FGSM --eps=4,8,16,32

CUDA Available:  True

Test set: Average loss: 0.1298, Accuracy: 1796/1860 (97%)

Epsilon: 0.01568627450980392	Test Accuracy = 1729 / 1860 = 0.9295698924731183
Epsilon: 0.03137254901960784	Test Accuracy = 1693 / 1860 = 0.9102150537634408
Epsilon: 0.06274509803921569	Test Accuracy = 1604 / 1860 = 0.8623655913978494
Epsilon: 0.12549019607843137	Test Accuracy = 1313 / 1860 = 0.7059139784946237
eps: 0.01568627450980392,0.03137254901960784,0.06274509803921569,0.12549019607843137
accuracy: 0.9295698924731183,0.9102150537634408,0.8623655913978494,0.7059139784946237


12/04/2024 23:52:26 - INFO - __main__ - output to ./ckpt/black-box\black-box\FGSM\non_targeted\04Dec24-235226
  model_A.load_state_dict(torch.load(args.ckpt_a, map_location='cpu'))
  model_B.load_state_dict(torch.load(args.ckpt_b, map_location='cpu'))

  0%|          | 0/1860 [00:00<?, ?it/s]
  0%|          | 1/1860 [00:04<2:16:29,  4.41s/it]
  1%|          | 19/1860 [00:04<05:13,  5.86it/s] 
  2%|▏         | 39/1860 [00:04<02:08, 14.15it/s]
  3%|▎         | 62/1860 [00:04<01:07, 26.58it/s]
  4%|▍         | 81/1860 [00:04<00:45, 38.97it/s]
  5%|▌         | 101/1860 [00:04<00:32, 54.53it/s]
  6%|▋         | 120/1860 [00:05<00:24, 70.94it/s]
  8%|▊         | 140/1860 [00:05<00:19, 89.46it/s]
  9%|▊         | 161/1860 [00:05<00:15, 109.89it/s]
 10%|▉         | 181/1860 [00:05<00:13, 126.39it/s]
 11%|█         | 201/1860 [00:05<00:11, 142.16it/s]
 12%|█▏        | 221/1860 [00:05<00:10, 150.67it/s]
 13%|█▎        | 240/1860 [00:05<00:10, 157.28it/s]
 14%|█▍        | 260/1860 [00:05<00:09, 1