In this notebook, we provide the robustness evaluations for various programming models on the code summarization task. The code summarization task wants to summarize the functionality of one given code snippet.

# Install environment

In [None]:
!pip install torch==1.7.0+cu110 numpy dill tqdm torchtext==0.8.0 tensorboard matplotlib scipy click==7.1.2 allennlp==2.4.0 -f https://download.pytorch.org/whl/torch_stable.html

# Download model and dataset
In this section, we loaded one pretrained transformer model which is trained from scratch on the train dataset from [py150k](https://www.sri.inf.ethz.ch/py150) by using supervised training method. You can check details of models in [Jain, et al., 2022]( https://arxiv.org/pdf/2007.04973.pdf). Then we loaded the test dataset from [py150k](https://www.sri.inf.ethz.ch/py150).

In [4]:
# download codes for robustness evaluation.
!gdown --fuzzy https://drive.google.com/file/d/13Q-dL0G2UDFSqZQBV2XESPSsQhXaNKDl/view?usp=sharing
# download transformer parameters from supervised leanring model.  
!gdown --fuzzy https://drive.google.com/file/d/1Lq0qbGauACbjGvXnsOkE2qVD8t026wLi/view?usp=sharing
# download a dictionary which records where parameters can be perturbed in each code snippets and the corresponding original tokens. 
!gdown --fuzzy https://drive.google.com/file/d/1CeEJ5DaGlpEF7rsY8VYrpuUjADX2w4DX/view?usp=sharing
# download test dataset. 
!gdown --fuzzy https://drive.google.com/file/d/1-LKZhx27toRBYOakZuhmmVJAM8vL4eRT/view?usp=sharing

Downloading...
From: https://drive.google.com/uc?id=13Q-dL0G2UDFSqZQBV2XESPSsQhXaNKDl
To: /mnt/ufs18/home-107/jiajingh/Transformer.zip
100%|████████████████████████████████████████| 220k/220k [00:00<00:00, 14.2MB/s]
Downloading...
From: https://drive.google.com/uc?id=1Lq0qbGauACbjGvXnsOkE2qVD8t026wLi
To: /mnt/ufs18/home-107/jiajingh/checkpoint.zip
100%|█████████████████████████████████████████| 849M/849M [00:07<00:00, 111MB/s]
Downloading...
From: https://drive.google.com/uc?id=1CeEJ5DaGlpEF7rsY8VYrpuUjADX2w4DX
To: /mnt/ufs18/home-107/jiajingh/test_site_map.json
100%|███████████████████████████████████████| 35.0M/35.0M [00:00<00:00, 104MB/s]
Downloading...
From: https://drive.google.com/uc?id=1-LKZhx27toRBYOakZuhmmVJAM8vL4eRT
To: /mnt/ufs18/home-107/jiajingh/test.tsv
100%|██████████████████████████████████████| 22.7M/22.7M [00:00<00:00, 81.6MB/s]


In [5]:
!rm -r __MACOSX/
!rm -r checkpoint
!rm -r normal/
!rm -r Transformer/
!unzip -q Transformer.zip
!unzip -q checkpoint.zip
%mkdir models/
%mkdir normal/
%ls
%mv checkpoint normal/checkpoints
%mkdir -p outputs/gradient-targeting

rm: cannot remove ‘checkpoint’: No such file or directory
mkdir: cannot create directory ‘models/’: File exists
[0m[01;34manaconda3[0m/                         data.jsonl         test.jsonl
Anaconda3-2021.05-Linux-x86_64.sh  [01;34mDocuments[0m/         test.jsonl.gz
Attack.ipynb                       environment.yml    test_site_map.json
[01;34mcheckpoint[0m/                        [01;34mmnt[0m/               test.tsv
checkpoint.zip                     [01;34mmodels[0m/            test.txt
[01;34mclaw[0m/                              new_data.jsonl     [01;34mTony[0m/
[01;34mclaw2[0m/                             new_data.jsonl.gz  train.jsonl
claw2.zip                          [01;34mnormal[0m/            train.jsonl.gz
[01;34mclaw-sat[0m/                          [01;34mondemand[0m/          train.txt
claw-sat.zip                       optimal.png        [01;34mTransformer[0m/
claw.zip                           [01;34moutputs[0m/           Transformer.zip


# Data preprocessing
Before applying data preprocessing, here are some details for the original test dataset. The test.tsv is from [py150k](https://www.sri.inf.ethz.ch/py150) which used for code summarization task. Each row of it contains three elements: 
1. index of the current code snippet.
2. input tokens of the current code snippet.
3. groundtruth (natural language summarization for the current code snippet).

In this step, we random subsampled 100 rows from the original test dataset. The indexes of selected code snippets will be printed out. 

In [21]:
import os
import re
import csv
import sys
import json
import numpy as np
csv.field_size_limit(sys.maxsize)
ID_MAP = {}
with open("test.tsv", 'r') as identity_tsv:
  reader = csv.reader(
    (x.replace('\0', '') for x in identity_tsv),
    delimiter='\t', quoting=csv.QUOTE_NONE
  )
  next(reader, None)
  for line in reader:
    ID_MAP[line[0]] = (line[1], line[2])
print("  + Loaded {} samples".format(len(ID_MAP)))

M = 100
idx_lst = list(range(M))
idx_lst = np.random.choice(np.array(idx_lst),M,replace=False)
print(idx_lst)
cnt = 0
with open("test.tsv", "w") as out_f:
  out_f.write('index\tsrc\ttgt\n')
  for key in ID_MAP.keys():
    if cnt in idx_lst:
        row = [ key,ID_MAP[key][0], ID_MAP[key][1] ]
        out_f.write('{}\n'.format('\t'.join(row)))
        cnt+=1
    else:
        break

  + Loaded 19999 samples
[25 85 32 66 57 77 16 19 15 37 62 60 93 99 64 91 27 76 39 52 79 82 38 96
 45 12 21 28 55 84 46 94 20 26 23 88 54  0 70  2 74 73 98 47 30 86  1 61
 68 33  8 78 48 69 36 51  6 71 13 17 56 89 50 29 72 31 95 53 18  7  5 90
  9 14 11 63 75 67 83 35 87  4 59 42 40 24  3 97 80 34 92 41 81 49 65 22
 58 10 43 44]


In the next step, we will transform the subsampled dataset into a new file in which each row contains: 
* index.
* clean input (orginal code snippets).
* label (natural language summarization for the current code snippet).
* program sketches [[Ramakrishnan et al., 2020](https://arxiv.org/abs/2002.03043)]  for adversarial examples (which will be perturbed in the attack generation program).

Here is one illustration of program sketches  for ``` if (* > 0) * = False ``` , where * represents two variables to be replaced to perturb the program.
![alt text](sketch.png "Title")

In [22]:

import os
import re
import csv
import sys
import tqdm
import json

def handle_replacement_tokens(line):
### replace 'replaceme' in code snippets with '@R_i'
  new_line = line
  uniques = set()
  for match in re.compile('replaceme\d+').findall(line):
    uniques.add(match.strip())
  uniques = list(uniques)
  uniques.sort()
  uniques.reverse()
  for match in uniques:
    replaced = match.replace("replaceme", "@R_") + '@'
    new_line = new_line.replace(match, replaced)
  return new_line

csv.field_size_limit(sys.maxsize)
#### load clean code snippets
ID_MAP = {}
TRANSFORMS = ['transforms.Combined']
print("Loading identity transform...")
with open("test.tsv", 'r') as identity_tsv:
  reader = csv.reader(
    (x.replace('\0', '') for x in identity_tsv),
    delimiter='\t', quoting=csv.QUOTE_NONE
  )
  next(reader, None)
  for line in reader:
    ID_MAP[line[0]] = (line[1], line[2])
print("  + Loaded {} samples".format(len(ID_MAP)))
#### load transformed code snippets
print("Loading transformed samples...")
TRANSFORMED = {}
for transform_name in TRANSFORMS:
  TRANSFORMED[transform_name] = {}
  with open("test.tsv", 'r') as current_tsv:
    reader = csv.reader(
      (x.replace('\0', '') for x in current_tsv),
      delimiter='\t', quoting=csv.QUOTE_NONE
    )
    next(reader, None)
    for line in reader:
      TRANSFORMED[transform_name][line[0]] = handle_replacement_tokens(line[1])
  print("  + Loaded {} samples from '{}'".format(
    len(TRANSFORMED[transform_name]), transform_name
  ))

print("Writing adv. {}ing samples...".format(sys.argv[1]))
### write results to a new file.
with open("outputs/test.tsv", "w") as out_f:
  out_f.write('index\tsrc\ttgt\t{}\n'.format(
    '\t'.join([ 
      '{}'.format(i) for i in TRANSFORMS
    ])
  ))

  idx_to_fname = {}
  index = 0

  for key in tqdm.tqdm(ID_MAP.keys(), desc="  + Progress"):
    row = [ ID_MAP[key][0], ID_MAP[key][1] ]
    for transform_name in TRANSFORMS:
      if key in TRANSFORMED[transform_name]:
        row.append(TRANSFORMED[transform_name][key])
      else:
        row.append(ID_MAP[key][0])
    out_f.write('{}\t{}\n'.format(index, '\t'.join(row)))
    idx_to_fname[index] = key
    index += 1
with open('outputs/test_idx_to_fname.json', 'w') as f:
  json.dump(idx_to_fname, f)
print("  + Adversarial testing file generation complete!")


  + Progress: 100%|██████████| 100/100 [00:00<00:00, 231985.84it/s]

Loading identity transform...
  + Loaded 100 samples
Loading transformed samples...
  + Loaded 100 samples from 'transforms.Combined'
Writing adv. -fing samples...
  + Adversarial testing file generation complete!





# Evaluation on supervised transformer
The pipeline of robustness for programming language models is:
* Generate adversarial tokens for selected sites.
* Replace origianl tokens in the original code snippets with adversaial tokens.
* Evaluate the F1 scores for adversarial code snippets.

Here we will provide one evaluation example which is conducted on the transformer which is trained by using supervised training method. 

## Clean examples
The following code cell will demonstrate the pipeline to evaluate a model. This will provide you with the F1 score for the model to be evaluated. 

In [7]:

### loading original tokens
!python ./Transformer/gradient_attack.py \
    --data_path ./outputs/test.tsv \
    --expt_dir "normal" \
    --load_checkpoint "Best_F1" \
    --save_path ./outputs/targets-test.json \
    --n_alt_iters 2 \
    --z_init 1 \
    --batch_size 32 \
    --u_pgd_epochs 0 \
    --z_epsilon 0 \
    --attack_version 2 \
    --u_learning_rate 0.5 \
    --z_learning_rate 0.5 \
    --smoothing_param 0.01 \
	--vocab_to_use 1 \
    --teacher_ratio 
### replace "@R_j" with original tokens for code snippets.
!python ./Transformer/replace_tokens.py \
      --source_data_path ./outputs/test.tsv \
      --dest_data_path ./outputs/gradient-targeting/test.tsv \
      --mapping_json ./outputs/targets-test-gradient.json
### evaluation 
!python3 ./Transformer/evaluate.py \
  --data_path ./outputs/gradient-targeting/test.tsv \
  --expt_dir "normal" \
  --output_dir ./outputs \
  --load_checkpoint "Best_F1" \
  --src_field_name "transforms.Combined" --save 

Namespace(attack_version=2, batch_size=32, data_path='./outputs/test.tsv', distinct=True, exact_matches=False, expt_dir='normal', load_checkpoint='Best_F1', n_alt_iters=2, no_gradient=False, num_replacements=1500, random=False, save_path='./outputs/targets-test.json', smoothing_param=0.01, teacher_ratio=True, u_accumulate_best_replacements=False, u_learning_rate=0.5, u_optim=False, u_pgd_epochs=0, u_rand_update_pgd=False, use_loss_smoothing=False, vocab_to_use=1, z_epsilon=0, z_init=1, z_learning_rate=0.5, z_optim=False)
data_split test
[('index', <torchtext.data.field.Field object at 0x2ba6557dcb20>), ('src', <seq2seq.dataset.fields.TransSourceField object at 0x2ba584ec3ee0>), ('tgt', <seq2seq.dataset.fields.TargetField object at 0x2ba6557dc1c0>), ('transforms.Combined', <seq2seq.dataset.fields.TransSourceField object at 0x2ba6557dcb80>)]
Original data size: 100
Attacking using Gradient transforms.Combined
OrderedDict([   ('version', 'v2'),
                ('n_alt_iters', 4),
        

# Adversarial examples generation
In this section, you can utilize two version of attacks. 
1. Random select sites (where should be attacked in the sentence), perturb those selected sites.
2. Co-optimize sites and tokens to generate attack. 

## Random site-selection + optimal site-perturbation [[Ramakrishnan et al., 2020](https://arxiv.org/abs/2002.03043)]
In the next step, you can observe that F1 score from these adversarial examples is less than that from clean examples

In [8]:
### generated perturbed tokens for random selected sites 
!python3 ./Transformer/gradient_attack.py \
    --data_path ./outputs/test.tsv \
    --expt_dir "normal" \
    --load_checkpoint "Best_F1" \
    --save_path ./outputs/targets-test.json \
    --n_alt_iters 2 \
    --z_init 1 \
    --batch_size 16 \
    --u_pgd_epochs 3 \
    --z_epsilon 1 \
    --attack_version 2 \
    --u_learning_rate 0.5 \
    --z_learning_rate 0.5 \
    --smoothing_param 0.01 \
	--vocab_to_use 1 \
    --teacher_ratio \
    --u_optim
### replace "@R_j" with adversarial tokens for code snippets.
!python3 ./Transformer/replace_tokens.py \
      --source_data_path ./outputs/test.tsv \
      --dest_data_path ./outputs/gradient-targeting/test.tsv \
      --mapping_json ./outputs/targets-test-gradient.json
### evaluate
!python3 ./Transformer/evaluate.py \
  --data_path ./outputs/gradient-targeting/test.tsv \
  --expt_dir "normal" \
  --output_dir ./outputs \
  --load_checkpoint "Best_F1" \
  --src_field_name "transforms.Combined" --save 

Namespace(attack_version=2, batch_size=16, data_path='./outputs/test.tsv', distinct=True, exact_matches=False, expt_dir='normal', load_checkpoint='Best_F1', n_alt_iters=2, no_gradient=False, num_replacements=1500, random=False, save_path='./outputs/targets-test.json', smoothing_param=0.01, teacher_ratio=True, u_accumulate_best_replacements=False, u_learning_rate=0.5, u_optim=True, u_pgd_epochs=3, u_rand_update_pgd=False, use_loss_smoothing=False, vocab_to_use=1, z_epsilon=1, z_init=1, z_learning_rate=0.5, z_optim=False)
data_split test
[('index', <torchtext.data.field.Field object at 0x2abe49845b20>), ('src', <seq2seq.dataset.fields.TransSourceField object at 0x2abd88f2bee0>), ('tgt', <seq2seq.dataset.fields.TargetField object at 0x2abe498451c0>), ('transforms.Combined', <seq2seq.dataset.fields.TransSourceField object at 0x2abe49845b80>)]
Original data size: 100
Attacking using Gradient transforms.Combined
OrderedDict([   ('version', 'v2'),
                ('n_alt_iters', 4),
         

## Optimal site-selection + optimal site-perturbation [[Srikant et al., 2021](https://openreview.net/forum?id=PH5PH9ZO_4)]
In the next step, you can observe that F1 score is much less than that from previous kind of attack, which means optimal site-selection + optimal site-perturbation is stronger than previous one.

In [9]:
### generated perturbed tokens for optimal selected sites 
!python3 ./Transformer/gradient_attack.py \
    --data_path ./outputs/test.tsv \
    --expt_dir "normal" \
    --load_checkpoint "Best_F1" \
    --save_path ./outputs/targets-test.json \
    --n_alt_iters 2 \
    --z_init 1 \
    --z_epsilon 1 \
    --batch_size 16 \
    --u_pgd_epochs 3 \
    --attack_version 2 \
    --u_learning_rate 0.5 \
    --z_learning_rate 0.5 \
    --smoothing_param 0.01 \
	--vocab_to_use 1 \
    --teacher_ratio \
    --u_optim \
    --z_optim
### replace "@R_j" with adversarial tokens for code snippets.
!python3 ./Transformer/replace_tokens.py \
      --source_data_path ./outputs/test.tsv \
      --dest_data_path ./outputs/gradient-targeting/test.tsv \
      --mapping_json ./outputs/targets-test-gradient.json
### evaluate
!python3 ./Transformer/evaluate.py \
  --data_path ./outputs/gradient-targeting/test.tsv \
  --expt_dir "normal" \
  --output_dir ./outputs \
  --load_checkpoint "Best_F1" \
  --src_field_name "transforms.Combined" --save 

Namespace(attack_version=2, batch_size=16, data_path='./outputs/test.tsv', distinct=True, exact_matches=False, expt_dir='normal', load_checkpoint='Best_F1', n_alt_iters=2, no_gradient=False, num_replacements=1500, random=False, save_path='./outputs/targets-test.json', smoothing_param=0.01, teacher_ratio=True, u_accumulate_best_replacements=False, u_learning_rate=0.5, u_optim=True, u_pgd_epochs=3, u_rand_update_pgd=False, use_loss_smoothing=False, vocab_to_use=1, z_epsilon=1, z_init=1, z_learning_rate=0.5, z_optim=True)
data_split test
[('index', <torchtext.data.field.Field object at 0x2b935feeeb20>), ('src', <seq2seq.dataset.fields.TransSourceField object at 0x2b929f5d3ee0>), ('tgt', <seq2seq.dataset.fields.TargetField object at 0x2b935feee1c0>), ('transforms.Combined', <seq2seq.dataset.fields.TransSourceField object at 0x2b935feeeb80>)]
Original data size: 100
Attacking using Gradient transforms.Combined
OrderedDict([   ('version', 'v2'),
                ('n_alt_iters', 4),
          

# Visualization
In this section we provide visualization results for adversarial attacks and clean examples.

## clean example
![alt text](clean.png "Title")
## random site-selection + optimal site-perturbation
![alt text](random.png "Title")
## optimal site-selection + optimal site-perturbation
![alt text](optimal.png "Title")

In the following section, we will provide two transformers to be evaluated trained by:
* contrastive learning (contracode) [[Jain, et al., 2022](https://arxiv.org/abs/2007.04973)].
* robustness-aware contrastive learning (CLAW).

In [10]:
### download robustness-aware contrastive leanring models (CLAW)
!gdown --fuzzy https://drive.google.com/file/d/1SzrDr9-YcKqkit0KQmcysJQnN4WFMFKJ/view?usp=sharing
### download contrastive leanring models (contracode)
!gdown --fuzzy https://drive.google.com/file/d/1PhwELuRhEtTJEz_sH8PV9Xiq49BU0z7d/view?usp=sharing
!rm -r final-models/
!mkdir -p claw-sat/checkpoints
!mkdir -p contracode/checkpoints
!unzip -q claw-sat.zip -d ./claw-sat/checkpoints/Best_F1
!unzip -q contracode.zip -d ./contracode/checkpoints/Best_F1

Downloading...
From: https://drive.google.com/uc?id=1SzrDr9-YcKqkit0KQmcysJQnN4WFMFKJ
To: /mnt/ufs18/home-107/jiajingh/claw-sat.zip
100%|███████████████████████████████████████| 1.48G/1.48G [00:13<00:00, 112MB/s]


# Contracode [[Jain, et al., 2022]( https://arxiv.org/pdf/2007.04973.pdf)]
## Evaluation on clean examples
In the next step, you will observe that F1 score is higher than that from the supervised trained transformer . 

In [12]:
!python ./Transformer/gradient_attack.py \
    --data_path ./outputs/test.tsv \
    --expt_dir "contracode" \
    --load_checkpoint "Best_F1" \
    --save_path ./outputs/targets-test.json \
    --n_alt_iters 2 \
    --z_init 1 \
    --batch_size 32 \
    --u_pgd_epochs 0 \
    --z_epsilon 0 \
    --attack_version 2 \
    --u_learning_rate 0.5 \
    --z_learning_rate 0.5 \
    --smoothing_param 0.01 \
	--vocab_to_use 1 \
    --teacher_ratio 

!python ./Transformer/replace_tokens.py \
      --source_data_path ./outputs/test.tsv \
      --dest_data_path ./outputs/gradient-targeting/test.tsv \
      --mapping_json ./outputs/targets-test-gradient.json

!python3 ./Transformer/evaluate.py \
  --data_path ./outputs/gradient-targeting/test.tsv \
  --expt_dir "contracode" \
  --output_dir ./outputs \
  --load_checkpoint "Best_F1" \
  --src_field_name "transforms.Combined" --save 

Namespace(attack_version=2, batch_size=32, data_path='./outputs/test.tsv', distinct=True, exact_matches=False, expt_dir='contracode', load_checkpoint='Best_F1', n_alt_iters=2, no_gradient=False, num_replacements=1500, random=False, save_path='./outputs/targets-test.json', smoothing_param=0.01, teacher_ratio=True, u_accumulate_best_replacements=False, u_learning_rate=0.5, u_optim=False, u_pgd_epochs=0, u_rand_update_pgd=False, use_loss_smoothing=False, vocab_to_use=1, z_epsilon=0, z_init=1, z_learning_rate=0.5, z_optim=False)
data_split test
[('index', <torchtext.data.field.Field object at 0x2b3f7e553b20>), ('src', <seq2seq.dataset.fields.TransSourceField object at 0x2b3ebdc36ee0>), ('tgt', <seq2seq.dataset.fields.TargetField object at 0x2b3f7e5531c0>), ('transforms.Combined', <seq2seq.dataset.fields.TransSourceField object at 0x2b3f7e553b80>)]
Original data size: 100
Attacking using Gradient transforms.Combined
OrderedDict([   ('version', 'v2'),
                ('n_alt_iters', 4),
    

## Evaluation on adversarial examples
In the next step, you will observe that robust F1 score is higher than that from the supervised trained transformer . Here we use optimal site-selection + optimal site-perturbation to attack code snippets [[Srikant et al., 2021](https://openreview.net/forum?id=PH5PH9ZO_4)].

In [13]:
!python ./Transformer/gradient_attack.py \
    --data_path ./outputs/test.tsv \
    --expt_dir "contracode" \
    --load_checkpoint "Best_F1" \
    --save_path ./outputs/targets-test.json \
    --n_alt_iters 2 \
    --z_init 1 \
    --batch_size 16 \
    --u_pgd_epochs 3 \
    --z_epsilon 5 \
    --attack_version 2 \
    --u_learning_rate 0.5 \
    --z_learning_rate 0.5 \
    --smoothing_param 0.01 \
	--vocab_to_use 1 \
    --teacher_ratio \
    --u_optim \
    --z_optim

!python ./Transformer/replace_tokens.py \
      --source_data_path ./outputs/test.tsv \
      --dest_data_path ./outputs/gradient-targeting/test.tsv \
      --mapping_json ./outputs/targets-test-gradient.json

!python3 ./Transformer/evaluate.py \
  --data_path ./outputs/gradient-targeting/test.tsv \
  --expt_dir "contracode" \
  --output_dir ./outputs \
  --load_checkpoint "Best_F1" \
  --src_field_name "transforms.Combined" --save 

Namespace(attack_version=2, batch_size=16, data_path='./outputs/test.tsv', distinct=True, exact_matches=False, expt_dir='contracode', load_checkpoint='Best_F1', n_alt_iters=2, no_gradient=False, num_replacements=1500, random=False, save_path='./outputs/targets-test.json', smoothing_param=0.01, teacher_ratio=True, u_accumulate_best_replacements=False, u_learning_rate=0.5, u_optim=True, u_pgd_epochs=3, u_rand_update_pgd=False, use_loss_smoothing=False, vocab_to_use=1, z_epsilon=5, z_init=1, z_learning_rate=0.5, z_optim=True)
data_split test
[('index', <torchtext.data.field.Field object at 0x2b97da806b20>), ('src', <seq2seq.dataset.fields.TransSourceField object at 0x2b9719eedee0>), ('tgt', <seq2seq.dataset.fields.TargetField object at 0x2b97da8061c0>), ('transforms.Combined', <seq2seq.dataset.fields.TransSourceField object at 0x2b97da806b80>)]
Original data size: 100
Attacking using Gradient transforms.Combined
OrderedDict([   ('version', 'v2'),
                ('n_alt_iters', 4),
      

# CLAW [[Jia, et al., 2022]()]
## Clean example
In the next step, you will observe that F1 score is higher than that from the transformer trained by contrastive learning.

In [14]:
!python ./Transformer/gradient_attack.py \
    --data_path ./outputs/test.tsv \
    --expt_dir "claw-sat" \
    --load_checkpoint "Best_F1" \
    --save_path ./outputs/targets-test.json \
    --n_alt_iters 2 \
    --z_init 1 \
    --batch_size 32 \
    --u_pgd_epochs 0 \
    --z_epsilon 0 \
    --attack_version 2 \
    --u_learning_rate 0.5 \
    --z_learning_rate 0.5 \
    --smoothing_param 0.01 \
	--vocab_to_use 1 \
    --teacher_ratio 

!python ./Transformer/replace_tokens.py \
      --source_data_path ./outputs/test.tsv \
      --dest_data_path ./outputs/gradient-targeting/test.tsv \
      --mapping_json ./outputs/targets-test-gradient.json

!python3 ./Transformer/evaluate.py \
  --data_path ./outputs/gradient-targeting/test.tsv \
  --expt_dir "claw-sat" \
  --output_dir ./outputs \
  --load_checkpoint "Best_F1" \
  --src_field_name "transforms.Combined" --save 

Namespace(attack_version=2, batch_size=32, data_path='./outputs/test.tsv', distinct=True, exact_matches=False, expt_dir='claw-sat', load_checkpoint='Best_F1', n_alt_iters=2, no_gradient=False, num_replacements=1500, random=False, save_path='./outputs/targets-test.json', smoothing_param=0.01, teacher_ratio=True, u_accumulate_best_replacements=False, u_learning_rate=0.5, u_optim=False, u_pgd_epochs=0, u_rand_update_pgd=False, use_loss_smoothing=False, vocab_to_use=1, z_epsilon=0, z_init=1, z_learning_rate=0.5, z_optim=False)
data_split test
[('index', <torchtext.data.field.Field object at 0x2b918c6dcb20>), ('src', <seq2seq.dataset.fields.TransSourceField object at 0x2b90cbdc5ee0>), ('tgt', <seq2seq.dataset.fields.TargetField object at 0x2b918c6dc1c0>), ('transforms.Combined', <seq2seq.dataset.fields.TransSourceField object at 0x2b918c6dcb80>)]
Original data size: 100
Attacking using Gradient transforms.Combined
OrderedDict([   ('version', 'v2'),
                ('n_alt_iters', 4),
      

## adversarial examples
In the next step, you will observe that robust F1 score is higher than that from the transformer trained by contrastive learning. Here we use optimal site-selection + optimal site-perturbation to attack code snippets [[Srikant et al., 2021](https://openreview.net/forum?id=PH5PH9ZO_4)].

In [15]:
!python ./Transformer/gradient_attack.py \
    --data_path ./outputs/test.tsv \
    --expt_dir "claw-sat" \
    --load_checkpoint "Best_F1" \
    --save_path ./outputs/targets-test.json \
    --n_alt_iters 2 \
    --z_init 1 \
    --batch_size 16 \
    --u_pgd_epochs 3 \
    --z_epsilon 5 \
    --attack_version 2 \
    --u_learning_rate 0.5 \
    --z_learning_rate 0.5 \
    --smoothing_param 0.01 \
	--vocab_to_use 1 \
    --teacher_ratio \
    --u_optim \
    --z_optim

!python ./Transformer/replace_tokens.py \
      --source_data_path ./outputs/test.tsv \
      --dest_data_path ./outputs/gradient-targeting/test.tsv \
      --mapping_json ./outputs/targets-test-gradient.json

!python3 ./Transformer/evaluate.py \
  --data_path ./outputs/gradient-targeting/test.tsv \
  --expt_dir "claw-sat" \
  --output_dir ./outputs \
  --load_checkpoint "Best_F1" \
  --src_field_name "transforms.Combined" --save 

Namespace(attack_version=2, batch_size=16, data_path='./outputs/test.tsv', distinct=True, exact_matches=False, expt_dir='claw-sat', load_checkpoint='Best_F1', n_alt_iters=2, no_gradient=False, num_replacements=1500, random=False, save_path='./outputs/targets-test.json', smoothing_param=0.01, teacher_ratio=True, u_accumulate_best_replacements=False, u_learning_rate=0.5, u_optim=True, u_pgd_epochs=3, u_rand_update_pgd=False, use_loss_smoothing=False, vocab_to_use=1, z_epsilon=5, z_init=1, z_learning_rate=0.5, z_optim=True)
data_split test
[('index', <torchtext.data.field.Field object at 0x2b9a7af3ab20>), ('src', <seq2seq.dataset.fields.TransSourceField object at 0x2b99ba621ee0>), ('tgt', <seq2seq.dataset.fields.TargetField object at 0x2b9a7af3a1c0>), ('transforms.Combined', <seq2seq.dataset.fields.TransSourceField object at 0x2b9a7af3ab80>)]
Original data size: 100
Attacking using Gradient transforms.Combined
OrderedDict([   ('version', 'v2'),
                ('n_alt_iters', 4),
        