### Reverse adv masking test

I found a really weird result (end of previous notebook) that showed that the model can identify out 20% of people even when every non-stopword is masked out. So like 80% or more of the words are masked sometimes. Yet the adversarial examples at `k=1` mask out only 10% of words, and it seems to be the same words! 

What I'm wondering is if I can take the adversarial examples, then mask *more* words, but flip the prediction from incorrect to correct.

In [1]:
import sys
sys.path.append('/home/jxm3/research/deidentification/unsupervised-deidentification')

In [2]:
from dataloader import WikipediaDataModule
import os

num_cpus = os.cpu_count()

dm = WikipediaDataModule(
    document_model_name_or_path="roberta-base",
    profile_model_name_or_path="google/tapas-base",
    max_seq_length=128,
    dataset_name='wiki_bio',
    dataset_train_split='train[:1024]', # not used in this notebook
    dataset_val_split='val[:20%]',
    dataset_version='1.2.0',
    word_dropout_ratio=0.0,
    word_dropout_perc=0.0,
    num_workers=1,
    train_batch_size=64,
    eval_batch_size=64
)
dm.setup("fit")

Initializing WikipediaDataModule with num_workers = 1 and mask token `<mask>`
loading wiki_bio[1.2.0] split train[:1024]


Using custom data configuration default
Reusing dataset wiki_bio (/home/jxm3/.cache/huggingface/datasets/wiki_bio/default/1.2.0/c05ce066e9026831cd7535968a311fc80f074b58868cfdffccbc811dff2ab6da)


loading wiki_bio[1.2.0] split val[:20%]


Using custom data configuration default
Reusing dataset wiki_bio (/home/jxm3/.cache/huggingface/datasets/wiki_bio/default/1.2.0/c05ce066e9026831cd7535968a311fc80f074b58868cfdffccbc811dff2ab6da)
Loading cached processed dataset at /home/jxm3/.cache/huggingface/datasets/wiki_bio/default/1.2.0/c05ce066e9026831cd7535968a311fc80f074b58868cfdffccbc811dff2ab6da/cache-793b771e10f80bbe.arrow
Loading cached processed dataset at /home/jxm3/.cache/huggingface/datasets/wiki_bio/default/1.2.0/c05ce066e9026831cd7535968a311fc80f074b58868cfdffccbc811dff2ab6da/cache-7d07543b6205ca87.arrow
Loading cached processed dataset at /home/jxm3/.cache/huggingface/datasets/wiki_bio/default/1.2.0/c05ce066e9026831cd7535968a311fc80f074b58868cfdffccbc811dff2ab6da/cache-7440752484ad8676.arrow
Loading cached processed dataset at /home/jxm3/.cache/huggingface/datasets/wiki_bio/default/1.2.0/c05ce066e9026831cd7535968a311fc80f074b58868cfdffccbc811dff2ab6da/cache-2c6f94b0d2dcc153.arrow


In [3]:
from model import CoordinateAscentModel

from model_cfg import model_paths_dict

model_key = "model_5"

checkpoint_path = model_paths_dict[model_key]
print(f"loading {model_key} from path:", checkpoint_path)


model = CoordinateAscentModel.load_from_checkpoint(
    checkpoint_path,
    document_model_name_or_path="roberta-base",
    profile_model_name_or_path="google/tapas-base",
    learning_rate=1e-5,
    pretrained_profile_encoder=False,
    lr_scheduler_factor=0.5,
    lr_scheduler_patience=1,
    train_batch_size=1,
    num_workers=1,
    gradient_clip_val=10.0,
)

loading model_5 from path: /home/jxm3/research/deidentification/unsupervised-deidentification/saves/ca__roberta__tapas__adv/deid-wikibio-2_default/236desyb_444/checkpoints/epoch=22-step=104718.ckpt


Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Initialized model with learning_rate = 1e-05 and patience 1


In [4]:
import textattack

class WordSwapSingleWord(textattack.transformations.word_swap.WordSwap):
    """Takes a sentence and transforms it by replacing with a single fixed word.
    """
    single_word: str
    def __init__(self, single_word: str = "?", **kwargs):
        super().__init__(**kwargs)
        self.single_word = single_word

    def _get_replacement_words(self, _word: str):
        return [self.single_word]

transformation = WordSwapSingleWord(single_word=dm.document_tokenizer.mask_token)
transformation(textattack.shared.AttackedText("Hello my name is Jack"))

[<AttackedText "<mask> my name is Jack">,
 <AttackedText "Hello <mask> name is Jack">,
 <AttackedText "Hello my <mask> is Jack">,
 <AttackedText "Hello my name <mask> Jack">,
 <AttackedText "Hello my name is <mask>">]

In [8]:
import numpy as np
import torch
import tqdm

def precompute_profile_embeddings():
    model.profile_model.cuda()
    model.profile_model.eval()

    model.val_profile_embeddings = np.zeros((len(dm.val_dataset), model.profile_embedding_dim))
    for val_batch in tqdm.tqdm(dm.val_dataloader()[0], desc="Precomputing val embeddings", colour="green", leave=False):
        with torch.no_grad():
            profile_embeddings = model.forward_profile(batch=val_batch)
        model.val_profile_embeddings[val_batch["text_key_id"]] = profile_embeddings.cpu()
    model.val_profile_embeddings = torch.tensor(model.val_profile_embeddings, dtype=torch.float32)
    model.profile_model.train()

precompute_profile_embeddings()

                                                                              1.42it/s]

In [10]:
from typing import List

import transformers
from model.model import Model

class MyModelWrapper(textattack.models.wrappers.ModelWrapper):
    model: Model
    tokenizer: transformers.AutoTokenizer
    profile_embeddings: torch.Tensor
    max_seq_length: int
    
    def __init__(self, model: Model, tokenizer: transformers.AutoTokenizer, max_seq_length: int = 128):
        self.model = model
        self.model.eval()
        self.tokenizer = tokenizer
        self.profile_embeddings = torch.tensor(model.val_profile_embeddings)
        self.max_seq_length = max_seq_length
                 
    def to(self, device):
        self.model.to(device)
        self.profile_embeddings.to(device)
        return self # so semantics `model = MyModelWrapper().to('cuda')` works properly

    def __call__(self, text_input_list: List[str], batch_size=32):
        model_device = next(self.model.parameters()).device
        
        doc_tokenized = self.tokenizer.batch_encode_plus(
            text_input_list,
            max_length=self.max_seq_length,
            padding='max_length',
            truncation=True,
            return_tensors='pt',
        )
        doc_tokenized = {f'document__{k}': v for k,v in doc_tokenized.items()}
        with torch.no_grad():
            document_embeddings = self.model.forward_document(batch=doc_tokenized, document_type='document')
            document_to_profile_logits = document_embeddings @ self.profile_embeddings.T.to(model_device)
            document_to_profile_probs = torch.nn.functional.softmax(
                document_to_profile_logits, dim=-1
            )
        assert document_to_profile_probs.shape == (len(text_input_list), len(self.profile_embeddings))
        return document_to_profile_probs
            

In [34]:
class GroundTruthTargetedClassification(textattack.goal_functions.ClassificationGoalFunction):
    """A targeted attack on classification models which attempts to maximize
    the score of the target label.
    Complete when the arget label is the predicted label.
    """

    def __init__(self, *args, target_class=0, **kwargs):
        super().__init__(*args, **kwargs)

    def _is_goal_complete(self, model_output, _):
        return (
            self.ground_truth_output == model_output.argmax()
        )

    def _get_score(self, model_output, _):
        if self.ground_truth_output < 0 or self.ground_truth_output >= len(model_output):
            raise ValueError(
                f"target class set to {self.ground_truth_output} with {len(model_output)} classes."
            )
        else:
            return model_output[self.ground_truth_output]

    def extra_repr_keys(self):
        if self.maximizable:
            return ["maximizable"]
        else:
            return []

In [30]:
from typing import List, Tuple

from collections import OrderedDict

import datasets

class WikiDataset(textattack.datasets.Dataset):
    examples: List[str]
    
    def __init__(self, dm: WikipediaDataModule, examples: List[str]):
        self.shuffled = True
        self.dm = dm
        self.examples = examples
        self.label_names = list(dm.val_dataset['name'])
    
    def __len__(self) -> int:
        return len(self.examples)
    
    def __getitem__(self, i: int) -> Tuple[OrderedDict, int]:
        input_dict = OrderedDict([
            ('document', self.examples[i])
        ])
        return input_dict, self.dm.val_dataset[i]['text_key_id']
        

In [13]:
from textattack.loggers import CSVLogger
from textattack.shared import AttackedText

import pandas as pd
class CustomCSVLogger(CSVLogger):
    """Logs attack results to a CSV."""

    def log_attack_result(self, result: textattack.goal_function_results.ClassificationGoalFunctionResult):
        # TODO print like 'mask1', 'mask2', ...
        original_text, perturbed_text = result.diff_color(self.color_method)
        original_text = original_text.replace("\n", AttackedText.SPLIT_TOKEN)
        perturbed_text = perturbed_text.replace("\n", AttackedText.SPLIT_TOKEN)
        result_type = result.__class__.__name__.replace("AttackResult", "")
        row = {
            "original_person": result.original_result._processed_output[0],
            "original_text": original_text,
            "perturbed_person": result.perturbed_result._processed_output[0],
            "perturbed_text": perturbed_text,
            "original_score": result.original_result.score,
            "perturbed_score": result.perturbed_result.score,
            "original_output": result.original_result.output,
            "perturbed_output": result.perturbed_result.output,
            "ground_truth_output": result.original_result.ground_truth_output,
            "num_queries": result.num_queries,
            "result_type": result_type,
        }
        self.df = pd.concat([self.df, pd.DataFrame([row])], ignore_index=True)
        self._flushed = False

In [14]:
import pandas as pd
df = pd.read_csv('../adv_csvs/model_5/results_1_100.csv')

In [19]:
pt = df['perturbed_text'].apply(lambda s: s.replace('<SPLIT>', '\n')).to_list()
print(len(pt))
pt[3]

100


'<mask> <mask> , (born march 14 , <mask>) is a professional squash player who represents france .\nshe reached a career-high world ranking of world no. 101 in july <mask> .\n'

In [22]:
model_wrapper = MyModelWrapper(model=model, tokenizer=dm.document_tokenizer)
model_wrapper.to('cuda')

  self.profile_embeddings = torch.tensor(model.val_profile_embeddings)


<__main__.MyModelWrapper at 0x7f682a66bb50>

In [62]:
from textattack.shared.validators import transformation_consists_of_word_swaps

class MaskModification(textattack.constraints.PreTransformationConstraint):
    """A constraint disallowing the modification of 'mask' words."""
    
    def _get_modifiable_indices(self, current_text):
        """Returns the word indices in ``current_text`` which are able to be
        modified."""
        non_mask_indices = set()
        for i, word in enumerate(current_text.words):
            if word.lower() not in ['mask', '[mask]', '<mask>']:
                non_mask_indices.add(i)
        return non_mask_indices

    def check_compatibility(self, transformation):
        """The stopword constraint only is concerned with word swaps since
        paraphrasing phrases containing stopwords is OK.
        Args:
            transformation: The ``Transformation`` to check compatibility with.
        """
        return transformation_consists_of_word_swaps(transformation)

In [63]:
# 
#  Initialize attack
# 

from textattack import Attack
from textattack.goal_functions import TargetedClassification
from textattack.constraints.pre_transformation import RepeatModification, MaxWordIndexModification

goal_function = GroundTruthTargetedClassification(model_wrapper)
constraints = [
    RepeatModification(),
    MaskModification(),
    MaxWordIndexModification(max_length=dm.max_seq_length),
]
transformation = WordSwapSingleWord(single_word=dm.document_tokenizer.mask_token)
search_method = textattack.search_methods.BeamSearch(beam_width=4)

attack = Attack(
    goal_function, constraints, transformation, search_method
)

from tqdm import tqdm # tqdm provides us a nice progress bar.
from textattack.attack_results import SuccessfulAttackResult
from textattack import Attacker
from textattack import AttackArgs

attack_args = AttackArgs(num_examples=10, disable_stdout=True)
dataset = WikiDataset(dm, examples=pt)

attacker = Attacker(attack, dataset, attack_args)

results_iterable = attacker.attack_dataset()

logger = CustomCSVLogger(color_method='html')

# 
# Run attack
# 
from tqdm import tqdm
for result in results_iterable:
    tqdm._instances.clear() # Doesn't fix the progress bar :-(
    logger.log_attack_result(result)

from IPython.display import display, HTML

def escape_mask(ex):
    ex["original_text"] = ex["original_text"].replace('<mask>', '[mask]')
    ex["perturbed_text"] = ex["perturbed_text"].replace('<mask>', '[mask]')
    return ex

display(HTML(logger.df.apply(escape_mask, axis=1).to_html(escape=False)))

textattack: No entry found for goal function <class '__main__.GroundTruthTargetedClassification'>.
textattack: Unknown if model of class <class 'model.coordinate_ascent.CoordinateAscentModel'> compatible with goal function <class '__main__.GroundTruthTargetedClassification'>.


Attack(
  (search_method): BeamSearch(
    (beam_width):  4
  )
  (goal_function):  GroundTruthTargetedClassification
  (transformation):  WordSwapSingleWord
  (constraints): 
    (0): RepeatModification
    (1): MaskModification
    (2): MaxWordIndexModification(
        (max_length):  128
      )
  (is_black_box):  True
) 




  0%|          | 0/10 [00:00<?, ?it/s][A
 10%|█         | 1/10 [00:00<00:02,  3.28it/s][A
[Succeeded / Failed / Skipped / Total] 1 / 0 / 0 / 1:  10%|█         | 1/10 [00:00<00:02,  3.22it/s][A
[Succeeded / Failed / Skipped / Total] 1 / 0 / 0 / 1:  20%|██        | 2/10 [00:00<00:02,  3.82it/s][A
[Succeeded / Failed / Skipped / Total] 2 / 0 / 0 / 2:  20%|██        | 2/10 [00:00<00:02,  3.79it/s][A
[Succeeded / Failed / Skipped / Total] 21 / 1 / 0 / 22:  73%|███████▎  | 22/30 [06:26<02:20, 17.56s/it]

[Succeeded / Failed / Skipped / Total] 3 / 0 / 0 / 3:  40%|████      | 4/10 [00:04<00:06,  1.01s/it][A
[Succeeded / Failed / Skipped / Total] 4 / 0 / 0 / 4:  40%|████      | 4/10 [00:04<00:06,  1.01s/it][A
[Succeeded / Failed / Skipped / Total] 5 / 0 / 0 / 5:  50%|█████     | 5/10 [00:04<00:04,  1.21it/s][A
[Succeeded / Failed / Skipped / Total] 5 / 0 / 0 / 5:  60%|██████    | 6/10 [00:07<00:05,  1.28s/it][A
[Succeeded / Failed / Skipped / Total] 5 / 1 / 0 / 6:  60%|██████    | 6/1


+-------------------------------+--------+
| Attack Results                |        |
+-------------------------------+--------+
| Number of successful attacks: | 9      |
| Number of failed attacks:     | 1      |
| Number of skipped attacks:    | 0      |
| Original accuracy:            | 100.0% |
| Accuracy under attack:        | 10.0%  |
| Attack success rate:          | 90.0%  |
| Average perturbed word %:     | 11.46% |
| Average num. words per input: | 37.2   |
| Avg num queries:              | 254.7  |
+-------------------------------+--------+


textattack: Logging to CSV at path results.csv
textattack: CSVLogger exiting without calling flush().





Unnamed: 0,original_person,original_text,perturbed_person,perturbed_text,original_score,perturbed_score,original_output,perturbed_output,ground_truth_output,num_queries,result_type
0,Halil hayreddin,"pope [mask] iii [mask] alexandria (also known as khail [mask]) was the [mask] pope of alexandria [mask] patriarch of the see of st. mark (880 -- [mask]) .in 882 , the governor of egypt , ahmad ibn tulun , forced khail to pay heavy contributions , forcing him to sell a church and some attached properties to the local jewish community .this building was at one time believed to have later become the site of the cairo geniza .",Michael iii of alexandria,"pope [mask] iii [mask] alexandria (also known as <mask> [mask]) was the [mask] pope of alexandria [mask] patriarch of the see of st. mark (880 -- [mask]) .in 882 , the governor of egypt , ahmad ibn tulun , forced khail to pay heavy contributions , forcing him to sell a church and some attached properties to the local jewish community .this building was at one time believed to have later become the site of the cairo geniza .",0.40561,0.918142,6582,0,0,68,Successful
1,Liu xiaolong,[mask] [mask] is a male former [mask] tennis player from china .,Hui jun,[mask] [mask] is a <mask> former [mask] <mask> <mask> from china .,0.015369,0.179927,10388,1,1,61,Successful
2,Adem büyük,[mask] [mask] (born 30 november [mask]) is a turkish professional footballer .he currently plays as a striker for [mask] [mask] .,Okan öztürk,[mask] [mask] (born 30 november [mask]) is a turkish professional footballer .he currently plays as a striker <mask> [mask] [mask] .,0.454004,0.60085,2279,2,2,16,Successful
3,Laura pomportes,"[mask] [mask] , (born march 14 , [mask]) is a professional squash player who represents france .she reached a career-high world ranking of world no. 101 in july [mask] .",Marie stephan,"[mask] [mask] , (born march <mask> , [mask]) <mask> a professional <mask> player <mask> <mask> france .<mask> <mask> a career-high <mask> ranking of <mask> <mask>. <mask> <mask> <mask> [mask] .",0.023271,0.51918,4726,3,3,816,Successful
4,Lester k. fryer,[mask] [mask]. [mask] is a former democratic member of the pennsylvania house of representatives .he was born in butler [mask] michael and angela pitullio martino .,Leonard l. martino,[mask] [mask]. [mask] is a former <mask> member of the pennsylvania house of representatives .he was born in butler [mask] michael and angela pitullio martino .,0.296919,0.550956,3564,4,4,22,Successful
5,Linda kozlowski,"[mask] [mask] (born [mask] 8 , [mask]) is an american stage , film and television actress .she is perhaps best known for portraying the female changeling on '' '' .",Patrick sabongui,"[mask] [mask] (<mask> [mask] <mask> , [mask]) <mask> <mask> <mask> <mask> , <mask> <mask> <mask> <mask> .<mask> <mask> <mask> <mask> <mask> <mask> <mask> <mask> <mask> <mask> <mask> '' '' .",0.004125,0.001457,9248,10061,5,862,Failed
6,Josh hamilton,"[mask] [mask] [mask] ([mask] [mask] [mask] , [mask]) , [mask] [mask] [mask] [mask] [mask] '' , [mask] [mask] [mask] [mask] [mask] [mask] [mask] [mask] [mask] [mask] [mask] [mask] of major league [mask] (mlb) .he bats and throws left-handed .[mask] was drafted by [mask] [mask] [mask] [mask] [mask] in the second round (52nd overall) of the 1999 major league baseball draft .he made his major league debut in 2002 .[mask] has more [mask] (121) than any other active baseball player .",Carl crawford,"[mask] [mask] [mask] ([mask] [mask] [mask] , [mask]) , [mask] [mask] [mask] [mask] [mask] '' , [mask] [mask] [mask] [mask] [mask] [mask] [mask] [mask] [mask] [mask] [mask] [mask] <mask> major league [mask] (mlb) .he <mask> and <mask> left-handed .[mask] was <mask> by [mask] [mask] [mask] [mask] [mask] in the second round (52nd overall) of the 1999 major league baseball draft .he made his major league debut in 2002 .[mask] has more [mask] (121) than any other active baseball player .",0.124548,0.389187,9080,6,6,523,Successful
7,David morrell,"[mask] [mask] (born [mask] neil morrison on 22 [mask] [mask]) is a [mask] musician and author , best known as the singer of indie punk band carter usm .",Jim bob,"[mask] [mask] (born [mask] neil morrison on 22 [mask] [mask]) is <mask> [mask] musician and author , best known as the singer of <mask> punk band carter usm .",0.033356,0.320616,7586,7,7,102,Successful
8,Jeff faulkner,"[mask] [mask] (born [mask] [mask] , [mask] in [mask] , virginia) is a former professional american football defensive lineman for the seattle seahawks , san diego chargers , new england patriots , baltimore ravens , and san francisco [mask] of the national football league .",Riddick parker,"[mask] [mask] (born [mask] [mask] , [mask] in [mask] , <mask>) is a former professional american football defensive lineman for the seattle seahawks , san diego chargers , new england patriots , baltimore ravens , and san francisco [mask] of the national football league .",0.330582,0.838798,102,8,8,32,Successful
9,Mariya borovichenko,blessed [mask] [mask] [mask] t.o.s.d. -lrb-) was a catholic visionary and anchoress from [mask] (kotor) .she was a teenage convert from orthodoxy of [mask] descent from [mask] (zeta) .she became a dominican tertiary and was posthumously venerated as a saint in [mask] .she was later beatified in 1934 .,Blessed osanna of cattaro -lrb- ozana kotorska -rrb-,blessed [mask] [mask] [mask] t.o.s.d. -lrb-) was a catholic visionary and anchoress from [mask] (kotor) .she was a teenage convert from orthodoxy of [mask] descent from [mask] (zeta) .she became a dominican tertiary and was posthumously venerated as a saint in [mask] .she was later beatified in <mask> .,0.265053,0.752448,9342,9,9,45,Successful


In [55]:
pd.set_option('max_colwidth', None) # show full width of showing cols

def escape_mask(ex):
    ex["original_text"] = ex["original_text"].replace('<mask>', '[mask]')
    ex["perturbed_text"] = ex["perturbed_text"].replace('<mask>', '[mask]')
    return ex

display(HTML(logger.df[["original_text", "perturbed_text"]].apply(escape_mask, axis=1).to_html(escape=False)))

Unnamed: 0,original_text,perturbed_text
0,"pope [mask] iii [mask] alexandria (also known as khail [mask]) was the [mask] pope of alexandria [mask] patriarch of the see of st. mark (880 -- [mask]) .in 882 , the governor of egypt , ahmad ibn tulun , forced khail to pay heavy contributions , forcing him to sell a church and some attached properties to the local jewish community .this building was at one time believed to have later become the site of the cairo geniza .","pope [mask] iii [mask] alexandria (also known as <mask> [mask]) was the [mask] pope of alexandria [mask] patriarch of the see of st. mark (880 -- [mask]) .in 882 , the governor of egypt , ahmad ibn tulun , forced khail to pay heavy contributions , forcing him to sell a church and some attached properties to the local jewish community .this building was at one time believed to have later become the site of the cairo geniza ."
1,[mask] [mask] is a male former [mask] tennis player from china .,<[mask]> [mask] is a <mask> former [mask] <mask> player from china .
2,[mask] [mask] (born 30 november [mask]) is a turkish professional footballer .he currently plays as a striker for [mask] [mask] .,[mask] [mask] (born 30 november [mask]) is a turkish professional footballer .he currently plays as a striker for <[mask]> [mask] .


## Wow, so weird!! 😅