# Gradient-based word deletion

I trained a model to "reidentify" individuals from information about them. Specifically, this model tries to read the beginning of a Wikipedia page and predict (given the infoboxes of many people's Wikipedia page) which person the page is about. Now I'm going to try and fool this "reidentifier" model, and see how many words I have to delete in order to fool the reidentifier a certain percentage of the time.

## 1. Load the model and make a prediction

In [1]:
import sys
sys.path.append('/home/jxm3/research/deidentification/unsupervised-deidentification')

In [2]:
from model import DocumentProfileMatchingTransformer

model = DocumentProfileMatchingTransformer(
    dataset_name='wiki_bio',
    model_name_or_path='distilbert-base-uncased',
    num_workers=1,
    loss_fn='exact',
    num_neighbors=2048,
    base_folder="/home/jxm3/research/deidentification/unsupervised-deidentification",
)

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Initialized DocumentProfileMatchingTransformer with learning_rate = 2e-05


In [3]:
from dataloader import WikipediaDataModule
import os

num_cpus = os.cpu_count()

dm = WikipediaDataModule(
    model_name_or_path='distilbert-base-uncased',
    dataset_name='wiki_bio',
    num_workers=min(8, num_cpus),
    train_batch_size=64,
    eval_batch_size=64,
    max_seq_length=64,
    redaction_strategy="",
    base_folder="/home/jxm3/research/deidentification/unsupervised-deidentification",
)
dm.setup("fit")

Initializing WikipediaDataModule with num_workers = 8


Using custom data configuration default
Reusing dataset wiki_bio (/home/jxm3/.cache/huggingface/datasets/wiki_bio/default/1.2.0/c05ce066e9026831cd7535968a311fc80f074b58868cfdffccbc811dff2ab6da)
Using custom data configuration default
Reusing dataset wiki_bio (/home/jxm3/.cache/huggingface/datasets/wiki_bio/default/1.2.0/c05ce066e9026831cd7535968a311fc80f074b58868cfdffccbc811dff2ab6da)
Loading cached processed dataset at /home/jxm3/.cache/huggingface/datasets/wiki_bio/default/1.2.0/c05ce066e9026831cd7535968a311fc80f074b58868cfdffccbc811dff2ab6da/cache-5535f82839d9fec4.arrow
Loading cached processed dataset at /home/jxm3/.cache/huggingface/datasets/wiki_bio/default/1.2.0/c05ce066e9026831cd7535968a311fc80f074b58868cfdffccbc811dff2ab6da/cache-5b1c3941089b7f1b.arrow
Loading cached processed dataset at /home/jxm3/.cache/huggingface/datasets/wiki_bio/default/1.2.0/c05ce066e9026831cd7535968a311fc80f074b58868cfdffccbc811dff2ab6da/cache-8a9b289bc8e70b72.arrow
Loading cached processed dataset at 

  0%|          | 0/15 [00:00<?, ?ba/s]

## 2. Define attack in TextAttack 

In [4]:
import textattack

### (a) Greedy word search + replace with `[MASK]`

In [5]:
search_method = textattack.search_methods.GreedyWordSwapWIR()

class WordSwapSingleWord(textattack.transformations.word_swap.WordSwap):
    """Takes a sentence and transforms it by replacing with a single fixed word.
    """
    single_word: str
    def __init__(self, single_word: str = "?", **kwargs):
        super().__init__(**kwargs)
        self.single_word = single_word

    def _get_replacement_words(self, _word: str):
        return [self.single_word]

In [6]:
transformation = WordSwapSingleWord(single_word='[MASK]')
transformation(textattack.shared.AttackedText("Hello my name is Jack"))

[<AttackedText "[MASK] my name is Jack">,
 <AttackedText "Hello [MASK] name is Jack">,
 <AttackedText "Hello my [MASK] is Jack">,
 <AttackedText "Hello my name [MASK] Jack">,
 <AttackedText "Hello my name is [MASK]">]

### (b) "Attack success" as fullfilment of the metric

In [7]:
from typing import List
import torch

class ChangeClassificationToBelowTopKClasses(textattack.goal_functions.ClassificationGoalFunction):
    k: int
    def __init__(self, *args, k: int = 1, **kwargs):
        self.k = k
        super().__init__(*args, **kwargs)

    def _is_goal_complete(self, model_output, _):
        original_class_score = model_output[self.ground_truth_output]
        num_better_classes = (model_output > original_class_score).sum()
        return num_better_classes >= self.k

    def _get_score(self, model_output, _):
        return 1 - model_output[self.ground_truth_output]


## (c) Model wrapper that computes similarities of input documents with validation profiles

In [49]:
import transformers

class MyModelWrapper(textattack.models.wrappers.ModelWrapper):
    model: DocumentProfileMatchingTransformer
    tokenizer: transformers.PreTrainedTokenizer
    profile_embeddings: torch.Tensor
    
    def __init__(self, model: DocumentProfileMatchingTransformer, tokenizer: transformers.PreTrainedTokenizer):
        self.model = model
        self.tokenizer = tokenizer
        self.profile_embeddings = torch.tensor(model.val_embeddings)
                 
    def to(self, device):
        self.model.to(device)
        self.profile_embeddings.to(device)
        return self # so semantics `model = MyModelWrapper().to('cuda')` works properly

    def __call__(self, text_input_list, batch_size=32):
        model_device = next(self.model.parameters()).device
        tokenized_ids = self.tokenizer(text_input_list)
        tokenized_ids = {k: torch.tensor(v).to(model_device) for k,v in tokenized_ids.items()}
        
        # TODO: implement batch size if we start running out of memory here.
        with torch.no_grad():
            document_embeddings = self.model.document_model(**tokenized_ids)
            document_embeddings = document_embeddings['last_hidden_state'][:, 0, :] # (batch, document_emb_dim)
            document_embeddings = self.model.lower_dim_embed(document_embeddings) # (batch, emb_dim)

        document_to_profile_probs = torch.nn.functional.softmax(
            document_embeddings @ self.profile_embeddings.T.to(model_device), dim=-1)
        assert document_to_profile_probs.shape == (len(text_input_list), len(self.profile_embeddings))
        return document_to_profile_probs
            

## (d) Dataset that loads Wikipedia documents with names as labels

In [9]:
next(iter(dm.val_dataloader()))

{'document_input_ids': tensor([[  101,  4831,  2745,  ..., 10722, 26896,   102],
         [  101, 17504, 12022,  ...,     0,     0,     0],
         [  101,  7929,  2319,  ...,     0,     0,     0],
         ...,
         [  101,  9033,  2860,  ..., 12681,  5283,   102],
         [  101,  7332, 27319,  ...,   102,     0,     0],
         [  101,  3958, 11463,  ...,     0,     0,     0]]),
 'document_attention_mask': tensor([[1, 1, 1,  ..., 1, 1, 1],
         [1, 1, 1,  ..., 0, 0, 0],
         [1, 1, 1,  ..., 0, 0, 0],
         ...,
         [1, 1, 1,  ..., 1, 1, 1],
         [1, 1, 1,  ..., 1, 0, 0],
         [1, 1, 1,  ..., 0, 0, 0]]),
 'document_redact_ner_input_ids': tensor([[  101,  4831,   103,  ...,  1010,  3140,   102],
         [  101, 17504, 12022,  ...,     0,     0,     0],
         [  101,   103,   103,  ...,     0,     0,     0],
         ...,
         [  101,  9033,  2860,  ..., 15810,  8840,   102],
         [  101,   103,   103,  ...,   102,     0,     0],
         [  1

In [20]:
from typing import Tuple

from collections import OrderedDict

import datasets

class WikiDataset(textattack.datasets.Dataset):
    dataset: datasets.Dataset
    
    def __init__(self, dm: WikipediaDataModule):
        self.shuffled = True
        self.dataset = dm.val_dataset
        self.label_names = list(dm.val_dataset['name'])
    
    def __len__(self) -> int:
        return len(self.dataset)
    
    def __getitem__(self, i: int) -> Tuple[OrderedDict, int]:
        input_dict = OrderedDict([
            ('document', self.dataset['document'][i])
        ])
        return input_dict, self.dataset['text_key_id'][i].item()
        

## 3. Run attack once

In [51]:
from textattack import Attack
from textattack.constraints.pre_transformation import RepeatModification

model_wrapper = MyModelWrapper(model, dm.tokenizer)
model_wrapper.to('cuda')

goal_function = ChangeClassificationToBelowTopKClasses(model_wrapper, k=10)
constraints = [RepeatModification()]
transformation = WordSwapSingleWord(single_word='[MASK]')
search_method = textattack.search_methods.GreedyWordSwapWIR()

attack = Attack(
    goal_function, constraints, transformation, search_method
)

textattack: No entry found for goal function <class '__main__.ChangeClassificationToBelowTopKClasses'>.
textattack: Unknown if model of class <class 'model.DocumentProfileMatchingTransformer'> compatible with goal function <class '__main__.ChangeClassificationToBelowTopKClasses'>.


In [52]:
from tqdm import tqdm # tqdm provides us a nice progress bar.
from textattack.loggers import CSVLogger # tracks a dataframe for us.
from textattack.attack_results import SuccessfulAttackResult
from textattack import Attacker
from textattack import AttackArgs


attack_args = AttackArgs(num_examples=4)
dataset = WikiDataset(dm)

attacker = Attacker(attack, dataset, attack_args)

results_iterable = attacker.attack_dataset()

logger = CSVLogger(color_method='html')

for result in results_iterable:
    print('result:', result)
    logger.log_attack_result(result)

from IPython.core.display import display, HTML

display(HTML(logger.df.to_html(escape=False)))

Attack(
  (search_method): GreedyWordSwapWIR(
    (wir_method):  unk
  )
  (goal_function):  ChangeClassificationToBelowTopKClasses
  (transformation):  WordSwapSingleWord
  (constraints): 
    (0): RepeatModification
  (is_black_box):  True
) 













  0%|          | 0/4 [00:00<?, ?it/s][A[A[A[A[A[A[A[A[A[A









[Succeeded / Failed / Skipped / Total] 0 / 0 / 1 / 1:  25%|██▌       | 1/4 [00:00<00:00, 12.11it/s][A[A[A[A[A[A[A[A[A[A









[Succeeded / Failed / Skipped / Total] 0 / 0 / 1 / 1:  50%|█████     | 2/4 [00:00<00:00, 15.43it/s][A[A[A[A[A[A[A[A[A[A









[Succeeded / Failed / Skipped / Total] 0 / 0 / 2 / 2:  50%|█████     | 2/4 [00:00<00:00, 15.14it/s][A[A[A[A[A[A[A[A[A[A









[Succeeded / Failed / Skipped / Total] 0 / 0 / 3 / 3:  75%|███████▌  | 3/4 [00:00<00:00, 16.97it/s][A[A[A[A[A[A[A[A[A[A

--------------------------------------------- Result 1 ---------------------------------------------

pope michael iii of alexandria -lrb- also known as khail iii -rrb- was the coptic pope of alexandria and patriarch of the see of st. mark -lrb- 880 -- 907 -rrb- .
in 882 , the governor of egypt , ahmad ibn tulun , forced khail to pay heavy contributions , forcing him to sell a church and some attached properties to the local jewish community .
this building was at one time believed to have later become the site of the cairo geniza .



--------------------------------------------- Result 2 ---------------------------------------------

hui jun is a male former table tennis player from china .



--------------------------------------------- Result 3 ---------------------------------------------

okan Öztürk -lrb- born 30 november 1977 -rrb- is a turkish professional footballer .
he currently plays as a striker for yeni malatyaspor .















[Succeeded / Failed / Skipped / Total] 0 / 0 / 4 / 4: 100%|██████████| 4/4 [00:00<00:00, 17.95it/s][A[A[A[A[A[A[A[A[A[A

--------------------------------------------- Result 4 ---------------------------------------------

marie stephan , -lrb- born march 14 , 1996 -rrb- is a professional squash player who represents france .
she reached a career-high world ranking of world no. 101 in july 2015 .




+-------------------------------+-------+
| Attack Results                |       |
+-------------------------------+-------+
| Number of successful attacks: | 0     |
| Number of failed attacks:     | 0     |
| Number of skipped attacks:    | 4     |
| Original accuracy:            | 0.0%  |
| Accuracy under attack:        | 0.0%  |
| Attack success rate:          | 0%    |
| Average perturbed word %:     | nan%  |
| Average num. words per input: | 34.75 |
| Avg num queries:              | nan   |
+-------------------------------+-------+


  average_perc_words_perturbed = self.perturbed_word_percentages.mean()
  ret = ret.dtype.type(ret / rcount)
  avg_num_queries = self.num_queries.mean()
textattack: Logging to CSV at path results.csv



result: Cody chupp (0%) --> [SKIPPED]

pope michael iii of alexandria -lrb- also known as khail iii -rrb- was the coptic pope of alexandria and patriarch of the see of st. mark -lrb- 880 -- 907 -rrb- .
in 882 , the governor of egypt , ahmad ibn tulun , forced khail to pay heavy contributions , forcing him to sell a church and some attached properties to the local jewish community .
this building was at one time believed to have later become the site of the cairo geniza .

result: Chris evans (0%) --> [SKIPPED]

hui jun is a male former table tennis player from china .

result: Anahita hemmati (0%) --> [SKIPPED]

okan Öztürk -lrb- born 30 november 1977 -rrb- is a turkish professional footballer .
he currently plays as a striker for yeni malatyaspor .

result: Samuel spokes (0%) --> [SKIPPED]

marie stephan , -lrb- born march 14 , 1996 -rrb- is a professional squash player who represents france .
she reached a career-high world ranking of world no. 101 in july 2015 .



  self.df = self.df.append(row, ignore_index=True)
  self.df = self.df.append(row, ignore_index=True)
  self.df = self.df.append(row, ignore_index=True)
  self.df = self.df.append(row, ignore_index=True)
  from IPython.core.display import display, HTML


Unnamed: 0,original_text,perturbed_text,original_score,perturbed_score,original_output,perturbed_output,ground_truth_output,num_queries,result_type
0,"pope michael iii of alexandria -lrb- also known as khail iii -rrb- was the coptic pope of alexandria and patriarch of the see of st. mark -lrb- 880 -- 907 -rrb- .in 882 , the governor of egypt , ahmad ibn tulun , forced khail to pay heavy contributions , forcing him to sell a church and some attached properties to the local jewish community .this building was at one time believed to have later become the site of the cairo geniza .","pope michael iii of alexandria -lrb- also known as khail iii -rrb- was the coptic pope of alexandria and patriarch of the see of st. mark -lrb- 880 -- 907 -rrb- .in 882 , the governor of egypt , ahmad ibn tulun , forced khail to pay heavy contributions , forcing him to sell a church and some attached properties to the local jewish community .this building was at one time believed to have later become the site of the cairo geniza .",0.999987,0.999987,7243,7243,0,1,Skipped
1,hui jun is a male former table tennis player from china .,hui jun is a male former table tennis player from china .,0.999541,0.999541,2378,2378,1,1,Skipped
2,okan Öztürk -lrb- born 30 november 1977 -rrb- is a turkish professional footballer .he currently plays as a striker for yeni malatyaspor .,okan Öztürk -lrb- born 30 november 1977 -rrb- is a turkish professional footballer .he currently plays as a striker for yeni malatyaspor .,0.999952,0.999952,755,755,2,1,Skipped
3,"marie stephan , -lrb- born march 14 , 1996 -rrb- is a professional squash player who represents france .she reached a career-high world ranking of world no. 101 in july 2015 .","marie stephan , -lrb- born march 14 , 1996 -rrb- is a professional squash player who represents france .she reached a career-high world ranking of world no. 101 in july 2015 .",0.999944,0.999944,10043,10043,3,1,Skipped


## 4. Run attack in loop and make plot for multiple values of $\epsilon$

In [None]:
## (4) code here