# The TextAttack ecosystem: search, transformations, and constraints

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/QData/TextAttack/blob/master/docs/2notebook/1_Introduction_and_Transformations.ipynb)

[![View Source on GitHub](https://img.shields.io/badge/github-view%20source-black.svg)](https://github.com/QData/TextAttack/blob/master/docs/2notebook/1_Introduction_and_Transformations.ipynb)

Please remember to run **pip3 install textattack[tensorflow]** in your notebook enviroment before the following codes:

An attack in TextAttack consists of four parts.

### Goal function

The **goal function** determines if the attack is successful or not. One common goal function is **untargeted classification**, where the attack tries to perturb an input to change its classification. 

### Search method
The **search method** explores the space of potential transformations and tries to locate a successful perturbation. Greedy search, beam search, and brute-force search are all examples of search methods.

### Transformation
A **transformation** takes a text input and transforms it, for example replacing words or phrases with similar ones, while trying not to change the meaning. Paraphrase and synonym substitution are two broad classes of transformations.

### Constraints
Finally, **constraints** determine whether or not a given transformation is valid. Transformations don't perfectly preserve syntax or semantics, so additional constraints can increase the probability that these qualities are preserved from the source to adversarial example. There are many types of constraints: overlap constraints that measure edit distance, syntactical  constraints check part-of-speech and grammar errors, and semantic constraints like language models and sentence encoders.

### A custom transformation

This lesson explains how to create a custom transformation. In TextAttack, many transformations involve *word swaps*: they take a word and try and find suitable substitutes. Some attacks focus on replacing characters with neighboring characters to create "typos" (these don't intend to preserve the grammaticality of inputs). Other attacks rely on semantics: they take a word and try to replace it with semantic equivalents.


### Banana word swap 

As an introduction to writing transformations for TextAttack, we're going to try a very simple transformation: one that replaces any given word with the word 'banana'. In TextAttack, there's an abstract `WordSwap` class that handles the heavy lifting of breaking sentences into words and avoiding replacement of stopwords. We can extend `WordSwap` and implement a single method, `_get_replacement_words`, to indicate to replace each word with 'banana'. 🍌

In [4]:
from textattack.transformations import WordSwap

# Import the model
import transformers
from textattack.models.wrappers import HuggingFaceModelWrapper

model = transformers.AutoModelForSequenceClassification.from_pretrained("textattack/bert-base-uncased-ag-news")
tokenizer = transformers.AutoTokenizer.from_pretrained("textattack/bert-base-uncased-ag-news")

model_wrapper = HuggingFaceModelWrapper(model, tokenizer)

# Create the goal function using the model
from textattack.goal_functions import UntargetedClassification
goal_function = UntargetedClassification(model_wrapper)

# Import the dataset
from textattack.datasets import HuggingFaceDataset
dataset = HuggingFaceDataset("ag_news", None, "test")

from textattack.search_methods import GreedySearch
from textattack.constraints.pre_transformation import RepeatModification, StopwordModification
from textattack import Attack

from tqdm import tqdm # tqdm provides us a nice progress bar.
from textattack.loggers.csv_logger import CSVLogger # tracks a dataframe for us.
from textattack.attack_results import SuccessfulAttackResult
from textattack import Attacker
from textattack import AttackArgs
from textattack.datasets import Dataset

textattack: Unknown if model of class <class 'transformers.models.bert.modeling_bert.BertForSequenceClassification'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.
Using custom data configuration default
Reusing dataset ag_news (/home/harsh1621/.cache/huggingface/datasets/ag_news/default/0.0.0/bc2bcb40336ace1a0374767fc29bb0296cdaf8a6da7298436239c54d79180548)


  0%|          | 0/2 [00:00<?, ?it/s]

textattack: Loading [94mdatasets[0m dataset [94mag_news[0m, split [94mtest[0m.


In [2]:
class BananaWordSwap(WordSwap):
    """ Transforms an input by replacing any word with 'banana'.
    """
    
    # We don't need a constructor, since our class doesn't require any parameters.

    def _get_replacement_words(self, word):
        """ Returns 'banana', no matter what 'word' was originally.
        
            Returns a list with one item, since `_get_replacement_words` is intended to
                return a list of candidate replacement words.
        """
        return ['banana']

### Creating the attack
Let's keep it simple: let's use a greedy search method, and let's not use any constraints for now. 

In [6]:
# We're going to use our Banana word swap class as the attack transformation.
transformation = BananaWordSwap() 
# We'll constrain modification of already modified indices and stopwords
constraints = [RepeatModification(),
               StopwordModification()]
# We'll use the Greedy search method
search_method = GreedySearch()
# Now, let's make the attack from the 4 components:
attack = Attack(goal_function, constraints, transformation, search_method)

Let's print our attack to see all the parameters:

In [7]:
print(attack)

Attack(
  (search_method): GreedySearch
  (goal_function):  UntargetedClassification
  (transformation):  BananaWordSwap
  (constraints): 
    (0): RepeatModification
    (1): StopwordModification
  (is_black_box):  True
)


In [8]:
print(dataset[0])

(OrderedDict([('text', "Fears for T N pension after talks Unions representing workers at Turner   Newall say they are 'disappointed' after talks with stricken parent firm Federal Mogul.")]), 2)


### Using the attack

Let's use our attack to successfully attack 10 samples.

In [9]:
attack_args = AttackArgs(num_examples=1)

attacker = Attacker(attack, dataset, attack_args)

attack_results = attacker.attack_dataset()

#The following legacy tutorial code shows how the Attack API works in detail.

#logger = CSVLogger(color_method='html')

#num_successes = 0
#i = 0
#while num_successes < 10:
    #result = next(results_iterable)
#    example, ground_truth_output = dataset[i]
#    i += 1
#    result = attack.attack(example, ground_truth_output)
#    if isinstance(result, SuccessfulAttackResult):
#        logger.log_attack_result(result)
#        num_successes += 1
#       print(f'{num_successes} of 10 successes complete.')

Attack(
  (search_method): GreedySearch
  (goal_function):  UntargetedClassification
  (transformation):  BananaWordSwap
  (constraints): 
    (0): RepeatModification
    (1): StopwordModification
  (is_black_box):  True
) 




  0%|          | 0/1 [00:00<?, ?it/s][A
100%|██████████| 1/1 [00:03<00:00,  3.63s/it][A

--------------------------------------------- Result 1 ---------------------------------------------



[Succeeded / Failed / Skipped / Total] 1 / 0 / 0 / 1: 100%|██████████| 1/1 [00:04<00:00,  4.06s/it][A


Fears for T N [[pension]] after [[talks]] [[Unions]] representing [[workers]] at Turner   Newall say they are '[[disappointed']] after talks with stricken parent firm Federal [[Mogul]].

Fears for T N [[banana]] after [[banana]] [[banana]] representing [[banana]] at Turner   Newall say they are '[[banana]] after talks with stricken parent firm Federal [[banana]].



+-------------------------------+--------+
| Attack Results                |        |
+-------------------------------+--------+
| Number of successful attacks: | 1      |
| Number of failed attacks:     | 0      |
| Number of skipped attacks:    | 0      |
| Original accuracy:            | 100.0% |
| Accuracy under attack:        | 0.0%   |
| Attack success rate:          | 100.0% |
| Average perturbed word %:     | 24.0%  |
| Average num. words per input: | 25.0   |
| Avg num queries:              | 94.0   |
+-------------------------------+--------+





# Clare Augmenter

In [10]:
from textattack.augmentation.recipes import CLAREAugmenter

clare_augmenter = CLAREAugmenter()

transformation = clare_augmenter.transformation

# We'll constrain modification of already modified indices and stopwords
constraints = [RepeatModification(),
               StopwordModification()]

# We'll use the Greedy search method
search_method = GreedySearch()

# Now, let's make the attack from the 4 components:
attack = Attack(goal_function, constraints, transformation, search_method)

print(attack)

If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`


Attack(
  (search_method): GreedySearch
  (goal_function):  UntargetedClassification
  (transformation):  CompositeTransformation(
    (0): WordSwapMaskedLM(
        (method):  bae
        (masked_lm_name):  RobertaForCausalLM
        (max_length):  512
        (max_candidates):  50
        (min_confidence):  0.0005
      )
    (1): WordInsertionMaskedLM(
        (masked_lm_name):  RobertaForCausalLM
        (max_length):  512
        (max_candidates):  50
        (min_confidence):  0.0
      )
    (2): WordMergeMaskedLM(
        (masked_lm_name):  RobertaForCausalLM
        (max_length):  512
        (max_candidates):  50
        (min_confidence):  0.005
      )
    )
  (constraints): 
    (0): RepeatModification
    (1): StopwordModification
  (is_black_box):  True
)


In [11]:
attack_args = AttackArgs(num_examples=1)

attacker = Attacker(attack, dataset, attack_args)

attack_results = attacker.attack_dataset()

Attack(
  (search_method): GreedySearch
  (goal_function):  UntargetedClassification
  (transformation):  CompositeTransformation(
    (0): WordSwapMaskedLM(
        (method):  bae
        (masked_lm_name):  RobertaForCausalLM
        (max_length):  512
        (max_candidates):  50
        (min_confidence):  0.0005
      )
    (1): WordInsertionMaskedLM(
        (masked_lm_name):  RobertaForCausalLM
        (max_length):  512
        (max_candidates):  50
        (min_confidence):  0.0
      )
    (2): WordMergeMaskedLM(
        (masked_lm_name):  RobertaForCausalLM
        (max_length):  512
        (max_candidates):  50
        (min_confidence):  0.005
      )
    )
  (constraints): 
    (0): RepeatModification
    (1): StopwordModification
  (is_black_box):  True
) 




  0%|          | 0/1 [00:00<?, ?it/s][A

2022-12-03 19:14:32,204 loading file /home/harsh1621/.flair/models/upos-english-fast/b631371788604e95f27b6567fe7220e4a7e8d03201f3d862e6204dbf90f9f164.0afb95b43b32509bf4fcc3687f7c64157d8880d08f813124c1bd371c3d8ee3f7
2022-12-03 19:14:32,253 SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, INTJ, PUNCT, VERB, PRON, NOUN, ADV, DET, ADJ, ADP, NUM, PROPN, CCONJ, PART, AUX, X, SYM, <START>, <STOP>




KeyboardInterrupt: 

# BackTranslation

In [15]:
from textattack.augmentation.recipes import BackTranslationAugmenter

back_trans_augmenter = BackTranslationAugmenter()

transformation = back_trans_augmenter.transformation

# We'll constrain modification of already modified indices and stopwords
constraints = [RepeatModification(),
               StopwordModification()]

# We'll use the Greedy search method
search_method = GreedySearch()

# Now, let's make the attack from the 4 components:
attack = Attack(goal_function, constraints, transformation, search_method)

print(attack)

Attack(
  (search_method): GreedySearch
  (goal_function):  UntargetedClassification
  (transformation):  BackTranslation
  (constraints): 
    (0): RepeatModification
    (1): StopwordModification
  (is_black_box):  True
)


In [18]:
attack_args = AttackArgs(num_examples=1)

attacker = Attacker(attack, dataset, attack_args)

attack_results = attacker.attack_dataset()

Attack(
  (search_method): GreedySearch
  (goal_function):  UntargetedClassification
  (transformation):  BackTranslation
  (constraints): 
    (0): RepeatModification
    (1): StopwordModification
  (is_black_box):  True
) 




`prepare_seq2seq_batch` is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular
`__call__` method to prepare your inputs and targets.

Here is a short example:

model_inputs = tokenizer(src_texts, text_target=tgt_texts, ...)

If you either need to use different keyword arguments for the source and target texts, you should do two calls like
this:

model_inputs = tokenizer(src_texts, ...)
labels = tokenizer(text_target=tgt_texts, ...)
model_inputs["labels"] = labels["input_ids"]

See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice.
For a more complete example, see the implementation of `prepare_seq2seq_batch`.



RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

# CheckList Augmenter

In [7]:
from textattack.augmentation.recipes import CheckListAugmenter

check_list_augmenter = CheckListAugmenter(pct_words_to_swap=0.2, transformations_per_example=5)

transformation = check_list_augmenter.transformation

# We'll constrain modification of already modified indices and stopwords
constraints = [RepeatModification(),
               StopwordModification()]

# We'll use the Greedy search method
search_method = GreedySearch()

# Now, let's make the attack from the 4 components:
attack = Attack(goal_function, constraints, transformation, search_method)

print(attack)

Attack(
  (search_method): GreedySearch
  (goal_function):  UntargetedClassification
  (transformation):  CompositeTransformation(
    (0): WordSwapChangeNumber
    (1): WordSwapChangeLocation
    (2): WordSwapChangeName
    (3): WordSwapExtend
    (4): WordSwapContract
    )
  (constraints): 
    (0): RepeatModification
    (1): StopwordModification
  (is_black_box):  True
)


In [9]:
attack_args = AttackArgs(num_examples=10)

attacker = Attacker(attack, dataset, attack_args)

attack_results = attacker.attack_dataset()

Attack(
  (search_method): GreedySearch
  (goal_function):  UntargetedClassification
  (transformation):  CompositeTransformation(
    (0): WordSwapChangeNumber
    (1): WordSwapChangeLocation
    (2): WordSwapChangeName
    (3): WordSwapExtend
    (4): WordSwapContract
    )
  (constraints): 
    (0): RepeatModification
    (1): StopwordModification
  (is_black_box):  True
) 



[Succeeded / Failed / Skipped / Total] 0 / 1 / 0 / 1:  10%|█         | 1/10 [00:00<00:01,  6.64it/s]

--------------------------------------------- Result 1 ---------------------------------------------

Fears for T N pension after talks Unions representing workers at Turner   Newall say they are 'disappointed' after talks with stricken parent firm Federal Mogul.




[Succeeded / Failed / Skipped / Total] 0 / 3 / 0 / 3:  30%|███       | 3/10 [00:01<00:04,  1.64it/s]

--------------------------------------------- Result 2 ---------------------------------------------

The Race is On: Second Private Team Sets Launch Date for Human Spaceflight (SPACE.com) SPACE.com - TORONTO, Canada -- A second\team of rocketeers competing for the  #36;10 million Ansari X Prize, a contest for\privately funded suborbital space flight, has officially announced the first\launch date for its manned rocket.


--------------------------------------------- Result 3 ---------------------------------------------

Ky. Company Wins Grant to Study Peptides (AP) AP - A company founded by a chemistry researcher at the University of Louisville won a grant to develop a method of producing better peptides, which are short chains of amino acids, the building blocks of proteins.




[Succeeded / Failed / Skipped / Total] 0 / 5 / 0 / 5:  50%|█████     | 5/10 [00:02<00:02,  1.70it/s]

--------------------------------------------- Result 4 ---------------------------------------------

Prediction Unit Helps Forecast Wildfires (AP) AP - It's barely dawn when Mike Fitzpatrick starts his shift with a blur of colorful maps, figures and endless charts, but already he knows what the day will bring. Lightning will strike in places he expects. Winds will pick up, moist places will dry and flames will roar.


--------------------------------------------- Result 5 ---------------------------------------------

Calif. Aims to Limit Farm-Related Smog (AP) AP - Southern California's smog-fighting agency went after emissions of the bovine variety Friday, adopting the nation's first rules to reduce air pollution from dairy cow manure.




[Succeeded / Failed / Skipped / Total] 0 / 6 / 0 / 6:  60%|██████    | 6/10 [00:03<00:02,  1.73it/s]

--------------------------------------------- Result 6 ---------------------------------------------

Open Letter Against British Copyright Indoctrination in Schools The British Department for Education and Skills (DfES) recently launched a "Music Manifesto" campaign, with the ostensible intention of educating the next generation of British musicians. Unfortunately, they also teamed up with the music industry (EMI, and various artists) to make this popular. EMI has apparently negotiated their end well, so that children in our schools will now be indoctrinated about the illegality of downloading music.The ignorance and audacity of this got to me a little, so I wrote an open letter to the DfES about it. Unfortunately, it's pedantic, as I suppose you have to be when writing to goverment representatives. But I hope you find it useful, and perhaps feel inspired to do something similar, if or when the same thing has happened in your area.




[Succeeded / Failed / Skipped / Total] 0 / 7 / 0 / 7:  70%|███████   | 7/10 [00:15<00:06,  2.18s/it]

--------------------------------------------- Result 7 ---------------------------------------------

Loosing the War on Terrorism \\"Sven Jaschan, self-confessed author of the Netsky and Sasser viruses, is\responsible for 70 percent of virus infections in 2004, according to a six-month\virus roundup published Wednesday by antivirus company Sophos."\\"The 18-year-old Jaschan was taken into custody in Germany in May by police who\said he had admitted programming both the Netsky and Sasser worms, something\experts at Microsoft confirmed. (A Microsoft antivirus reward program led to the\teenager's arrest.) During the five months preceding Jaschan's capture, there\were at least 25 variants of Netsky and one of the port-scanning network worm\Sasser."\\"Graham Cluley, senior technology consultant at Sophos, said it was staggeri ...\\




[Succeeded / Failed / Skipped / Total] 0 / 9 / 0 / 9:  90%|█████████ | 9/10 [00:16<00:01,  1.81s/it]

--------------------------------------------- Result 8 ---------------------------------------------

FOAFKey: FOAF, PGP, Key Distribution, and Bloom Filters \\FOAF/LOAF  and bloom filters have a lot of interesting properties for social\network and whitelist distribution.\\I think we can go one level higher though and include GPG/OpenPGP key\fingerpring distribution in the FOAF file for simple web-of-trust based key\distribution.\\What if we used FOAF and included the PGP key fingerprint(s) for identities?\This could mean a lot.  You include the PGP key fingerprints within the FOAF\file of your direct friends and then include a bloom filter of the PGP key\fingerprints of your entire whitelist (the source FOAF file would of course need\to be encrypted ).\\Your whitelist would be populated from the social network as your client\discovered new identit ...\\


--------------------------------------------- Result 9 ---------------------------------------------

E-mail scam targets police ch

[Succeeded / Failed / Skipped / Total] 0 / 10 / 0 / 10: 100%|██████████| 10/10 [00:17<00:00,  1.75s/it]

--------------------------------------------- Result 10 ---------------------------------------------

Card fraud unit nets 36,000 cards In its first two years, the UK's dedicated card fraud unit, has recovered 36,000 stolen cards and 171 arrests - and estimates it saved 65m.



+-------------------------------+--------+
| Attack Results                |        |
+-------------------------------+--------+
| Number of successful attacks: | 0      |
| Number of failed attacks:     | 10     |
| Number of skipped attacks:    | 0      |
| Original accuracy:            | 100.0% |
| Accuracy under attack:        | 100.0% |
| Attack success rate:          | 0.0%   |
| Average perturbed word %:     | nan%   |
| Average num. words per input: | 63.0   |
| Avg num queries:              | 23.6   |
+-------------------------------+--------+





# WordNet Augmenter

In [14]:
from textattack.augmentation.recipes import WordNetAugmenter

word_net_augmenter = WordNetAugmenter(pct_words_to_swap=0.4, transformations_per_example=5, high_yield=True, enable_advanced_metrics=True)

transformation = check_list_augmenter.transformation

# We'll constrain modification of already modified indices and stopwords
# constraints = [RepeatModification(),
#                StopwordModification()]
constraints = []

# We'll use the Greedy search method
search_method = GreedySearch()

# Now, let's make the attack from the 4 components:
attack = Attack(goal_function, constraints, transformation, search_method)

print(attack)

Attack(
  (search_method): GreedySearch
  (goal_function):  UntargetedClassification
  (transformation):  CompositeTransformation(
    (0): WordSwapChangeNumber
    (1): WordSwapChangeLocation
    (2): WordSwapChangeName
    (3): WordSwapExtend
    (4): WordSwapContract
    )
  (constraints): None
  (is_black_box):  True
)


[nltk_data] Downloading package omw-1.4 to
[nltk_data]     /home/harsh1621/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


In [15]:
attack_args = AttackArgs(num_examples=3)

attacker = Attacker(attack, dataset, attack_args)

attack_results = attacker.attack_dataset()

Attack(
  (search_method): GreedySearch
  (goal_function):  UntargetedClassification
  (transformation):  CompositeTransformation(
    (0): WordSwapChangeNumber
    (1): WordSwapChangeLocation
    (2): WordSwapChangeName
    (3): WordSwapExtend
    (4): WordSwapContract
    )
  (constraints): None
  (is_black_box):  True
) 



  0%|          | 0/3 [00:00<?, ?it/s]

KeyboardInterrupt: 

# Synonym Insertion

In [3]:
from textattack.augmentation.recipes import SynonymInsertionAugmenter

synonym_augmenter = SynonymInsertionAugmenter(pct_words_to_swap=0.4, transformations_per_example=3, high_yield=True, enable_advanced_metrics=True)

transformation = synonym_augmenter.transformation

# We'll constrain modification of already modified indices and stopwords
# constraints = [RepeatModification(),
#                StopwordModification()]
constraints = []

# We'll use the Greedy search method
search_method = GreedySearch()

# Now, let's make the attack from the 4 components:
attack = Attack(goal_function, constraints, transformation, search_method)

print(attack)

Attack(
  (search_method): GreedySearch
  (goal_function):  UntargetedClassification
  (transformation):  WordInsertionRandomSynonym
  (constraints): None
  (is_black_box):  True
)


In [4]:
attack_args = AttackArgs(num_examples=3)

attacker = Attacker(attack, dataset, attack_args)

attack_results = attacker.attack_dataset()

Attack(
  (search_method): GreedySearch
  (goal_function):  UntargetedClassification
  (transformation):  WordInsertionRandomSynonym
  (constraints): None
  (is_black_box):  True
) 



 33%|███▎      | 1/3 [01:57<03:54, 117.38s/it]

--------------------------------------------- Result 1 ---------------------------------------------


[Succeeded / Failed / Skipped / Total] 1 / 0 / 0 / 1:  33%|███▎      | 1/3 [01:57<03:55, 117.74s/it]


Fears for T N pension after talks Unions representing workers at Turner   Newall say they are 'disappointed' after talks with stricken parent firm Federal Mogul.

[[deal]] Fears for T N [[Fed]] pension after [[smitten]] [[turner]] [[magnate]] [[Turner]] [[tonne]] talks Unions representing [[turner]] [[care]] workers [[rear]] at Turner   Newall [[Fed]] [[marriage]] say they are '[[n]] [[arouse]] [[afterwards]] [[Fed]] [[be]] [[matrimony]] [[nurture]] [[at]] [[steadfastly]] [[run]] disappointed' after talks [[raise]] [[streamlet]] [[federal]] [[smitten]] [[dialogue]] [[allot]] [[raise]] [[Fed]] [[wedlock]] with [[lift]] [[struck]] [[afflict]] stricken parent [[assume]] [[endure]] firm [[ply]] [[At]] [[astatine]] [[mogul]] [[enamored]] [[At]] [[lecture]] [[astatine]] [[afflicted]] [[marriage]] [[infatuated]] [[federal]] [[subsequently]] Federal [[house]] [[house]] [[power]] [[infatuated]] [[smitten]] [[marriage]] Mogul.




[Succeeded / Failed / Skipped / Total] 2 / 0 / 0 / 2:  67%|██████▋   | 2/3 [05:56<02:58, 178.39s/it]

--------------------------------------------- Result 2 ---------------------------------------------

The Race is On: Second Private Team Sets Launch Date for Human Spaceflight (SPACE.com) SPACE.com - TORONTO, Canada -- A second\team of rocketeers competing for the  #36;10 million Ansari X Prize, a contest for\privately funded suborbital space flight, has officially announced the first\launch date for its manned rocket.

The Race is On: Second Private Team Sets Launch [[squad]] Date [[clandestine]] [[equal]] [[vie]] [[fund]] for Human [[establish]] Spaceflight ([[man]] [[indorse]] SPACE.com) [[human]] SPACE.com - [[X]] [[mystery]] TORONTO, [[secret]] [[roquette]] Canada -- A second\team of rocketeers competing for the  #36;10 million Ansari [[subocular]] X Prize, [[undercover]] a contest for\privately funded suborbital space flight, has officially [[escape]] [[turnout]] announced the first\[[secret]] launch [[declare]] date for [[secret]] [[IT]] [[secret]] [[back]] its [[arugula]] [[te

[Succeeded / Failed / Skipped / Total] 3 / 0 / 0 / 3: 100%|██████████| 3/3 [06:13<00:00, 124.57s/it]

--------------------------------------------- Result 3 ---------------------------------------------

Ky. Company Wins Grant to Study Peptides (AP) AP - A company founded by a chemistry researcher at the University of Louisville won a grant to develop a method of producing better peptides, which are short chains of amino acids, the building blocks of proteins.

[[progress]] Ky. Company Wins Grant to Study Peptides (AP) [[embody]] AP - A [[germinate]] company founded by a chemistry researcher [[alchemy]] at the University of [[peptide]] Louisville won a grant [[shortstop]] to [[profits]] develop a method of [[succeed]] producing better peptides, which are short chains of amino acids, the building blocks of proteins.



+-------------------------------+--------+
| Attack Results                |        |
+-------------------------------+--------+
| Number of successful attacks: | 3      |
| Number of failed attacks:     | 0      |
| Number of skipped attacks:    | 0      |
| Original acc




# PerSenT Model and Data

In [5]:
import torch
from transformers import BertModel, BertTokenizer, BertForSequenceClassification, AutoModelForSequenceClassification
import os

model = BertForSequenceClassification.from_pretrained(pretrained_model_name_or_path="pytorch_model.bin", config="config_pyt.json")

num_labels = 3

model.classifier = torch.nn.Linear(model.config.hidden_size, num_labels)

tokenizer = BertTokenizer.from_pretrained(os.getcwd())

model_wrapper = HuggingFaceModelWrapper(model, tokenizer)

goal_function = UntargetedClassification(model_wrapper)

textattack: Unknown if model of class <class 'transformers.models.bert.modeling_bert.BertForSequenceClassification'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.


## Persent Data

In [6]:
import numpy as np
import pandas as pd
import tqdm

data_path = "data/fixed_test.csv"

data = pd.read_csv(data_path)
data.head(5)

Unnamed: 0,DOCUMENT_INDEX,TITLE,TARGET_ENTITY,DOCUMENT,MASKED_DOCUMENT,TRUE_SENTIMENT,Paragraph0,Paragraph1,Paragraph2,Paragraph3,...,Paragraph6,Paragraph7,Paragraph8,Paragraph9,Paragraph10,Paragraph11,Paragraph12,Paragraph13,Paragraph14,Paragraph15
0,4550,UPDATE 6,Donald Trump,term extension of government funding that woul...,term extension of government funding that woul...,Neutral,,,,,...,,,,,,,,,,
1,4551,Special Counsel Mueller reportedly seeks Q&A w...,Donald Trump,At a press conference in June a reporter ask...,At a press conference in June a reporter ask...,Neutral,,,,,...,,,,,,,,,,
2,4552,AP News in Brief at 6:04 a.m. EST,Donald Trump,Trump Xi present united front despite differ...,Trump Xi present united front despite differ...,Positive,,,,,...,,,,,,,,,,
3,4553,The Latest: Trump says he thinks Mueller 'will...,Donald Trump,The Latest on President Donald Trump in Florid...,The Latest on [TGT] ( all times local ): 9 : 3...,Neutral,,,,,...,,,,,,,,,,
4,4554,New York Times Opinion Page To Publish Letters...,Donald Trump,“ I ’ m thrilled with the progress that Presid...,“ I ’ m thrilled with the progress that Presid...,Positive,,,,,...,,,,,,,,,,


In [7]:
class InputExample(object):
    """
        Training / test example for masked word prediction and author sentiment classification.
    """

    def __init__(self, masked_sentence, original_sentence, sentiment):
        """
        Construct and InputExample

        Args:
            masked_sentence (str): 
                A string containing the input article with target_entity masked.

            original_sentence (str): 
                A string containing the input article with no masks.

            sentiment (str):
                Author's sentiment
        """
        self.masked_sentence = masked_sentence
        self.original_sentence = original_sentence
        self.sentiment = sentiment

def convert_text_to_examples(masked_texts, original_texts, labels):
    """
        Create InputExamples.
    """
    InputExamples = []

    for masked_text, original_text, label in zip(masked_texts, original_texts, labels):
        InputExamples.append(
            InputExample(masked_text, original_text, label)
        )
    return InputExamples

input_examples = convert_text_to_examples(data['MASKED_DOCUMENT'], data['DOCUMENT'], data['TRUE_SENTIMENT'])

def align_os_to_ms(tokenized_os, tokenized_ms):
    l_os = []
    l_ms = []
    
    ms_index = 0
    os_index = 0

    while (os_index < len(tokenized_os)) and (ms_index < len(tokenized_ms)):
        if tokenized_ms[ms_index] == '[MASK]':
            l_os.append(tokenized_os[os_index])
            l_ms.append(tokenized_ms[ms_index])
            os_index += 1
            ms_index += 1
        elif tokenized_ms[ms_index] != tokenized_os[os_index]:
            l_ms.append('[MASK]')
            l_os.append(tokenized_os[os_index])
            os_index += 1
        else:
            l_ms.append(tokenized_ms[ms_index])
            l_os.append(tokenized_os[os_index])
            ms_index += 1
            os_index += 1

    while os_index < len(tokenized_os):
        l_ms.append('[MASK]')
        l_os.append(tokenized_os[os_index])
        os_index += 1

    return l_os, l_ms

def single_example_to_features(tokenizer, example, max_seq_length=256):
    """
        Converts a single 'InputExample' into a single 'InputFeatures'
    """
    example.masked_sentence = example.masked_sentence.replace('[TGT]', '[MASK]')

    tokens_masked = tokenizer.tokenize(example.masked_sentence)
    tokens_original = tokenizer.tokenize(example.original_sentence)

    tokens_original, tokens_masked = align_os_to_ms(tokens_original, tokens_masked)
    
    if len(tokens_masked) > max_seq_length - 2:
        tokens_masked = tokens_masked[:(max_seq_length-2)]
        tokens_original = tokens_original[:(max_seq_length-2)]

    tokens_masked = ['[CLS]'] + tokens_masked + ['[SEP]']
    tokens_original = ['[CLS]'] + tokens_original + ['[SEP]']
    segment_ids = [0] * len(tokens_masked)

    input_ids = tokenizer.convert_tokens_to_ids(tokens_masked)
    label_ids = tokenizer.convert_tokens_to_ids(tokens_original)
    input_mask = [1] * len(input_ids)

    while len(input_ids) < max_seq_length:
        input_ids.append(0)
        label_ids.append(0)
        input_mask.append(0)
        segment_ids.append(0)

    assert len(input_ids) == max_seq_length
    assert len(label_ids) == max_seq_length
    assert len(input_mask) == max_seq_length
    assert len(segment_ids) == max_seq_length
    
    return input_ids, label_ids, input_mask, segment_ids, example.sentiment

def examples_to_features(tokenizer, examples, max_seq_length=256):
    input_ids, label_ids, input_masks, segment_ids, sentiments = [], [], [], [], []
    
    for example in tqdm.tqdm(examples, desc="Converting examples to features"):
        input_id, label_id, input_mask, segment_id, sentiment = single_example_to_features(tokenizer, example, max_seq_length)
        input_ids.append(input_id)
        label_ids.append(label_id)
        input_masks.append(input_mask)
        segment_ids.append(segment_id)
        sentiments.append(sentiment)

    return (
        np.array(input_ids),
        np.array(label_ids),
        np.array(input_masks),
        np.array(segment_ids),
        np.array(sentiments).reshape(-1, 1)
    )

tokenized_examples = examples_to_features(tokenizer=tokenizer, 
                                          examples=input_examples, 
                                          max_seq_length=512)

# print(tokenized_examples)
# print(input_examples)

Converting examples to features: 100%|██████████| 827/827 [00:10<00:00, 76.47it/s] 


In [8]:
from textattack.augmentation.recipes import CLAREAugmenter

clare_augmenter = CLAREAugmenter()

transformation = clare_augmenter.transformation

# We'll constrain modification of already modified indices and stopwords
constraints = [RepeatModification(),
               StopwordModification()]

# We'll use the Greedy search method
search_method = GreedySearch()

# Now, let's make the attack from the 4 components:
attack = Attack(goal_function, constraints, transformation, search_method)

print(attack)

If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`


Attack(
  (search_method): GreedySearch
  (goal_function):  UntargetedClassification
  (transformation):  CompositeTransformation(
    (0): WordSwapMaskedLM(
        (method):  bae
        (masked_lm_name):  RobertaForCausalLM
        (max_length):  512
        (max_candidates):  50
        (min_confidence):  0.0005
      )
    (1): WordInsertionMaskedLM(
        (masked_lm_name):  RobertaForCausalLM
        (max_length):  512
        (max_candidates):  50
        (min_confidence):  0.0
      )
    (2): WordMergeMaskedLM(
        (masked_lm_name):  RobertaForCausalLM
        (max_length):  512
        (max_candidates):  50
        (min_confidence):  0.005
      )
    )
  (constraints): 
    (0): RepeatModification
    (1): StopwordModification
  (is_black_box):  True
)


In [9]:
from textattack.datasets import Dataset

persent_data = []

for ex in input_examples:
    sent_code = 0;
    if ex.sentiment == "Positive":
        sent_code = 2
    elif ex.sentiment == "Negative":
        sent_code = 0
    elif ex.sentiment == "Neutral":
        sent_code = 1
            
    persent_data.append((ex.original_sentence, sent_code))

dataset = Dataset(persent_data)
print(dataset[1:2])

[(OrderedDict([('text', 'At a press conference in June   a reporter asked  Donald Trump  whether  he  ’ d be willing to answer questions about the Russia scandal under oath . “ One hundred percent  ” the president responded . As we discussed last week   Trump ’ s position on this has evolved . Asked at a press conference at Camp David whether he ’ s still committed to speaking with Mueller   Trump hedged   refusing to answer the question directly . A few days later   at an event alongside the prime minister of Norway   Trump faced a similar question . The Republican ’ s response was long   meandering   and not altogether coherent   but he concluded that it “ seems unlikely ” that he ’ d answer the special counsel ’ s questions . Close video Trump lawyers change defense on collusion and obstruction Rachel Maddow points out that  Donald Trump ’ s lawyers ’ arguments about Trump ’ s legal liability in the Russia scandal  have changed from denying Trump ’ s actions to excusing  them  as no

In [10]:
attack_args = AttackArgs(num_examples=1)

attacker = Attacker(attack, dataset, attack_args)

attack_results = attacker.attack_dataset()

Attack(
  (search_method): GreedySearch
  (goal_function):  UntargetedClassification
  (transformation):  CompositeTransformation(
    (0): WordSwapMaskedLM(
        (method):  bae
        (masked_lm_name):  RobertaForCausalLM
        (max_length):  512
        (max_candidates):  50
        (min_confidence):  0.0005
      )
    (1): WordInsertionMaskedLM(
        (masked_lm_name):  RobertaForCausalLM
        (max_length):  512
        (max_candidates):  50
        (min_confidence):  0.0
      )
    (2): WordMergeMaskedLM(
        (masked_lm_name):  RobertaForCausalLM
        (max_length):  512
        (max_candidates):  50
        (min_confidence):  0.005
      )
    )
  (constraints): 
    (0): RepeatModification
    (1): StopwordModification
  (is_black_box):  True
) 



  0%|          | 0/1 [00:00<?, ?it/s]

OutOfMemoryError: CUDA out of memory. Tried to allocate 1.54 GiB (GPU 0; 5.80 GiB total capacity; 3.83 GiB already allocated; 821.06 MiB free; 3.88 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF