# Design a Simple Attack System For NLP Domain Using TextAttack Library

In this work, **we aim to design a simple Adversarial Example (AE) generator for a standard text classification task in the black-box setting. **The system will be evaluated and its results will be visualized to get a better undestanding about such attack systems in the Natural Language Processig (NLP) domain.

Generating adversarial examples for textual data is not quite as straightforward as it is for image data due to the discrete nature of the text context. The AE generator approaches in NLP range from character-level to sentence-level techniques where the word-level methods demonstrate their superiority compared to other approaches. **This work has concentrated on word-level attacks where substituting important words to fool the target model is the most common technique. **Our method aims to implement a word substitution method using the following components:
* The method employs a lexical database of English to find synonyms for each important word in order to perform word-substituting operations. This can be called the transformation step.
* Moreover, we need to restrict substitutions to preserve the linguistic characteristics (such as semantic, syntactic, grammatical) of the original text and source language. This step is referred to as constraints.    

---

We employ the TextAttack library (https://github.com/QData/TextAttack) to design our attack system. TextAttack is a python framework for adversarial attacks, data augmentation, and model training in the NLP domain. TextAttack formulates an attack as consisting of four components:

### Goal Function
a goal function that determines whether the attack is successful in terms of the model outputs.
* Examples: untargeted classification, targeted classification, non-overlapping output

### Constraints
constraints which determine if a perturbation is valid with respect to the original input
* Examples: maximum word embedding distance, part-of-speech consistency, grammar checker, minimum sentence encoding cosine similarity

### Transformation
a transformation that generates potential modifications given an input
* Examples: word embedding word swap, thesaurus word swap, homoglyph character substitution.

### Search Method
a search method that successively queries the model and selects promising perturbations from a set of transformations.
* Examples: greedy with word importance ranking, beam search, genetic algorithm.





#Library Steup
We need to install the TextAttack library:

In [None]:
!pip3 install textattack[tensorflow]

#after installation, we should restart the runtime and then reinstall the textattack

# Creating Goal Function, Target Model, And Dataset
We are performing an untargeted attack on a classification model. The goal is to fool a victim model to predict a different label than the true label for an input sample. As victom model, we employ BERT (https://arxiv.org/abs/1810.04805), a state-of-the-art transformer-based neural model, trained for news classification on the AG News dataset. There are several pretrained models in [HuggingFace Model Hub](https://huggingface.co/textattack) that we can employ for our attacking aim.

In [None]:
import transformers
from textattack.models.wrappers import HuggingFaceModelWrapper

model = transformers.AutoModelForSequenceClassification.from_pretrained(
    "textattack/bert-base-uncased-ag-news")
tokenizer = transformers.AutoTokenizer.from_pretrained(
    "textattack/bert-base-uncased-ag-news")

model_wrapper = ...

# Create the goal function using the model
from textattack.goal_functions import UntargetedClassification
goal_function = ...

# Import the dataset
from textattack.datasets import HuggingFaceDataset
dataset = ...

textattack: Unknown if model of class <class 'transformers.models.bert.modeling_bert.BertForSequenceClassification'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.


  0%|          | 0/2 [00:00<?, ?it/s]

textattack: Loading [94mdatasets[0m dataset [94mag_news[0m, split [94mtest[0m.


# Using Our Transformation

A transformation module generates potential modifications given an input sample to generate a related adversarial example in order to fool the target model.

In this step, we design a simple transformation component based on the synonym-based substitution method. To do so, we utilize the "wordnet" database to obtain various meaningful synonyms for each important word achieved by a search method in the next steps. WordNet is a lexical database of semantic relations between words in more than 200 languages.



In [None]:
from textattack.transformations import WordSwap
from nltk.corpus import wordnet


class Synonym_Substitution(WordSwap):
  """ Transforms an input by replacing any word with one of its synonym.
    """
  def _get_replacement_words(self, word):
    ...
        #and '_' not in lemma.name(): # avoid synonym if it contains the input word or multi-words
        if word.lower() not in lemma.name().lower() and lemma.name().lower(
        ) not in word.lower():
          return [str(lemma.name())]  #[lemma.name()]
    return [word]

Lets test our transformation on an input sample:

In [None]:
syn_subs = Synonym_Substitution()  #Transformation object
input_sample = "I would like to buy some bitcoin"  #input sample
print('Input sentence: ' + input_sample)
words = input_sample.split()  #tokenization

#find a synonym for the 'buy' word
synonym = ...
print('\nSynonym: ', synonym[0])
input_sample = ...
print('\nAdversarial example: ' + input_sample)  #adversarial example


Input sentence: I would like to buy some bitcoin

Synonym:  bargain

Adversarial example: I would like to bargain some bitcoin


# Using Our Constraint

This step designs a simple constraint component to allow our attack system to only substitute the Noun word in the transformation step. To do so, we need to specify Part of Speech (POS) tag for each word in the input text and only allow Nouns to be swapped. POS tagging is a popular NLP process which implies categorizing words within the input text in correspondence with a particular part of speech (such as noun, verb, and adjective), depending on the definition of the word and its context.

In [None]:
from textattack.constraints import Constraint
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag


class NounConstraint(Constraint):
  """ A constraint that ensures `transformed_text` only substitutes nouns from
  `current_text` with other noun-based synonyms.
    """

  def _check_constraint(self, transformed_text, current_text):
    transformed_words = word_tokenize(transformed_text.text.lower())
    current_words = word_tokenize(current_text.text.lower())

    if len(transformed_words) != len(current_words):
      # If the two sentences have a different number of words, then
      # they definitely don't have the same Part of Speech Tag (POS) tag.
      # In this case, the constraint is violated, and we return False.
      return False
    else:
      # Here we compare all of the words, in order, to make sure that they match.
      # If we find two words that don't match, this means a word was swapped
      # between `current_text` and `transformed_text`. The word and its
      # substitution must have a Noun POS tag to fulfill our constraint.
      transformed_tags = ...
      current_tags = ...
      for i in range(len(transformed_words)):
        if transformed_words[i] != current_words[i]:
          if not (current_tags[i][1] == 'NOUN' and
                  transformed_tags[i][1] == 'NOUN'):
            return False

      return True


Lets test our constraint on two series of input sample pair:

In [None]:
from textattack.shared.attacked_text import AttackedText

nounConstraint = NounConstraint(True)  #Constraint object

input_sample1 = "I would like to buy some apples"  #input sample 1
input_sample2 = "I would like to bargain some apples"  #input sample 2

print('Input sentence1: ' + input_sample1)
print('Input sentence2: ' + input_sample2)
print(
    'Constraint: ',
    nounConstraint._check_constraint(...)

input_sample1 = "I would like to buy some apples"  #input sample 1
input_sample2 = "I would like to buy some bananas"  #input sample 2

print('\n\nInput sentence1: ' + input_sample1)
print('Input sentence2: ' + input_sample2)
print(
    'Constraint: ',
    nounConstraint._check_constraint(...)


# Creating The Attack
Let's use a greedy search method to find important word along with our transformation and constraint components as implemented in the previous steps. Greedy search initially scores transformations at all positions in the input text. Then it takes transformation(s) with the highest score(s) to fool the target model.


In [None]:
from textattack.search_methods import GreedySearch
from textattack.constraints.pre_transformation import RepeatModification
from textattack import Attack

# We're going to use our Synonym_Substitution word swap class as the attack transformation.
transformation = Synonym_Substitution()
# We'll constrain Non-Noun substitutions
constraints = [NounConstraint(False), RepeatModification()]
# We'll use the Greedy search method
search_method = GreedySearch()
# Now, let's make the attack from the 4 components:
attack = Attack(...)

# Utilizing The Attack
Two classes of "AttackArgs" and "Attacker" are used to set our attack configuration and perform a desired attack on a specific dataset.

The "AttackArgs" class represents arguments to be passed to Attacker, such as number of examples to attack, interval at which to save checkpoints, logging details. The "Attacker" class uses the designed attack to actually run the attacks, while also providing useful features such as parallel processing, saving/resuming from a checkpint, logging to files and stdout.

The following example utilizes both "AttackArgs" and "Attacker" classes to run our designed attack on 20 samples from the AG News dataset:



In [None]:
from tqdm import tqdm  # tqdm provides us a nice progress bar.
from textattack.loggers import CSVLogger  # tracks a dataframe for us.
from textattack import Attacker
from textattack import AttackArgs

# Attack 2 samples with CSV logging and checkpoint saved every 5 interval
attack_args = AttackArgs(...)
attacker = Attacker(...)
attack_results = attacker.attack_dataset()

# Visualizing Our Attack Results

`AttackResult` are been logged using a `CSVLogger`. This logger stores all attack results in a dataframe, which can be easily accessed and displayed.

In [None]:
from abc import ABC

class Logger(ABC):
    """An abstract class for different methods of logging attack results."""

    def __init__(self):
        pass
    def log_attack_result(self, result, examples_completed):
        pass
    def log_summary_rows(self, rows, title, window_id):
        pass
    def log_hist(self, arr, numbins, title, window_id):
        pass
    def log_sep(self):
        pass
    def flush(self):
        pass
    def close(self):
        pass

In [None]:
import csv
import pandas as pd
from textattack.shared import AttackedText, logger

class CSVLogger(Logger):
    """Logs attack results to a CSV."""

    def __init__(self, filename="results.csv", color_method="file"):
        logger.info(f"Logging to CSV at path {filename}")
        self.filename = filename
        self.color_method = color_method
        self.df = pd.DataFrame()
        self._flushed = True

    def log_attack_result(self, result):
        original_text, perturbed_text = result.diff_color(self.color_method)
        original_text = original_text.replace("\n", AttackedText.SPLIT_TOKEN)
        perturbed_text = perturbed_text.replace("\n", AttackedText.SPLIT_TOKEN)
        result_type = result.__class__.__name__.replace("AttackResult", "")
        row = {
            "original_text": original_text,
            "perturbed_text": perturbed_text,
            "original_score": result.original_result.score,
            "perturbed_score": result.perturbed_result.score,
            "original_output": result.original_result.output,
            "perturbed_output": result.perturbed_result.output,
            "ground_truth_output": result.original_result.ground_truth_output,
            "num_queries": result.num_queries,
            "result_type": result_type,
        }
        self.df = self.df.append(row, ignore_index=True)
        self._flushed = False


    def flush(self):
        self.df.to_csv(self.filename, quoting=csv.QUOTE_NONNUMERIC, index=False)
        self._flushed = True

In [None]:
import pandas as pd

# increase colum width so we can actually read the examples
pd.options.display.max_colwidth = 480

logger = ...

for result in attack_results:
  logger.log_attack_result(result)

from IPython.core.display import display, HTML

display(
    HTML(logger.df[['original_text', 'perturbed_text']].to_html(escape=False)))
