## Simple TextAttack Demo on a Scikit-learn model
This is a basic sample of using TextAttack on a Scikit-Learn Naive Bayes model. Much of the code is taken and adapted from TextAttack tutorial notebook found here:
https://github.com/QData/TextAttack/tree/master/docs/2notebook

In [1]:
import datasets
import pandas as pd
import numpy as np

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report

from textattack.attack_recipes import TextFoolerJin2019
from textattack.attack_recipes import Pruthi2019
from textattack import Attacker
from textattack import AttackArgs
from textattack.datasets import Dataset
from textattack.models.wrappers import SklearnModelWrapper
from textattack.loggers import CSVLogger

from IPython.display import display, HTML

In [2]:
dataset = datasets.load_dataset('rotten_tomatoes')

df_train = pd.DataFrame(dataset['train'])
df_test = pd.DataFrame(dataset['test'])

X_train = df_train['text']
y_train = df_train['label']

X_test = df_test['text']
y_test = df_test['label']

Using custom data configuration default
Reusing dataset rotten_tomatoes (/Users/teknetik/.cache/huggingface/datasets/rotten_tomatoes/default/1.0.0/40d411e45a6ce3484deed7cc15b82a53dad9a72aafd9f86f8f227134bec5ca46)


  0%|          | 0/3 [00:00<?, ?it/s]

In [3]:
count_vector = CountVectorizer(max_features=100, stop_words='english')
count_fit = count_vector.fit(X_train)

In [4]:
X_train_counts = pd.DataFrame(count_fit.transform(X_train).toarray(), columns=count_vector.get_feature_names_out())
X_train_counts.shape

(8530, 100)

In [5]:
X_test_counts = pd.DataFrame(count_fit.transform(X_test).toarray(), columns=count_vector.get_feature_names_out())
X_test_counts.shape

(1066, 100)

In [6]:
# Train Naive Bayes model
nb = MultinomialNB().fit(X_train_counts, y_train)

In [7]:
predicted = nb.predict(X_test_counts)

print('Training accuracy: ', nb.score(X_train_counts, y_train))
print('Testing accuracy: ', nb.score(X_test_counts, y_test))
print('\n')
print(classification_report(y_test, predicted))

Training accuracy:  0.6192262602579133
Testing accuracy:  0.625703564727955


              precision    recall  f1-score   support

           0       0.61      0.69      0.65       533
           1       0.64      0.56      0.60       533

    accuracy                           0.63      1066
   macro avg       0.63      0.63      0.62      1066
weighted avg       0.63      0.63      0.62      1066



### Now that we have a trained model, we can use TextAttack to generate adversarial examples to "attack" the model

In [8]:
# This is a modified version of SklearnModelWrapper with the updated scikit-learn tokenizer method name.
# Code taken from: https://github.com/QData/TextAttack/blob/master/textattack/models/wrappers/sklearn_model_wrapper.py
class SklearnModelWrapperUpdate(SklearnModelWrapper):
    """Loads a scikit-learn model and tokenizer (tokenizer implements
    `transform` and model implements `predict_proba`).

    May need to be extended and modified for different types of
    tokenizers.
    """

    def __call__(self, text_input_list, batch_size=None):
        encoded_text_matrix = self.tokenizer.transform(text_input_list).toarray()
        tokenized_text_df = pd.DataFrame(
            encoded_text_matrix, columns=self.tokenizer.get_feature_names_out()
        )
        return self.model.predict_proba(tokenized_text_df)

In [9]:
pd.options.display.max_colwidth = 480

def display_log(result):
    logger = CSVLogger(color_method='html')

    for result in attack_results:
        logger.log_attack_result(result)

    logger.flush()

    display(HTML(logger.df[['original_text', 'perturbed_text', 'original_output', 
                            'perturbed_output', 'ground_truth_output']].to_html(escape=False)))

In [10]:
model_wrapper = SklearnModelWrapperUpdate(nb, count_vector)

In [11]:
# Convert our test data into a textattack compatible dataset
dataset_attack = Dataset([(x[0], x[1]) for x in df_test.values])

In [12]:
# Attack with a recipe:
# https://textattack.readthedocs.io/en/latest/3recipes/attack_recipes.html#textfooler-is-bert-really-robust
attack = TextFoolerJin2019.build(model_wrapper)

attacker = Attacker(attack, dataset_attack, AttackArgs(num_examples=10))
attack_results = attacker.attack_dataset()

textattack: Unknown if model of class <class 'sklearn.naive_bayes.MultinomialNB'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.


Attack(
  (search_method): GreedyWordSwapWIR(
    (wir_method):  delete
  )
  (goal_function):  UntargetedClassification
  (transformation):  WordSwapEmbedding(
    (max_candidates):  50
    (embedding):  WordEmbedding
  )
  (constraints): 
    (0): WordEmbeddingDistance(
        (embedding):  WordEmbedding
        (min_cos_sim):  0.5
        (cased):  False
        (include_unknown_words):  True
        (compare_against_original):  True
      )
    (1): PartOfSpeech(
        (tagger_type):  nltk
        (tagset):  universal
        (allow_verb_noun_swap):  True
        (compare_against_original):  True
      )
    (2): UniversalSentenceEncoder(
        (metric):  angular
        (threshold):  0.840845057
        (window_size):  15
        (skip_text_shorter_than_window):  True
        (compare_against_original):  False
      )
    (3): RepeatModification
    (4): StopwordModification
    (5): InputColumnModification(
        (matching_column_labels):  ['premise', 'hypothesis']
       

  0%|                                                                                                                                           | 0/10 [00:00<?, ?it/s]2023-02-07 09:12:46.145539: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
 10%|█████████████                                                                                                                      | 1/10 [00:04<00:42,  4.77s/it]

--------------------------------------------- Result 1 ---------------------------------------------


[Succeeded / Failed / Skipped / Total] 4 / 0 / 6 / 10: 100%|███████████████████████████████████████████████████████████████████████████| 10/10 [00:06<00:00,  1.64it/s]


lovingly photographed in the manner of a golden book sprung to [[life]] , stuart little 2 manages sweetness largely without stickiness .

lovingly photographed in the manner of a golden book sprung to [[vie]] , stuart little 2 manages sweetness largely without stickiness .


--------------------------------------------- Result 2 ---------------------------------------------

consistently clever and suspenseful .


--------------------------------------------- Result 3 ---------------------------------------------

it's like a " big chill " reunion of the baader-meinhof gang , only these guys are more harmless pranksters than political activists .


--------------------------------------------- Result 4 ---------------------------------------------

the story gives ample opportunity for large-scale action and suspense , which director shekhar kapur supplies with tremendous skill .


--------------------------------------------- Result 5 ---------------------------------------------

re




In [13]:
display_log(attack_results)

textattack: Logging to CSV at path results.csv


Unnamed: 0,original_text,perturbed_text,original_output,perturbed_output,ground_truth_output
0,"lovingly photographed in the manner of a golden book sprung to life , stuart little 2 manages sweetness largely without stickiness .","lovingly photographed in the manner of a golden book sprung to vie , stuart little 2 manages sweetness largely without stickiness .",1,0,1
1,consistently clever and suspenseful .,consistently clever and suspenseful .,0,0,1
2,"it's like a "" big chill "" reunion of the baader-meinhof gang , only these guys are more harmless pranksters than political activists .","it's like a "" big chill "" reunion of the baader-meinhof gang , only these guys are more harmless pranksters than political activists .",0,0,1
3,"the story gives ample opportunity for large-scale action and suspense , which director shekhar kapur supplies with tremendous skill .","the story gives ample opportunity for large-scale action and suspense , which director shekhar kapur supplies with tremendous skill .",0,0,1
4,"red dragon "" never cuts corners .","red dragon "" never cuts corners .",0,0,1
5,fresnadillo has something serious to say about the ways in which extravagant chance can distort our perspective and throw us off the path of good sense .,fresnadillo has something serious to say about the ways in which extravagant chance can distort our perspective and throw us off the path of better sense .,1,0,1
6,throws in enough clever and unexpected twists to make the formula feel fresh .,throws in enough clever and unexpected twists to accomplish the formula feel fresh .,1,0,1
7,weighty and ponderous but every bit as filling as the treat of the title .,weighty and ponderous but every bit as filling as the treat of the title .,0,0,1
8,"a real audience-pleaser that will strike a chord with anyone who's ever waited in a doctor's office , emergency room , hospital bed or insurance company office .","a true audience-pleaser that will strike a chord with anyone who's ever waited in a doctor's office , emergency room , hospital bed or insurance company office .",1,0,1
9,generates an enormous feeling of empathy for its characters .,generates an enormous feeling of empathy for its characters .,0,0,1


In [14]:
# Attack with a recipe:
# https://textattack.readthedocs.io/en/latest/3recipes/attack_recipes.html#pruthi2019-combating-with-robust-word-recognition
attack = Pruthi2019.build(model_wrapper)

attacker = Attacker(attack, dataset_attack, AttackArgs(num_examples=10))
attack_results = attacker.attack_dataset()

textattack: Unknown if model of class <class 'sklearn.naive_bayes.MultinomialNB'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.


Attack(
  (search_method): GreedySearch
  (goal_function):  UntargetedClassification
  (transformation):  CompositeTransformation(
    (0): WordSwapNeighboringCharacterSwap(
        (random_one):  False
      )
    (1): WordSwapRandomCharacterDeletion(
        (random_one):  False
      )
    (2): WordSwapRandomCharacterInsertion(
        (random_one):  False
      )
    (3): WordSwapQWERTY
    )
  (constraints): 
    (0): MaxWordsPerturbed(
        (max_num_words):  1
        (compare_against_original):  True
      )
    (1): MinWordLength
    (2): StopwordModification
    (3): RepeatModification
  (is_black_box):  True
) 



[Succeeded / Failed / Skipped / Total] 1 / 0 / 4 / 5:  50%|██████████████████████████████████████▌                                      | 5/10 [00:00<00:00, 56.27it/s]

--------------------------------------------- Result 1 ---------------------------------------------

lovingly photographed in the manner of a golden book sprung to [[life]] , stuart little 2 manages sweetness largely without stickiness .

lovingly photographed in the manner of a golden book sprung to [[lfe]] , stuart little 2 manages sweetness largely without stickiness .


--------------------------------------------- Result 2 ---------------------------------------------

consistently clever and suspenseful .


--------------------------------------------- Result 3 ---------------------------------------------

it's like a " big chill " reunion of the baader-meinhof gang , only these guys are more harmless pranksters than political activists .


--------------------------------------------- Result 4 ---------------------------------------------

the story gives ample opportunity for large-scale action and suspense , which director shekhar kapur supplies with tremendous skill .


---

[Succeeded / Failed / Skipped / Total] 2 / 1 / 5 / 8:  80%|█████████████████████████████████████████████████████████████▌               | 8/10 [00:00<00:00, 33.10it/s]

--------------------------------------------- Result 6 ---------------------------------------------

fresnadillo has something serious to say about the ways in which extravagant chance can distort our perspective and throw us off the path of good sense .


--------------------------------------------- Result 7 ---------------------------------------------

throws in enough clever and unexpected twists to [[make]] the formula feel fresh .

throws in enough clever and unexpected twists to [[maTke]] the formula feel fresh .


--------------------------------------------- Result 8 ---------------------------------------------

weighty and ponderous but every bit as filling as the treat of the title .




[Succeeded / Failed / Skipped / Total] 3 / 1 / 6 / 10: 100%|███████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 30.26it/s]

--------------------------------------------- Result 9 ---------------------------------------------

a [[real]] audience-pleaser that will strike a chord with anyone who's ever waited in a doctor's office , emergency room , hospital bed or insurance company office .

a [[rael]] audience-pleaser that will strike a chord with anyone who's ever waited in a doctor's office , emergency room , hospital bed or insurance company office .


--------------------------------------------- Result 10 ---------------------------------------------

generates an enormous feeling of empathy for its characters .



+-------------------------------+-------+
| Attack Results                |       |
+-------------------------------+-------+
| Number of successful attacks: | 3     |
| Number of failed attacks:     | 1     |
| Number of skipped attacks:    | 6     |
| Original accuracy:            | 40.0% |
| Accuracy under attack:        | 10.0% |
| Attack success rate:          | 75.0% |
| Average perturb




In [15]:
display_log(attack_results)

textattack: Logging to CSV at path results.csv


Unnamed: 0,original_text,perturbed_text,original_output,perturbed_output,ground_truth_output
0,"lovingly photographed in the manner of a golden book sprung to life , stuart little 2 manages sweetness largely without stickiness .","lovingly photographed in the manner of a golden book sprung to lfe , stuart little 2 manages sweetness largely without stickiness .",1,0,1
1,consistently clever and suspenseful .,consistently clever and suspenseful .,0,0,1
2,"it's like a "" big chill "" reunion of the baader-meinhof gang , only these guys are more harmless pranksters than political activists .","it's like a "" big chill "" reunion of the baader-meinhof gang , only these guys are more harmless pranksters than political activists .",0,0,1
3,"the story gives ample opportunity for large-scale action and suspense , which director shekhar kapur supplies with tremendous skill .","the story gives ample opportunity for large-scale action and suspense , which director shekhar kapur supplies with tremendous skill .",0,0,1
4,"red dragon "" never cuts corners .","red dragon "" never cuts corners .",0,0,1
5,fresnadillo has something serious to say about the ways in which extravagant chance can distort our perspective and throw us off the path of good sense .,fresnadillo has something serious to say about the ways in which extravagant chance can distort our perspective and throw us off the path of giod sense .,1,1,1
6,throws in enough clever and unexpected twists to make the formula feel fresh .,throws in enough clever and unexpected twists to maTke the formula feel fresh .,1,0,1
7,weighty and ponderous but every bit as filling as the treat of the title .,weighty and ponderous but every bit as filling as the treat of the title .,0,0,1
8,"a real audience-pleaser that will strike a chord with anyone who's ever waited in a doctor's office , emergency room , hospital bed or insurance company office .","a rael audience-pleaser that will strike a chord with anyone who's ever waited in a doctor's office , emergency room , hospital bed or insurance company office .",1,0,1
9,generates an enormous feeling of empathy for its characters .,generates an enormous feeling of empathy for its characters .,0,0,1
