# TensorFlow and TextAttack Recipe Clare

In [None]:
!pip install textattack[tensorflow,optional]

Collecting textattack[optional,tensorflow]
  Downloading textattack-0.3.9-py3-none-any.whl (436 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m436.8/436.8 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting bert-score>=0.3.5 (from textattack[optional,tensorflow])
  Downloading bert_score-0.3.13-py3-none-any.whl (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.1/61.1 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
Collecting flair (from textattack[optional,tensorflow])
  Downloading flair-0.13.0-py3-none-any.whl (387 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m387.2/387.2 kB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
Collecting language-tool-python (from textattack[optional,tensorflow])
  Downloading language_tool_python-2.7.1-py3-none-any.whl (34 kB)
Collecting lemminflect (from textattack[optional,tensorflow])
  Downloading lemminflect-0.2.3-py3-none-any.whl (769 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━

## Run textattack on a trained tensorflow model:

### First: Training

The following is code for training a text classification model using TensorFlow (and on top of it, the Keras API). This comes from the Tensorflow documentation ([see here](https://www.tensorflow.org/tutorials/keras/text_classification_with_hub)).

This cell loads the IMDB dataset (using `tensorflow_datasets`, not `datasets`), initializes a simple classifier, and trains it using Keras.

In [None]:
import numpy as np

import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds

import matplotlib.pyplot as plt

print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("Hub version: ", hub.__version__)
print(
    "GPU is", "available" if tf.config.list_physical_devices("GPU") else "NOT AVAILABLE"
)

train_data, test_data = tfds.load(
    name="imdb_reviews", split=["train", "test"], batch_size=-1, as_supervised=True
)

train_examples, train_labels = tfds.as_numpy(train_data)
test_examples, test_labels = tfds.as_numpy(test_data)

model = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"
hub_layer = hub.KerasLayer(
    model, output_shape=[20], input_shape=[], dtype=tf.string, trainable=True
)
hub_layer(train_examples[:3])

model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Dense(16, activation="relu"))
model.add(tf.keras.layers.Dense(1))

model.summary()

x_val = train_examples[:1000]
partial_x_train = train_examples[1000:]

y_val = train_labels[:1000]
partial_y_train = train_labels[1000:]

model.compile(
    optimizer="adam",
    loss=tf.losses.BinaryCrossentropy(from_logits=True),
    metrics=["accuracy"],
)

history = model.fit(
    partial_x_train,
    partial_y_train,
    epochs=4,
    batch_size=256,
    validation_data=(x_val, y_val),
    verbose=1,
)

Version:  2.15.0
Eager mode:  True
Hub version:  0.15.0
GPU is available
Downloading and preparing dataset 80.23 MiB (download: 80.23 MiB, generated: Unknown size, total: 80.23 MiB) to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/3 [00:00<?, ? splits/s]

Generating train examples...:   0%|          | 0/25000 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0.incompleteYONR9V/imdb_reviews-train.tfrecord…

Generating test examples...:   0%|          | 0/25000 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0.incompleteYONR9V/imdb_reviews-test.tfrecord*…

Generating unsupervised examples...:   0%|          | 0/50000 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0.incompleteYONR9V/imdb_reviews-unsupervised.t…

Dataset imdb_reviews downloaded and prepared to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0. Subsequent calls will reuse this data.
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 keras_layer (KerasLayer)    (None, 20)                400020    
                                                                 
 dense (Dense)               (None, 16)                336       
                                                                 
 dense_1 (Dense)             (None, 1)                 17        
                                                                 
Total params: 400373 (1.53 MB)
Trainable params: 400373 (1.53 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


In [None]:
print()

### Attacking

For each input, our classifier outputs a single number that indicates how positive or negative the model finds the input. For binary classification, TextAttack expects two numbers for each input (a score for each class, positive and negative). We have to post-process each output to fit this TextAttack format. To add this post-processing we need to implement a custom model wrapper class (instead of using the built-in `textattack.models.wrappers.TensorFlowModelWrapper`).

Each `ModelWrapper` must implement a single method, `__call__`, which takes a list of strings and returns a `List`, `np.ndarray`, or `torch.Tensor` of predictions.

In [None]:
import numpy as np
import torch

from textattack.models.wrappers import ModelWrapper


class CustomTensorFlowModelWrapper(ModelWrapper):
    def __init__(self, model):
        self.model = model

    def __call__(self, text_input_list):
        text_array = np.array(text_input_list)
        preds = self.model(text_array).numpy()
        logits = torch.exp(-torch.tensor(preds))
        logits = 1 / (1 + logits)
        logits = logits.squeeze(dim=-1)
        # Since this model only has a single output (between 0 or 1),
        # we have to add the second dimension.
        final_preds = torch.stack((1 - logits, logits), dim=1)
        return final_preds

textattack: Updating TextAttack package dependencies.
textattack: Downloading NLTK required packages.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package omw to /root/nltk_data...
[nltk_data] Downloading package universal_tagset to /root/nltk_data...
[nltk_data]   Unzipping taggers/universal_tagset.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.7.0.json:   0%|   …

INFO:stanza:Downloading default packages for language: en (English) ...


Downloading https://huggingface.co/stanfordnlp/stanza-en/resolve/v1.7.0/models/default.zip:   0%|          | 0…

INFO:stanza:Finished downloading models and saved to /root/stanza_resources.


Let's test our model wrapper out to make sure it can use our model to return predictions in the correct format.

In [None]:
CustomTensorFlowModelWrapper(model)(["I hate you so much", "I love you"])

tensor([[0.5239, 0.4761],
        [0.0644, 0.9356]])

Looks good! Now we can initialize our model wrapper with the model we trained and pass it to an instance of `textattack.attack.Attack`.

We'll use the `CLARE2020` recipe as our attack, and attack 10 samples.

In [None]:
model_wrapper = CustomTensorFlowModelWrapper(model)
import textattack
from textattack.datasets import HuggingFaceDataset
from textattack.attack_recipes import CLARE2020
from textattack import Attacker

dataset = HuggingFaceDataset("rotten_tomatoes", None, "test", shuffle=True)
attack = CLARE2020.build(model_wrapper)

attacker = Attacker(attack, dataset)
results = attacker.attack_dataset()
ppl = textattack.metrics.quality_metrics.Perplexity().calculate(results)
usem = textattack.metrics.quality_metrics.USEMetric().calculate(results)

textattack: Loading [94mdatasets[0m dataset [94mrotten_tomatoes[0m, split [94mtest[0m.
If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`
textattack: Unknown if model of class <class 'keras.src.engine.sequential.Sequential'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.


Attack(
  (search_method): GreedySearch
  (goal_function):  UntargetedClassification
  (transformation):  CompositeTransformation(
    (0): WordSwapMaskedLM(
        (method):  bae
        (masked_lm_name):  RobertaForCausalLM
        (max_length):  512
        (max_candidates):  50
        (min_confidence):  0.0005
      )
    (1): WordInsertionMaskedLM(
        (masked_lm_name):  RobertaForCausalLM
        (max_length):  512
        (max_candidates):  50
        (min_confidence):  0.0
      )
    (2): WordMergeMaskedLM(
        (masked_lm_name):  RobertaForCausalLM
        (max_length):  512
        (max_candidates):  50
        (min_confidence):  0.005
      )
    )
  (constraints): 
    (0): UniversalSentenceEncoder(
        (metric):  cosine
        (threshold):  0.7
        (window_size):  15
        (skip_text_shorter_than_window):  True
        (compare_against_original):  True
      )
    (1): RepeatModification
    (2): StopwordModification
  (is_black_box):  True
) 




  0%|          | 0/10 [00:00<?, ?it/s][A
[Succeeded / Failed / Skipped / Total] 0 / 0 / 1 / 1:  10%|█         | 1/10 [00:00<00:00, 49.78it/s][A

--------------------------------------------- Result 1 ---------------------------------------------

lovingly photographed in the manner of a golden book sprung to life , stuart little 2 manages sweetness largely without stickiness .




[Succeeded / Failed / Skipped / Total] 0 / 0 / 1 / 1:  10%|█         | 1/10 [03:11<28:41, 191.27s/it]

[Succeeded / Failed / Skipped / Total] 0 / 0 / 1 / 1:  20%|██        | 2/10 [00:11<00:45,  5.68s/it][A
[Succeeded / Failed / Skipped / Total] 1 / 0 / 1 / 2:  20%|██        | 2/10 [00:11<00:45,  5.71s/it][A
[Succeeded / Failed / Skipped / Total] 1 / 0 / 2 / 3:  30%|███       | 3/10 [00:11<00:26,  3.81s/it][A

--------------------------------------------- Result 2 ---------------------------------------------

consistently clever and [[suspenseful]] .

consistently clever and [[stupid]] .


--------------------------------------------- Result 3 ---------------------------------------------

it's like a " big chill " reunion of the baader-meinhof gang , only these guys are more harmless pranksters than political activists .





[Succeeded / Failed / Skipped / Total] 1 / 0 / 2 / 3:  40%|████      | 4/10 [00:22<00:33,  5.57s/it][A
[Succeeded / Failed / Skipped / Total] 2 / 0 / 2 / 4:  40%|████      | 4/10 [00:22<00:33,  5.58s/it][A
[Succeeded / Failed / Skipped / Total] 2 / 0 / 3 / 5:  50%|█████     | 5/10 [00:22<00:22,  4.47s/it][A

--------------------------------------------- Result 4 ---------------------------------------------

the story gives ample opportunity for large-scale [[action]] and suspense , which director shekhar kapur supplies with tremendous skill .

the story gives ample opportunity for large-scale [[plot]] and [[utter]] suspense , which director shekhar kapur supplies with tremendous skill .


--------------------------------------------- Result 5 ---------------------------------------------

red dragon " never cuts corners .





[Succeeded / Failed / Skipped / Total] 2 / 0 / 3 / 5:  60%|██████    | 6/10 [00:38<00:25,  6.38s/it][A
[Succeeded / Failed / Skipped / Total] 3 / 0 / 3 / 6:  60%|██████    | 6/10 [00:38<00:25,  6.38s/it][A

--------------------------------------------- Result 6 ---------------------------------------------

fresnadillo has something serious to say about the ways in which [[extravagant]] chance can distort our [[perspective]] and throw us off the path of good sense .

fresnadillo has something serious to say about the ways in which [[bad]] chance can distort our [[convictions]] and throw us off the path of good sense .





[Succeeded / Failed / Skipped / Total] 3 / 0 / 3 / 6:  70%|███████   | 7/10 [00:42<00:18,  6.05s/it][A
[Succeeded / Failed / Skipped / Total] 4 / 0 / 3 / 7:  70%|███████   | 7/10 [00:42<00:18,  6.05s/it][A

--------------------------------------------- Result 7 ---------------------------------------------

throws in enough clever and [[unexpected]] twists to make the formula feel fresh .

throws in enough clever and [[absurd]] twists to make the formula feel fresh .





[Succeeded / Failed / Skipped / Total] 4 / 0 / 3 / 7:  80%|████████  | 8/10 [00:45<00:11,  5.74s/it][A
[Succeeded / Failed / Skipped / Total] 5 / 0 / 3 / 8:  80%|████████  | 8/10 [00:45<00:11,  5.74s/it][A

--------------------------------------------- Result 8 ---------------------------------------------

weighty and ponderous but every bit as filling as the treat of the title .

weighty and ponderous but every bit as filling as [[possessing]] the treat of the title .





[Succeeded / Failed / Skipped / Total] 5 / 0 / 3 / 8:  90%|█████████ | 9/10 [01:11<00:07,  7.89s/it][A
[Succeeded / Failed / Skipped / Total] 6 / 0 / 3 / 9:  90%|█████████ | 9/10 [01:11<00:07,  7.89s/it][A

--------------------------------------------- Result 9 ---------------------------------------------

a real audience-pleaser that will strike a chord with anyone who's ever waited in a doctor's office , emergency room , [[hospital]] bed or [[insurance]] company [[office]] .

a real audience-pleaser that will strike a chord with anyone who's ever waited in a doctor's office , emergency room , [[flat]] bed or [[any]] company [[toilet]] .





[Succeeded / Failed / Skipped / Total] 6 / 0 / 3 / 9: 100%|██████████| 10/10 [01:13<00:00,  7.34s/it][A
[Succeeded / Failed / Skipped / Total] 7 / 0 / 3 / 10: 100%|██████████| 10/10 [01:13<00:00,  7.34s/it]

--------------------------------------------- Result 10 ---------------------------------------------

generates an enormous feeling of empathy for its characters .

generates an [[apparently]] enormous feeling of empathy for its characters .



+-------------------------------+---------+
| Attack Results                |         |
+-------------------------------+---------+
| Number of successful attacks: | 7       |
| Number of failed attacks:     | 0       |
| Number of skipped attacks:    | 3       |
| Original accuracy:            | 70.0%   |
| Accuracy under attack:        | 0.0%    |
| Attack success rate:          | 100.0%  |
| Average perturbed word %:     | 31.63%  |
| Average num. words per input: | 15.4    |
| Avg num queries:              | 1909.29 |
+-------------------------------+---------+







config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
print(ppl)

{'avg_original_perplexity': 66.45, 'avg_attack_perplexity': 92.36}


In [None]:
train_dataset = textattack.datasets.HuggingFaceDataset("rotten_tomatoes", split="train")
eval_dataset = textattack.datasets.HuggingFaceDataset("rotten_tomatoes", split="test")

textattack: Loading [94mdatasets[0m dataset [94mrotten_tomatoes[0m, split [94mtrain[0m.
textattack: Loading [94mdatasets[0m dataset [94mrotten_tomatoes[0m, split [94mtest[0m.


In [None]:
training_args = textattack.TrainingArgs(
    num_epochs=3,
    num_clean_epochs=1,
    num_train_adv_examples=100,
    learning_rate=5e-5,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=4,
    query_budget_train=120,
    save_last=True,
    load_best_model_at_end=True
)

trainer = textattack.Trainer(
    model_wrapper,
    "classification",
    attack,
    train_dataset,
    eval_dataset,
    training_args
)
m=trainer.train()
e=trainer.evaluate()

AssertionError: ignored