<a href="https://colab.research.google.com/github/malienist/FIRST-JP/blob/main/notebooks/w_imdb_10.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Adversarial Attacks on IMDB Sentiment Analysis using BERT-Attack (BAE)

In this notebook, we will explore how to perform adversarial attacks on a pre-trained sentiment analysis model using BERT-Attack (BAE). Adversarial attacks are designed to fool machine learning models by introducing small perturbations in the input data, leading to incorrect predictions. BERT-Attack leverages the power of BERT, a masked language model, to generate adversarial examples by replacing words in the input text.

## Objectives

- Load and prepare a pre-trained sentiment analysis model (`distilbert-base-uncased-finetuned-sst-2-english`).
- Utilize the IMDB dataset for sentiment analysis.
- Apply the BERT-Attack (BAE) to create adversarial examples.
- Evaluate the model's predictions on the adversarial examples.

## Steps

1. **Load the Pre-trained Model and Tokenizer**: We'll use the HuggingFace Transformers library to load the model and tokenizer.
2. **Set up TextAttack**: We'll configure TextAttack to use BERT-Attack for generating adversarial examples.
3. **Load the IMDB Dataset**: We'll use the IMDB dataset, which contains movie reviews labeled as positive or negative.
4. **Generate Adversarial Examples**: We'll apply BERT-Attack to a set of samples from the dataset and observe the changes in the model's predictions.
5. **Evaluate Results**: We'll compare the original and adversarial predictions to understand the effectiveness of the attack.

## Requirements

Ensure you have the necessary libraries installed:

```python
!pip install transformers textattack


In [None]:
!pip install textattack torch torchvision



# Next Steps


*   Load and Prepare the Model
*   Wrap the Model for TextAttack
*   Load the IMDB dataset




In [None]:
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
from textattack.attack_recipes import BAEGarg2019
from textattack.models.wrappers import HuggingFaceModelWrapper
from textattack.datasets import HuggingFaceDataset, Dataset
from textattack import Attacker

# Load a pre-trained sentiment analysis model
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create a HuggingFace pipeline for sentiment analysis
pipe = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

# Wrap the HuggingFace model for TextAttack
model_wrapper = HuggingFaceModelWrapper(model, tokenizer)

# Load the IMDB dataset
dataset = HuggingFaceDataset("imdb", split="test")


textattack: Updating TextAttack package dependencies.
textattack: Downloading NLTK required packages.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package omw to /root/nltk_data...
[nltk_data] Downloading package universal_tagset to /root/nltk_data...
[nltk_data]   Unzipping taggers/universal_tagset.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

textattack: Loading [94mdatasets[0m dataset [94mimdb[0m, split [94mtest[0m.


# Nest Steps

*   Create a Custom Dataset for TextAttack
*   Define the Attack Method
*   Perform the Attack
*   Evaluate and Print the Results


In [None]:
# Select multiple samples from the dataset
num_samples = 10  # Specify the number of samples you want to attack
samples = [(dataset[i][0]['text'], dataset[i][1]) for i in range(num_samples)]

# Create a custom dataset for TextAttack
custom_dataset = Dataset(samples, input_columns=["text"])

# Define the attack method
attack = BAEGarg2019.build(model_wrapper)

# Perform the attack
attacker = Attacker(attack, custom_dataset)

# Attack each sample individually and collect results
results = attacker.attack_dataset()

# Print the adversarial text and predictions
for result in results:
    adversarial_text = result.perturbed_text()
    adversarial_result = pipe([adversarial_text])
    print(f"Adversarial text: {adversarial_text}")
    print(f"Adversarial prediction: {adversarial_result}")

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

textattack: Unknown if model of class <class 'transformers.models.distilbert.modeling_distilbert.DistilBertForSequenceClassification'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.


Attack(
  (search_method): GreedyWordSwapWIR(
    (wir_method):  delete
  )
  (goal_function):  UntargetedClassification
  (transformation):  WordSwapMaskedLM(
    (method):  bae
    (masked_lm_name):  BertForMaskedLM
    (max_length):  512
    (max_candidates):  50
    (min_confidence):  0.0
  )
  (constraints): 
    (0): PartOfSpeech(
        (tagger_type):  nltk
        (tagset):  universal
        (allow_verb_noun_swap):  True
        (compare_against_original):  True
      )
    (1): UniversalSentenceEncoder(
        (metric):  cosine
        (threshold):  0.936338023
        (window_size):  15
        (skip_text_shorter_than_window):  True
        (compare_against_original):  True
      )
    (2): RepeatModification
    (3): StopwordModification
  (is_black_box):  True
) 



 10%|█         | 1/10 [10:24<1:33:41, 624.63s/it]

--------------------------------------------- Result 1 ---------------------------------------------


[Succeeded / Failed / Skipped / Total] 1 / 0 / 0 / 1:  10%|█         | 1/10 [10:25<1:33:53, 625.94s/it]


[[I]] love sci-fi and am willing to put up with a lot. Sci-fi movies/TV are usually underfunded, under-appreciated and misunderstood. I [[tried]] to like this, I really did, but it is to good TV sci-fi as Babylon 5 is to Star Trek (the original). [[Silly]] prosthetics, cheap cardboard sets, stilted dialogues, CG that doesn't [[match]] the background, and painfully one-dimensional characters cannot be overcome with a 'sci-fi' [[setting]]. (I'm sure there are those of you out there who think Babylon 5 is [[good]] sci-fi TV. It's not. It's clichéd and [[uninspiring]].) While US viewers might like emotion and character development, sci-fi is a genre that does not take itself [[seriously]] (cf. Star Trek). It [[may]] treat important issues, yet not as a serious philosophy. It's really [[difficult]] to care about the characters here as they are not simply foolish, just [[missing]] a spark of life. Their actions and reactions are wooden and predictable, often painful to watch. The makers of 

[Succeeded / Failed / Skipped / Total] 2 / 0 / 0 / 2:  20%|██        | 2/10 [13:26<53:45, 403.13s/it]

--------------------------------------------- Result 2 ---------------------------------------------

Worth the entertainment value of a rental, especially if you like action movies. This one features the usual car chases, fights with the great Van Damme kick style, shooting battles with the 40 shell load shotgun, and even terrorist style bombs. All of this is entertaining and competently handled but there is [[nothing]] that really blows you away if you've seen your share before.<br /><br />The plot is made interesting by the inclusion of a rabbit, which is clever but hardly profound. Many of the characters are heavily stereotyped -- the angry veterans, the terrified illegal aliens, the crooked cops, the indifferent feds, the bitchy tough lady station head, the crooked politician, the fat federale who looks like he was typecast as the Mexican in a Hollywood movie from the 1940s. All passably acted but again nothing special.<br /><br />I thought the main villains were pretty well done 

[Succeeded / Failed / Skipped / Total] 3 / 0 / 0 / 3:  30%|███       | 3/10 [15:46<36:48, 315.51s/it]

--------------------------------------------- Result 3 ---------------------------------------------

its a totally average film with a few semi-alright action sequences that make the plot seem a little better and remind the viewer of the classic van dam films. parts of the plot don't make sense and seem to be added in to use up time. the end plot is that of a very basic type that doesn't leave the viewer [[guessing]] and any twists are obvious from the beginning. the end scene with the flask backs don't make [[sense]] as they are added in and seem to have [[little]] relevance to the history of van dam's character. not really worth watching again, bit disappointed in the end production, even though it is apparent it was shot on a low budget certain shots and sections in the film are of [[poor]] directed quality

its a totally average film with a few semi-alright action sequences that make the plot seem a little better and remind the viewer of the classic van dam films. parts of the plo

[Succeeded / Failed / Skipped / Total] 4 / 0 / 0 / 4:  40%|████      | 4/10 [23:51<35:46, 357.81s/it]

--------------------------------------------- Result 4 ---------------------------------------------

STAR RATING: ***** Saturday Night **** Friday Night *** Friday Morning ** Sunday Night * Monday Morning <br /><br />Former New Orleans homicide cop Jack Robideaux (Jean Claude Van Damme) is re-assigned to Columbus, a small but violent town in Mexico to help the police there with their efforts to stop a major heroin smuggling operation into their town. The culprits turn out to be ex-military, lead by former commander Benjamin Meyers (Stephen Lord, otherwise known as Jase from East Enders) who is using a special method he learned in Afghanistan to fight off his opponents. But Jack has a more personal reason for taking him down, that draws the two men into an explosive final showdown where only one will walk away alive.<br /><br />After Until Death, Van Damme [[appeared]] to be on a high, showing he [[could]] make the best straight to video films in the action market. While that was a far

[Succeeded / Failed / Skipped / Total] 4 / 0 / 1 / 5:  50%|█████     | 5/10 [23:52<23:52, 286.55s/it]

--------------------------------------------- Result 5 ---------------------------------------------

First off let me say, If you haven't enjoyed a Van Damme movie since bloodsport, you probably will not like this movie. Most of these movies may not have the best plots or best actors but I enjoy these kinds of movies for what they are. This movie is much better than any of the movies the other action guys (Segal and Dolph) have thought about putting out the past few years. Van Damme is good in the movie, the movie is only worth watching to Van Damme fans. It is not as good as Wake of Death (which i highly recommend to anyone of likes Van Damme) or In hell but, in my opinion it's worth watching. It has the same type of feel to it as Nowhere to Run. Good fun stuff!




[Succeeded / Failed / Skipped / Total] 5 / 0 / 1 / 6:  60%|██████    | 6/10 [28:21<18:54, 283.57s/it]

--------------------------------------------- Result 6 ---------------------------------------------

I had high hopes for this one until they changed the name to 'The Shepherd : Border Patrol, the lamest movie name ever, what was wrong with just 'The Shepherd'. This is a by the numbers action flick that tips its [[hat]] at many classic Van Damme films. There is a nice bit of action in a bar which reminded me of hard target and universal soldier but [[directed]] with no intensity or flair which is a shame. There is one great line about 'being p*ss drunk and carrying a rabbit' and some OK action scenes let down by the cheapness of it all. A lot of the times the dialogue doesn't match the characters mouth and the stunt men fall down dead a split second before even being shot. The end fight is one of the better Van Damme fights except the Director tries to go a bit too John Woo and [[fails]] also introducing flashbacks which no one really [[cares]] about just gets in the way of the action

[Succeeded / Failed / Skipped / Total] 6 / 0 / 1 / 7:  70%|███████   | 7/10 [32:37<13:58, 279.58s/it]

--------------------------------------------- Result 7 ---------------------------------------------

Isaac Florentine has made some of the best western Martial Arts action movies ever produced. In particular US Seals 2, Cold Harvest, Special Forces and Undisputed 2 are all action classics. You can tell Isaac has a real passion for the genre and his films are always eventful, creative and sharp affairs, with some of the best fight sequences an action fan could hope for. In particular he has found a muse with Scott Adkins, as talented an actor and action performer as you could hope for. This is borne out with Special Forces and Undisputed 2, but unfortunately The Shepherd just doesn't live up to their abilities.<br /><br />There is no doubt that JCVD looks better here fight-wise than he has done in years, especially in the fight he has (for pretty much no reason) in a prison cell, and in the final showdown with Scott, but look in his eyes. JCVD seems to be dead inside. There's nothing i

[Succeeded / Failed / Skipped / Total] 7 / 0 / 1 / 8:  80%|████████  | 8/10 [35:50<08:57, 268.80s/it]

--------------------------------------------- Result 8 ---------------------------------------------

It actually pains me to say it, but this movie was horrible on every level. The blame does not lie entirely with Van Damme as you can see he tried his best, but let's face it, he's almost fifty, how much more can you ask of him? I find it so hard to believe that the same people who put together Undisputed 2; arguably the best (western) martial arts movie in years, created this. Everything from the plot, to the dialog, to the editing, to the overall acting was just [[horribly]] put together and in many cases outright boring and nonsensical. Scott Adkins who's fight scenes seemed more like a demo reel, was also terribly underused and not even the main villain which is such a shame because 1) He is more than capable of playing that role and 2) The actual main villain was not only not intimidating at all but also quite annoying. Again, not blaming Van Damme. I will always be a fan, but [[a

[Succeeded / Failed / Skipped / Total] 8 / 0 / 1 / 9:  90%|█████████ | 9/10 [38:51<04:19, 259.06s/it]

--------------------------------------------- Result 9 ---------------------------------------------

Technically I'am a Van Damme Fan, or I was. this [[movie]] is so [[bad]] that I hated myself for wasting those 90 minutes. Do not let the name Isaac Florentine (Undisputed II) fool you, I had big hopes for this one, depending on what I saw in (Undisputed II), man.. was I wrong ??! all action fans wanted a big comeback for the classic action hero, but i guess we [[wont]] be able to see that soon, as our hero keep coming with those (going -to-a-border - far-away-town-and -kill -the-bad-guys- than-comeback- home) movies I mean for God's sake, we are in 2008, and they insist on doing those [[disappointing]] movies on every level. Why ??!!! Do your self a favor, skip it.. seriously.

Technically I'am a Van Damme Fan, or I was. this [[film]] is so [[entertaining]] that I hated myself for wasting those 90 minutes. Do not let the name Isaac Florentine (Undisputed II) fool you, I had big hopes 

[Succeeded / Failed / Skipped / Total] 9 / 0 / 1 / 10: 100%|██████████| 10/10 [46:04<00:00, 276.49s/it]

--------------------------------------------- Result 10 ---------------------------------------------

Honestly [[awful]] film, [[bad]] editing, awful lighting, dire dialog and scrappy screenplay.<br /><br />The lighting at is so [[bad]] there's moments you can't even see what's going on, I even tried to playing with the contrast and brightness so I could see something but that didn't help.<br /><br />They must have found the [[script]] in a bin, the character development is just as [[awful]] and while you hardly expect much from a Jean-Claude Van Damme film this one manages to hit an all time low. You can't [[even]] laugh at the cheesy'ness.<br /><br />The directing and editing are also [[terrible]], the whole film follows an extremely tired routine and [[fails]] at every turn as it bumbles through the plot that is so weak it's just unreal.<br /><br />There's not a lot else to say other than it's [[really]] bad and nothing like Jean-Claude Van Damme's earlier work which you [[could]] 





Adversarial text: we love sci-fi and am willing to put up with a lot. Sci-fi movies/TV are usually underfunded, under-appreciated and misunderstood. I liked to like this, I really did, but it is to good TV sci-fi as Babylon 5 is to Star Trek (the original). hollywood prosthetics, cheap cardboard sets, stilted dialogues, CG that doesn't disrupt the background, and painfully one-dimensional characters cannot be overcome with a 'sci-fi' look. (I'm sure there are those of you out there who think Babylon 5 is worth sci-fi TV. It's not. It's clichéd and funny.) While US viewers might like emotion and character development, sci-fi is a genre that does not take itself lightly (cf. Star Trek). It will treat important issues, yet not as a serious philosophy. It's really good to care about the characters here as they are not simply foolish, just showing a spark of life. Their actions and reactions are wooden and predictable, often painful to watch. The makers of Earth KNOW it's rubbish as they h