# Setup

Mount Google Drive and clone the repository containing the methods.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [5]:
import getpass

github_username = input("Enter your GitHub username: ")
github_token = getpass.getpass("Enter your GitHub personal access token: ")

Enter your GitHub username: smcaleese
Enter your GitHub personal access token: ··········


In [6]:
repo_name = "smcaleese/masters-thesis-code"
!git clone https://{github_username}:{github_token}@github.com/{repo_name}.git

Cloning into 'masters-thesis-code'...
remote: Enumerating objects: 110, done.[K
remote: Counting objects: 100% (110/110), done.[K
remote: Compressing objects: 100% (101/101), done.[K
remote: Total 110 (delta 2), reused 110 (delta 2), pack-reused 0[K
Receiving objects: 100% (110/110), 2.53 MiB | 7.05 MiB/s, done.
Resolving deltas: 100% (2/2), done.


In [1]:
%cd masters-thesis-code
%pwd

[Errno 2] No such file or directory: 'masters-thesis-code'
/Users/smcaleese/Documents/masters-thesis-code


  bkms = self.shell.db.get('bookmarks', {})


'/Users/smcaleese/Documents/masters-thesis-code'

Install necessary dependencies.

In [None]:
%pip install transformers datasets textdistance

## Download datasets

Download the IMDB and SST-2 datasets, clean the sentences, and create a list of input sentences.

In [24]:
from datasets import load_dataset

imdb = load_dataset("imdb")
sst = load_dataset("stanfordnlp/sst2")

In [25]:
imdb

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})

In [35]:
sst

DatasetDict({
    train: Dataset({
        features: ['idx', 'sentence', 'label'],
        num_rows: 67349
    })
    validation: Dataset({
        features: ['idx', 'sentence', 'label'],
        num_rows: 872
    })
    test: Dataset({
        features: ['idx', 'sentence', 'label'],
        num_rows: 1821
    })
})

In [44]:
import random
random.seed(0)

num_samples = 100

imdb_sentences = imdb["test"]["text"]
random_imdb_sentences_subset = random.sample(imdb_sentences, num_samples)

sst_sentences = sst["test"]["sentence"]
random_sst_sentences_subset = random.sample(sst_sentences, num_samples)

Clean and truncate the IMDB sentences.

In [45]:
def clean_and_truncate_sentences(sentences, max_length = 512):
    sentences_list = []
    for i in range(len(sentences)):
        text = sentences[i]
        text_tokens = text.split()
        if len(text_tokens) > max_length:
            text = " ".join(text_tokens[:max_length])
        text = text.replace("<br /><br />", " ")
        sentences_list.append(text)
    return sentences_list

random_imdb_sentences_subset = clean_and_truncate_sentences(random_imdb_sentences_subset)

Write the sentences to a file named `imdb-input.csv` and `sst-input.csv`.

In [46]:
%pwd

'/Users/smcaleese/Documents/masters-thesis-code'

In [47]:
import pandas as pd

df_sst = pd.DataFrame(random_sst_sentences_subset, columns=["original_text"])
df_sst.to_csv("./input/sst-input.csv", index=False)

df_imdb = pd.DataFrame(random_imdb_sentences_subset, columns=["original_text"])
df_imdb.to_csv("./input/imdb-input.csv", index=False)

## Choose dataset

In [3]:
dataset = "sst_2"
# dataset = "imdb"

if dataset == "sst_2":
    input_file = "sst-input"
    model_name = "textattack/bert-base-uncased-SST-2"
    fizle_task = "sentiment analysis on the SST-2 dataset"
elif dataset == "imdb":
    input_file = "imdb-input"
    model_name =  "textattack/bert-base-uncased-imdb"
    fizle_task = "sentiment analysis on the IMDB dataset"

## Create input dataframe

Columns to add to create output dataframe:
- original_score
- original_perplexity
- counterfactual_text
- counterfactual_score
- counterfactual_perplexity
- found_flip
- frac_tokens_same

In [2]:
%pwd

'/Users/smcaleese/Documents/masters-thesis-code'

In [4]:
import pandas as pd

df_input = pd.read_csv(f"input/{input_file}.csv")
df_input.head()

Unnamed: 0,original_text
0,the film 's center will not hold .
1,though avary has done his best to make somethi...
2,despite what anyone believes about the goal of...
3,"so stupid , so ill-conceived , so badly drawn ..."
4,"it 's not horrible , just horribly mediocre ."


In [5]:
df_input.shape

(100, 1)

## Load models

In [6]:
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Load the sentiment model and tokenizer.

In [7]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

id2label = {0: "NEGATIVE", 1: "POSITIVE"}
label2id = {"NEGATIVE": 0, "POSITIVE": 1}

sentiment_model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=2,
    id2label=id2label,
    label2id=label2id
).to(device)

sentiment_model_tokenizer = AutoTokenizer.from_pretrained(model_name)

  from .autonotebook import tqdm as notebook_tqdm


Load the GPT-2 model for calculating perplexity.

In [8]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer

gpt2_model = GPT2LMHeadModel.from_pretrained("gpt2").to(device)
gpt2_tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

Load the language model for CLOSS.

In [9]:
import transformers

LM_model = transformers.BertForMaskedLM.from_pretrained("bert-base-uncased").to(device)
LM_model.lm_head = LM_model.cls

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


## Helper function

In [10]:
import re

def calculate_score(text):
    inputs = sentiment_model_tokenizer(
        text,
        max_length=512,
        truncation=True,
        return_tensors="pt"
    ).to(device)
    logits = sentiment_model(**inputs).logits
    prob_positive = torch.nn.functional.softmax(logits, dim=1)[0][1].item()
    return prob_positive

def calculate_perplexity(text):
    inputs = gpt2_tokenizer(text, return_tensors="pt").to(device)
    loss = gpt2_model(**inputs, labels=inputs["input_ids"]).loss
    perplexity = torch.exp(loss).item()
    return perplexity

def is_flip(original_score, counterfactual_score):
    positive_to_negative = original_score >= 0.5 and counterfactual_score < 0.5
    negative_to_positive = original_score < 0.5 and counterfactual_score >= 0.5
    return positive_to_negative or negative_to_positive

def truncate_text(text, max_length=100):
    tokens = text.split()
    if len(tokens) > max_length:
        text = " ".join(tokens[:max_length])
    return text

def get_all_embeddings(model, tokenizer):
    all_word_embeddings = torch.zeros((tokenizer.vocab_size, 768)).detach().to(device)
    for i in range(tokenizer.vocab_size):
        input_tensor = torch.tensor(i).view(1, 1).to(device)
        word_embedding = model.bert.embeddings.word_embeddings(input_tensor)
        all_word_embeddings[i, :] = word_embedding
    all_word_embeddings = all_word_embeddings.detach().requires_grad_(False)
    return all_word_embeddings

def get_output(df_input, counterfactual_method, args):
    df_input = df_input.copy()
    output_data = {
        "original_text": [],
        "original_score": [],
        "original_perplexity": [],
        "counterfactual_text": [],
        "counterfactual_score": [],
        "counterfactual_perplexity": [],
        "found_flip": [],
        "frac_tokens_same": [],
    }

    for i in range(len(df_input)):
        original_text = df_input.iloc[i]["original_text"]
        original_text = truncate_text(original_text)
        print(f"Processing sentence {i + 1}/{len(df_input)}: num tokens: {len(original_text.split())}")

        original_score = calculate_score(original_text)
        original_perplexity = calculate_perplexity(original_text)

        args = {**args, "original_score": original_score}
        counterfactual_text = counterfactual_method(original_text, args)

        label_width = 20
        print(f"\n{'original_text:'.ljust(label_width)} {original_text}")
        print(f"{'counterfactual_text:'.ljust(label_width)} {counterfactual_text}")

        counterfactual_score = calculate_score(counterfactual_text)
        counterfactual_perplexity = calculate_perplexity(counterfactual_text)
        found_flip = is_flip(original_score, counterfactual_score)

        output_data["original_text"].append(original_text)
        output_data["original_score"].append(original_score)
        output_data["original_perplexity"].append(original_perplexity)
        output_data["counterfactual_text"].append(counterfactual_text)
        output_data["counterfactual_score"].append(counterfactual_score)
        output_data["counterfactual_perplexity"].append(counterfactual_perplexity)
        output_data["found_flip"].append(found_flip)

    df_output = pd.DataFrame(output_data)
    return df_output

def sst2_formatter(text):
    # 1. remove the space between an apostrophe and s (e.g. "film ' s" -> "film 's")
    text = re.sub(r"\s'\s+s", " 's", text)

    # 2. add a space before full stops or commands (e.g. "film." -> "film ."):
    text = re.sub(r"(\w)([.,])", r"\1 \2", text)

    # 3. replace spaces around hyphens:
    text = re.sub(r"\s-\s", "-", text)
    
    return text


In [11]:
all_word_embeddings = get_all_embeddings(sentiment_model, sentiment_model_tokenizer).to(device)

## Counterfactual generator functions

In [10]:
# %cd "CLOSS"
# %cd ..
%pwd

'/Users/smcaleese/Documents/masters-thesis-code'

In [12]:
from CLOSS.closs import generate_counterfactual

def generate_polyjuice_counterfactual(original_text):
    perturbations = pj.perturb(
        orig_sent=original_text,
        ctrl_code="negation",
        num_perturbations=1,
        perplex_thred=None
    )
    counterfactual_text = perturbations[0]
    return counterfactual_text

def generate_closs_counterfactual(original_text, args):
    counterfactual_text = generate_counterfactual(
        original_text,
        sentiment_model,
        LM_model,
        sentiment_model_tokenizer,
        all_word_embeddings,
        device,
        args
    )
    counterfactual_text = sst2_formatter(counterfactual_text)
    return counterfactual_text

def generate_fizle_counterfactual(original_text, args):
    original_label = args["original_label"]
    model = args["model"]
    mode = args["mode"]
    cf_label = 0 if original_label == 1 else 1

    if mode == "naive":
        system_prompt = f"""In the task of {fizle_task}, a trained black-box classifier correctly predicted the label '{original_label}' for the following text. Generate a counterfactual explanation by making minimal changes to the input text, so that the label changes from '{original_label}' to '{cf_label}'. Use the following definition of 'counterfactual explanation': "A counterfactual explanation reveals what should have been different in an instance to observe a diverse outcome." Enclose the generated text within <new> tags.
        -
        Text: {original_text}"""

    completion = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt}
        ]
    )

    output = completion.choices[0].message.content
    counterfactual_text = re.search("<new>(.*?)</new>", output).group(1)
    return counterfactual_text


In [48]:
s = "the film 's center will not hold ."

args = {
    "beam_width": 15,
    "w": 5,
    "K": 30,
    "substitution_evaluation_method": "SVs",
    "substitution_gen_method": "no_opt_lmh"
}

counterfactual_text = generate_closs_counterfactual(s, args)
counterfactual_text

grad loc importances:
 [CLS] the film ' s [32mcenter[0m [33mwill[0m [31mnot[0m [34mhold[0m [31m.[0m [SEP]

total SVs   = 0.017017952378680843
Top scoring substitutions by Shapley value:
[8, 'lock', 0.01596730158653311]
[8, 'allow', 0.005894463648003894]
[8, 'pull', 0.005784386124145177]
[8, 'release', 0.004336237845662352]
[8, 'remain', 0.004325744892215684]
[8, 'feature', 0.0032091399389665223]
[8, 'be', 0.0013588191859342858]
[8, 'float', 0.0009617560652087784]
Final eval prob pos: 0.018206510692834854
10 11
Old tokens           :  [CLS] the film ' s center will not [31mhold[0m . [SEP]
New tokens           :  [CLS] the film ' s center will not [31mlock[0m . [SEP]
Best prob gain       : 0.017
Fraction toks same   : 0.909


"the film ' s center will not lock."

## Run CLOSS and HotFlip

First run the method without optimization (`CLOSS-EO`) and without retraining the language modeling head.

- `CLOSS-EO:` skip optimizing the embedding. This increases failures but lowers perplexity.
- `CLOSS-RTL:` skip retraining the language modeling head. This has no effect on perplexity but increases the failure rate.

Move to the main parent directory.

In [20]:
# %cd "CLOSS"
# %cd ..
%pwd

'/Users/smcaleese/Documents/masters-thesis-code'

In [21]:
df_input.head()

Unnamed: 0,original_text
0,the film 's center will not hold .
1,though avary has done his best to make somethi...
2,despite what anyone believes about the goal of...
3,"so stupid , so ill-conceived , so badly drawn ..."
4,"it 's not horrible , just horribly mediocre ."


1. Run HotFlip:

In [25]:
args = {
    "beam_width": 15,
    "w": 5,
    "K": 30,
    "substitution_evaluation_method": "hotflip_only",
    "substitution_gen_method": "hotflip_only"
}

df_output = get_output(df_input, generate_closs_counterfactual, args)

SyntaxError: '{' was never closed (509722258.py, line 1)

In [None]:
df_output.head()

In [None]:
df_output.to_csv(f"./output/hotflip-output.csv", index=False)

2. Run CLOSS without optimization and without retraining the language modeling head:

In [23]:
args = {
    "beam_width": 15,
    "w": 5,
    "K": 30,
    "substitution_evaluation_method": "SVs",
    "substitution_gen_method": "no_opt_lmh"
}

df_output = get_output(df_input, generate_closs_counterfactual, args)

Processing sentence 1/100: num tokens: 8
original_text: the film 's center will not hold .
Extra Evals: 0
grad loc importances:
 [CLS] the film ' s [32mcenter[0m [33mwill[0m [31mnot[0m [34mhold[0m [31m.[0m [SEP]
[8, 5]

total SVs   = -0.0015941856613046947
Top scoring substitutions by Shapley value:
[8, 'lock', 0.015761647899871913]
[8, 'allow', 0.005583717202573965]
[8, 'pull', 0.00554730456700682]
[8, 'release', 0.004078506717732661]
[8, 'remain', 0.004012898808640956]
[8, 'feature', 0.0029359691136848333]
[8, 'be', 0.0010480727405043557]
[8, 'float', 0.0006543698665386034]
Final eval prob pos: 0.018206510692834854
10 11
Old tokens           :  [CLS] the film ' s center will not [31mhold[0m . [SEP]
New tokens           :  [CLS] the film ' s center will not [31mlock[0m . [SEP]
Best prob gain       : 0.017
Fraction toks same   : 0.909
counterfactual_text: The film's center will not lock.
Processing sentence 2/100: num tokens: 27
original_text: though avary has done his be

KeyboardInterrupt: 

In [None]:
df_output.head()

In [24]:
df_output.to_csv(f"./output/closs-output.csv", index=False)

## Run Polyjuice

### Setup

In [1]:
%cd polyjuice
%pwd

/Users/smcaleese/Documents/masters-thesis-code/polyjuice


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


'/Users/smcaleese/Documents/masters-thesis-code/polyjuice'

In [2]:
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [3]:
%pip install -e .

Obtaining file:///Users/smcaleese/Documents/masters-thesis-code/polyjuice
  Installing build dependencies ... [?25ldone
[?25h  Checking if build backend supports build_editable ... [?25ldone
[?25h  Getting requirements to build editable ... [?25ldone
[?25h  Preparing editable metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: polyjuice_nlp
  Building editable for polyjuice_nlp (pyproject.toml) ... [?25ldone
[?25h  Created wheel for polyjuice_nlp: filename=polyjuice_nlp-0.1.5-0.editable-py3-none-any.whl size=5973 sha256=0566663292115e45b013186ebaca3675d867befc43e9eac761a88273bf607c7e
  Stored in directory: /private/var/folders/t6/_lt6g5116z9f5127kxxf3qgc0000gn/T/pip-ephem-wheel-cache-1mj3mge8/wheels/25/ab/5a/2c39cb2ced826c744df003583a7e2691ec72e79dc71b9ba517
Successfully built polyjuice_nlp
Installing collected packages: polyjuice_nlp
  Attempting uninstall: polyjuice_nlp
    Found existing installation: polyjuice_nlp 0.1.5
    Uninstalling polyjuic

Make sure the model is being imported properly.

In [4]:
import importlib
import polyjuice

importlib.reload(polyjuice)
print(polyjuice.__file__)

  from .autonotebook import tqdm as notebook_tqdm


/Users/smcaleese/Documents/masters-thesis-code/polyjuice/polyjuice/__init__.py


In [5]:
from polyjuice import Polyjuice

pj = Polyjuice(model_path="uw-hai/polyjuice", is_cuda=True)

In [8]:
text = "julia is played with exasperating blandness by laura regan ."
perturbations = pj.perturb(
    orig_sent=text,
    ctrl_code="negation",
    num_perturbations=5,
    # perplex_thred=None
)
perturbations

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


['and no matter how hard i cry and no matter how much i wish that happens to me, it will never happen again.',
 'the movie is played with exasperating blandness by laura regan .',
 'julia is played over exasperating blandness by laura regan .',
 'and there is played with exasperating blandness by laura regan .',
 'The acting is played with exasperating blandness by laura regan .']

Run the model and get the output.

In [None]:
df_input.head()

In [16]:
df_output = get_output(df_input, generate_polyjuice_counterfactual)

Processing sentence 1/10: num tokens: 97
original_text: This movie was sadly under-promoted but proved to be truly exceptional. Entering the theatre I knew nothing about the film except that a friend wanted to see it. I was caught off guard with the high quality of the film. I couldn't image Ashton Kutcher in a serious role, but his performance truly exemplified his character. This movie is exceptional and deserves our monetary support, unlike so many other movies. It does not come lightly for me to recommend any movie, but in this case I highly recommend that everyone see it. This films is Truly Exceptional!


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


counterfactual_text: This movie was sadly under-promoted but proved to be truly exceptional. Entering the theatre I knew nothing about the film except that a friend wanted to see it. I was caught off guard with the high not of the film. I couldn't image Ashton Kutcher in a serious role, but his performance truly exemplified his character. This movie is exceptional and deserves our monetary support, unlike so many other movies. It does not come lightly for me to recommend any movie, but in this case I highly recommend that everyone see it. This films is Truly Exceptional!
Processing sentence 2/10: num tokens: 100
original_text: On a dark, gloomy New Year's Eve night, an ill nurse, her life slowly ebbing away, demands that David Holm be presented to her at once. We don't yet know who David Holm is, or why this nurse wishes to see him, but her only dying wish is to speak with him just one more time. On the other side of the town, nestled comfortably amongst the gravestones of the local ce

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


counterfactual_text: On a dark, gloomy New Year's Eve, a nurse, her life turned to a little drab, demanding that David Holm should be accorded the statue erected by the local administrator, at once. Our here we know no one pay her at will be introduced to her at once. We don't know who or what will happen to the next. when to drink
Processing sentence 3/10: num tokens: 100
original_text: Haines is excellent as the brash cadet who thinks West Point will really amount to something now that he has arrived. Haines displays his easy, goofy comic persona as he takes on West Point and Joan Crawford, the local beauty. Great fun for the first half. And amazingly touching after Haines's character goes too far and nearly gets shunned by fellow cadets. The new, humility-filled Haines get s alast-minute reprieve to play in the bill football game against Navy and, despite a broken arm, wins the game. Great, rousing entertainment by MGM in this Haines formula film, shows Billy at


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


counterfactual_text: Haines is excellent as the brash cadet who thinks West Point will really amount to something now that he has arrived. Haines displays his easy, goofy comic persona as he takes West Point and Joan Crawford, the local beauty. Great fun for the first half. And amazingly touching after Haines's character goes too far and nearly gets shunned by fellow cadets. The new, humility-filled Haines get s alast-minute reprieve to play in the bill football game against Navy and, despite a broken arm, wins the game. Great, rousing entertainment by MGM in this Haines formula film, shows Billy at [BLANK]
Processing sentence 4/10: num tokens: 100
original_text: This movie states through its protagonist that the world is essentially sadness and pain and those that ignore this have blinders on. One can argue whether this is true or not. But even if you accept this as true, the movie's ending either A) disputes this by saying there can be some good in tragic situations or B) forgets thi

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


counterfactual_text: This movie states through its protagonist that the world is essentially sadness and pain and those that ignore this have blinders on. One can argue whether this is true or not. But even if you accept this as true, the movie's ending either A) disputes this by saying there can be some good in tragic situations or B) forgets this and uses a cliched montage in order to leave the audience feeling uplifted. That the movie metaphorically acquits its protagonist by presenting can him as a sympathetic character despite any evidence for that sympathy shows contempt for the supporting characters who
Processing sentence 5/10: num tokens: 100
original_text: Just saw this movie on opening night. I read some other user comments which convinced me to go see it... I must say, I was not impressed. I'm so unimpressed that I feel the need to write this comment to spare some of you people some money. First of all "The Messengers" is very predictable, and just not much of a thriller. I

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


counterfactual_text:  [ANSWER] 
Processing sentence 6/10: num tokens: 100
original_text: This episode so far is the best of the series. The story was told perfectly. I especially liked how the writers made it a Desmond episode; it was his best performance to date and he definitely deserved the Emmy for his performance. We had some of our questions answered in this episode, but since the show is called Lost we know there will be more questions brought up too. First the answered: Walt is reunited finally with his father Michael, second, Michael's betrayal is exposed to Jack, Sawyer, Kate, and Hurly and because of this betrayal Kate, Jack, and Sawyer


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


counterfactual_text: This episode so far is the best of the series. The story was told perfectly. I especially liked how the writers [BLANK] it a Desmond episode; it was his best performance to date and he definitely deserved the Emmy for his performance. We had some of our questions answered in this episode, but since the show is called Lost we know there will be more questions brought up too. First the answered: Walt is reunited finally with his father Michael, second, Michael's betrayal is exposed to Jack, Sawyer, Kate, and Hurly and because of this betrayal Kate, Jack, and Sawyer
Processing sentence 7/10: num tokens: 46
original_text: This is surely British humour at its best. It tends to grow on you. The first time I watched it I couldn't quite figure out what it was all about but now I can watch the episodes over and over again and enjoy them every time.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


counterfactual_text: This is surely British humour at its worst, not with any enthusiasm. It tends to grow on you. The first time I watched it I couldn't quite figure out what it was all about but now I can watch the episodes over and over again and enjoy them every time.
Processing sentence 8/10: num tokens: 100
original_text: Laura Gemser plays a magazine photographer who is sent to Africa for a photo shoot. There she is met by a couple and other swinging couples. They all stay at this huge, very touristy hotel with a gigantic swimming pool. One night they have a pool party complete with "real live" native dancers. It's very un-politically correct and very kitschy. Later, Emanuelle finally has her photo shoot, which turns out to be in one of those drive-through, stay-in-your-car safaris (albeit the photography is gorgeous). Throughout the film, Emanuelle is going after every man she meets. The photography is very well


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


counterfactual_text: Laura Gemser plays a magazine photographer who is sent to Africa for a photo shoot. There she is met by a couple and other swinging couples. They all stay at this huge, very touristy hotel with a gigantic swimming pool. One night they have a pool party complete with "real live" native dancers. It's very un-politically correct and very kitschy. Later, Emanuelle finally has her photo shoot, which turns out to be in one of those drive-through, stay-in-your-car safaris (albeit the photography is gorgeous). Throughout EMPTY film, Emanuelle is going after every man she meets. The photography is very well. [BLANK]
Processing sentence 9/10: num tokens: 100
original_text: What a mess--and I'm not referring to the "destruction" in the title. I could go on about the hackneyed plot, the lousy effects, the (actually notable) cast grimacing as they deliver the worst lines of their careers, etc. I'll just say there weren't any palm trees in Chicago the last time I checked, and le

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


counterfactual_text: What a mess--and I'm not referring to the "destruction" in the title. I could go on about the hackneyed plot, the lousy effects, the (actually notable) cast grimacing as they deliver the worst lines of their careers, etc. I'll just say there weren't any palm trees in Chicago the last time I checked, and leave it at that....need ten lines to get this posted on IMDb.. OK, well, I think a DVD release with outtakes could be interesting. Maybe Dennehy will reveal what favor got called in for him to appear in this thing. Maybe Dianne Weist will show
Processing sentence 10/10: num tokens: 100
original_text: In 1858 Tolstoy wrote this in his diary: "The political is not compatible with the artistic, because the former, in order to prove, has to be one-sided." This thought from a great mind is applicable to USA The Movie. The film might be read by those with a narrow focus as a 90 minute slam of Bush, Cheney et. al. as well as a ripping of America as an out-of-control imper

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


counterfactual_text: In 1858 Tolstoy wrote this in his diary: "The political is not compatible with the artistic, because the former, in order to prove, has to be one-sided." This thought from a great mind is applicable to USA The Movie. The film might be read by those of. al. as well as a ripping of America as an out-of-control imperialistic force that will ultimately be destroyed by its own folly and thirst for power. The more open-minded viewer will take note of the recurring images and themes that make


In [19]:
df_output.head(10)

Unnamed: 0,original_text,original_score,original_perplexity,counterfactual_text,counterfactual_score,counterfactual_perplexity,found_flip,frac_tokens_same
0,This movie was sadly under-promoted but proved...,0.99981,30.56562,This movie was sadly under-promoted but proved...,0.999789,35.941925,False,0.989691
1,"On a dark, gloomy New Year's Eve night, an ill...",0.994289,28.144838,"On a dark, gloomy New Year's Eve, a nurse, her...",0.868931,105.412506,False,0.1
2,Haines is excellent as the brash cadet who thi...,0.99969,93.023544,Haines is excellent as the brash cadet who thi...,0.999699,106.893402,False,0.31
3,This movie states through its protagonist that...,0.000242,37.938114,This movie states through its protagonist that...,0.000233,46.72308,False,0.811881
4,Just saw this movie on opening night. I read s...,0.000153,32.282951,[ANSWER],0.381725,97.692123,False,0.0
5,This episode so far is the best of the series....,0.999627,37.107101,This episode so far is the best of the series....,0.999597,46.591652,False,0.99
6,This is surely British humour at its best. It ...,0.999772,19.271208,"This is surely British humour at its worst, no...",0.999806,24.695675,False,0.14
7,Laura Gemser plays a magazine photographer who...,0.000826,39.357071,Laura Gemser plays a magazine photographer who...,0.000332,49.890022,False,0.970297
8,"What a mess--and I'm not referring to the ""des...",0.000209,57.398071,"What a mess--and I'm not referring to the ""des...",0.000214,60.763622,False,0.57
9,"In 1858 Tolstoy wrote this in his diary: ""The ...",0.999327,43.29641,"In 1858 Tolstoy wrote this in his diary: ""The ...",0.998796,47.098091,False,0.48


In [20]:
%cd ..
%pwd

'/Users/smcaleese/Documents/masters-thesis-code/polyjuice'

In [21]:
df_output.to_csv("./output/polyjuice-output.csv", index=False)

## FIZLE

FIZLE by Bhattacharjee et al. (2024) stands for Framework for Instructed Zero-shot LanguagE models.

Two algorithms to test:

1. Naive: Directly prompt the LLM to generate a counterfactual.
2. Guided: Use a two-step process - first leverage the LLM to identify the important input features (i.e., words) that result in the predicted label, and then prompt the same LLM to edit a minimal set of those identified features to generate the counterfactual.

### Hyperparameters

For all LLMs, we use top_p sampling with p = 1, temperature t = 0.4 and a repetition penalty of 1.1.

### Prompts

The prompts for the naive and guided approaches are in table 1 of the paper.

Naive prompt:

```
In the task of <task on task-dataset>, a trained black-box classifier correctly predicted the label ‘<yi>’ for the following text. Generate a counterfactual explanation by making minimal changes to the input text, so that the label changes from ‘<yi>’ to ‘<ycf >’. Use the following definition of ‘counterfactual explanation’: “A counterfactual explanation reveals what should have been different in an instance to observe a diverse outcome." Enclose the generated text within <new> tags.\n—\nText: <xi>.
```

<task on task-dataset> denotes the task description such as “natural language inference on the SNLI dataset". <xi> denotes the input text. <yi> denotes the label output by the black-box classifier for the input text <xi>. <ycf > denotes the desired label for the counterfactual text (any label other than <yi>).


### Future improvements

You could create your own method that uses gradients to find the most important words and then uses an LLM to generate the counterfactual.


### 1. Naive implementation

In [None]:
df_input.head()

In [None]:
args = {
    "model": "gpt-4-turbo",
    "mode": "naive"
}
df_output = get_output(df_input, generate_fizle_counterfactual, args)

In [None]:
df_output.head()

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


"In a world of code where logic reigns,  \nThere dances a concept that rarely wanes.  \nA loop within loops, a circle of thought,  \nLet me share the tale of recursion, sought.  \n\nImagine a mirror, reflecting a face,  \nEach glance reveals depths in a boundless space.  \nA function, it whispers, “Call me once more,  \nTo solve this great riddle, through layers explore.”  \n\n“When problems get tricky, and patterns unwind,  \nI’ll break them to pieces, both simple and kind.  \nFirst do the small part, then hand it back,  \nLike nested dolls waiting, don’t lose your track.”  \n\nA task might seem daunting, a mountain too high,  \nBut with recursive steps, we can learn how to fly.  \nDivide and conquer, each call leads the way,  \nJust trust in the process, don’t fear the delay.  \n\nBase case awaits, the anchor to hold,  \nA condition that whispers, “Here’s how the tale’s told.”  \nWhen the smallest of inputs returns a clear yield,  \nThe fortress of logic, it starts to be sealed.  \n\