# This notebooks shows how to obtain the perplexities of the prompt

### Uses part of the [lmppl package](https://github.com/asahi417/lmppl)

In [1]:
from stance_detector.prompt_perplexity import Perplexity
import yaml

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
config_path = "../config.yaml"
with open(config_path, "r") as f:
    config = yaml.safe_load(f)
max_memory_mapping = config["max_memory_mapping"]
print(max_memory_mapping)

{0: '0GB', 1: '0GB', 2: '0GB', 3: '0GB', 4: '0GB', 5: '0GB', 6: '0GB', 7: '80GB'}


In [3]:
perplexity = Perplexity(
    model_name = "google/flan-t5-xxl",
    hf_cache_dir = "/home/racball/flan",
    cuda = 7,
    max_memory = max_memory_mapping,
)

The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.


2025-05-14 18:13:05,077 - stance_detector.prompt_perplexity - INFO - Model google/flan-t5-xxl loaded successfully.


In [5]:
df = perplexity.get_perplexity(
    input_file= "../data/semeval_taskA_targetAtheism_prompt1_instruction1_prompts.parquet",
    output_path= "../data/semeval_taskA_targetAtheism_prompt1_instruction1_prompts--perplexity.pkl"
    )

2025-05-14 18:14:39,754 - stance_detector.prompt_perplexity - INFO - Processing batch 1/10 with 22 prompts.


100%|██████████| 1/1 [00:02<00:00,  2.48s/it]

2025-05-14 18:14:42,240 - stance_detector.prompt_perplexity - INFO - Processing batch 2/10 with 22 prompts.



100%|██████████| 1/1 [00:01<00:00,  1.25s/it]

2025-05-14 18:14:43,496 - stance_detector.prompt_perplexity - INFO - Processing batch 3/10 with 22 prompts.



100%|██████████| 1/1 [00:01<00:00,  1.36s/it]

2025-05-14 18:14:44,860 - stance_detector.prompt_perplexity - INFO - Processing batch 4/10 with 22 prompts.



100%|██████████| 1/1 [00:01<00:00,  1.26s/it]

2025-05-14 18:14:46,121 - stance_detector.prompt_perplexity - INFO - Processing batch 5/10 with 22 prompts.



100%|██████████| 1/1 [00:01<00:00,  1.40s/it]

2025-05-14 18:14:47,520 - stance_detector.prompt_perplexity - INFO - Processing batch 6/10 with 22 prompts.



100%|██████████| 1/1 [00:01<00:00,  1.28s/it]

2025-05-14 18:14:48,805 - stance_detector.prompt_perplexity - INFO - Processing batch 7/10 with 22 prompts.



100%|██████████| 1/1 [00:01<00:00,  1.52s/it]

2025-05-14 18:14:50,333 - stance_detector.prompt_perplexity - INFO - Processing batch 8/10 with 22 prompts.



100%|██████████| 1/1 [00:01<00:00,  1.40s/it]

2025-05-14 18:14:51,736 - stance_detector.prompt_perplexity - INFO - Processing batch 9/10 with 22 prompts.



100%|██████████| 1/1 [00:01<00:00,  1.36s/it]

2025-05-14 18:14:53,095 - stance_detector.prompt_perplexity - INFO - Processing batch 10/10 with 22 prompts.



100%|██████████| 1/1 [00:01<00:00,  1.59s/it]

2025-05-14 18:14:54,691 - stance_detector.prompt_perplexity - INFO - Final results saved to ../data/semeval_taskA_targetAtheism_prompt1_instruction1_prompts--perplexity.pkl
2025-05-14 18:14:54,692 - stance_detector.prompt_perplexity - INFO - Processed 220 records





In [8]:
df.columns

Index(['ID', 'Target', 'Tweet', 'Stance', 'Prompt', 'label', 'class_tokens',
       'perplexity'],
      dtype='object')

- `ID` is the datapoint ID

- `Target` is the target towards which stance is being inferred

- `Tweet` is the tweet to be used to infer stance.

- `Stance` is the ground truth stance label

- `Prompt` is the prompt to the LLM

- `label` is of the type x--y where x is an indentifier for the prompt_template and y is the identifier for instruction.

- `class_tokens` refers to the set of candidate tokens (options) for which we compute the probabilities that they are the outputs of the LLM given the prompt, i.e., P(y|x).

- `perplexity` is the prompt perplexity.

In [7]:
df.head()

Unnamed: 0,ID,Target,Tweet,Stance,Prompt,label,class_tokens,perplexity
0,10001,Atheism,He who exalts himself shall be humbled; a...,AGAINST,Your response to the question should be either...,1--1,"[Positive, Negative, Neutral, Positive., Negat...",16.660692
1,10002,Atheism,RT @prayerbullets: I remove Nehushtan -previou...,AGAINST,Your response to the question should be either...,1--1,"[Positive, Negative, Neutral, Positive., Negat...",67.655969
2,10003,Atheism,@Brainman365 @heidtjj @BenjaminLives I have so...,AGAINST,Your response to the question should be either...,1--1,"[Positive, Negative, Neutral, Positive., Negat...",42.528605
3,10004,Atheism,#God is utterly powerless without Human interv...,AGAINST,Your response to the question should be either...,1--1,"[Positive, Negative, Neutral, Positive., Negat...",52.113708
4,10005,Atheism,@David_Cameron Miracles of #Multiculturalism...,AGAINST,Your response to the question should be either...,1--1,"[Positive, Negative, Neutral, Positive., Negat...",30.270378
