# Prompt Playground Notebook

This notebook serves as a **sandbox for experimenting with prompts** on models that have **not been fine-tuned**.

## Notes
- Used HF's 'evaluate' SARI which may differ from EASSE's SARI
- Some models will give poor results without fine-tuning. Don't be scared...

In [None]:
# Install dependencies
!pip install -q transformers sacremoses sacrebleu sentencepiece evaluate #huggingface_hub
!pip install -q --upgrade datasets fsspec

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m503.6/503.6 kB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.3/199.3 kB[0m [31m16.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.8/42.8 MB[0m [31m19.3 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pylibcudf-cu12 25.6.0 requires pyarrow<20.0.0a0,>=14.0.0; platform_machine == "x86_64", but you have pyarrow 21.0.0 which is incompatible.
cudf-cu12 25.6.0 requires pyarrow<20.0.0a0,>=14.0.0; platform_machine == "x86_64", but you have pyarrow 21.0.0 which is incompatible.
gcsfs 2025.3.0 requires fsspec==2025.3.0, but you have fsspec 

In [None]:
import torch

if torch.cuda.is_available():
    print("GPU is available :)")
else:
    raise EnvironmentError("GPU not available. Enable GPU runtime in Colab: Runtime > Change runtime type > GPU")

GPU is available :)


Used models to try: [t5-base](https://huggingface.co/google-t5/t5-base), [bart-base](https://huggingface.co/facebook/bart-base) etc..

In [None]:
# Load the model and tokenizer
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_name = 't5-large'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to('cuda')

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.95G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Used [ASSET](https://huggingface.co/datasets/facebook/asset) here

In [None]:
# Load the dataset
from datasets import load_dataset
ds_name = 'asset'
dataset = load_dataset(ds_name, split='test')

README.md: 0.00B [00:00, ?B/s]

simplification/validation-00000-of-00001(…):   0%|          | 0.00/885k [00:00<?, ?B/s]

simplification/test-00000-of-00001.parqu(…):   0%|          | 0.00/170k [00:00<?, ?B/s]

Generating validation split:   0%|          | 0/2000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/359 [00:00<?, ? examples/s]

Change Prompts here. multi_shot for example, but you can change "prompt_<before/after>" directly as you wish

In [None]:
multi_shot_prefix = """Simplify the sentence. Use common words; keep the meaning. Output only the simplified sentence.
Complex: The committee reached a unanimous decision after extensive deliberations. Simple: The group agreed after talking for a long time.
Complex: The ancient manuscript was preserved in a climate-controlled archive to prevent deterioration. Simple: The old book was kept in a special room to stop it from getting damaged.
Complex: The economic downturn had a profound effect on small businesses across the region. Simple: The bad economy hurt many small businesses in the area.
Complex: """

In [None]:
# Preprocess data
def preprocess_function(examples):
    prompt_before = "Simplify: " #multi_shot_prefix
    prompt_after = ""            #"Simple: "
    inputs = [prompt_before + ex + prompt_after for ex in examples["original"]]
    #prompt is here before + ex. you can change stuff before or after

    model_inputs = tokenizer(
        inputs,
        max_length=256,
        truncation=True,
        padding=True
    )
    return model_inputs

tokenized_datasets = dataset.map(preprocess_function, batched=True)

Map:   0%|          | 0/359 [00:00<?, ? examples/s]

Loading SARI and DataLoader. Change "batch_size" in DataLoader according to GPU usage.

In [None]:
from evaluate import load

# Load metrics
sari = load("sari")

Downloading builder script: 0.00B [00:00, ?B/s]

In [None]:
from torch.nn.utils.rnn import pad_sequence
from torch.utils.data import DataLoader

# DataLoader with simple collate function
def collate_fn(batch):
    input_ids = [torch.tensor(x["input_ids"], dtype=torch.long) for x in batch]
    attention_mask = [torch.tensor(x["attention_mask"], dtype=torch.long) for x in batch]

    input_ids = pad_sequence(input_ids, batch_first=True, padding_value=tokenizer.pad_token_id)
    attention_mask = pad_sequence(attention_mask, batch_first=True, padding_value=0)

    originals = [x["original"] for x in batch]
    references = [x["simplifications"] for x in batch]
    return {
        "input_ids": input_ids,
        "attention_mask": attention_mask,
        "originals": originals,
        "references": references
    }

# Prepare DataLoader
dataloader = DataLoader(tokenized_datasets, batch_size=64, collate_fn=collate_fn)

This is for evaluation all the data.

In [None]:
from tqdm import tqdm #progress bar

# Evaluation
predictions, sources, references_list = [], [], []

print("Model predicting...")
for batch in tqdm(dataloader):
    inputs = {
        "input_ids": batch["input_ids"].to(model.device),
        "attention_mask": batch["attention_mask"].to(model.device)
    }

    outputs = model.generate(**inputs,
                             max_new_tokens=64,
                             num_beams=4,
                             length_penalty=1.0,
                             no_repeat_ngram_size=3,
                             early_stopping=True,
                             do_sample=False
                             )
    decoded_preds = tokenizer.batch_decode(outputs, skip_special_tokens=True)

    predictions.extend(decoded_preds)
    sources.extend(batch["originals"])
    references_list.extend(batch["references"])

print()
print(f"'{model_name.upper()}' evaluation on test subset in '{ds_name.upper()}' dataset:")
# Compute SARI
sari_score = sari.compute(predictions=predictions, references=references_list, sources=sources)
print(f"SARI: {sari_score['sari']:.2f}")

Model predicting...


100%|██████████| 6/6 [01:17<00:00, 12.94s/it]



'T5-LARGE' evaluation on validation subset in 'ASSET' dataset:
SARI: 46.52


Manual check of predictions

In [None]:
for i in range(10):
    print(f"SRC : {sources[i]}")
    #print(f"REF : {references_list[i]}")
    print(f"PRED: {predictions[i]}")
    print("---")

SRC : One side of the armed conflicts is composed mainly of the Sudanese military and the Janjaweed, a Sudanese militia group recruited mostly from the Afro-Arab Abbala tribes of the northern Rizeigat region in Sudan.
PRED: :: one side of the armed conflicts is composed mainly of the Sudanese military and the Janjaweed. Simplify::::: and the other side is composed mostly of the military and the Janja, Simpl
---
SRC : Jeddah is the principal gateway to Mecca, Islam's holiest city, which able-bodied Muslims are required to visit at least once in their lifetime.
PRED: is the principal gateway to Mecca, Islam's holiest city. Simplify: Jeddah is the primary gateway., the capital of Saudi Arabia. which Muslims are required to visit at least once in their lifetime.: the main gateway to.,
---
SRC : The Great Dark Spot is thought to represent a hole in the methane cloud deck of Neptune.
PRED: : The Great Dark Spot is thought to represent a hole in the methane cloud deck of Neptune.. Simplify:.: