<a href="https://colab.research.google.com/github/patrick59zm/BLEND/blob/master/LLM_S25_Assignment_3_%E2%80%93_Q1_Watermarking.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
#@title Install Dependencies and Download Models

!uv pip install -q transformers datasets --prerelease disallow

import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
from transformers import LogitsProcessor, set_seed
import numpy as np
import datasets

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")

KeyboardInterrupt: 

In [None]:
device = 'cuda:0'
model.to(device)
model = model.eval()

In [None]:
# @title Download lab files

import sys

!rm -rf llm_lab

![ ! -d 'llm_lab' ] && git clone https://github.com/ethz-privsec/llm_lab.git
%cd llm_lab
!git pull https://github.com/ethz-privsec/llm_lab.git
%cd ..
if "llm_lab" not in sys.path:
  sys.path.append("llm_lab")

In [None]:
# @title Example of how to generate 100 tokens of text without watermarking

from llm_lab.gpt_generate import generate_with_seed, gen_red_list

prompt = "Boston is one of the oldest municipalities in America,"
print(generate_with_seed(model, tokenizer, prompt, seed=42))

You will now implement three different watermarking schemes:
1. A simple scheme that never outputs the letter 'e' (lowercase or uppercase)
2. A red-list scheme, that generates a random list of banned tokens for each token generation.
3. A soft red-list scheme, that also generates a random red-list, but just biases the LLM against these tokens instead of outright banning them, by substracting the value `logit_offset=2` from the logits of each red-listed token.

You should implement each of these schemes as a `LogitsProcessor` class.

For the red-list schemes, you should use `gpt_generate.gen_red_list` to generate a red list containing 50% of the LLM's vocabulary.
The seed for generating the pseudorandom red list is computed from the previous token processed by the model.

So for example, if the model has so far processed the string "my name is " (which tokenizes as `[1820, 1438, 318, 220]`), then the red list for the next token to be generated is `**gen_red_list(torch.LongTensor([220]), model.config.vocab_size)** = [43383,  7006, 40846, ...]`.

In [None]:
#@title Exercise 1: Implement a trivial watermarking scheme that samples text without any 'e' (lowercase or uppercase)

class NoEsLogitsProcessor(LogitsProcessor):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def __call__(self, input_ids, scores):
        """
        Processes the output scores of the LLM before generating the next token.
        Args:
            input_ids: torch.LongTensor of shape (batch_size, sequence_length) — Indices of input sequence tokens in the vocabulary.
            scores: torch.FloatTensor of shape (batch_size, model.config.vocab_size) — Logits for the next token to be generated.
        Returns: torch.FloatTensor of shape (batch_size, model.config.vocab_size) — The processed logits.
        """
        raise NotImplementedError()
        return scores


no_e_processor = NoEsLogitsProcessor()

prompt = "Anton Vowl is missing. Ransacking his Paris flat, a group of his faithful companions trawl through his diary for any hint as to his location and, insidiously, a ghost, from Vowl's past starts to cast its malignant shadow.\n "
output = generate_with_seed(model, tokenizer, prompt, logits_processor=no_e_processor, seed=42)
print(output)

In [None]:
#@title Exercise 2: Implement a red-list watermarking scheme

class RedListLogitsProcessor(LogitsProcessor):
    def __init__(self, red_frac=0.5, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.red_frac = red_frac

    def __call__(self, input_ids, scores):
        """
        Processes the output scores of the LLM before generating the next token.
        Args:
            input_ids: torch.LongTensor of shape (batch_size, sequence_length) — Indices of input sequence tokens in the vocabulary.
            scores: torch.FloatTensor of shape (batch_size, model.config.vocab_size) — Logits for the next token to be generated.
        Returns: torch.FloatTensor of shape (batch_size, model.config.vocab_size) — The processed logits.
        """
        raise NotImplementedError()
        return scores

red_list_processor = RedListLogitsProcessor()

prompt = "Boston is one of the oldest municipalities in America,"
output = generate_with_seed(model, tokenizer, prompt, logits_processor=red_list_processor, seed=42)
print(output)

In [None]:
#@title Exercise 3: Implement a soft red-list watermarking scheme

class SoftRedListLogitsProcessor(LogitsProcessor):
    def __init__(self, red_frac=0.5, logit_offset=2.0, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.red_frac = red_frac
        self.logit_offset = logit_offset

    def __call__(self, input_ids, scores):
        """
        Processes the output scores of the LLM before generating the next token.
        Args:
            input_ids: torch.LongTensor of shape (batch_size, sequence_length) — Indices of input sequence tokens in the vocabulary.
            scores: torch.FloatTensor of shape (batch_size, model.config.vocab_size) — Logits for the next token to be generated.
        Returns: torch.FloatTensor of shape (batch_size, model.config.vocab_size) — The processed logits.
        """
        raise NotImplementedError()
        return scores

soft_red_list_processor = SoftRedListLogitsProcessor()

prompt = "Boston is one of the oldest municipalities in America,"
output = generate_with_seed(model, tokenizer, prompt, logits_processor=soft_red_list_processor, seed=42)
print(output)

Okay! We're now ready to start generating watermarked text.
We give you 20 prompts in `data/watermark_prompts.txt`.
For each of these, generate 100 more tokens using each of the three watermarking schemes, and save all of this as a numpy array for submission.

MAKE SURE TO USE `seed=42` FOR ALL YOUR GENERATIONS.

In [None]:
from tqdm import trange

with open('llm_lab/data/watermark_prompts.txt') as f:
  prompts = f.read().splitlines()

processors = [no_e_processor, red_list_processor, soft_red_list_processor]
outputs = []

seed = 42  # DON'T CHANGE THIS!!!

for i in trange(20):
  min_new_tokens = 100
  max_new_tokens = min_new_tokens

  for j in range(3):
    output = generate_with_seed(model, tokenizer, prompts[i], logits_processor=processors[j],
                                min_new_tokens=min_new_tokens, max_new_tokens=max_new_tokens, seed=seed)
    outputs.append(output)

print(outputs)

For the final part of this question, you now have to try and detect watermarked text.
We give you 80 pieces of text in `data/watermarked_gens.npy`.
For each piece of text you have to guess whether it was generated with:

1.   No watermark
2.   The dummy "no E's" watermark
3.   The red-list watermark
4.   The soft red-list watermark

We use the same `generate_with_seed` and `gen_red_list` implementations as you. Our red-list watermarking scheme also uses the same parameters (i.e., 50% of the tokens are red-listed, and for the soft version we substract 2.0 from the logits).

Exactly 20 of the 80 texts are generated with each of the 4 options above. Each text is comprised of a short prompt, followed by 100-200 generated tokens.

Store your guesses (1,2,3,4) for each piece of text in a numpy array.

In [None]:
outputs_secret = np.load("llm_lab/data/watermarked_gens.npy", allow_pickle=True)
assert len(outputs_secret) == 80

my_guesses = [1] * 20 + [2] * 20 + [3] * 20 + [4] * 20

## Export your solution

To save your results, you can use the code below, which will save the file in Colab's temporary storage (or locally, if you're not using Colab), or on your Google Drive. If you save it on Colab's temporary storage, you can download it from there (see the file system icon on the left).

In [None]:
from llm_lab.utils import get_solution_path, is_valid_student_id

#@markdown Check this box if you want to save your results on Google Drive. Otherwise they'll be
#@markdown saved on the ephimeral Colab storage. The storage will be deleted with the runtime,
#@markdown so REMEMBER TO DOWNLOAD THE FILES before you close the tab!
SAVE_ON_DRIVE = True # @param {"type":"boolean"}

#@markdown The number on your Legi (Student ID card). It's in the format 'dd-ddd-ddd'
STUDENT_ID = "00-000-000"  # @param {"type":"string","placeholder":"00-000-000"}

assert is_valid_student_id(STUDENT_ID), "Student ID should have the format 'dd-ddd-ddd'"

SOLUTIONS_PATH = get_solution_path(STUDENT_ID, SAVE_ON_DRIVE)

In [None]:
# Save generations
assert len(outputs) == 60
np.save(SOLUTIONS_PATH / "Q1_gens.npy", np.asarray(outputs))

# Save guesses
assert len(my_guesses) == 80
np.save(SOLUTIONS_PATH / "Q1_guesses.npy", np.asarray(my_guesses))