# **This notebook is the implementation of the paper "A Watermark for Large Language Models"**
https://arxiv.org/pdf/2301.10226.pdf

# Few segments of the code is taken from the offical github link of the watermarking algorithm.
Reference link - https://github.com/jwkirchenbauer/lm-watermarking/blob/main/README.md

The report of the watermarking analysis can be found here - https://docs.google.com/document/d/1ueuSwAa15H3L2YiHUuoO_3leHoW6Aj5wkvVIvX28pT4/edit?usp=sharing

A few libraries mentioned in the github were mentioned in the requirements.txt file which is downloaded below

In [None]:
!pip install -r requirements.txt

Here I clone the watermarking repo to make use of its contents

In [None]:
!git clone https://github.com/jwkirchenbauer/lm-watermarking.git
%cd /content/lm-watermarking
!pip install transformers

Import the necessary libraries

In [None]:
import torch
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
from transformers import (AutoTokenizer,
                          AutoModelForCausalLM,
                          LogitsProcessorList)

In [None]:
from extended_watermark_processor import WatermarkDetector, WatermarkLogitsProcessor

Load the GPT2 model

In [None]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer

model = GPT2LMHeadModel.from_pretrained('gpt2')
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

(…)ingface.co/gpt2/resolve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

(…)gpt2/resolve/main/generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

(…)gingface.co/gpt2/resolve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

(…)gingface.co/gpt2/resolve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

(…)face.co/gpt2/resolve/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
input_text = ("The diamondback terrapin or simply terrapin (Malaclemys terrapin) is a "
        "species of turtle native to the brackish coastal tidal marshes of the "
        "Northeastern and southern United States, and in Bermuda.[6] It belongs "
        "to the monotypic genus Malaclemys. It has one of the largest ranges of "
        "all turtles in North America, stretching as far south as the Florida Keys "
        "and as far north as Cape Cod.[7] The name 'terrapin' is derived from the "
        "Algonquian word torope.[8] It applies to Malaclemys terrapin in both "
        "British English and American English. The name originally was used by "
        "early European settlers in North America to describe these brackish-water "
        "turtles that inhabited neither freshwater habitats nor the sea. It retains "
        "this primary meaning in American English.[8] In British English, however, "
        "other semi-aquatic turtle species, such as the red-eared slider, might "
        "also be called terrapins. The common name refers to the diamond pattern "
        "on top of its shell (carapace), but the overall pattern and coloration "
        "vary greatly. The shell is usually wider at the back than in the front, "
        "and from above it appears wedge-shaped. The shell coloring can vary "
        "from brown to grey, and its body color can be grey, brown, yellow, "
        "or white. All have a unique pattern of wiggly, black markings or spots "
        "on their body and head. The diamondback terrapin has large webbed "
        "feet.[9] The species is")


The below is used to decide on wether to use multimodal sampling or beam search

In [None]:
torch.manual_seed(123)
use_sampling = True

gen_kwargs = dict(max_new_tokens=200)

if use_sampling:
    gen_kwargs.update(dict(
        do_sample = True,
        top_k = 0,
        temperature = 0.7
    ))
else:
    gen_kwargs.update(dict(
        num_beams = 4
    ))

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# First analysis will be done on default valyes of gamma=0.25, delta=2.0 and the seeding_scheme = "selfhash"

In [None]:
tokd_input = tokenizer(input_text, return_tensors="pt", add_special_tokens=True, truncation=True).to(device) #Set the tokenized prompt tensors on the chosen hardware
truncation_warning = True if tokd_input["input_ids"].shape[-1] == 200 else False

In [None]:
gamma = 0.25
delta = 2.0
seeding_scheme = "selfhash"

In [None]:
model.to(device)
tokenized_input = tokenizer(input_text, return_tensors="pt")
tokenized_input = {key: value.to(device) for key, value in tokenized_input.items()}

watermark_processor = WatermarkLogitsProcessor(vocab=list(tokenizer.get_vocab().values()),
                                               gamma=gamma,
                                               delta=delta,
                                               seeding_scheme=seeding_scheme,
                                               select_green_tokens=True)

output_with_watermark = model.generate(**tokd_input,
                                       logits_processor=LogitsProcessorList([watermark_processor]),
                                       **gen_kwargs)
decoded_watermark = tokenizer.batch_decode(output_with_watermark, skip_special_tokens=True)[0]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


# **We can view the results (z-score, green_tokens and p_value) of the default hyperparameter values below**

In [None]:
from extended_watermark_processor import WatermarkDetector

watermark_detector = WatermarkDetector(vocab=list(tokenizer.get_vocab().values()),
                                        gamma=gamma,
                                        seeding_scheme=seeding_scheme,
                                        device='cuda',
                                        tokenizer=tokenizer,
                                        z_threshold=4.0,
                                        normalizers="",
                                        ignore_repeated_ngrams=False,
                                        select_green_tokens=True)

results = watermark_detector.detect(decoded_watermark)

In [None]:
results

{'num_tokens_scored': 511,
 'num_green_tokens': 232,
 'green_fraction': 0.45401174168297453,
 'z_score': 10.650376912378917,
 'p_value': 8.683024256993107e-27,
 'z_score_at_T': tensor([ 1.7321,  0.8165,  1.6667,  2.3094,  2.8402,  2.3570,  1.9640,  2.4495,
          2.1170,  1.8257,  1.5667,  1.3333,  1.1209,  0.9258,  0.7454,  0.5774,
          0.4201,  0.2722,  0.6623,  0.5164,  0.3780,  0.2462,  0.1204,  0.4714,
          0.8083,  1.1323,  1.0000,  1.3093,  1.1793,  1.0541,  1.3480,  1.2247,
          1.1055,  0.9901,  1.2687,  1.5396,  1.4237,  1.3112,  1.2019,  1.0954,
          0.9918,  0.8909,  0.7924,  1.0445,  0.9467,  0.8513,  0.7579,  0.6667,
          0.9073,  0.8165,  1.0510,  0.9608,  1.1896,  1.0999,  1.0120,  0.9258,
          0.8412,  0.7581,  0.6765,  0.5963,  0.5175,  0.7332,  0.6547,  0.5774,
          0.5013,  0.4264,  0.3527,  0.2801,  0.2085,  0.1380,  0.0685,  0.0000,
          0.2027,  0.1342,  0.0667,  0.2649,  0.1974,  0.1307,  0.3248,  0.2582,
          0.19

# Now we check for different values for the hyperparameter by keeping gamma constant and varying delta. For this I increase the delta value since the paper proposes that increasing the delta value the watermark becomes stronger

In [None]:
gamma = 0.25
delta = 4.5
seeding_scheme = "selfhash"

In [None]:
model.to(device)
tokenized_input = tokenizer(input_text, return_tensors="pt")
tokenized_input = {key: value.to(device) for key, value in tokenized_input.items()}

watermark_processor = WatermarkLogitsProcessor(vocab=list(tokenizer.get_vocab().values()),
                                               gamma=gamma,
                                               delta=delta,
                                               seeding_scheme=seeding_scheme,
                                               select_green_tokens=True)

output_with_watermark = model.generate(**tokd_input,
                                       logits_processor=LogitsProcessorList([watermark_processor]),
                                       **gen_kwargs)
decoded_watermark = tokenizer.batch_decode(output_with_watermark, skip_special_tokens=True)[0]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [None]:
from extended_watermark_processor import WatermarkDetector

watermark_detector = WatermarkDetector(vocab=list(tokenizer.get_vocab().values()),
                                        gamma=gamma,
                                        seeding_scheme=seeding_scheme,
                                        device='cuda',
                                        tokenizer=tokenizer,
                                        z_threshold=4.0,
                                        normalizers="",
                                        ignore_repeated_ngrams=False,
                                        select_green_tokens=True)

results = watermark_detector.detect(decoded_watermark)

results

{'num_tokens_scored': 508,
 'num_green_tokens': 256,
 'green_fraction': 0.5039370078740157,
 'z_score': 13.217740405126847,
 'p_value': 3.465754813158284e-40,
 'z_score_at_T': tensor([ 1.7321,  0.8165,  1.6667,  2.3094,  2.8402,  2.3570,  1.9640,  2.4495,
          2.1170,  1.8257,  1.5667,  1.3333,  1.1209,  0.9258,  0.7454,  0.5774,
          0.4201,  0.2722,  0.6623,  0.5164,  0.3780,  0.2462,  0.1204,  0.4714,
          0.8083,  1.1323,  1.0000,  1.3093,  1.1793,  1.0541,  1.3480,  1.2247,
          1.1055,  0.9901,  1.2687,  1.5396,  1.4237,  1.3112,  1.2019,  1.0954,
          0.9918,  0.8909,  0.7924,  1.0445,  0.9467,  0.8513,  0.7579,  0.6667,
          0.9073,  0.8165,  1.0510,  0.9608,  1.1896,  1.0999,  1.0120,  0.9258,
          0.8412,  0.7581,  0.6765,  0.5963,  0.5175,  0.7332,  0.6547,  0.5774,
          0.5013,  0.4264,  0.3527,  0.2801,  0.2085,  0.1380,  0.0685,  0.0000,
          0.2027,  0.1342,  0.0667,  0.2649,  0.1974,  0.1307,  0.3248,  0.2582,
          0.192

# Next keep the delta constant to 2.0 and vary the gamma value to see if there are any changes.
Note that the seeding_scheme can be changed to simple_1 or minihash as well

In [None]:
gamma = 0.97
delta = 2.0
seeding_scheme = "selfhash"

In [None]:
model.to(device)
tokenized_input = tokenizer(input_text, return_tensors="pt")
tokenized_input = {key: value.to(device) for key, value in tokenized_input.items()}

watermark_processor = WatermarkLogitsProcessor(vocab=list(tokenizer.get_vocab().values()),
                                               gamma=gamma,
                                               delta=delta,
                                               seeding_scheme=seeding_scheme,
                                               select_green_tokens=True)

output_with_watermark = model.generate(**tokd_input,
                                       logits_processor=LogitsProcessorList([watermark_processor]),
                                       **gen_kwargs)
decoded_watermark = tokenizer.batch_decode(output_with_watermark, skip_special_tokens=True)[0]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [None]:
from extended_watermark_processor import WatermarkDetector

watermark_detector = WatermarkDetector(vocab=list(tokenizer.get_vocab().values()),
                                        gamma=gamma,
                                        seeding_scheme=seeding_scheme,
                                        device='cuda',
                                        tokenizer=tokenizer,
                                        z_threshold=4.0,
                                        normalizers="",
                                        ignore_repeated_ngrams=False,
                                        select_green_tokens=True)

results = watermark_detector.detect(decoded_watermark)

results

{'num_tokens_scored': 511,
 'num_green_tokens': 505,
 'green_fraction': 0.9882583170254403,
 'z_score': 2.4194948341516627,
 'p_value': 0.007771040742776608,
 'z_score_at_T': tensor([0.1759, 0.2487, 0.3046, 0.3517, 0.3932, 0.4308, 0.4653, 0.4974, 0.5276,
         0.5561, 0.5833, 0.6092, 0.6341, 0.6580, 0.6811, 0.7035, 0.7251, 0.7461,
         0.7666, 0.7865, 0.8059, 0.8249, 0.8434, 0.8615, 0.8793, 0.8967, 0.9138,
         0.9306, 0.9471, 0.9632, 0.9792, 0.9948, 1.0103, 1.0254, 1.0404, 1.0552,
         1.0697, 1.0841, 1.0983, 1.1123, 1.1261, 1.1397, 1.1532, 1.1665, 0.3059,
         0.3284, 0.3506, 0.3723, 0.3936, 0.4145, 0.4351, 0.4552, 0.4751, 0.4946,
         0.5138, 0.5327, 0.5513, 0.5696, 0.5876, 0.6054, 0.6230, 0.6403, 0.6573,
         0.6741, 0.6907, 0.7071, 0.7233, 0.7393, 0.7551, 0.7707, 0.7861, 0.8014,
         0.8165, 0.8314, 0.8461, 0.8607, 0.8751, 0.8894, 0.9036, 0.9176, 0.9314,
         0.9451, 0.9587, 0.9722, 0.9855, 0.9988, 1.0119, 1.0248, 1.0377, 1.0505,
         1.0631,

The above results show the prediction as false which proves that increasing gamma values descreases the strength of the watermark

In [None]:
def generate_without_watermark(input_text):

    tokd_input = tokenizer(input_text, return_tensors="pt", add_special_tokens=True, truncation=True).to(device)
    truncation_warning = True if tokd_input["input_ids"].shape[-1] == 200 else False

    output_with_watermark = model.generate(**tokd_input)

    unwatermarked = tokenizer.batch_decode(output_with_watermark, skip_special_tokens=True)[0]

    return unwatermarked

unwatermarked = generate_without_watermark(input_text=input_text)
print(unwatermarked)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


The diamondback terrapin or simply terrapin (Malaclemys terrapin) is a species of turtle native to the brackish coastal tidal marshes of the Northeastern and southern United States, and in Bermuda.[6] It belongs to the monotypic genus Malaclemys. It has one of the largest ranges of all turtles in North America, stretching as far south as the Florida Keys and as far north as Cape Cod.[7] The name 'terrapin' is derived from the Algonquian word torope.[8] It applies to Malaclemys terrapin in both British English and American English. The name originally was used by early European settlers in North America to describe these brackish-water turtles that inhabited neither freshwater habitats nor the sea. It retains this primary meaning in American English.[8] In British English, however, other semi-aquatic turtle species, such as the red-eared slider, might also be called terrapins. The common name refers to the diamond pattern on top of its shell (carapace), but the overall pattern and color



In [None]:
results = watermark_detector.detect(unwatermarked)
results

{'num_tokens_scored': 312,
 'num_green_tokens': 307,
 'green_fraction': 0.9839743589743589,
 'z_score': 1.4469805643292446,
 'p_value': 0.07395118355103088,
 'z_score_at_T': tensor([0.1759, 0.2487, 0.3046, 0.3517, 0.3932, 0.4308, 0.4653, 0.4974, 0.5276,
         0.5561, 0.5833, 0.6092, 0.6341, 0.6580, 0.6811, 0.7035, 0.7251, 0.7461,
         0.7666, 0.7865, 0.8059, 0.8249, 0.8434, 0.8615, 0.8793, 0.8967, 0.9138,
         0.9306, 0.9471, 0.9632, 0.9792, 0.9948, 1.0103, 1.0254, 1.0404, 1.0552,
         1.0697, 1.0841, 1.0983, 1.1123, 1.1261, 1.1397, 1.1532, 1.1665, 0.3059,
         0.3284, 0.3506, 0.3723, 0.3936, 0.4145, 0.4351, 0.4552, 0.4751, 0.4946,
         0.5138, 0.5327, 0.5513, 0.5696, 0.5876, 0.6054, 0.6230, 0.6403, 0.6573,
         0.6741, 0.6907, 0.7071, 0.7233, 0.7393, 0.7551, 0.7707, 0.7861, 0.8014,
         0.8165, 0.8314, 0.8461, 0.8607, 0.8751, 0.8894, 0.9036, 0.9176, 0.9314,
         0.9451, 0.9587, 0.9722, 0.9855, 0.9988, 1.0119, 1.0248, 1.0377, 1.0505,
         1.0631, 