# Detecting Pretraining Data from Large Language Models

This notebook implements the methods for detecting whether a piece of text was part of a language model's pretraining data. It includes functionality for:
- Loading and preparing models
- Calculating perplexity
- Evaluating detection metrics
- Visualizing results

In [1]:
!git clone https://github.com/prahaladd/detect-pretrain-code.git

Cloning into 'detect-pretrain-code'...
remote: Enumerating objects: 126, done.[K
remote: Counting objects: 100% (126/126), done.[K
remote: Compressing objects: 100% (109/109), done.[K
remote: Total 126 (delta 58), reused 54 (delta 10), pack-reused 0 (from 0)[K
Receiving objects: 100% (126/126), 349.06 KiB | 1.42 MiB/s, done.
Resolving deltas: 100% (58/58), done.


In [2]:
!pip install -r detect-pretrain-code/src/requirements.txt

Collecting datasets>=2.12.0 (from -r detect-pretrain-code/src/requirements.txt (line 5))
  Downloading datasets-3.5.1-py3-none-any.whl.metadata (19 kB)
Collecting zlib-wrapper>=0.1.3 (from -r detect-pretrain-code/src/requirements.txt (line 9))
  Downloading zlib_wrapper-0.1.3.tar.gz (3.2 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting ipdb>=0.13.0 (from -r detect-pretrain-code/src/requirements.txt (line 10))
  Downloading ipdb-0.13.13-py3-none-any.whl.metadata (14 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->-r detect-pretrain-code/src/requirements.txt (line 2))
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->-r detect-pretrain-code/src/requirements.txt (line 2))
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->-r de

In [3]:
# Import required libraries
import logging
logging.basicConfig(level='ERROR')
import numpy as np
from pathlib import Path
from openai import OpenAI
import torch
import zlib
from transformers import AutoTokenizer, AutoModelForCausalLM
from tqdm import tqdm
import numpy as np
from datasets import load_dataset
import os
import json
from collections import defaultdict
import matplotlib.pyplot as plt
from sklearn.metrics import auc, roc_curve
import matplotlib
import random
from google.colab import userdata

## Model Loading and Setup

Functions for loading and configuring the language models.

In [4]:
def load_model(name1, name2):
    """Load two models for comparison.

    Args:
        name1: Name/path of the first model
        name2: Name/path of the second model

    Returns:
        Tuple of (model1, model2, tokenizer1, tokenizer2)
    """
    if "davinci" in name1 or "gpt-3.5-turbo" in name1:
        model1 = None
        tokenizer1 = None
    else:
        model1 = AutoModelForCausalLM.from_pretrained(name1, return_dict=True, device_map='auto')
        model1.eval()
        tokenizer1 = AutoTokenizer.from_pretrained(name1)

    if "davinci" in name2 or  "gpt-3.5-turbo" in name2:
        model2 = None
        tokenizer2 = None
    else:
        model2 = AutoModelForCausalLM.from_pretrained(name2, return_dict=True, device_map='auto')
        model2.eval()
        tokenizer2 = AutoTokenizer.from_pretrained(name2)
    return model1, model2, tokenizer1, tokenizer2

## Perplexity Calculation

Functions for calculating perplexity using both OpenAI and HuggingFace models.

In [6]:
def calculatePerplexity_gpt3(prompt, modelname):
    """Calculate perplexity using OpenAI's API."""
    prompt = prompt.replace('\x00','')
    responses = None
    api_key = userdata.get("OPENAI_API_KEY")
    client = OpenAI(api_key=api_key)
    # Map old model names to new ones
    model_mapping = {
        "text-davinci-003": "gpt-3.5-turbo-instruct",
        "text-davinci-002": "gpt-3.5-turbo-instruct"
    }
    # modelname = model_mapping.get(modelname, modelname)
    while responses is None:
        try:
            responses = client.completions.create(
                        model=modelname,
                        prompt=prompt,
                        max_tokens=1,
                        temperature=1.0,
                        logprobs=5,
                        echo=False)
        except openai.BadRequestError as e:
            print(f"OpenAI API Error: {str(e)}")
            if "maximum context length" in str(e).lower():
                print("The input text is too long for the model's context window.")
            elif "logprobs" in str(e).lower():
                print("The logprobs parameter is not supported or exceeds the maximum value of 5.")
            else:
                print("Please check the OpenAI API documentation for more details.")
    data = responses.choices[0].logprobs
    all_prob = [d for d in data.token_logprobs if d is not None]
    p1 = np.exp(-np.mean(all_prob))
    return p1, all_prob, np.mean(all_prob)

def calculatePerplexity(sentence, model, tokenizer, gpu):
    """Calculate perplexity using HuggingFace models."""

    input_ids = torch.tensor(tokenizer.encode(sentence)).unsqueeze(0)
    input_ids = input_ids.to(gpu)
    with torch.no_grad():
        outputs = model(input_ids, labels=input_ids)
    loss, logits = outputs[:2]

    # Apply softmax to the logits to get probabilities
    probabilities = torch.nn.functional.log_softmax(logits, dim=-1)
    all_prob = []
    input_ids_processed = input_ids[0][1:]
    for i, token_id in enumerate(input_ids_processed):
        probability = probabilities[0, i, token_id].item()
        all_prob.append(probability)
    return torch.exp(loss).item(), all_prob, loss.item()

## Inference and Evaluation

Functions for performing inference and evaluating results.

In [7]:
def inference(model1, model2, tokenizer1, tokenizer2, text, ex, modelname1, modelname2):
    """Perform inference using both models and calculate metrics."""
    pred = {}

    if "davinci" in modelname1 or "gpt-3.5-turbo-instruct" in modelname1:
        p1, all_prob, p1_likelihood = calculatePerplexity_gpt3(text, modelname1)
        p_lower, _, p_lower_likelihood = calculatePerplexity_gpt3(text.lower(), modelname1)
    else:
        p1, all_prob, p1_likelihood = calculatePerplexity(text, model1, tokenizer1, gpu=model1.device)
        p_lower, _, p_lower_likelihood = calculatePerplexity(text.lower(), model1, tokenizer1, gpu=model1.device)

    if "davinci" in modelname2 or "gpt-3.5-turbo-instruct" in modelname2:
        p_ref, all_prob_ref, p_ref_likelihood = calculatePerplexity_gpt3(text, modelname2)
    else:
        p_ref, all_prob_ref, p_ref_likelihood = calculatePerplexity(text, model2, tokenizer2, gpu=model2.device)

    # Calculate various metrics
    pred["ppl"] = p1
    pred["ppl/Ref_ppl (calibrate PPL to the reference model)"] = p1_likelihood-p_ref_likelihood
    pred["ppl/lowercase_ppl"] = -(np.log(p_lower) / np.log(p1)).item()
    zlib_entropy = len(zlib.compress(bytes(text, 'utf-8')))
    pred["ppl/zlib"] = np.log(p1)/zlib_entropy

    # Calculate min-k probabilities
    for ratio in [0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6]:
        k_length = int(len(all_prob)*ratio)
        topk_prob = np.sort(all_prob)[:k_length]
        pred[f"Min_{ratio*100}% Prob"] = -np.mean(topk_prob).item()

    ex["pred"] = pred
    return ex

def evaluate_data(test_data, model1, model2, tokenizer1, tokenizer2, col_name, modelname1, modelname2):
    """Evaluate data using both models."""
    print(f"all data size: {len(test_data)}")
    all_output = []
    for ex in tqdm(test_data):
        text = ex[col_name]
        new_ex = inference(model1, model2, tokenizer1, tokenizer2, text, ex, modelname1, modelname2)
        all_output.append(new_ex)
    return all_output

## Visualization and Metrics

Functions for plotting results and calculating metrics.

In [8]:
def sweep(score, x):
    """Compute ROC curve and return metrics."""
    fpr, tpr, _ = roc_curve(x, -score)
    acc = np.max(1-(fpr+(1-tpr))/2)
    return fpr, tpr, auc(fpr, tpr), acc

def do_plot(prediction, answers, sweep_fn=sweep, metric='auc', legend="", output_dir=None):
    """Generate ROC curves and calculate metrics."""
    fpr, tpr, auc_score, acc = sweep_fn(np.array(prediction), np.array(answers, dtype=bool))
    low = tpr[np.where(fpr<.05)[0][-1]]
    print('Attack %s   AUC %.4f, Accuracy %.4f, TPR@5%%FPR of %.4f\n'%(legend, auc_score, acc, low))

    metric_text = ''
    if metric == 'auc':
        metric_text = 'auc=%.3f'%auc_score
    elif metric == 'acc':
        metric_text = 'acc=%.3f'%acc

    plt.plot(fpr, tpr, label=legend+metric_text)
    return legend, auc_score, acc, low

def fig_fpr_tpr(all_output, output_dir):
    """Generate and save FPR-TPR plots."""
    print("output_dir", output_dir)
    answers = []
    metric2predictions = defaultdict(list)
    for ex in all_output:
        answers.append(ex["label"])
        for metric in ex["pred"].keys():
            if ("raw" in metric) and ("clf" not in metric):
                continue
            metric2predictions[metric].append(ex["pred"][metric])

    plt.figure(figsize=(4,3))
    with open(f"{output_dir}/auc.txt", "w") as f:
        for metric, predictions in metric2predictions.items():
            legend, auc_score, acc, low = do_plot(predictions, answers, legend=metric, metric='auc', output_dir=output_dir)
            f.write('%s   AUC %.4f, Accuracy %.4f, TPR@0.1%%FPR of %.4f\n'%(legend, auc_score, acc, low))

    plt.semilogx()
    plt.semilogy()
    plt.xlim(1e-5,1)
    plt.ylim(1e-5,1)
    plt.xlabel("False Positive Rate")
    plt.ylabel("True Positive Rate")
    plt.plot([0, 1], [0, 1], ls='--', color='gray')
    plt.subplots_adjust(bottom=.18, left=.18, top=.96, right=.96)
    plt.legend(fontsize=8)
    plt.savefig(f"{output_dir}/auc.png")

## Utility Functions

Helper functions for data loading and manipulation.

In [9]:
def load_jsonl(input_path):
    """Load data from a JSONL file."""
    with open(input_path, 'r') as f:
        data = [json.loads(line) for line in tqdm(f)]
    random.seed(0)
    random.shuffle(data)
    return data

def dump_jsonl(data, path):
    """Save data to a JSONL file."""
    with open(path, 'w') as f:
        for line in tqdm(data):
            f.write(json.dumps(line) + "\n")

def read_jsonl(path):
    """Read data from a JSONL file."""
    with open(path, 'r') as f:
        return [json.loads(line) for line in tqdm(f)]

def convert_huggingface_data_to_list_dic(dataset):
    """Convert HuggingFace dataset to list of dictionaries."""
    all_data = []
    for i in range(len(dataset)):
        ex = dataset[i]
        all_data.append(ex)
    return all_data

## Example Usage

Here's how to use the functions above:

In [10]:
# Example usage
if __name__ == "__main__":
    # Set up output directory
    output_dir = "output"
    Path(output_dir).mkdir(parents=True, exist_ok=True)

    # Load models
    target_model = "gpt-3.5-turbo-instruct"
    ref_model = "huggyllama/llama-7b"
    model1, model2, tokenizer1, tokenizer2 = load_model(target_model, ref_model)

    # Load data
    dataset = load_dataset("swj0419/WikiMIA", split="WikiMIA_length64")
    data = convert_huggingface_data_to_list_dic(dataset)

    # Evaluate
    all_output = evaluate_data(data, model1, model2, tokenizer1, tokenizer2, "input", target_model, ref_model)

    # Plot results
    fig_fpr_tpr(all_output, output_dir)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/594 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/2.28k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.


README.md:   0%|          | 0.00/1.83k [00:00<?, ?B/s]

(…)-00000-of-00001-fff31bd5ed612836.parquet:   0%|          | 0.00/132k [00:00<?, ?B/s]

(…)-00000-of-00001-e984cf40f6c5b556.parquet:   0%|          | 0.00/92.9k [00:00<?, ?B/s]

(…)-00000-of-00001-6d31a92f6d59bcdc.parquet:   0%|          | 0.00/100k [00:00<?, ?B/s]

(…)-00000-of-00001-c337a02056685c1a.parquet:   0%|          | 0.00/140k [00:00<?, ?B/s]

Generating WikiMIA_length128 split:   0%|          | 0/250 [00:00<?, ? examples/s]

Generating WikiMIA_length256 split:   0%|          | 0/82 [00:00<?, ? examples/s]

Generating WikiMIA_length32 split:   0%|          | 0/776 [00:00<?, ? examples/s]

Generating WikiMIA_length64 split:   0%|          | 0/542 [00:00<?, ? examples/s]

all data size: 542


  0%|          | 0/542 [00:01<?, ?it/s]


NameError: name 'openai' is not defined