# Hey, That's My Data! - A Jupyter Notebook for CatShift Analysis

This notebook provides a conceptual, step-by-step implementation of the **CatShift** framework, based on the paper: *Hey, That's My Data! Label-Only Dataset Inference in Large Language Models* ([arxiv.org/pdf/2506.06057](https://arxiv.org/pdf/2506.06057)).

The goal of CatShift is to determine if a Large Language Model (LLM) was trained on a specific "suspect" dataset, using only its public-facing API (label-only) and its fine-tuning capabilities.

### How to Use This Notebook:
1.  **Replace Placeholders:** The functions in the *Mock/Placeholder Functions* section need to be implemented with real code that calls your chosen LLM's API (e.g., OpenAI, Anthropic) or a local library (e.g., Hugging Face Transformers).
2.  **Load Real Data:** Replace the mock dictionaries in the *Data Preparation* section with your actual suspect and validation datasets.
3.  **Manage API Keys and Costs:** Be mindful that fine-tuning commercial models via an API can incur costs.

This framework provides a strong foundation for conducting your own CatShift analysis to investigate potential dataset usage in large language models.

## 1. Setup and Imports

First, we import the necessary libraries. `numpy` is used for numerical operations, `scipy.stats.ks_2samp` for the statistical test, and `matplotlib` for visualizing the results.

In [None]:
import numpy as np
from scipy.stats import ks_2samp
import matplotlib.pyplot as plt

# For a real implementation, you would uncomment and use libraries like these:
# from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
# from sentence_transformers import SentenceTransformer, util
# import openai

## 2. Mock/Placeholder Functions

In a real-world scenario, these functions would interact with a live LLM. For this demonstration, they are placeholders that simulate the behavior described in the paper.

In [None]:
def get_model_and_scorer():
    """
    Placeholder for loading a pretrained LLM and a sentence similarity model.
    """
    print("INFO: Loading base LLM and similarity scorer (conceptual)...")
    mock_model = "base_model"
    mock_similarity_scorer = "scorer"
    print("INFO: Models loaded.")
    return mock_model, mock_similarity_scorer

def get_completions(model_id, dataset):
    """
    Placeholder for generating top-1 text completions from a given model.
    """
    print(f"INFO: Generating completions from model '{model_id}'...")
    completions = []
    for i, item in enumerate(dataset):
        prompt = item['prompt']
        if "base" in model_id:
            completion = f"{prompt} ...and so on."
        elif "suspect" in model_id and "old book" in prompt:
            completion = f"{prompt} The quick brown fox jumps over the lazy dog."
        elif "validation" in model_id and "new article" in prompt:
            completion = f"{prompt} The latest discovery is astounding."
        else:
            completion = f"{prompt} This is a new completion."
        completions.append(completion)
    return completions

def fine_tune_model(base_model, training_dataset, new_model_id):
    """
    Placeholder for the fine-tuning process.
    """
    print(f"INFO: Fine-tuning '{base_model}' on {len(training_dataset)} samples to create '{new_model_id}'...")
    print(f"INFO: Fine-tuning complete. New model ID is '{new_model_id}'.")
    return new_model_id

def compute_similarity_scores(completions_before, completions_after, scorer):
    """
    Placeholder for computing similarity between two lists of text.
    """
    print("INFO: Computing similarity scores...")
    scores = [np.random.uniform(low=0.7, high=0.95) for _ in completions_before]
    # Heuristically simulate a larger output shift (lower similarity) for member data
    if "The quick brown fox" in completions_after[0]: 
      scores = [s - np.random.uniform(0.3, 0.5) for s in scores]
    return scores

## 3. The CatShift Analysis Pipeline

This function executes the core logic of CatShift: generate, fine-tune, generate again, and compare.

In [None]:
def run_catshift_analysis(base_model, scorer, suspect_dataset, validation_dataset):
    """
    Executes the full CatShift pipeline on a suspect dataset.
    """
    print("\n--- Starting CatShift Analysis for SUSPECT Dataset ---")
    completions_before_suspect = get_completions(base_model, suspect_dataset['test'])
    finetuned_model_suspect = fine_tune_model(base_model, suspect_dataset['train'], 'finetuned_model_suspect')
    completions_after_suspect = get_completions(finetuned_model_suspect, suspect_dataset['test'])
    suspect_scores = compute_similarity_scores(completions_before_suspect, completions_after_suspect, scorer)
    print(f"SUSPECT SET: Average similarity score = {np.mean(suspect_scores):.4f}")

    print("\n--- Establishing Baseline using VALIDATION Dataset ---")
    completions_before_validation = get_completions(base_model, validation_dataset['test'])
    finetuned_model_validation = fine_tune_model(base_model, validation_dataset['train'], 'finetuned_model_validation')
    completions_after_validation = get_completions(finetuned_model_validation, validation_dataset['test'])
    validation_scores = compute_similarity_scores(completions_before_validation, completions_after_validation, scorer)
    print(f"VALIDATION SET: Average similarity score = {np.mean(validation_scores):.4f}")

    print("\n--- HYPOTHESIS TESTING ---")
    statistic, p_value = ks_2samp(suspect_scores, validation_scores)
    print(f"KS Statistic: {statistic:.4f}")
    print(f"P-value: {p_value:.6f}")
    
    return p_value, suspect_scores, validation_scores

## 4. Data Preparation

Here, we set up our mock datasets. 
- The `suspect_member_dataset` simulates data that the LLM *was* trained on.
- The `known_non_member_dataset` simulates data that the LLM has *never* seen.

In [None]:
# This dataset simulates data that the LLM *was* trained on.
# The fine-tuning process will cause a "recovery" of forgotten knowledge,
# leading to a large output shift (and thus low similarity scores).
suspect_member_dataset = {
    'train': [{'prompt': f'From the old book, chapter {i}:'} for i in range(50)],
    'test': [{'prompt': f'From the old book, chapter {i+50}:'} for i in range(20)]
}

# This dataset simulates data that the LLM was *not* trained on.
# The fine-tuning process introduces entirely new knowledge,
# leading to a smaller, more moderate output shift.
known_non_member_dataset = {
    'train': [{'prompt': f'From the new article, section {i}:'} for i in range(50)],
    'test': [{'prompt': f'From the new article, section {i+50}:'} for i in range(20)]
}

## 5. Execution and Interpretation

Now, we run the full analysis and interpret the results based on the p-value.

In [None]:
# Load the models (conceptually)
base_llm, similarity_scorer = get_model_and_scorer()

# Run the analysis
p_value_result, suspect_scores, validation_scores = run_catshift_analysis(
    base_llm, 
    similarity_scorer, 
    suspect_member_dataset, 
    known_non_member_dataset
)

# --- INTERPRETATION ---
# The paper uses a significance level (alpha) of 0.1
alpha = 0.1
print("\n--- FINAL CONCLUSION ---")
if p_value_result < alpha:
    print(f"Result: P-value ({p_value_result:.6f}) is less than alpha ({alpha}).")
    print("Conclusion: We REJECT the null hypothesis.")
    print("This suggests the suspect dataset's output shift is SIGNIFICANTLY DIFFERENT from the non-member baseline.")
    print("Therefore, the suspect dataset is LIKELY a MEMBER of the original training data.")
else:
    print(f"Result: P-value ({p_value_result:.6f}) is not less than alpha ({alpha}).")
    print("Conclusion: We FAIL to reject the null hypothesis.")
    print("There is no strong statistical evidence to distinguish the suspect dataset from a non-member.")
    print("Therefore, the suspect dataset is LIKELY NOT a member of the original training data.")

## 6. Visualization

A key part of data analysis is visualization. Let's plot histograms of the two similarity score distributions. We expect to see a clear separation between the two.

- **Validation Scores (Blue)**: Should be higher, indicating less change after fine-tuning.
- **Suspect Scores (Red)**: Should be lower, indicating a significant change (a "shift") as the model "recovers" forgotten knowledge.

In [None]:
plt.figure(figsize=(10, 6))
plt.hist(validation_scores, bins=15, alpha=0.7, label='Validation (Non-Member) Scores', color='blue', density=True)
plt.hist(suspect_scores, bins=15, alpha=0.7, label='Suspect (Member) Scores', color='red', density=True)

plt.title('Distribution of Similarity Scores Before vs. After Fine-Tuning')
plt.xlabel('Similarity Score (Lower score = bigger output shift)')
plt.ylabel('Density')
plt.legend()
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()