This is an extension to the main task-1. It includes how information theoretic concepts (specifically perplexity) and geometric concepts (local intrinsic dimension) can be used to differentiate between AI and human-written text

# V. Information-Theoretic Signatures

Information theory quantifies predictability. AI, is very predictable. Or it should be. That's what we're going to see here...  

Before getting started on the specifics, [this blog](https://kuiper2000.github.io/chaos_and_predictability/week9/week9) explains the basics of information theory, and how it ties into predictability quite well. I enjoyed reading it.

I also want to preface by saying that I am by no means an expert / know much about information theory. I tried to learn a bit for the purposes of this task, but my domain knowledge is limited to that. I also thank the Infosec class and TA's for hinting the basics of this to us, which led me down this rabbit hole.

## Perplexity

[This blog](https://medium.com/nlplanet/two-minutes-nlp-perplexity-explained-with-simple-probabilities-6cdc46884584) explains perplexity really nicely through a simple example.

In general, it is a measure of how well a given sentence is predicted. Or, in the sense of the word itself, how **perplexed** a model is when reading text. Lower perplexity => less perplexed, higher perplexity => more perplexed.

- Because AI generates text by selecting high-probability tokens, the resulting text has statistically low perplexity. It follows the "path of least resistance." However, since our temperature is relatively high (set to 1), the perplexity may be higher than usual AI generated text.
- Human writing - especially famous authors - is replete with choices. Creative metaphors, sudden topic shifts, and idiomatic expressions that statistically defy the model's predictive expectations.


### The maths behind perplexity

This section is inspired by [Fabio Chiusano's Medium post.](https://medium.com/nlplanet/two-minutes-nlp-perplexity-explained-with-simple-probabilities-6cdc46884584)

A language model predicts text **one word at a time**.

For a sentence like:

> **"a red fox."**

the model assigns probabilities like:

- P("a")
- P("red" | "a")
- P("fox" | "a red")
- P("." | "a red fox")

To get the probability of the **whole sentence**, we multiply:

$$
P(W) = P(w_1) \times P(w_2|w_1) \times \dots \times P(w_n|w_1,\dots,w_{n-1})
$$

Example:

$$
P(\text{"a red fox."}) = 0.4 \times 0.27 \times 0.55 \times 0.79 = 0.0469
$$

**Problem: longer sentences always get smaller probabilities**

Multiplying many numbers **smaller than 1** makes the result very small.

So:
- Long sentences → tiny probabilities  
- Short sentences → bigger probabilities  

This makes comparisons **unfair**.

---

To remove the effect of sentence length, we **average** the probabilities using the **geometric mean**.

For a sentence with `n` words:

$$
P_{\text{norm}}(W) = P(W)^{1/n}
$$

Example:

$$
P_{\text{norm}}(\text{"a red fox."}) = 0.0469^{1/4} = 0.465
$$

This means that on average, the model assigns about **46.5% confidence per word**.

---

Perplexity is just the **inverse** of this normalized probability:

$$
\text{Perplexity}(W) = \frac{1}{P_{\text{norm}}(W)}
$$

or  

$$
\text{Perplexity}(W) = \left(\frac{1}{P(W)}\right)^{1/n}
$$

Example:

$$
\text{PP} = \frac{1}{0.465} \approx 2.15
$$

### Interpretation:
- Perplexity ≈ 2 => the model feels like it's choosing between **2 reasonable words** at each step

So we can see that a bad model has high perplexity.

### How we are calculating the Perplexity in code

Perplexity calculation happens in 4 main steps:
1. Tokenization: The text is converted into token-id's which GPT2 understands.
2. Teacher Forcing: The key is labels=input_ids. This basically tells GPT2:
- here is your input sequence
- now predict each subsequent token
- calculate loss

3. Cross-Entropy Loss: GPT2 calculates its loss (how wrong each prediction was).
- For each prediction, it predicts a probability distribution of all possible next tokens.
- The loss is measured as the gap between predicted probability and the actual token.

4. Next, we want to convert loss to probability. `perplexity = exp(loss)`


**IMPORTANT NOTE:** Perplexity is also affected by temperature. As temperature increases, the AI becomes more non-deterministic, and so perplexity increases significantly as well. We use a temperature of 1.0 for creation of class 2 and class 3. This will result in higher than expected, however, it will still mostly be lower than that of humans. Generally, AI achieves human levels of perplexity only at temperatures >1.5. [Source - Peeperkorn et al., 2024](https://arxiv.org/html/2405.00492v1).

In [None]:
from pathlib import Path

# Mount Google Drive
try:
    from google.colab import drive
    drive.mount('/content/drive')
    # Adjust this path to where you stored the dataset in Google Drive
    DATASET_DIR = Path('/content/drive/MyDrive/precog-my-dataset/dataset')
except ImportError:
    DATASET_DIR = Path('../dataset')
    print(f"Running locally")

print(f"Using dataset directory: {DATASET_DIR.resolve()}")

# Verify the path exists
if DATASET_DIR.exists():
    print(f"Dataset directory found!")
else:
    print(f"Dataset directory NOT found at {DATASET_DIR.resolve()}")
    print(f"Please upload your dataset folder to Google Drive and adjust the path above.")

Using dataset directory: /dataset


In [2]:
import torch
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
import numpy as np
import statistics
from tqdm.auto import tqdm

print("Loading GPT-2 model and tokenizer...")
print("(This may take a bit if downloading for the first time)")

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

try:
    # Load model and tokenizer
    model = GPT2LMHeadModel.from_pretrained("gpt2").to(device)
    gpt2_tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
    
    # Set padding token (GPT-2 doesn't have one by default)
    gpt2_tokenizer.pad_token = gpt2_tokenizer.eos_token
    
    model.eval()
    print("Model loaded successfully!")
except Exception as e:
    print(f"Error loading model: {e}")
    print("\nTip: If you're getting timeout errors, try running:")
    print("  export HF_HUB_DOWNLOAD_TIMEOUT=300")
    print("in your terminal before running this cell, or download manually.")

def calculate_perplexity_batch(texts, batch_size=8):
    """Calculate perplexity for a batch of texts"""
    perplexities = []
    
    for i in tqdm(range(0, len(texts), batch_size), desc="Processing batches"):
        batch = texts[i:i+batch_size]
        try:
            # Tokenize batch
            encodings = gpt2_tokenizer(
                batch, 
                return_tensors="pt", 
                truncation=True, 
                max_length=1024,
                padding=True
            )
            input_ids = encodings.input_ids.to(device)
            attention_mask = encodings.attention_mask.to(device)
            
            with torch.no_grad():
                # Calculate per-sample loss
                for j in range(len(batch)):
                    sample_input_ids = input_ids[j:j+1]
                    sample_attention_mask = attention_mask[j:j+1]
                    
                    # Only calculate loss on non-padded tokens
                    # Create labels with -100 for padding tokens (ignored in loss)
                    labels = sample_input_ids.clone()
                    labels[sample_attention_mask == 0] = -100
                    
                    sample_outputs = model(
                        sample_input_ids, 
                        attention_mask=sample_attention_mask, 
                        labels=labels
                    )
                    loss = sample_outputs.loss
                    perplexity = torch.exp(loss).item()
                    perplexities.append(perplexity)
        except Exception as e:
            print(f"Error in batch processing: {e}")
            # Fallback to individual processing for this batch
            for text in batch:
                try:
                    encodings = gpt2_tokenizer(text, return_tensors="pt", truncation=True, max_length=1024)
                    input_ids = encodings.input_ids.to(device)
                    
                    with torch.no_grad():
                        outputs = model(input_ids, labels=input_ids)
                        loss = outputs.loss
                        perplexity = torch.exp(loss).item()
                    perplexities.append(perplexity)
                except:
                    perplexities.append(None)
    
    return perplexities

def read_file_safe(file_path):
    """Safely read a file and return its content"""
    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            text = f.read()
        return text.strip() if text else None
    except Exception as e:
        print(f"Error reading {file_path}: {e}")
        return None

def get_text_files_from_paths(path_list):
    """Get all .txt files from a list of directory paths"""
    txt_files = []
    for path in path_list:
        if path.exists():
            for file in path.glob('*.txt'):
                txt_files.append(file)
        else:
            print(f"Warning: Path does not exist: {path}")
    return txt_files

# Define paths matching the structure from task-1.ipynb
# Class 1: Human-written
class1_paths = [
    DATASET_DIR / 'class1-human-written' / '01-arthur-conan-doyle' / 'extracted_paragraphs',
    DATASET_DIR / 'class1-human-written' / '02-pg-wodehouse' / 'extracted_paragraphs',
    DATASET_DIR / 'class1-human-written' / '03-mark-twain' / 'extracted_paragraphs',
    DATASET_DIR / 'class1-human-written' / '04-william-shakespeare' / 'extracted_paragraphs'
]

# Class 2: AI-written
class2_paths = [
    DATASET_DIR / 'class2-ai-written' / 'ai-generated-paragraphs'
]

# Class 3: AI-mimicry
class3_paths = [
    DATASET_DIR / 'class3-ai-mimicry' / '01-arthur-conan-doyle',
    DATASET_DIR / 'class3-ai-mimicry' / '02-pg-wodehouse',
    DATASET_DIR / 'class3-ai-mimicry' / '03-mark-twain',
    DATASET_DIR / 'class3-ai-mimicry' / '04-william-shakespeare'
]

perplexity_results = {}

for class_name, class_paths in [("Class 1 (Human)", class1_paths), 
                                 ("Class 2 (AI)", class2_paths), 
                                 ("Class 3 (AI Mimicry)", class3_paths)]:
    print(f"\n{'='*70}")
    print(f"Processing {class_name}...")
    print('='*70)
    
    # Get all text files
    text_files = get_text_files_from_paths(class_paths)
    print(f"Found {len(text_files)} text files")
    
    if len(text_files) == 0:
        print(f"No files found. Please check the paths:")
        for path in class_paths:
            print(f"  - {path} (exists: {path.exists()})")
        continue
    
    # Read all files (without multiprocessing - doesn't work well in notebooks)
    print(f"Reading files...")
    all_texts = []
    for file_path in tqdm(text_files, desc="Reading files"):
        text = read_file_safe(file_path)
        if text:
            all_texts.append(text)
    
    print(f"Read {len(all_texts)} valid text files")
    
    if all_texts:
        print(f"Calculating perplexities in batches of 8...")
        # Process in batches of 8
        perplexities = calculate_perplexity_batch(all_texts, batch_size=8)
        
        # Filter out None values
        perplexities = [p for p in perplexities if p is not None]
        
        if perplexities:
            mean_perplexity = statistics.mean(perplexities)
            median_perplexity = statistics.median(perplexities)
            
            perplexity_results[class_name] = {
                'mean': mean_perplexity,
                'median': median_perplexity,
                'count': len(perplexities)
            }
            
            print(f"Mean Perplexity: {mean_perplexity:.2f}")
            print(f"Median Perplexity: {median_perplexity:.2f}")

print("\n" + "="*70)
print("PERPLEXITY SUMMARY")
print("="*70)
print(f"{'Class':<25} {'Mean':<15} {'Median':<15} {'Files':<10}")
print("-"*70)
for class_name, stats in perplexity_results.items():
    print(f"{class_name:<25} {stats['mean']:<15.2f} {stats['median']:<15.2f} {stats['count']:<10}")
print("="*70)

Loading GPT-2 model and tokenizer...
(This may take a bit if downloading for the first time)
Using device: cuda


Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/148 [00:00<?, ?it/s]

GPT2LMHeadModel LOAD REPORT from: gpt2
Key                  | Status     |  | 
---------------------+------------+--+-
h.{0...11}.attn.bias | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Model loaded successfully!

Using 1 worker processes for file reading

Processing Class 1 (Human)...
Found 0 text files
Reading files in parallel using 1 workers...
Read 0 valid text files

Processing Class 2 (AI)...
Found 0 text files
Reading files in parallel using 1 workers...
Read 0 valid text files

Processing Class 3 (AI Mimicry)...
Found 0 text files
Reading files in parallel using 1 workers...
Read 0 valid text files

PERPLEXITY SUMMARY
Class                     Mean            Median          Files     
----------------------------------------------------------------------
