# Perplexity ratio score: Kullback–Leibler divergence

In [1]:
# Change working directory to parent so we can import as we would from main.py
%cd ..

import numpy as np
from IPython.display import Image

import configuration as config
import functions.kullback_leibler_divergence as kld_funcs

/mnt/arkk/llm_detector/classifier


Plan here is to take our sampling distributions of perplexity ratio (PR) scores for human and synthetic text and use them to generate a function that takes a perplexity ratio score and converts it into a Kullback-Leibler divergence (KLD) score. See the figure below from the [Wikipedia article on KLD](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence).

Workflow is as follows:
1. Get kernel density estimate of PR score distribution for human and synthetic text fragments in training data.
2. Calculated KLD between the human and synthetic PR score distributions.
3. Get get kernel density estimate of KLD.
4. Use probability density function of KLD kernel density estimate to calculate KLD score for each text fragment in the training and testing data.
5. Add the KLD score as a new feature.

The above will be done individually for each fragment length bin and the combined data. This way the KLD score feature in each bin will capture the PR score distribution for text fragments in that specific length regime, rather that for the whole dataset.

In [2]:
Image(url = 'https://raw.githubusercontent.com/gperdrizet/llm_detector/benchmarking/benchmarking/notebooks/images/KL-Gauss-Example.png')

In [4]:
# The dataset we want to bin - omit the file extension, it will be 
# added appropriately for the input and output files
dataset_name = 'falcon-7b_scores_v2_10-1000_words'

# Input file path
input_file = f'{config.DATA_PATH}/{dataset_name}.h5'

# Option to sample 10% of the data for rapid testing and development
sample = False

# The length bins
bins = {
    'combined': [0, np.inf],
    'bin_100': [1, 100],
    'bin_150': [51, 150],
    'bin_200': [101, 200],
    'bin_250': [151, 250],
    'bin_300': [201, 300],
    'bin_350': [251, 350],
    'bin_400': [301, 400],
    'bin_450': [351, 450],
    'bin_500': [401, 500],
    'bin_600': [451, 600]
}


# Run the Kullback-Leibler score calculation on the
# perplexity ratio score

kld_funcs.kullback_leibler_score(
        feature_name = 'Perplexity ratio score' ,
        hdf5_file = input_file,
        padding = 0.1,
        sample_frequency = 0.001,
        score_sample = sample
)


Worker 0 - 8423 fragments in bin_100
Worker 1 - 7968 fragments in bin_150
Worker 2 - 7000 fragments in bin_200
Worker 3 - 6095 fragments in bin_250

  kld_value = i * log2(i/j)



Worker 4 - 5283 fragments in bin_300
Worker 5 - 4193 fragments in bin_350
Worker 6 - 2582 fragments in bin_400
Worker 7 - 1162 fragments in bin_450
Worker 8 - 441 fragments in bin_500
Worker 9 - 327 fragments in bin_600
Worker 0 - adding Kullback-Leibler score to training features
Worker 10 - 23937 fragments in combined
Worker 1 - adding Kullback-Leibler score to training features
Worker 2 - adding Kullback-Leibler score to training features
Worker 3 - adding Kullback-Leibler score to training features
Worker 4 - adding Kullback-Leibler score to training features
Worker 5 - adding Kullback-Leibler score to training features
Worker 6 - adding Kullback-Leibler score to training features
Worker 7 - adding Kullback-Leibler score to training features
Worker 9 - adding Kullback-Leibler score to training features
Worker 8 - adding Kullback-Leibler score to training features
Worker 10 - adding Kullback-Leibler score to training features
Worker 9 - adding Kullback-Leibler score to testing feat