# Perplexity ratio score: Kullback–Leibler divergence

Plan here is to take our sampling distributions of perplexity ratio (PR) scores for human and synthetic text and use them to generate a function that takes a perplexity ratio score and converts it into a Kullback-Leibler divergence (KLD) score. See the figure below from the [Wikipedia article on KLD](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence).

Workflow is as follows:
1. Get kernel density estimate of PR score distributions for human and synthetic text fragments in the training data.
2. Calculated the KLD between the human and synthetic PR score distributions.
3. Get a get kernel density estimate of the KLD.
4. Use the probability density function from the KLD kernel density estimate to calculate a KLD score for each text fragment in the training and testing data.
5. Add the KLD score as a new feature.

The above will be done individually for each fragment length bin and the combined data. This way the KLD score feature in each bin will capture the PR score distribution for text fragments in that specific length regime, rather that for the whole dataset.

In [1]:
from IPython.display import Image
Image(url = 'https://raw.githubusercontent.com/gperdrizet/llm_detector/benchmarking/benchmarking/notebooks/images/KL-Gauss-Example.png')

## 1. Run set-up

In [2]:
# Change working directory to parent so we can import as we would from main.py
print(f'Working directory: ', end = '')
%cd ..

# Do the imports
import configuration as config
import functions.kullback_leibler_divergence as kld_funcs

Working directory: /mnt/arkk/llm_detector/classifier


In [3]:
# The dataset we want to bin - omit the file extension, it will be
# added appropriately for the input and output files
dataset_name = 'falcon-7b_scores_v2_10-300_words'

# Input file path
input_file = f'{config.DATA_PATH}/{dataset_name}.h5'

## 2. Perplexity ratio score Kullback-Leibler divergence

In [4]:
# Run the Kullback-Leibler score calculation on the perplexity ratio score
kld_funcs.kullback_leibler_score(
        feature_name = 'Perplexity ratio score',
        hdf5_file = input_file,
        padding = 0.1,
        sample_frequency = 0.001,
        score_sample = False
)

Will log to: /mnt/arkk/llm_detector/classifier/logs/kld.log
