# Perplexity ratio score: Kullback–Leibler divergence

Plan here is to take our sampling distributions of perplexity ratio (PR) scores for human and synthetic text and use them to generate a function that takes a perplexity ratio score and converts it into a Kullback-Leibler divergence (KLD) score. See the figure below from the [Wikipedia article on KLD](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence).

Workflow is as follows:
1. Get kernel density estimate of PR score distributions for human and synthetic text fragments in the training data.
2. Calculated the KLD between the human and synthetic PR score distributions.
3. Get a get kernel density estimate of the KLD.
4. Use the probability density function from the KLD kernel density estimate to calculate a KLD score for each text fragment in the training and testing data.
5. Add the KLD score as a new feature.

The above will be done individually for each fragment length bin and the combined data. This way the KLD score feature in each bin will capture the PR score distribution for text fragments in that specific length regime, rather that for the whole dataset.

One additional note - after some initial testing - the 4th step above is very slow due to the sheer number of points that need to be evaluated. Therefore, the workflow will be split up as follows to more efficiently utilize compute resources.

**Version 2 workflow:**

**Part I**: parallelized over the length bins
1. Get kernel density estimate of score distributions for human and synthetic text fragments from the training data.
2. Calculate the KLD between the human and synthetic score distributions.
3. Get a kernel density estimate of the KLD.
4. Serialize the KLD's kernel density estimate to disk.

**Park II**: Parallelized over evaluation points in each bin, bins processed serially.
1. Load KLD kernel density estimate.
2. Calculate KDE value for each score in bin.
3. Add scores as new feature in bin dataset.


In [1]:
from IPython.display import Image
Image(url = 'https://raw.githubusercontent.com/gperdrizet/llm_detector/benchmarking/benchmarking/notebooks/images/KL-Gauss-Example.png')

## 1. Run set-up

In [2]:
# Change working directory to parent so we can import as we would from main.py
print(f'Working directory: ', end = '')
%cd ..

# Do the imports
import h5py
import pickle
import pandas as pd
import matplotlib.pyplot as plt
import configuration as config
import functions.helper as helper_funcs
import functions.kullback_leibler_divergence as kld_funcs

Working directory: /mnt/arkk/llm_detector/classifier


In [3]:
# The dataset we want to bin - omit the file extension, it will be
# added appropriately for the input and output files
dataset_name = 'falcon-7b_scores_v2_10-300_words'

# Input file path
input_file = f'{config.DATA_PATH}/{dataset_name}.h5'

## 2. Get Kullback-Leibler divergence kernel density estimates

In [4]:
# Run part I of the workflow in parallel over the bins - gets kernel density estimate
# of Kullback-Leibler divergence between the feature's distribution for human and
# synthetic text fragments in the training data, saves to disk for later use.

kld_funcs.get_kullback_leibler_KDEs(
    feature_name = 'Perplexity ratio score',
    hdf5_file = input_file,
    logfile_name = 'perplexity_ratio_KLD_KDE.log'
)

Will log to: /mnt/arkk/llm_detector/classifier/logs/perplexity_ratio_KLD_KDE.log
bin_100 finished: True
bin_125 finished: True
bin_150 finished: True
bin_175 finished: True
bin_200 finished: True
bin_225 finished: True
bin_250 finished: True
bin_275 finished: True
bin_300 finished: True
bin_50 finished: True
bin_75 finished: True
combined finished: True


## 3. Evaluate perplexity ratio scores and add feature

In [5]:
# Run part II of the workflow described above - uses previously stored
# kernel density estimates of the Kullback-Leibler divergence between
# the perplexity ratio scores for human and synthetic text fragments
# in each bin. Loads the data from each bin sequentially and evaluates
# each text fragment's perplexity ratio score. Parallelizes evaluation
# over data in the bin. Adds result back to hdf5 as new feature column.

kld_funcs.make_kullback_leibler_feature(
    feature_name = 'Perplexity ratio score',
    hdf5_file = input_file,
    logfile_name = 'perplexity_ratio_KLD_feature_addition.log'
)

EOFError: Ran out of input

## 4. Check results

In [None]:
# Get bins from hdf5 metadata
data_lake = h5py.File(input_file, 'r')
bins = dict(data_lake.attrs.items())
bin_ids = list(bins.keys())
data_lake.close()

# Open a connection to the hdf5 dataset via PyTables with Pandas so we can
# load the data from each bin as a dataframe
data_lake = pd.HDFStore(input_file)

# Now we want to make 3 plots for each bin: the distributions of perplexity ratio score,
# the Kullback-Leibler divergence kernel density estimate, and the distribution of 
# Kullback-Leibler score values

feature_name = 'Perplexity ratio score'

# Set up a figure for n bins x 3 plots
fig, axs = plt.subplots(
    len(bin_ids),
    3,
    figsize = (9, (3 * len(bin_ids))),
    gridspec_kw = {'wspace':0.4, 'hspace':0.4},
    #sharex='col'
)

# Loop on the bins to draw each plot
for i, bin_id in enumerate(bin_ids):

    # Load bin data
    bin_training_features_df = data_lake[f'training/{bin_id}/features']

    # Get human and synthetic perplexity ratio score
    human_feature = bin_training_features_df[feature_name][bin_training_features_df['Source'] == 'human']
    synthetic_feature = bin_training_features_df[feature_name][bin_training_features_df['Source'] == 'synthetic']

    # Draw histograms for human and synthetic perplexity ratio scores in the first plot
    axs[i, 0].set_title(f'{bin_id} {feature_name.lower()}', fontsize = 'medium')
    axs[i, 0].set_xlabel('Score')
    axs[i, 0].set_ylabel('Count')
    axs[i, 0].hist(human_feature, bins = 50, alpha = 0.5, label = 'human')
    axs[i, 0].hist(synthetic_feature, bins = 50, alpha = 0.5, label = 'synthetic')
    axs[i, 0].legend(loc = 'upper left', fontsize = 'x-small')

    # Turn axis tick labels back on for shared x axis
    axs[i, 0].tick_params(labelbottom = True)

    # For the second plot load and evaluate the Kullback-Leibler divergence kernel density estimate
        
    # Make the filename for the stored Kullback-Leibler divergence kernel density estimate for this bin
    formatted_feature_name = feature_name.replace(' ', '_').lower()
    input_filename = f'{config.MODELS_PATH}/{formatted_feature_name}_KLD_KDE_{bin_id}.pkl'

    # Load the KLD KDE
    with open(input_filename, 'rb') as input_file:
        kld_kde = pickle.load(input_file)

    # Make 100 evaluation points across the data's range
    x = helper_funcs.make_padded_range(bin_training_features_df[feature_name])

    # Evaluate
    y = kld_kde(x)

    # Plot
    axs[i, 1].set_title(f'{bin_id} Kullback-Leibler KDE', fontsize = 'medium')
    axs[i, 1].set_xlabel('Score')
    axs[i, 1].set_ylabel('KDE value')
    axs[i, 1].plot(x, y)

    # Turn axis tick labels back on for shared x axis
    axs[i, 1].tick_params(labelbottom = True)

    # For the third plot make a histogram of the KLD KDE values in the bin

    # Get human and KLD scores
    human_feature = bin_training_features_df[f'{feature_name} Kullback-Leibler divergence'][bin_training_features_df['Source'] == 'human']
    synthetic_feature = bin_training_features_df[f'{feature_name} Kullback-Leibler divergence'][bin_training_features_df['Source'] == 'synthetic']

    # Draw histograms for human and synthetic Kullback-Leibler divergence scores
    axs[i, 2].set_title(f'{bin_id} Kullback-Leibler scores', fontsize = 'medium')
    axs[i, 2].set_xlabel('Score')
    axs[i, 2].set_ylabel('Count')
    axs[i, 2].hist(human_feature, bins = 50, alpha = 0.5, label = 'human')
    axs[i, 2].hist(synthetic_feature, bins = 50, alpha = 0.5, label = 'synthetic')
    axs[i, 2].legend(loc = 'upper left', fontsize = 'x-small')

    # Turn axis tick labels back on for shared x axis
    axs[i, 2].tick_params(labelbottom = True)

plt.show()

data_lake.close()