# Finalized classifier v2.0

Now all that's left to be done is wire up some routing logic between the models so we can easily do inference on incoming text. Workflow will looks something like this:

1. **Stage I features**: calculate perplexity ratio and TF-IDF based features for input text.
2. **Stage I classifier**: send feature vector to correct stage I classifiers based on text length.
3. **Stage II features**: create new feature vector for with stage I class probabilities, perplexity ration and TF-IDF features.
4. **Stage II classifier**: send new feature vector to correct stage II classifier for final prediction.

Each step requires assets from the feature engineering and classifier training phases. Let's make a checklist to help make sure we have everything in place.

1. **Stage I features**:
    - Perplexity ratio score: tokenizer + reader and writer models.
    - Perplexity ratio Kullback-Leibler score: perplexity ratio Kullback-Leibler divergence kernel density estimate for each bin.
    - TF-IDF score: human and synthetic TF-IDF look-up tables for each bin.
    - TF-IDF Kullback-Leibler score: TF-IDF Kullback-Leibler divergence kernel density estimate for each bin.

2. **Stage I classifier**:
    - Trained XGBoost classifier model for each bin.

3. **Stage II features**
    - Perplexity ratio Kullback-Leibler score: perplexity ratio Kullback-Leibler divergence kernel density estimate for each bin.
    - TF-IDF score: human and synthetic TF-IDF look-up tables for each bin.
    - TF-IDF Kullback-Leibler score: TF-IDF Kullback-Leibler divergence kernel density estimate for each bin.

4. **Stage II classifier**
    - Trained XGBoost classifier model for each bin.

## 1. Run setup

In [1]:
# Change working directory to parent so we can import as we would from main.py
print(f'Working directory: ', end = '')
%cd ..

# PyPI imports
import h5py
import pickle
import pandas as pd

# Internal imports
import configuration as config

Working directory: /mnt/arkk/llm_detector/classifier


Load the stage I training data and take just the text and labels. We will be treating each fragment as if it we submitted by a user and therefore all we will have is the text string. We will use the labels later to check the model's performance.

In [6]:
# Load the stage I training data and take just the text and labels

# Stage I dataset
dataset_name='falcon-7b_scores_v2_10-300_words_stage_I'

# Input file path
input_file=f'{config.DATA_PATH}/{dataset_name}.h5'

# Open the new hdf5 file with pandas so we can work with dataframes
data_lake=pd.HDFStore(input_file)

# Get the features and extract just the text
training_df=data_lake['training/combined/features']
texts=training_df['String'].to_list()
print(f'Have {len(texts)} training text fragments')

# Get the corresponding labels
labels=data_lake['training/combined/labels'].to_list()
print(f'Have {len(labels)} training text fragment labels')

# Close the connection to the hdf5 file
data_lake.close()

training_df.head()

Have 39042 training text fragments
Have 39042 training text fragment labels


Unnamed: 0,Fragment length (words),Fragment length (tokens),Source,String,Perplexity,Cross-perplexity,Perplexity ratio score,Perplexity ratio score Kullback-Leibler divergence,Human TF-IDF,Synthetic TF-IDF,TF-IDF score,TF-IDF score Kullback-Leibler divergence
0,32,46,human,It’s a disease people just don’t know about. T...,3.127,2.832031,1.104138,0.47319,-3.385168,-3.422211,-0.252168,0.071449
1,27,45,human,Owens(c) vs Roman Reigns – WWE Universal Champ...,2.912,3.027344,0.961935,0.120746,-2.634773,-2.650256,-0.081826,0.067688
2,268,371,synthetic,chemical or pharmaceutical processes.\nhowever...,2.193,2.675781,0.819708,3.325138,-2.8623,-2.877875,-0.089403,0.067875
3,112,131,human,unstable area near the Iraq and Syria border. ...,2.73,2.695312,1.013043,0.203559,-3.475644,-3.331192,0.983256,0.037913
4,32,44,synthetic,"ins , is encoded by a gene family with at leas...",3.025,3.693359,0.819143,3.306112,-2.650665,-2.901612,-1.393327,0.069538
