The official repository of "On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing". You can find another version of our artifact at Zenodo. An early version of this project can be found here.
- Disk: At least 10GB to store the models and datasets. An extra 52GB for each 50,000 samples of features (~2.2 TB in total for ./GPABench2).
- GPU: For CheckGPT: 6 GB Memory (for training) or 2 GB Memory (for inference). Need to adjust the batch size accordingly. For other benchmarked models in Sec 2.2: 11 GB Memory.
Run
pip install -r requirements.txt
We recommend using a virtual environment, docker, or VM to avoid version conflicts. For example, to set up a virtual environment using virtualenv, install it using pip:
pip install virtualenv
Navigate to a desired folder, create a virtual environment, activate it, and install our list of packages as provided:
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
There are two versions of datasets:
- GPABenchmark.
- GPABench2.
We mainly use GPABench2 in our CCS 2024 submission.
The files are separated into several parts for upload convenience. Please download and extract all parts into the same host folder (e.g., ./artifact_checkgpt/).
- CheckGPT.zip: the main folder of the CheckGPT code (./artifact_checkgpt/CheckGPT). embeddings is the folder for saving features. exp is the folder for saving results under different experiment IDs.
- CheckGPT_presaved_files.zip: pre-trained models and saved experiments (./artifact_checkgpt/CheckGPT_presaved_files).
- CS.zip, PHX.zip, HSS.zip: GPABench2 datasets. Please download and extract them into a newly created folder, "GPABench2" (./artifact_checkgpt/GPABench2).
- GPABenchmark.zip: GPABenchmark datasets (./artifact_checkgpt/GPABenchmark).
- scripts.zip: scripts for reproducing the results in the paper. Extract them into the main folder (./artifact_checkgpt/CheckGPT).
- README.md: this file.
GPABenchmark:
- GPT example: ./GPABenchmark/CS_Task1/gpt.json (Computer Science, Task 1 GPT-WRI)
- HUM example: ./GPABenchmark/CS_Task1/hum.json
- Data structure: {PaperID}: {Abstract}
GPABench2:
- GPT example: ./GPABench2/PHX/gpt_task3_prompt4.json (Physics, Task 3 GPT-POL, Prompt 4)
- HUM example: ./GPABench2/PHX/ground.json
- Data structure: {Index}: { {"id"}: {PaperID}, {"title"}: {PaperTitle}, {"abstract"}: {Abstract} }
For GPABench2, download CS, PHX, and HSS, and put them under a created folder "./GPABench2". For HUM Task 2 GPT-CPL, use the second half of each text.
Download these files and put them under CheckGPT_presaved_files:
- Other Academic Writing Purposes (Section 5.4) (Available under CheckGPT_presaved_files/Additional_data/Other_purpose)
- Classic NLP Datasets (Section 5.4) (Available under CheckGPT_presaved_files/Additional_data/Classic_NLP)
- Advanced Prompt Engineering (Section 5.7) (Available under CheckGPT_presaved_files/Additional_data/Prompt_engineering)
- Sanitized GPT Output (Section 5.10) (Available under CheckGPT_presaved_files/Additional_data/Sanitized)
- GPT4 (Section 5.6 ) (Available under CheckGPT_presaved_files/Additional_data/GPT4)
Download. Place them under CheckGPT_presaved_files.
- Models trained on GPABenchmark (v1) can be accessed at Pretrained_models.
- Experiments in Section 5.2 and 5.3, including pre-trained models and training logs, can be found at saved_experiments/basic.
Run
pip install -r requirements.txt
To train or reuse the text, please extract features from the text beforehand (For development only. Not need for testing).
To turn text into features, use features.py.
python features.py {DOMAIN} {TASK} {PROMPT}
Features will be saved in the folder named embeddings. ATTENTION: Each file of saved features for 50,000 samples will be approximately 52GB.
For example, to fetch the features of GPT data in CS on Task 1 Prompt 3:
python features.py CS 1 3 --gpt 1
The saved features are named in this format: ./embeddings/CS/gpt_CS_task1_prompt3.h5
Likely, to fefetch the features of HUM data in CS on Task 1 Prompt 3:
python features.py CS 1 3 --gpt 0
The saved features are named in this format: ./embeddings/CS/ground_CS.h5 (Same for Task 1 and 3)
For Task 2 GPT-CPL, the ground data will be cut into halves. Only the second halves will be processed. An example of saved names is ground_CS_task2.h5.
Or you can name the desired sample size. For example, to get the first 1000 samples:
python features.py CS 1 3 --gpt 0 --number 1000
The saved features are named in this format: ./embeddings/CS_1000/gpt_CS_task1_prompt3.h5
To evaluate any single piece of input text, run and follow instructions:
python run_input.py
To directly evaluate any json data file, run:
python validate_text.py {FILE_PATH} {MODEL_PATH} {IS_GPT_OR_NOT}
For example, if you want to test pre-trained model ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth on ../GPABench2/CS/gpt_task3_prompt2.json or ../GPABench2/CS/ground.json:
python validate_text.py ../GPABench2/CS/gpt_task3_prompt2.json ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth 1
or
python validate_text.py ../GPABench2/CS/ground.json ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth 0
To run it on special dataset like GPT4, run
python validate_text.py ../CheckGPT_presaved_files/Additional_data/GPT4/chatgpt_cs_task3.json ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth 1
python dnn.py {DOMAIN} {TASK} {PROMPT} {EXP_ID} --pretrain 1 --test 1 --saved-model {MODEL_PATH}
To test the pretrained model ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth on pre-save features ./embeddings/CS/gpt_task3_prompt2.h5 and ./embeddings/CS/ground.h5, run
python dnn.py CS 3 2 12345 --pretrain 1 --test 1 --saved-model ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth
For features of small test data with 1000 samples:
python dnn.py CS_1000 3 2 12346 --pretrain 1 --test 1 --saved-model ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth
python dnn.py {DOMAIN} {TASK} {PROMPT} {EXP_ID}
To train a model from scratch on CS Task 3 Prompt 2:
python dnn.py CS 3 2 12347
Ablation Study: use --modelid to use different model (0 for CheckGPT, 1 for RCH, 2 for MLP-Pool, 3 for CNN):
python dnn.py CS 3 2 12347 --modelid 1
python dnn.py CS 3 2 12347 --modelid 2
python dnn.py CS 3 2 12347 --modelid 3
python dnn.py {DOMAIN} {TASK} {PROMPT} {EXP_ID} --trans 1 --mdomain --mtask --mprompt --mid
At the beginning, it will also provide cross-validation (testing) result.
For example, to transfer from CS_Task3_Prompt1 to HSS_Task1_Prompt2, run:
python dnn.py HSS 1 2 12347 --trans 1 --mdomain CS --mtask 3 --mprompt 1 --mid 12346
python dnn.py HSS_500 1 2 12347 --trans 1 --mdomain CS_500 --mtask 3 --mprompt 1 --mid 12346
--mid indicates the pre-trained model in previous experiments (e.g., 12346 as we did above).