Code example for paper Self-Prompting Large Language Models for Zero-Shot Open-Domain QA (NAACL 2024).
- python 3.7
- openai==0.25.0
- sentence-transformers==2.2.2
- torch==1.13.1
- transformers==4.28.1
Save your openai api key into ./related_files/openai_api.txt
.
We provide a sample test dataset in ./datasets/samples_nq/test.jsonl
.
We provide the generated data by InstructGPT in ./gpt3_gen_samples/filtered_flattened_topic_aware_gens.json
Do clustering
python compute_sent_emb.py \
--genfile ./gpt3_gen_samples/filtered_flattened_topic_aware_gens.json \
--num_clusters_list [1,2,4,6,8,10] \
--clsuterfile ./gpt3_gen_samples/filtered_clustering_results_sbert_qa.json \
--device cuda:0 \
--way sbert \
--qapair
Do selection
python sbert_retrieve.py \
--genfile ./gpt3_gen_samples/filtered_flattened_topic_aware_gens.json \
--clusterfile ./gpt3_gen_samples/filtered_clustering_results_sbert_qa.json \
--device cuda:0 \
--way sbert \
--qapair \
--model_suffix sbert
python -u new_main.py \
--api_file ./related_files/openai-api.txt \
--model_name instructgpt \
--dataset_name samples_nq \
--dataset_dir ./datasets/samples_nq \
--start_pos 0 \
--end_pos -1 \
--output_files_folder ./outputs/samples_nq \
--num_sample 10 \
--source gpt3gen \
--pick_demo_seed -1 \
--sid -7 \
--instruction_way -2 \
--demo_way 4 \
--with_restrict ans \
--clusters_filename ./gpt3gen/filtered_clustering_results_sbert_qa.json \
--flattened_gen_data ./gpt3gen/filtered_flattened_topic_aware_gens.json \
--clusters_retrieve_filename ./gpt3gen/filtered_clustered_retrieve_res_samples_nq_sbert_qa.json
python collect_merge_delete_eval.py
If you find this code helpful, please kindly cite this:
@article{li2022self,
title={Self-prompting large language models for zero-shot open-domain qa},
author={Li, Junlong and Wang, Jinyuan, and Zhang, Zhuosheng and Zhao, Hai},
journal={arXiv preprint arXiv:2212.08635},
year={2022}
}