Skip to content

genglinliu/UnknownBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Setup

  • install dependencies pip install -r requirements.txt

  • set pythonpath export PYTHONPATH="${PYTHONPATH}:/path/to/LLM-hallucination/"

i.e. export PYTHONPATH="${PYTHONPATH}:/home/genglin2/LLM-hallucination"

  • set up your own OpenAI API, Google BARD API, Replicate API source api_key_config.sh

datasets

Tasks: FalseQA, NEC, RefuNQ (each task has two modes: answerable and unanswerable)

data_path = f"/data/{task_name}/{task_name}_{mode}.json"  

Results are stored at experiment_outputs/Logits Figures are saved at experiment_outputs/figures

Setup for running the logit experiments on LLama-2

  • install dependencies pip install -r requirements.txt
  • run python src/run_llama_all.py

Quick Start

generate LLM responses on FalseQA / NEC / RefuNQ

Our work contains three tasks:

  • FalseQA: answering questions that may contain false premises
  • Non-existent Concepts (NEC): explain questions that might involve nonexistent concepts
  • RefuNQ: refusal-inducing NaturalQuestions

To run different models on these tasks, we have

python src/run_*.py --prompt baseline

  • For claude, you may need to run ulimit -n 2048 to prevent a potential too many open files error.

Evaluation

To evaluate the outputs of the LLMs and visualize the analysis, see the notebooks in /scripts.

Prompts

The prompts used in this repo can be found in the prompts/ folder.

Citation

@misc{liu2024examining,
      title={Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge}, 
      author={Genglin Liu and Xingyao Wang and Lifan Yuan and Yangyi Chen and Hao Peng},
      year={2024},
      eprint={2311.09731},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

Repo for paper: Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published