Skip to content

jiangjiechen/uncommongen

Repository files navigation

Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge

This repo contains the experimental code and resources used in our ACL 2023 paper: Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge.

Install Requirements

export PJ_HOME=${YOUR_WORKING_DIR}/uncommongen/
export OPENAI_API_KEY=${YOUR_API_KEY}

pip3 install -r requirements.txt

Download Datasets

The CSK-PN dataset can be found at [Google drive] in jsonline format.

Since running OpenAI models are costly, we also release the generated results by these LLMs along with the dataset (so it's a pretty big json per line). You will find them nested under cg_pred, qa_pred, etc.

How to Run

Note: OpenAI has deprecated code-davinci-002.

Running constrained generation (CG)

For detailed parameters, please refer to constrained_generation/llm_constrained_generation.py.

An example:

python3 constrained_generation/llm_constrained_generation.py -i ${INPUT_FILE} -o ${OUTPUT_FILE} -m ${MODEL_NAME} --posk ${POSK} --negk ${NEGK} -b 16 --cot none

Running boolean question answering (QA)

For detailed parameters, please refer to boolqa/llm_answer_prediction.py.

An example:

python3 boolqa/llm_answer_prediction.py -i ${INPUT_FILE} -o ${OUTPUT_FILE} -m ${MODEL_NAME} --posk ${POSK} --negk ${NEGK} -b 16 --cot none

Evaluation

Evaluate constrained generation (CG)

python3 evaluation/eval_constrained_generation.py -i ${INPUT_FILE} -m ${MODEL_KEY}

Note that ${MODEL_KEY} is the id of a generation in the input json file, typically in the form of ${MODEL_NAME}_ex-${POSK}p${NEGK}n, such as text-davinci-002_ex-3p3n. Different parameters could result in different model keys. Please check the code carefully.

Evaluate boolean question answering (QA)

python3 evaluation/eval_boolqa.py -i ${INPUT_FILE} -m ${MODEL_KEY}

Same note as the CG task.

Citation

If you find our paper or resources useful, please kindly cite our paper. If you have any questions, please contact us!

@inproceedings{chen-etal-2023-say,
    title = "Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge",
    author = "Chen, Jiangjie  and
      Shi, Wei  and
      Fu, Ziquan  and
      Cheng, Sijie  and
      Li, Lei  and
      Xiao, Yanghua",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.550",
    pages = "9890--9908",
    abstract = "Large language models (LLMs) have been widely studied for their ability to store and utilize positive knowledge. However, negative knowledge, such as {``}lions don{'}t live in the ocean{''}, is also ubiquitous in the world but rarely mentioned explicitly in text.What do LLMs know about negative knowledge?This work examines the ability of LLMs on negative commonsense knowledge.We design a constrained keywords-to-sentence generation task (CG) and a Boolean question answering task (QA) to probe LLMs.Our experiments reveal that LLMs frequently fail to generate valid sentences grounded in negative commonsense knowledge, yet they can correctly answer polar yes-or-no questions.We term this phenomenon the belief conflict of LLMs.Our further analysis shows that statistical shortcuts and negation reporting bias from language modeling pre-training cause this conflict.",
}

About

Resources for our ACL 2023 paper: "Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge".

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages