Skip to content

Metacognitive Prompting Improves Understanding in Large Language Models (NAACL 2024)

License

Notifications You must be signed in to change notification settings

EternityYW/Metacognitive-Prompting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Metacognitive Prompting Improves Understanding in Large Language Models

This repository contains datasets, model descriptions, the full set of prompts used in experiments, and corresponding experimental results.

Datasets

We utilize multiple natural language understanding datasets for our experiments, selected from GLUE and SuperGLUE. For evaluation purposes, we utilize the development set corresponding to each task. The overview of datasets is shown below:

Task Dataset Input Output Metric
Sentiment SST-2 Single sentence Binary Accuracy
Similarity STS-B Sentence pair Continuous Pearson/Spearman Correlation
Paraphrase QQP Question pair Binary F1/Accuracy
QA/NLI QNLI Question + passage Binary Accuracy
NLI WNLI, RTE, CB Sentence pair Binary/Ternary F1/Accuracy
WSD WiC Sentence pair + target word Binary Accuracy
coref. WSC Passage + pronouns Binary Accuracy
QA COPA Question + choices Binary Accuracy

Here, QA stands for question answering, NLI is natural language inference, WSD is word sense disambiguation, and coref. is coreference resolution. Datasets can be obtained in "./datasets".

Models

In our evaluation, we consider five popular large language models (LLMs): the open-source models Llama-2-13b-chat and Vicuna-13b-v1.1, and the closed-source models PaLM-bison-chat, GPT-3.5-turbo, and GPT-4. For all models, we apply greedy decoding (i.e., temperature = 0) for response generation.

Prompts

Metacognitive Prompting (MP) is inspired by human introspective reasoning processes. The figure below shows the alignment between human metacognitive pro- cesses and the stages of MP for LLMs:

image

MP consists of five main stages: 1) understanding the input text, 2) making a preliminary judgment, 3) critically evaluating this preliminary analysis, 4) reaching a final decision accompanied by an explanation of the reasoning, and 5) evaluating the confidence level in the entire process. A sample question chosen from the Quora Question Pair (QQP) dataset demonstrates the overall MP process:

image

The diagram features three columns, from left to right, representing the high-level metacognitive stages, specific metacognitive prompts fed into the LLM, and the LLM's corresponding outputs. Prompts in the middle column are collectively fed into the LLM as a single input during the experiments.

For our experiments, we compare our proposed MP with standard prompting (SP) and chain-of-thought (CoT) prompting. Each of these is conducted under zero-shot and 5-shot learning settings. For exemplars used under 5-shot learning settings, they are randomly selected from the training set of each dataset. Each dataset has its own set of exemplars, where exemplar answers are obtained through human annotation.

The full set of prompts used when applying MP, StP, and CoT under zero-shot and 5-shot learning paradigms can be found in "./prompts".

Experimental Results

The experimental results for each dataset can be found in "./results". For each dataset, we experiment with three prompting methods in zero-shot and 5-shot learning scenarios across five LLMs. We report the best result after multiple experimental iterations.

Please refer to our full paper for more details.

Citation

If you find this work helpful, please consider citing as follows:

@article{wang2023metacognitive,
  title={Metacognitive Prompting Improves Understanding in Large Language Models},
  author={Wang, Yuqing and Zhao, Yun},
  journal={arXiv preprint arXiv:2308.05342},
  year={2023}
}

About

Metacognitive Prompting Improves Understanding in Large Language Models (NAACL 2024)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published