Prompt4LLM-Eval

This is the Source Code of Paper: Which is better? Exploring Prompting Strategy For LLM-based Metrics.

What is Prompt4LLM-Eval?

Prompt4LLM-Eval is a methodology that analyzes LLM-based assessment methods by decomposing them into three components: Prompt Strategy, Score Aggregation, Explainability.

We won the summarization track at the AACL-Eval4NLP workshop with this methodology.

Key findings

For detailed experimental results, please refer to the paper.

Dataset : Summeval dev set
Metric : Kendall's tau correlation to measure similarity to human scores

1) Prompt Strategy
The highest performance is achieved when the prompt is configured similarly to human annotation instructions.

	Orca-7B	Orca-13B
Human	0.3472	0.4468
Model	0.2864	0.3844
BaSE	0.2746	0.3891

Human: Prompt consisting of human annotation instructions
Model : The evaluation prompt used for GPT4
Baseline : Task description and Score guide

2) Score Aggregation
The direct generation method achieved the highest performance, while the approximation method exhibited low performance due to noise introduced during sampling.

	Orca-7B	Orca-13B
Direct	0.3472	0.4468
Logprob	0.3296	0.4210
Approximation	0.3239	0.4002

Direct: A scoring method that uses the score you create as is
Logprob : weighted sum based on 1~5 token probability
Approximation : Calculate the average after sampling the evaluation score N times

Usage

Setting up an experimental environment

conda create --name Prompt4LLM_Eval python=3.10
conda activate Prompt4LLM_Eval
conda install pip
pip install -r requirements.txt

Running experiment with vllm

./scripts/inference_vllm.sh

Running experiment with Guidance

./scripts/inference_guidance.sh

Bib

@misc{kim2023better,
      title={Which is better? Exploring Prompting Strategy For LLM-based Metrics}, 
      author={Joonghoon Kim and Saeran Park and Kiyoon Jeong and Sangmin Lee and Seung Hun Han and Jiyoon Lee and Pilsung Kang},
      year={2023},
      eprint={2311.03754},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
docker		docker
scripts		scripts
utils		utils
README.md		README.md
guidance_flu.py		guidance_flu.py
inference_vllm.py		inference_vllm.py
prompt_template_experiment.json		prompt_template_experiment.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prompt4LLM-Eval

What is Prompt4LLM-Eval?

Key findings

Usage

Bib

About

Releases

Packages

Languages

kiyoonjeong0305/Prompt4LLM-Eval

Folders and files

Latest commit

History

Repository files navigation

Prompt4LLM-Eval

What is Prompt4LLM-Eval?

Key findings

Usage

Bib

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages