A Comparison of Methods for Evaluating Generative IR

In this repository we include the prompts, codes and generated responses for the paper "A Comparison of Methods for Evaluating Generative IR".

Prompts includes all the prmopts we used for our experiments.

generated responses directory includes all the regular generated responses as well as liar responses we used for our experiments by four different LLMS including:

gpt-4
gpt-3.5-turbo
llama2 7b chat
llama2 13b chat LLMs.

data includes the queries and relevance judgements used for our experiments from Trec DL 2019 and Trec DL 2020.

genir experiments include the results in evaluating LLM generated responses using 5 different evaluation methods described in the paper.

Validations include the resuls of compaing qrels from different level of relevance in TREC dl 2019 and 2020.

The results for experiments conducted in Section 3 and 4 can be found under Validations and Genir experiments respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
data		data
generated responses		generated responses
genir experiments		genir experiments
prompts		prompts
validations		validations
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Comparison of Methods for Evaluating Generative IR

About

Releases

Packages

Narabzad/genir-evaluation

Folders and files

Latest commit

History

Repository files navigation

A Comparison of Methods for Evaluating Generative IR

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages