Skip to content

Narabzad/genir-evaluation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Comparison of Methods for Evaluating Generative IR

In this repository we include the prompts, codes and generated responses for the paper "A Comparison of Methods for Evaluating Generative IR".

Prompts includes all the prmopts we used for our experiments.

generated responses directory includes all the regular generated responses as well as liar responses we used for our experiments by four different LLMS including:

  • gpt-4
  • gpt-3.5-turbo
  • llama2 7b chat
  • llama2 13b chat LLMs.

data includes the queries and relevance judgements used for our experiments from Trec DL 2019 and Trec DL 2020.

genir experiments include the results in evaluating LLM generated responses using 5 different evaluation methods described in the paper.

Validations include the resuls of compaing qrels from different level of relevance in TREC dl 2019 and 2020.

The results for experiments conducted in Section 3 and 4 can be found under Validations and Genir experiments respectively.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published