# Dexter -- A Benchmark for Open-domain Complex Question Answering


## Introduction
Answering complex questions is a difficult task that requires knowledge retrieval. To address this, we propose our easy to use and  extensible benchmark composing diverse complex QA tasks and provide a toolkit to evaluate zero-shot retrieval capabilities of state-of-the-art dense and sparse retrieval models in an open-domain setting. Additionally, since context-based reasoning is key to complex QA tasks, we extend our toolkit with various LLM engines. Both the above components together allow our users to evaluate the various components in the Retrieval Augmented Generation pipeline.




## Evaluation Metrics

We will use the following metrics to evaluate the performance of the QA systems:

- **EM-Tol**: Helps handle cases of high precision and large numbers by using math.isclose with tolerance 0f 0.02
- **F1 Score**: The harmonic mean of precision and recall.
- **Cover-Exact Match (EM)**: Check whether the generated answer comprises the ground truth.

# Contriever Inference

In [1]:
from dexter.data.loaders.RetrieverDataset import RetrieverDataset
from dexter.retriever.dense.Contriever import Contriever
from dexter.config.constants import Split
from dexter.utils.metrics.retrieval.RetrievalMetrics import RetrievalMetrics
from dexter.utils.metrics.SimilarityMatch import CosineSimilarity as CosScore
from dexter.data.datastructures.hyperparameters.dpr import DenseHyperParams


if __name__ == "__main__":

    config_instance = DenseHyperParams(query_encoder_path="facebook/contriever",
                                     document_encoder_path="facebook/contriever"
                                     ,batch_size=32,show_progress_bar=True)

    loader = RetrieverDataset("data","wiki_musique_corpus","config.ini",Split.DEV,tokenizer=None)
    queries, qrels, corpus = loader.qrels()
    tasb_search = Contriever(config_instance)


    similarity_measure = CosScore()
    response = tasb_search.retrieve(corpus,queries,100,similarity_measure)
    print("indices",len(response))
    metrics = RetrievalMetrics(k_values=[1,3,5])
    print(metrics.evaluate_retrieval(qrels=qrels,results=response))

KeyError: 'wiki_musique_corpus'