Clarification Needed on the Specificity of test_dataset #22

ViceSilva · 2024-04-16T22:15:45Z

Hello,

I am currently working with the project and have a question regarding the test_dataset used within. Could you please clarify whether the test_dataset needs to be domain-specific, particularly tailored to the RAG domain, or if a generic labeled dataset is suitable for this purpose?

robbym-dev · 2024-04-28T08:34:41Z

For evaluating Retrieval-Augmented Generation (RAG) models like the ones you’re working with, the choice of a validation dataset can significantly influence how well the model’s performance generalizes across different types of data and use cases.

If your RAG model is intended to be used in a specific domain (like medical, legal, or technical documents), it would be beneficial to use a domain-specific validation dataset. This approach helps ensure that the model performs well on the type of content it will encounter in its expected environment.

However, if the model is intended for more general use, a generic labeled dataset could suffice. This kind of dataset helps evaluate the model’s ability to handle a broad range of topics and types of queries.

In your case, it be beneficial to use a domain-specific validation dataset tailored to the RAG domain to accurately evaluate your RAG model using ARES.

robbym-dev closed this as completed Apr 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification Needed on the Specificity of test_dataset #22

Clarification Needed on the Specificity of test_dataset #22

ViceSilva commented Apr 16, 2024

robbym-dev commented Apr 28, 2024

Clarification Needed on the Specificity of test_dataset #22

Clarification Needed on the Specificity of test_dataset #22

Comments

ViceSilva commented Apr 16, 2024

robbym-dev commented Apr 28, 2024