New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to interpret the combination of metrics: context precision and the rest (real world example) #308
Comments
I'm also wondering if this is a "side effect" of the (relatively) long chunks of my docs ? (around 500 tokens).. I don't know if this also impacts the calculation.. |
@shahules786 : could you please take a look on this please? |
Hi @younes-io , this is an interesting but weird result. Will you be able to share a subset of your data so that I can understand well what's going on? |
@shahules786 I'm afraid I can't share that since it's private data.. |
@shahules786 : I have tested using the example in ragas docs So, I used this dataset:
and here's the result:
The N.B: in the docs, the context precision is not displayed. |
@shahules786 : sorry for bothering you, is someone from the team / community able to help on this please ? Thank you |
Hi @younes-io , apologies for the late reply. Can you share your ragas version and LLM used? |
@younes-io If you're open for a short call, I would love to help in person. Please book a slot here (early next week) |
@shahules786 no worries, I'm also very sorry for the very late reply.. Sure, I'll book a slot! |
I ran ragas to evaluate my LangChain-powered chatbot (it's basically a QA chain with document retrieval) and I got the following results.
Of course, the
context_precision
(another form ofcontext_relevancy
which will disappear I think, according to the docs) values are very low (akahorrible
). So, I did some debugging to understand the intermediate calculations (I didn't grasp everything.. but I've got an idea), and I'm wondering how is this situation possible (this is how I interpret it, and correct if I'm wrong):context_recall: 1.00 (can it retrieve all the relevant information required to answer the question: YES)
context_precision: 0.00 (the signal to noise ration of retrieved context: -almost- everything retrieved is Noise)
For example, I checked that for one answer, this is how the context precision metric evaluated the 2 retrieved documents:
[[ChatGeneration(text='No.', generation_info={'finish_reason': 'stop'}, message=AIMessage(content='No.'))]
Yet, the faithfullness is 1 and the answer relevancy is 0.81.. I'm really confused.. maybe I miss something, but I'd like to understand how to interpret not only each metric independently, but the combinations of their values and what they entail.
Thank you,
The text was updated successfully, but these errors were encountered: