Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to interpret the combination of metrics: context precision and the rest (real world example) #308

Open
younes-io opened this issue Nov 20, 2023 · 9 comments
Labels
question Further information is requested

Comments

@younes-io
Copy link
Contributor

I ran ragas to evaluate my LangChain-powered chatbot (it's basically a QA chain with document retrieval) and I got the following results.

question ground_truth faithfulness answer_relevancy context_recall context_precision context_relevancy
Q1 GT1 1 0.813637991 1 0 0.002824859
Q2 GT2 1 0.835290922 0 0 0.002890173
Q3 GT3 1 0.882307479 1 0 0.002659574
Q4 GT4 1 0.844765424 0 0 0.01953125
Q5 GT5 1 0.889618083 1 0 0.017857143

Of course, the context_precision (another form of context_relevancy which will disappear I think, according to the docs) values are very low (aka horrible). So, I did some debugging to understand the intermediate calculations (I didn't grasp everything.. but I've got an idea), and I'm wondering how is this situation possible (this is how I interpret it, and correct if I'm wrong):

context_recall: 1.00 (can it retrieve all the relevant information required to answer the question: YES)
context_precision: 0.00 (the signal to noise ration of retrieved context: -almost- everything retrieved is Noise)

For example, I checked that for one answer, this is how the context precision metric evaluated the 2 retrieved documents:

[[ChatGeneration(text='No.', generation_info={'finish_reason': 'stop'}, message=AIMessage(content='No.'))]

Yet, the faithfullness is 1 and the answer relevancy is 0.81.. I'm really confused.. maybe I miss something, but I'd like to understand how to interpret not only each metric independently, but the combinations of their values and what they entail.

Thank you,

@younes-io
Copy link
Contributor Author

I'm also wondering if this is a "side effect" of the (relatively) long chunks of my docs ? (around 500 tokens).. I don't know if this also impacts the calculation..

@younes-io
Copy link
Contributor Author

@shahules786 : could you please take a look on this please?

@shahules786
Copy link
Member

Hi @younes-io , this is an interesting but weird result. Will you be able to share a subset of your data so that I can understand well what's going on?

@younes-io
Copy link
Contributor Author

@shahules786 I'm afraid I can't share that since it's private data..
Basically, I have document chunks (say 2) returned by OpenSearch, which contain the answer to the question. The first document contains the response, the second contains a small portion of the answer. The second document is larger than the first.
I'm just wondering if ragas takes into account the ratio of "relevance to the question / length of the context" in its calculations of context_precision..

@younes-io
Copy link
Contributor Author

younes-io commented Nov 22, 2023

@shahules786 : I have tested using the example in ragas docs

So, I used this dataset:

from datasets import load_dataset

fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval")
fiqa_eval

and here's the result:

question contexts answer ground_truths context_precision faithfulness answer_relevancy context_recall context_relevancy
0 How to deposit a cheque issued to an associate... [Just have the associate sign the back and the... \nThe best way to deposit a cheque issued to a... [Have the check reissued to the proper payee.J... 0.0 1.0 0.938239 0.875 0.058824
1 Can I send a money order from USPS as a business? [Sure you can. You can fill in whatever you w... \nYes, you can send a money order from USPS as... [Sure you can. You can fill in whatever you w... 0.0 0.8 0.885277 1.000 0.285714
2 1 EIN doing business under multiple business n... [You're confusing a lot of things here. Compan... \nYes, it is possible to have one EIN doing bu... [You're confusing a lot of things here. Compan... 0.0 0.8 0.924754 0.000 0.083333
3 Applying for and receiving business credit [Set up a meeting with the bank that handles y... \nApplying for and receiving business credit c... ["I'm afraid the great myth of limited liabili... 0.0 1.0 0.899104 0.500 0.333333
4 401k Transfer After Business Closure [The time horizon for your 401K/IRA is essenti... \nIf your employer has closed and you need to ... [You should probably consult an attorney. Howe... 0.0 0.6 0.853572 0.000 0.043478

The context_precision is "almost" always equal to zero (or holds a near-zero value).

N.B: in the docs, the context precision is not displayed.

@younes-io
Copy link
Contributor Author

@shahules786 : sorry for bothering you, is someone from the team / community able to help on this please ? Thank you

@shahules786
Copy link
Member

Hi @younes-io , apologies for the late reply. Can you share your ragas version and LLM used?
Also can you try out the same using latest ragas in main ? You can install from source using pip install git+https://github.com/explodinggradients/ragas

@shahules786
Copy link
Member

@younes-io If you're open for a short call, I would love to help in person. Please book a slot here (early next week)

@younes-io
Copy link
Contributor Author

@shahules786 no worries, I'm also very sorry for the very late reply.. Sure, I'll book a slot!

@jjmachan jjmachan added the question Further information is requested label Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants