Skip to content

Conversation

sahusiddharth
Copy link
Contributor

@sahusiddharth sahusiddharth commented Aug 12, 2024

Solves:

Input

from datasets import Dataset 
from ragas.metrics import noise_sensitivity_relevant, noise_sensitivity_irrelevant
from ragas import evaluate
data_sample = {
    "question": ["What is the Life Insurance Corporation of India (LIC) known for?"],
    "ground_truth": ["The Life Insurance Corporation of India (LIC) is the largest insurance company in India, established in 1956 through the nationalization of the insurance industry. It is known for managing a large portfolio of investments."],
    "answer": ["The Life Insurance Corporation of India (LIC) is the largest insurance company in India, known for its vast portfolio of investments. LIC contributs to the financial stability of the country."],
    "contexts": [["The Life Insurance Corporation of India (LIC) was established in 1956 following the nationalization of the insurance industry in India.",
        "LIC is the largest insurance company in India, with a vast network of policyholders and a huge investments.",
        "As the largest institutional investor in India, LIC manages a substantial funds, contributing to the financial stability of the country.",
        "The Indian economy is one of the fastest-growing major economies in the world, thanks to the secors like finance, technology, manufacturing etc"]]
}


dataset = Dataset.from_dict(data_sample)
metrics = [noise_sensitivity_relevant, noise_sensitivity_irrelevant]
score = evaluate(dataset,metrics=metrics)
score.to_pandas()

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Aug 12, 2024
@shahules786 shahules786 self-requested a review August 13, 2024 04:04
Copy link
Member

@shahules786 shahules786 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, great work. @sahusiddharth
Two more suggestions

  1. Can you fix the conflict?
  2. Can you also add a documentation associated with noise sensitivity?
    TBH understanding the use and internals of this is little overwhelming, users will find it hard to adopt it if they can't understand it.

@sahusiddharth
Copy link
Contributor Author

Could you please provide some suggestions on the types of documentation that might be required? Your input would be greatly appreciated.

@sahusiddharth sahusiddharth force-pushed the feat/adding_noise_metric branch from a5b2c28 to dfe069f Compare August 13, 2024 15:12
@shahules786
Copy link
Member

Hey @sahusiddharth I did see this doc written by one of author of ragchecker, I have also emailed him asking for an intuitive explanation of noise sensitivity (let's wait for few hours and see if he replies). Otherwise, it would be nice to follow how the format in metrics as here https://docs.ragas.io/en/stable/concepts/metrics/context_recall.html

@sahusiddharth
Copy link
Contributor Author

Hi @shahules786,

Got it. Please let me know when you hear back from the author. In the meantime, I’ll check the metrics format you mentioned.

Thanks!

Copy link
Member

@shahules786 shahules786 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @sahusiddharth The changes you did looks good. I also did some corrections on top of that. I think it's best to move forward and do the doc now - it the author reacts later we can get his feedback on it later. Can you prepare the docs to go with this?

It needs 3 sections:
A brief intuitive description of metrics
an example
how it's calculated using the example

@sahusiddharth
Copy link
Contributor Author

Hi @shahules786,

I wanted to clarify something regarding the noise sensitivity implementation. There are two types: one for when relevant context is retrieved and another for when irrelevant context is retrieved. The current implementation only addresses the relevant context.

Would you prefer that I add the handling for irrelevant context first, or should I complete the documentation for the basic implementation before proceeding with the additional functionality?

@shahules786
Copy link
Member

@sahusiddharth I did notice that, thought that using relevant might be more useful but now I think if the user has the ability to switch b/w both using an argument that would be better. Can you modify it to include that behavior? Then we can add documentation for both in same page.

@sahusiddharth
Copy link
Contributor Author

I’m happy to make the necessary modifications. I would appreciate some additional guidance on how best to return the results when asked for both. Since the output could be returned in multiple types of data, such as dictionaries, named tuples, or tuples, I’m considering the most appropriate format.

{'noise_sensitivity_relevant': 0.0, 'noise_sensitivity_irrelevant': 0.0}

I was thinking to return only the number when asked for a specific one.

Could you please advise on the preferred format for returning these results?

@shahules786
Copy link
Member

shahules786 commented Aug 14, 2024

@sahusiddharth Thanks for asking that. I think both should not be an option - it would be one of them 'relevant' or 'irrelevant'. By default, it should stay as 'relevant'.
In upcoming versions, we will introduce caching so avoid llm recalculating the same intermediate results as in this case if someone wants both they have two make two calls.

Make sure that when you write the doc give credit to Ragchecker by citing the work.

@sahusiddharth
Copy link
Contributor Author

@shahules786, I'm almost done with the documentation, to properly show the power of noise sensitivity, the example is getting long, Do you have a problem with that?

@shahules786
Copy link
Member

@sahusiddharth We can refine it later , but I also show some basic examples here

@sahusiddharth
Copy link
Contributor Author

@shahules786, Have gone through them, I didn't liked it that much the answers generated by llm is rarely using the information provided in the context and I didn't find it intuitive enough.

@shahules786
Copy link
Member

@sahusiddharth I agree, their dataset generation is naive.

@sahusiddharth
Copy link
Contributor Author

@shahules786, tried returning tuple when given we want noise sensitivity for both relevant and irrelevant the make-ci was giving me error.

ragas/src/ragas/metrics/_noise_sensitivity.py:208:15 - error: Method "_ascore" overrides class "Metric" in an incompatible manner
    Return type mismatch: base method returns type "Coroutine[Any, Any, float]", override returns type "Coroutine[Any, Any, Coroutine[Any, Any, float | Tuple[float, float]]]"
      "Coroutine[Any, Any, Coroutine[Any, Any, float | Tuple[float, float]]]" is incompatible with "Coroutine[Any, Any, float]"
        Type parameter "_ReturnT_co_nd@Coroutine" is covariant, but "Coroutine[Any, Any, float | Tuple[float, float]]" is not a subtype of "float"
          "Coroutine[Any, Any, float | Tuple[float, float]]" is incompatible with "float" (reportIncompatibleMethodOverride)
  /Users/nexus/Desktop/ankit/ragas/src/ragas/metrics/_noise_sensitivity.py:244:16 - error: Expression of type "Unknown | tuple[Unknown, Unknown]" is incompatible with return type "Coroutine[Any, Any, float | Tuple[float, float]]"
    Type "Unknown | tuple[Unknown, Unknown]" is incompatible with type "Coroutine[Any, Any, float | Tuple[float, float]]"
      "tuple[Unknown, Unknown]" is incompatible with "Coroutine[Any, Any, float | Tuple[float, float]]" (reportReturnType)

@sahusiddharth sahusiddharth changed the title Basic implementation of Noise sensitivity metrics from RAGChecker Implementation of Noise sensitivity metrics from RAGChecker Aug 14, 2024
@shahules786 shahules786 requested a review from jjmachan August 15, 2024 04:56
Copy link
Member

@shahules786 shahules786 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I have made few changes to polish it up.

@jjmachan jjmachan linked an issue Aug 15, 2024 that may be closed by this pull request
@shahules786 shahules786 merged commit 8da231d into explodinggradients:main Aug 23, 2024
@HuXiangkun
Copy link

HuXiangkun commented Sep 29, 2024

Hi @sahusiddharth @shahules786 , thanks for your interest to RAGChecker. I'm the coauthor of RAGChecker. This is a nice work for integrating Noise Sensitivity into Ragas.

Regarding your comments on ground truth answer generation, I want to make some clarifications. We took the short answers and the annotated ground truth passages (the context) as input to generate long-form answers. And then, to ensure the answers are faithful to the context, we use RefChecker to fiter out answers that contain hallucinations. So the generated answers are always stick to the provided context.

Please refer to Appendix A.2 in RAGChecker paper. Thanks!

@shahules786, Have gone through them, I didn't liked it that much the answers generated by llm is rarely using the information provided in the context and I didn't find it intuitive enough.

@jjmachan
Copy link
Member

hey @HuXiangkun thanks a lot for clearing that up ❤️
would love to have you in our community and stay in touch if you would be interested 🙂

@HuXiangkun
Copy link

Hi @jjmachan , I'm glad to stay in touch!

@jjmachan
Copy link
Member

jjmachan commented Oct 5, 2024

just send you a mail 🙂

@sahusiddharth sahusiddharth deleted the feat/adding_noise_metric branch November 6, 2024 13:07
@nibeditaSw
Copy link

Hi @sahusiddharth
which ragas version you are using for the evaluation of noise sensitivity metrics?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[R-293] Noise sensitivity metrics from RAGChecker

5 participants