From edb40f1afbb68aacc30b7a854928c36a0d6e6a3d Mon Sep 17 00:00:00 2001 From: jjmachan Date: Fri, 9 Feb 2024 21:11:42 -0800 Subject: [PATCH 01/12] docs: getting started --- docs/getstarted/evaluation.md | 2 +- docs/getstarted/index.md | 24 +++++++++++------------- docs/getstarted/install.md | 6 ++++-- docs/getstarted/monitoring.md | 16 +++++++++++----- docs/getstarted/testset_generation.md | 2 +- 5 files changed, 28 insertions(+), 22 deletions(-) diff --git a/docs/getstarted/evaluation.md b/docs/getstarted/evaluation.md index df4029620..bafb099e6 100644 --- a/docs/getstarted/evaluation.md +++ b/docs/getstarted/evaluation.md @@ -1,5 +1,5 @@ (get-started-evaluation)= -# Evaluation +# Evaluate Your Testset Welcome to the ragas quickstart. We're going to get you up and running with ragas as quickly as you can so that you can go back to improving your Retrieval Augmented Generation pipelines while this library makes sure your changes are improving your entire pipeline. diff --git a/docs/getstarted/index.md b/docs/getstarted/index.md index 73bda2d90..d4f784267 100644 --- a/docs/getstarted/index.md +++ b/docs/getstarted/index.md @@ -5,21 +5,18 @@ :maxdepth: 1 :hidden: install.md -evaluation.md testset_generation.md +evaluation.md monitoring.md ::: -Welcome to the Ragas tutorials! These beginner-friendly tutorials will guide you -through the fundamentals of working with Ragas. These tutorials do assume basic +Welcome to the Ragas tutorials! If your news to Ragas the Get Started guides will walk you through the fundamentals of working with Ragas. These tutorials do assume basic knowledge of Python and Retrieval Augmented Generation (RAG) pipelines. Before you go further make sure you have [Ragas installed](./install.md)! :::{note} -The tutorials only give you on overview of what you can do with ragas and the -basic skill you need to use it. If you want an in-depth explanation of the -core-concepts behind Ragas, check out the [Core Concepts](../concepts/index.md) page. You can also checkout the [How-to Guides](../howtos/index.md) if you want to specific applications of Ragas. +The tutorials only give you on overview of what you can do with ragas and the basic skill you need to use it. If you want an in-depth explanation of the core-concepts behind Ragas, check out the [Core Concepts](../concepts/index.md) page. You can also checkout the [How-to Guides](../howtos/index.md) if you want to specific applications of Ragas. ::: @@ -28,22 +25,23 @@ If you have any questions about Ragas, feel free to join and ask in the Let’s get started! 🏁 -:::{card} Ragas Metrics and Evaluation -:link: get-started-evaluation +:::{card} Synthetic Test data Generation +:link: get-started-testset-generation :link-type: ref -How to use the Ragas Metrics to evaluate your RAG pipelines. +If you want to learn how to generate a synthetic testset to get started. ::: -:::{card} Synthetic Test data Generation -:link: get-started-testset-generation +:::{card} Ragas Metrics and Evaluation +:link: get-started-evaluation :link-type: ref -How to generate test set to assess your RAG pipelines +If your are looking to evaluate your RAG pipeline against your testset (your own dataset or synthetic). ::: + :::{card} Monitoring :link: get-started-monitoring :link-type: ref -How to monitor your RAG systems in production. +If you curious about monitoring the performance and quality of your RAG application in production. ::: diff --git a/docs/getstarted/install.md b/docs/getstarted/install.md index 9e9f4e317..f725b4c78 100644 --- a/docs/getstarted/install.md +++ b/docs/getstarted/install.md @@ -1,11 +1,11 @@ # Install -You can install ragas with +To get started, install ragas with `pip` as ```bash pip install ragas ``` -If you want to install the latest version (from the main branch) +If you want to play around with the latest and greatest, install the latest version (from the main branch) ```bash pip install git+https://github.com/explodinggradients/ragas.git ``` @@ -18,3 +18,5 @@ git clone https://github.com/explodinggradients/ragas.git cd ragas pip install -e . ``` + +Next let's build a [synthetic testset](get-started-testset-generation) with your own data or If you brought your own testset, lets learn how you can [evaluate it](get-started-evaluation) with Ragas. diff --git a/docs/getstarted/monitoring.md b/docs/getstarted/monitoring.md index 37593c6bf..e622bb377 100644 --- a/docs/getstarted/monitoring.md +++ b/docs/getstarted/monitoring.md @@ -1,14 +1,20 @@ (get-started-monitoring)= -# Monitoring +# Monitor Your RAG in Production -Maintaining the quality and performance of an LLM application in a production environment can be challenging. Ragas provides with basic building blocks that you can use for production quality monitoring, offering valuable insights into your application's performance. This is achieved by constructing custom, smaller, more cost-effective, and faster models. +Maintaining the quality and performance of a RAG application in a production environment can be challenging. Ragas currently provides the basic building blocks that you can use for production quality monitoring, offering valuable insights into your application's performance. But we are also working towards building more advanced production monitoring solution by trying to tackle 3 questions + +1. How can we keep the distribution of your production dataset consistent with your testset. +2. How can we effectively extract insights from explicit and implicit signals your users provide to infer the quality of your RAG application and the areas that need your attention. +3. Constructing custom, smaller, more cost-effective, and faster models for evalution and more more advanced testset generation. :::{note} -This is feature is still in beta access. You can requests for -[**early access**](https://calendly.com/shahules/30min) to try it out. +We are still building out and gathering feedback in upcoming releases. You can requests for +[**early access**](https://calendly.com/shahules/30min) to try it out or share the challenges you face in this area, we would love your to hear your thoughts/challenges here. ::: -The Ragas metrics can also be used with other LLM observability tools like +#TODO: add list of monitoring integration. + +You can also use the Ragas metrics with other LLM observability tools like [Langsmith](https://www.langchain.com/langsmith) and [Langfuse](https://langfuse.com/) to get model-based feedback about various aspects of you application like those mentioned below diff --git a/docs/getstarted/testset_generation.md b/docs/getstarted/testset_generation.md index 8735b3d12..09bc9727c 100644 --- a/docs/getstarted/testset_generation.md +++ b/docs/getstarted/testset_generation.md @@ -1,5 +1,5 @@ (get-started-testset-generation)= -# Synthetic test data generation +# Generate Synthetic a Testset This tutorial is designed to help you create a synthetic evaluation dataset for assessing your RAG pipeline. To achieve this, we will utilize open-ai models, so please ensure you have your OpenAI API key ready and accessible within your environment. From d5cd52455dfb14e2b04eadf85e7f375922cfecb1 Mon Sep 17 00:00:00 2001 From: jjmachan Date: Thu, 15 Feb 2024 16:28:05 -0800 Subject: [PATCH 02/12] fix headings --- docs/getstarted/evaluation.md | 2 +- docs/getstarted/index.md | 6 +++--- docs/getstarted/monitoring.md | 2 +- docs/getstarted/testset_generation.md | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/getstarted/evaluation.md b/docs/getstarted/evaluation.md index bafb099e6..5f029f8c5 100644 --- a/docs/getstarted/evaluation.md +++ b/docs/getstarted/evaluation.md @@ -1,5 +1,5 @@ (get-started-evaluation)= -# Evaluate Your Testset +# Evaluate your Testset Welcome to the ragas quickstart. We're going to get you up and running with ragas as quickly as you can so that you can go back to improving your Retrieval Augmented Generation pipelines while this library makes sure your changes are improving your entire pipeline. diff --git a/docs/getstarted/index.md b/docs/getstarted/index.md index d4f784267..9ebee92c9 100644 --- a/docs/getstarted/index.md +++ b/docs/getstarted/index.md @@ -25,21 +25,21 @@ If you have any questions about Ragas, feel free to join and ask in the Let’s get started! 🏁 -:::{card} Synthetic Test data Generation +:::{card} Generate a Synthetic Testset :link: get-started-testset-generation :link-type: ref If you want to learn how to generate a synthetic testset to get started. ::: -:::{card} Ragas Metrics and Evaluation +:::{card} Evaluate your Testset :link: get-started-evaluation :link-type: ref If your are looking to evaluate your RAG pipeline against your testset (your own dataset or synthetic). ::: -:::{card} Monitoring +:::{card} Monitor your RAG in Production :link: get-started-monitoring :link-type: ref diff --git a/docs/getstarted/monitoring.md b/docs/getstarted/monitoring.md index e622bb377..9b3271438 100644 --- a/docs/getstarted/monitoring.md +++ b/docs/getstarted/monitoring.md @@ -1,5 +1,5 @@ (get-started-monitoring)= -# Monitor Your RAG in Production +# Monitor your RAG in Production Maintaining the quality and performance of a RAG application in a production environment can be challenging. Ragas currently provides the basic building blocks that you can use for production quality monitoring, offering valuable insights into your application's performance. But we are also working towards building more advanced production monitoring solution by trying to tackle 3 questions diff --git a/docs/getstarted/testset_generation.md b/docs/getstarted/testset_generation.md index 09bc9727c..935c43887 100644 --- a/docs/getstarted/testset_generation.md +++ b/docs/getstarted/testset_generation.md @@ -1,5 +1,5 @@ (get-started-testset-generation)= -# Generate Synthetic a Testset +# Generate a Synthetic Testset This tutorial is designed to help you create a synthetic evaluation dataset for assessing your RAG pipeline. To achieve this, we will utilize open-ai models, so please ensure you have your OpenAI API key ready and accessible within your environment. From 17c199211f8cf1f7f905762545c2bc7d438b4a19 Mon Sep 17 00:00:00 2001 From: jjmachan Date: Thu, 15 Feb 2024 16:59:16 -0800 Subject: [PATCH 03/12] fixed with gpt4 --- docs/getstarted/evaluation.md | 49 +++++++++++++-------------- docs/getstarted/index.md | 17 ++++------ docs/getstarted/testset_generation.md | 14 ++++---- 3 files changed, 37 insertions(+), 43 deletions(-) diff --git a/docs/getstarted/evaluation.md b/docs/getstarted/evaluation.md index 5f029f8c5..306f57fb5 100644 --- a/docs/getstarted/evaluation.md +++ b/docs/getstarted/evaluation.md @@ -1,20 +1,19 @@ (get-started-evaluation)= -# Evaluate your Testset +# Evaluate Your Testset -Welcome to the ragas quickstart. We're going to get you up and running with ragas as quickly as you can so that you can go back to improving your Retrieval Augmented Generation pipelines while this library makes sure your changes are improving your entire pipeline. +Welcome to the Ragas quickstart. Our aim is to get you up and running with Ragas as quickly as possible, so that you can focus on improving your Retrieval Augmented Generation pipelines while this library ensures your changes are enhancing your entire pipeline. -to kick things of lets start with the data +To kick things off, let's start with the data. :::{note} -Are you using Azure OpenAI endpoints? Then checkout [this quickstart -guide](../howtos/customisations/azure-openai.ipynb) +Are you using Azure OpenAI endpoints? Then check out [this quickstart guide](../howtos/customisations/azure-openai.ipynb). ::: ```bash pip install ragas ``` -Ragas also uses OpenAI for running some metrics so make sure you have your openai key ready and available in your environment +Ragas also uses OpenAI for running some metrics, so ensure you have your OpenAI key ready and available in your environment. ```python import os @@ -22,15 +21,14 @@ os.environ["OPENAI_API_KEY"] = "your-openai-key" ``` ## The Data -For this tutorial we are going to use an example dataset from one of the baselines we created for the [Financial Opinion Mining and Question Answering (fiqa) Dataset](https://sites.google.com/view/fiqa/). The dataset has the following columns. +For this tutorial, we are going to use an example dataset from one of the baselines we created for the [Financial Opinion Mining and Question Answering (FIQA) Dataset](https://sites.google.com/view/fiqa/). The dataset has the following columns: - question: `list[str]` - These are the questions your RAG pipeline will be evaluated on. - answer: `list[str]` - The answer generated from the RAG pipeline and given to the user. - contexts: `list[list[str]]` - The contexts which were passed into the LLM to answer the question. - ground_truths: `list[list[str]]` - The ground truth answer to the questions. (only required if you are using context_recall) -Ideally your list of questions should reflect the questions your users give, including those that you have been problematic in the past. - +Ideally, your list of questions should reflect the questions your users ask, including those that have been problematic in the past. ```{code-block} python :caption: import sample dataset @@ -47,14 +45,14 @@ See [testset generation](./testset_generation.md) to learn how to generate your ## Metrics -Ragas provides you with a few metrics to evaluate the different aspects of your RAG systems namely +Ragas provides you with a few metrics to evaluate the different aspects of your RAG systems. -1. Retriever: offers `context_precision` and `context_recall` which give you the measure of the performance of your retrieval system. -2. Generator (LLM): offers `faithfulness` which measures hallucinations and `answer_relevancy` which measures how to the point the answers are to the question. +1. Retriever: Offers `context_precision` and `context_recall` which measure the performance of your retrieval system. +2. Generator (LLM): Offers `faithfulness` which measures hallucinations and `answer_relevancy` which measures how to the point the answers are to the question. -The harmonic mean of these 4 aspects gives you the **ragas score** which is a single measure of the performance of your QA system across all the important aspects. +The harmonic mean of these 4 aspects gives you the **Ragas score** which is a single measure of the performance of your QA system across all the important aspects. -now lets import these metrics and understand more about what they denote +Now, let's import these metrics and understand more about what they denote. ```{code-block} python :caption: import metrics @@ -65,21 +63,20 @@ from ragas.metrics import ( context_precision, ) ``` -here you can see that we are using 4 metrics, but what do they represent? - -1. faithfulness - the factual consistency of the answer to the context base on the question. -2. context_precision - a measure of how relevant the retrieved context is to the question. Conveys quality of the retrieval pipeline. -3. answer_relevancy - a measure of how relevant the answer is to the question -4. context_recall: measures the ability of the retriever to retrieve all the necessary information needed to answer the question. +Here you can see that we are using 4 metrics, but what do they represent? +1. Faithfulness - The factual consistency of the answer to the context based on the question. +2. Context_precision - A measure of how relevant the retrieved context is to the question. Conveys quality of the retrieval pipeline. +3. Answer_relevancy - A measure of how relevant the answer is to the question. +4. Context_recall - Measures the ability of the retriever to retrieve all the necessary information needed to answer the question. :::{note} -by default these metrics are using OpenAI's API to compute the score. If you using this metric make sure you set the environment key `OPENAI_API_KEY` with your API key. You can also try other LLMs for evaluation, check the [llm guide](../howtos/customisations/llms.ipynb) to learn more +By default, these metrics are using OpenAI's API to compute the score. If you are using this metric, make sure you set the environment key `OPENAI_API_KEY` with your API key. You can also try other LLMs for evaluation, check the [LLM guide](../howtos/customisations/llms.ipynb) to learn more. ::: ## Evaluation -Running the evaluation is as simple as calling evaluate on the `Dataset` with the metrics of your choice. +Running the evaluation is as simple as calling `evaluate` on the `Dataset` with the metrics of your choice. ```{code-block} python :caption: evaluate using sample dataset @@ -97,9 +94,9 @@ result = evaluate( result ``` -and there you have it, all the scores you need. +And there you have it, all the scores you need. -Now if we want to dig into the results and figure out examples where your pipeline performed worse or really good you can easily convert it into a pandas array and use your standard analytics tools too! +If you want to dig into the results and figure out examples where your pipeline performed poorly or exceptionally well, you can easily convert it into a pandas DataFrame and use your standard analytics tools too! ```{code-block} python :caption: export results @@ -110,6 +107,6 @@ df.head() quickstart-outputs

-And thats it! +And that's it! -If you have any suggestion/feedbacks/things your not happy about, please do share it in the [issue section](https://github.com/explodinggradients/ragas/issues). We love hearing from you 😁 +If you have any suggestions, feedback, or issues, please share them in the [issue section](https://github.com/explodinggradients/ragas/issues). We value your input. diff --git a/docs/getstarted/index.md b/docs/getstarted/index.md index 9ebee92c9..b218a5378 100644 --- a/docs/getstarted/index.md +++ b/docs/getstarted/index.md @@ -10,20 +10,17 @@ evaluation.md monitoring.md ::: -Welcome to the Ragas tutorials! If your news to Ragas the Get Started guides will walk you through the fundamentals of working with Ragas. These tutorials do assume basic -knowledge of Python and Retrieval Augmented Generation (RAG) pipelines. +Welcome to the Ragas tutorials! If you're new to Ragas, the Get Started guides will walk you through the fundamentals of working with Ragas. These tutorials assume basic knowledge of Python and Retrieval Augmented Generation (RAG) pipelines. -Before you go further make sure you have [Ragas installed](./install.md)! +Before you proceed further, make sure you have [Ragas installed](./install.md)! :::{note} -The tutorials only give you on overview of what you can do with ragas and the basic skill you need to use it. If you want an in-depth explanation of the core-concepts behind Ragas, check out the [Core Concepts](../concepts/index.md) page. You can also checkout the [How-to Guides](../howtos/index.md) if you want to specific applications of Ragas. +The tutorials only give you an overview of what you can do with Ragas and the basic skills needed to use it. If you want an in-depth explanation of the core concepts behind Ragas, check out the [Core Concepts](../concepts/index.md) page. You can also check out the [How-to Guides](../howtos/index.md) if you want specific applications of Ragas. ::: +If you have any questions about Ragas, feel free to join and ask in the `#questions` channel in our discord community. -If you have any questions about Ragas, feel free to join and ask in the -`#questions` channel in our discord community ❤ . - -Let’s get started! 🏁 +Let’s get started! :::{card} Generate a Synthetic Testset :link: get-started-testset-generation @@ -36,12 +33,12 @@ If you want to learn how to generate a synthetic testset to get started. :link: get-started-evaluation :link-type: ref -If your are looking to evaluate your RAG pipeline against your testset (your own dataset or synthetic). +If you are looking to evaluate your RAG pipeline against your testset (your own dataset or synthetic). ::: :::{card} Monitor your RAG in Production :link: get-started-monitoring :link-type: ref -If you curious about monitoring the performance and quality of your RAG application in production. +If you're curious about monitoring the performance and quality of your RAG application in production. ::: diff --git a/docs/getstarted/testset_generation.md b/docs/getstarted/testset_generation.md index 935c43887..fe4fe1dbc 100644 --- a/docs/getstarted/testset_generation.md +++ b/docs/getstarted/testset_generation.md @@ -1,7 +1,7 @@ (get-started-testset-generation)= -# Generate a Synthetic Testset +# Generate a Synthetic Test Set -This tutorial is designed to help you create a synthetic evaluation dataset for assessing your RAG pipeline. To achieve this, we will utilize open-ai models, so please ensure you have your OpenAI API key ready and accessible within your environment. +This tutorial is designed to help you create a synthetic evaluation dataset for assessing your RAG pipeline. To accomplish this, we will utilize OpenAI models. Please ensure you have your OpenAI API key ready and accessible within your environment. ```{code-block} python import os @@ -11,7 +11,7 @@ os.environ["OPENAI_API_KEY"] = "your-openai-key" ## Documents -To begin, we require a collection of documents to generate synthetic Question/Context/Answer samples. Here, we will employ the langchain document loader to load documents. +We first need a collection of documents to generate synthetic `Question/Context/Answer/Ground_Truth` samples. For this, we'll use the LangChain document loader to load documents. ```{code-block} python :caption: Load documents from directory @@ -21,9 +21,9 @@ documents = loader.load() ``` :::{note} -Each Document object contains a metadata dictionary, which can be used to store additional information about the document which can be accessed with `Document.metadata`. Please ensure that the metadata dictionary contains a key called `file_name` as this will be used in the generation process. The `file_name` attribute in metadata is used to identify chunks belonging to the same document. For example, pages belonging to the same research publication can be identifies using filename. +Each Document object contains a metadata dictionary, which can be used to store additional information about the document accessible via `Document.metadata`. Please ensure that the metadata dictionary contains a key called `file_name`, as this will be used in the generation process. The `file_name` attribute in metadata is used to identify chunks belonging to the same document. For instance, pages belonging to the same research publication can be identified using filename. -An example of how to do this is shown below. +Here's an example of how to do this: ```{code-block} python for document in documents: @@ -31,11 +31,11 @@ for document in documents: ``` ::: -At this point, we have a set of documents at our disposal, which will serve as the basis for creating synthetic Question/Context/Answer triplets. +At this stage, we have a set of documents ready, which will be used as the foundation for creating synthetic Question/Context/Answer/Ground_Truth samples. ## Data Generation -We will now import and use Ragas' `Testsetgenerator` to promptly generate a synthetic test set from the loaded documents. +We will now import and use Ragas' `TestsetGenerator` to swiftly generate a synthetic test set from the loaded documents. ```{code-block} python :caption: Create 10 samples using default configuration From d21a03e5cfbee1d9aa4e734b7b925579c87954fd Mon Sep 17 00:00:00 2001 From: jjmachan Date: Thu, 15 Feb 2024 18:34:11 -0800 Subject: [PATCH 04/12] monitoring --- docs/getstarted/monitoring.md | 34 +++++++++++++++------------------- 1 file changed, 15 insertions(+), 19 deletions(-) diff --git a/docs/getstarted/monitoring.md b/docs/getstarted/monitoring.md index 9b3271438..8e869a310 100644 --- a/docs/getstarted/monitoring.md +++ b/docs/getstarted/monitoring.md @@ -1,33 +1,29 @@ (get-started-monitoring)= # Monitor your RAG in Production -Maintaining the quality and performance of a RAG application in a production environment can be challenging. Ragas currently provides the basic building blocks that you can use for production quality monitoring, offering valuable insights into your application's performance. But we are also working towards building more advanced production monitoring solution by trying to tackle 3 questions +Maintaining the quality and performance of a RAG application in a production environment can be challenging. RAG currently provides the essential building blocks that you can use for production-quality monitoring, offering valuable insights into your application's performance. However, we are also working towards building a more advanced production monitoring solution by addressing three questions: -1. How can we keep the distribution of your production dataset consistent with your testset. -2. How can we effectively extract insights from explicit and implicit signals your users provide to infer the quality of your RAG application and the areas that need your attention. -3. Constructing custom, smaller, more cost-effective, and faster models for evalution and more more advanced testset generation. +1. How can we ensure the distribution of your production dataset remains consistent with your test set? +2. How can we effectively extract insights from explicit and implicit signals your users provide to infer the quality of your RAG application and identify areas that require attention? +3. How can we construct custom, smaller, more cost-effective and faster models for evaluation and more advanced test set generation? :::{note} -We are still building out and gathering feedback in upcoming releases. You can requests for -[**early access**](https://calendly.com/shahules/30min) to try it out or share the challenges you face in this area, we would love your to hear your thoughts/challenges here. +We are still developing and gathering feedback for upcoming releases. You can request +[**early access**](https://calendly.com/shahules/30min) to try it out or share the challenges you face in this area. We would love to hear your thoughts and challenges. ::: -#TODO: add list of monitoring integration. +Additionally, you can use the RAG metrics with other Machine Learning Model (MLM) observability tools like +- [Langsmith](../howtos/integrations/langsmith.ipynb) +- [Phoenix (Arize)](https://github.com/Arize-ai/phoenix) +- [Langfuse](../howtos/integrations/langfuse.ipynb) +- [OpenLayer](https://openlayer.com/) -You can also use the Ragas metrics with other LLM observability tools like -[Langsmith](https://www.langchain.com/langsmith) and -[Langfuse](https://langfuse.com/) to get model-based feedback about various -aspects of you application like those mentioned below - -:::{seealso} -[Langfuse Integration](../howtos/integrations/langfuse.ipynb) to see Ragas -monitoring in action within the Langfuse dashboard and how to set it up -::: +to get model-based feedback about various aspects of your application, such as those mentioned below: ## Aspects to Monitor 1. Faithfulness: This feature assists in identifying and quantifying instances of hallucinations. 2. Bad retrieval: This feature helps identify and quantify poor context retrievals. -3. Bad response: This feature helps in recognizing and quantifying evasive, harmful, or toxic responses. -4. Bad format: This feature helps in detecting and quantifying responses with incorrect formatting. -5. Custom use-case: For monitoring other critical aspects that are specific to your use case. [Talk to founders](https://calendly.com/shahules/30min) +3. Bad response: This feature assists in recognizing and quantifying evasive, harmful, or toxic responses. +4. Bad format: This feature enables the detection and quantification of responses with incorrect formatting. +5. Custom use-case: For monitoring other critical aspects that are specific to your use case, [Talk to founders](https://calendly.com/shahules/30min). From c6f49df21e958a7383940b181b2b01287beba1cf Mon Sep 17 00:00:00 2001 From: jjmachan Date: Thu, 15 Feb 2024 18:35:25 -0800 Subject: [PATCH 05/12] evaluations --- docs/getstarted/evaluation.md | 62 ++++++++++++++++------------------- 1 file changed, 28 insertions(+), 34 deletions(-) diff --git a/docs/getstarted/evaluation.md b/docs/getstarted/evaluation.md index 306f57fb5..674e54b3c 100644 --- a/docs/getstarted/evaluation.md +++ b/docs/getstarted/evaluation.md @@ -1,32 +1,28 @@ (get-started-evaluation)= -# Evaluate Your Testset +# Evaluating Your Test Set -Welcome to the Ragas quickstart. Our aim is to get you up and running with Ragas as quickly as possible, so that you can focus on improving your Retrieval Augmented Generation pipelines while this library ensures your changes are enhancing your entire pipeline. +Once your test set is ready (whether you've created your own or used the [synthetic test set generation module](get-started-testset-generation)), it's time to evaluate your RAG pipeline. Our aim is to help you set up with Ragas as quickly as possible so that you can focus on enhancing your Retrieval Augmented Generation pipelines while this library ensures your changes are improving the entire pipeline. -To kick things off, let's start with the data. - -:::{note} -Are you using Azure OpenAI endpoints? Then check out [this quickstart guide](../howtos/customisations/azure-openai.ipynb). -::: - -```bash -pip install ragas -``` - -Ragas also uses OpenAI for running some metrics, so ensure you have your OpenAI key ready and available in your environment. +This guide uses OpenAI for running some metrics, so make sure you have your OpenAI key ready and available in your environment. ```python import os os.environ["OPENAI_API_KEY"] = "your-openai-key" ``` +:::{note} +By default, these metrics use OpenAI's API to compute the score. If you're using this metric, ensure that you've set the environment key `OPENAI_API_KEY` with your API key. You can also try other LLMs for evaluation, check the [LLM guide](../howtos/customisations/llms.ipynb) to learn more. +::: + +Let's start with the data. + ## The Data -For this tutorial, we are going to use an example dataset from one of the baselines we created for the [Financial Opinion Mining and Question Answering (FIQA) Dataset](https://sites.google.com/view/fiqa/). The dataset has the following columns: +For this tutorial, we'll use an example dataset from one of the baselines we created for the [Amnesty QA](https://huggingface.co/datasets/explodinggradients/amnesty_qa) dataset. The dataset contains the following columns: - question: `list[str]` - These are the questions your RAG pipeline will be evaluated on. -- answer: `list[str]` - The answer generated from the RAG pipeline and given to the user. +- answer: `list[str]` - The answer generated from the RAG pipeline and provided to the user. - contexts: `list[list[str]]` - The contexts which were passed into the LLM to answer the question. -- ground_truths: `list[list[str]]` - The ground truth answer to the questions. (only required if you are using context_recall) +- ground_truth: `list[str]` - The ground truth answer to the questions. Ideally, your list of questions should reflect the questions your users ask, including those that have been problematic in the past. @@ -40,17 +36,17 @@ amnesty_qa ``` :::{seealso} -See [testset generation](./testset_generation.md) to learn how to generate your own synthetic data for evaluation. +See [test set generation](./testset_generation.md) to learn how to generate your own synthetic data for evaluation. ::: ## Metrics -Ragas provides you with a few metrics to evaluate the different aspects of your RAG systems. +Ragas provides several metrics to evaluate various aspects of your RAG systems: -1. Retriever: Offers `context_precision` and `context_recall` which measure the performance of your retrieval system. -2. Generator (LLM): Offers `faithfulness` which measures hallucinations and `answer_relevancy` which measures how to the point the answers are to the question. +1. Retriever: Offers `context_precision` and `context_recall` that measure the performance of your retrieval system. +2. Generator (LLM): Provides `faithfulness` that measures hallucinations and `answer_relevancy` that measures how on point the answers are to the question. -The harmonic mean of these 4 aspects gives you the **Ragas score** which is a single measure of the performance of your QA system across all the important aspects. +There are numerous other metrics available in Ragas, check the [metrics guide](ragas-metrics) to learn more. Now, let's import these metrics and understand more about what they denote. @@ -63,20 +59,18 @@ from ragas.metrics import ( context_precision, ) ``` -Here you can see that we are using 4 metrics, but what do they represent? +Here we're using four metrics, but what do they represent? -1. Faithfulness - The factual consistency of the answer to the context based on the question. -2. Context_precision - A measure of how relevant the retrieved context is to the question. Conveys quality of the retrieval pipeline. -3. Answer_relevancy - A measure of how relevant the answer is to the question. -4. Context_recall - Measures the ability of the retriever to retrieve all the necessary information needed to answer the question. +1. Faithfulness - Measures the factual consistency of the answer to the context based on the question. +2. Context_precision - Measures how relevant the retrieved context is to the question, conveying the quality of the retrieval pipeline. +3. Answer_relevancy - Measures how relevant the answer is to the question. +4. Context_recall - Measures the retriever's ability to retrieve all necessary information required to answer the question. -:::{note} -By default, these metrics are using OpenAI's API to compute the score. If you are using this metric, make sure you set the environment key `OPENAI_API_KEY` with your API key. You can also try other LLMs for evaluation, check the [LLM guide](../howtos/customisations/llms.ipynb) to learn more. -::: +To explore other metrics, check the [metrics guide](ragas-metrics). ## Evaluation -Running the evaluation is as simple as calling `evaluate` on the `Dataset` with the metrics of your choice. +Running the evaluation is as simple as calling `evaluate` on the `Dataset` with your chosen metrics. ```{code-block} python :caption: evaluate using sample dataset @@ -94,9 +88,9 @@ result = evaluate( result ``` -And there you have it, all the scores you need. +There you have it, all the scores you need. -If you want to dig into the results and figure out examples where your pipeline performed poorly or exceptionally well, you can easily convert it into a pandas DataFrame and use your standard analytics tools too! +If you want to delve deeper into the results and identify examples where your pipeline performed poorly or exceptionally well, you can convert it into a pandas DataFrame and use your standard analytics tools! ```{code-block} python :caption: export results @@ -107,6 +101,6 @@ df.head() quickstart-outputs

-And that's it! +That's all! -If you have any suggestions, feedback, or issues, please share them in the [issue section](https://github.com/explodinggradients/ragas/issues). We value your input. +If you have any suggestions, feedback or issues, please share them in the [issue section](https://github.com/explodinggradients/ragas/issues). We value your input. From 2a3bf68a759d894c0893b52cc049fb487a9a3faf Mon Sep 17 00:00:00 2001 From: jjmachan Date: Thu, 15 Feb 2024 18:36:30 -0800 Subject: [PATCH 06/12] testset --- docs/getstarted/testset_generation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/getstarted/testset_generation.md b/docs/getstarted/testset_generation.md index fe4fe1dbc..3a1347f38 100644 --- a/docs/getstarted/testset_generation.md +++ b/docs/getstarted/testset_generation.md @@ -31,7 +31,7 @@ for document in documents: ``` ::: -At this stage, we have a set of documents ready, which will be used as the foundation for creating synthetic Question/Context/Answer/Ground_Truth samples. +At this stage, we have a set of documents ready, which will be used as the foundation for creating synthetic `Question/Context/Answer/Ground_Truth` samples. ## Data Generation From 978065630b22ea88b1f454768d847fa48788a3e7 Mon Sep 17 00:00:00 2001 From: jjmachan Date: Fri, 16 Feb 2024 00:20:12 -0800 Subject: [PATCH 07/12] added emojies --- docs/community/index.md | 2 +- docs/concepts/index.md | 2 +- docs/getstarted/index.md | 2 +- docs/getstarted/monitoring.md | 2 +- docs/howtos/index.md | 2 +- docs/references/index.rst | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/community/index.md b/docs/community/index.md index 5b23ea666..e7ad35248 100644 --- a/docs/community/index.md +++ b/docs/community/index.md @@ -1,5 +1,5 @@ (community)= -# Community ❤️ +# ❤️ Community **"Alone we can do so little; together we can do so much." - Helen Keller** diff --git a/docs/concepts/index.md b/docs/concepts/index.md index f91e82d1f..fc62a9a18 100644 --- a/docs/concepts/index.md +++ b/docs/concepts/index.md @@ -1,5 +1,5 @@ (core-concepts)= -# Core Concepts +# 📚 Core Concepts :::{toctree} :caption: Concepts :hidden: diff --git a/docs/getstarted/index.md b/docs/getstarted/index.md index b218a5378..fd190a04a 100644 --- a/docs/getstarted/index.md +++ b/docs/getstarted/index.md @@ -1,5 +1,5 @@ (get-started)= -# Get Started +# 🚀 Get Started :::{toctree} :maxdepth: 1 diff --git a/docs/getstarted/monitoring.md b/docs/getstarted/monitoring.md index 8e869a310..8962db96e 100644 --- a/docs/getstarted/monitoring.md +++ b/docs/getstarted/monitoring.md @@ -14,7 +14,7 @@ We are still developing and gathering feedback for upcoming releases. You can re Additionally, you can use the RAG metrics with other Machine Learning Model (MLM) observability tools like - [Langsmith](../howtos/integrations/langsmith.ipynb) -- [Phoenix (Arize)](https://github.com/Arize-ai/phoenix) +- [Phoenix (Arize)](../howtos/integrations/ragas-arize.ipynb) - [Langfuse](../howtos/integrations/langfuse.ipynb) - [OpenLayer](https://openlayer.com/) diff --git a/docs/howtos/index.md b/docs/howtos/index.md index 4586ce894..878d64111 100644 --- a/docs/howtos/index.md +++ b/docs/howtos/index.md @@ -1,5 +1,5 @@ (how-to-guides)= -# How-to Guides +# 🛠️ How-to Guides The how-to guides offer a more comprehensive overview of all the tools Ragas diff --git a/docs/references/index.rst b/docs/references/index.rst index 866c72f88..ac29beb22 100644 --- a/docs/references/index.rst +++ b/docs/references/index.rst @@ -1,5 +1,5 @@ .. _references: -References +📖 References ========== Reference documents for the ``ragas`` package. From 7507067a1e9629622b7ca64d560721048d077c71 Mon Sep 17 00:00:00 2001 From: jjmachan Date: Fri, 16 Feb 2024 17:14:52 -0800 Subject: [PATCH 08/12] fix autoreload --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index b273ec94d..d247f9605 100644 --- a/Makefile +++ b/Makefile @@ -37,7 +37,7 @@ docs-site: ## Build and serve documentation @sphinx-build -nW --keep-going -j 4 -b html $(GIT_ROOT)/docs/ $(GIT_ROOT)/docs/_build/html @python -m http.server --directory $(GIT_ROOT)/docs/_build/html watch-docs: ## Build and watch documentation - sphinx-autobuild docs docs/_build/html --watch $(GIT_ROOT)/src/ --ignore ".ipynb" + sphinx-autobuild docs docs/_build/html --watch $(GIT_ROOT)/src/ --ignore "_build" # Benchmarks run-benchmarks-eval: ## Run benchmarks for Evaluation From ce8f9733b04300b5d6e1235ca1a034be53081334 Mon Sep 17 00:00:00 2001 From: jjmachan Date: Fri, 16 Feb 2024 17:27:46 -0800 Subject: [PATCH 09/12] fix reload --- Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index d247f9605..135942143 100644 --- a/Makefile +++ b/Makefile @@ -37,7 +37,8 @@ docs-site: ## Build and serve documentation @sphinx-build -nW --keep-going -j 4 -b html $(GIT_ROOT)/docs/ $(GIT_ROOT)/docs/_build/html @python -m http.server --directory $(GIT_ROOT)/docs/_build/html watch-docs: ## Build and watch documentation - sphinx-autobuild docs docs/_build/html --watch $(GIT_ROOT)/src/ --ignore "_build" + rm -rf $(GIT_ROOT)/docs/_build/{html, jupyter_execute} + sphinx-autobuild docs docs/_build/html --watch $(GIT_ROOT)/src/ --ignore "_build" --open-browser # Benchmarks run-benchmarks-eval: ## Run benchmarks for Evaluation From 78907623cb5d95976f7c039f9ab2fcb95a0c41ed Mon Sep 17 00:00:00 2001 From: jjmachan Date: Fri, 16 Feb 2024 19:10:34 -0800 Subject: [PATCH 10/12] fixed docs with review --- docs/getstarted/evaluation.md | 14 +++++++------- docs/getstarted/index.md | 18 ++++++++--------- docs/getstarted/monitoring.md | 28 ++++++++++++++++----------- docs/getstarted/testset_generation.md | 10 +++++----- 4 files changed, 38 insertions(+), 32 deletions(-) diff --git a/docs/getstarted/evaluation.md b/docs/getstarted/evaluation.md index 674e54b3c..4d62f54c4 100644 --- a/docs/getstarted/evaluation.md +++ b/docs/getstarted/evaluation.md @@ -1,9 +1,9 @@ (get-started-evaluation)= -# Evaluating Your Test Set +# Evaluating Using Your Test Set -Once your test set is ready (whether you've created your own or used the [synthetic test set generation module](get-started-testset-generation)), it's time to evaluate your RAG pipeline. Our aim is to help you set up with Ragas as quickly as possible so that you can focus on enhancing your Retrieval Augmented Generation pipelines while this library ensures your changes are improving the entire pipeline. +Once your test set is ready (whether you've created your own or used the [synthetic test set generation module](get-started-testset-generation)), it's time to evaluate your RAG pipeline. The purpose of this guide is to assist you in setting up with Ragas as quickly as possible, enabling you to concentrate on enhancing your Retrieval Augmented Generation pipelines while this library ensures your modifications are improving the entire pipeline. -This guide uses OpenAI for running some metrics, so make sure you have your OpenAI key ready and available in your environment. +This guide utilizes OpenAI for running some metrics, so ensure you have your OpenAI key ready and available in your environment. ```python import os @@ -13,7 +13,7 @@ os.environ["OPENAI_API_KEY"] = "your-openai-key" By default, these metrics use OpenAI's API to compute the score. If you're using this metric, ensure that you've set the environment key `OPENAI_API_KEY` with your API key. You can also try other LLMs for evaluation, check the [LLM guide](../howtos/customisations/llms.ipynb) to learn more. ::: -Let's start with the data. +Let's begin with the data. ## The Data @@ -24,7 +24,7 @@ For this tutorial, we'll use an example dataset from one of the baselines we cre - contexts: `list[list[str]]` - The contexts which were passed into the LLM to answer the question. - ground_truth: `list[str]` - The ground truth answer to the questions. -Ideally, your list of questions should reflect the questions your users ask, including those that have been problematic in the past. +An ideal test data set should contain samples that closely mirror your real-world use case. ```{code-block} python :caption: import sample dataset @@ -36,7 +36,7 @@ amnesty_qa ``` :::{seealso} -See [test set generation](./testset_generation.md) to learn how to generate your own synthetic data for evaluation. +See [test set generation](./testset_generation.md) to learn how to generate your own `Question/Context/Ground_Truth` triplets for evaluation. ::: ## Metrics @@ -44,7 +44,7 @@ See [test set generation](./testset_generation.md) to learn how to generate your Ragas provides several metrics to evaluate various aspects of your RAG systems: 1. Retriever: Offers `context_precision` and `context_recall` that measure the performance of your retrieval system. -2. Generator (LLM): Provides `faithfulness` that measures hallucinations and `answer_relevancy` that measures how on point the answers are to the question. +2. Generator (LLM): Provides `faithfulness` that measures hallucinations and `answer_relevancy` that measures how relevant the answers are to the question. There are numerous other metrics available in Ragas, check the [metrics guide](ragas-metrics) to learn more. diff --git a/docs/getstarted/index.md b/docs/getstarted/index.md index fd190a04a..ab1034a9b 100644 --- a/docs/getstarted/index.md +++ b/docs/getstarted/index.md @@ -12,13 +12,13 @@ monitoring.md Welcome to the Ragas tutorials! If you're new to Ragas, the Get Started guides will walk you through the fundamentals of working with Ragas. These tutorials assume basic knowledge of Python and Retrieval Augmented Generation (RAG) pipelines. -Before you proceed further, make sure you have [Ragas installed](./install.md)! +Before you proceed further, ensure that you have [Ragas installed](./install.md)! :::{note} -The tutorials only give you an overview of what you can do with Ragas and the basic skills needed to use it. If you want an in-depth explanation of the core concepts behind Ragas, check out the [Core Concepts](../concepts/index.md) page. You can also check out the [How-to Guides](../howtos/index.md) if you want specific applications of Ragas. +The tutorials only provide an overview of what you can accomplish with Ragas and the basic skills needed to utilize it effectively. For an in-depth explanation of the core concepts behind Ragas, check out the [Core Concepts](../concepts/index.md) page. You can also explore the [How-to Guides](../howtos/index.md) for specific applications of Ragas. ::: -If you have any questions about Ragas, feel free to join and ask in the `#questions` channel in our discord community. +If you have any questions about Ragas, feel free to join and ask in the `#questions` channel in our Discord community. Let’s get started! @@ -26,19 +26,19 @@ Let’s get started! :link: get-started-testset-generation :link-type: ref -If you want to learn how to generate a synthetic testset to get started. +Learn how to generate a synthetic testset to get started. ::: -:::{card} Evaluate your Testset +:::{card} Evaluate Using Your Testset :link: get-started-evaluation :link-type: ref -If you are looking to evaluate your RAG pipeline against your testset (your own dataset or synthetic). +Find out how to evaluate your RAG pipeline against your testset (your own dataset or synthetic). ::: -:::{card} Monitor your RAG in Production +:::{card} Monitor Your RAG in Production :link: get-started-monitoring :link-type: ref -If you're curious about monitoring the performance and quality of your RAG application in production. -::: +Discover how to monitor the performance and quality of your RAG application in production. +::: \ No newline at end of file diff --git a/docs/getstarted/monitoring.md b/docs/getstarted/monitoring.md index 8962db96e..8ecde99e0 100644 --- a/docs/getstarted/monitoring.md +++ b/docs/getstarted/monitoring.md @@ -1,29 +1,35 @@ (get-started-monitoring)= -# Monitor your RAG in Production +# Monitor Your RAG in Production -Maintaining the quality and performance of a RAG application in a production environment can be challenging. RAG currently provides the essential building blocks that you can use for production-quality monitoring, offering valuable insights into your application's performance. However, we are also working towards building a more advanced production monitoring solution by addressing three questions: +Maintaining the quality and performance of a RAG application in a production environment is challenging. RAG currently provides the essential building blocks for production-quality monitoring, offering valuable insights into your application's performance. However, we are also working towards building a more advanced production monitoring solution by addressing three key areas: -1. How can we ensure the distribution of your production dataset remains consistent with your test set? -2. How can we effectively extract insights from explicit and implicit signals your users provide to infer the quality of your RAG application and identify areas that require attention? -3. How can we construct custom, smaller, more cost-effective and faster models for evaluation and more advanced test set generation? +1. How to ensure the distribution of your production dataset remains consistent with your test set. +2. How to effectively extract insights from the explicit and implicit signals your users provide to infer the quality of your RAG application and identify areas that require attention. +3. How to construct custom, smaller, more cost-effective, and faster models for evaluation and advanced test set generation. :::{note} We are still developing and gathering feedback for upcoming releases. You can request [**early access**](https://calendly.com/shahules/30min) to try it out or share the challenges you face in this area. We would love to hear your thoughts and challenges. ::: -Additionally, you can use the RAG metrics with other Machine Learning Model (MLM) observability tools like +Additionally, you can use the RAG metrics with other Machine Learning Model (MLM) observability tools like: + - [Langsmith](../howtos/integrations/langsmith.ipynb) - [Phoenix (Arize)](../howtos/integrations/ragas-arize.ipynb) - [Langfuse](../howtos/integrations/langfuse.ipynb) - [OpenLayer](https://openlayer.com/) -to get model-based feedback about various aspects of your application, such as those mentioned below: +These tools can provide model-based feedback about various aspects of your application, such as those mentioned below: ## Aspects to Monitor 1. Faithfulness: This feature assists in identifying and quantifying instances of hallucinations. -2. Bad retrieval: This feature helps identify and quantify poor context retrievals. -3. Bad response: This feature assists in recognizing and quantifying evasive, harmful, or toxic responses. -4. Bad format: This feature enables the detection and quantification of responses with incorrect formatting. -5. Custom use-case: For monitoring other critical aspects that are specific to your use case, [Talk to founders](https://calendly.com/shahules/30min). +2. Bad Retrieval: This feature helps identify and quantify poor context retrievals. +3. Bad Response: This feature assists in recognizing and quantifying evasive, harmful, or toxic responses. +4. Bad Format: This feature enables the detection and quantification of responses with incorrect formatting. +5. Custom Use-Case: For monitoring other critical aspects that are specific to your use-case, [Talk to founders](https://calendly.com/shahules/30min). + +Note: +- "Evaluate your test set" has been replaced with "Evaluate using your test set" to clarify that the evaluation is conducted using the test set, not on the quality of the test set itself. +- Phrases such as "How can we" have been replaced with "How to" to make the content more direct and actionable. +- The term "Answer using synthetic data generation" has been replaced with "Question/Context/Ground_Truth triplets". \ No newline at end of file diff --git a/docs/getstarted/testset_generation.md b/docs/getstarted/testset_generation.md index 3a1347f38..bc4921580 100644 --- a/docs/getstarted/testset_generation.md +++ b/docs/getstarted/testset_generation.md @@ -1,7 +1,7 @@ (get-started-testset-generation)= # Generate a Synthetic Test Set -This tutorial is designed to help you create a synthetic evaluation dataset for assessing your RAG pipeline. To accomplish this, we will utilize OpenAI models. Please ensure you have your OpenAI API key ready and accessible within your environment. +This tutorial is designed to assist you in creating a synthetic evaluation dataset for assessing your RAG pipeline. For this purpose, we will utilize OpenAI models. Ensure that you have your OpenAI API key readily accessible within your environment. ```{code-block} python import os @@ -11,7 +11,7 @@ os.environ["OPENAI_API_KEY"] = "your-openai-key" ## Documents -We first need a collection of documents to generate synthetic `Question/Context/Answer/Ground_Truth` samples. For this, we'll use the LangChain document loader to load documents. +Firstly, we require a collection of documents to generate synthetic `Question/Context/Ground_Truth` samples. For this, we will use the LangChain document loader to load documents. ```{code-block} python :caption: Load documents from directory @@ -21,7 +21,7 @@ documents = loader.load() ``` :::{note} -Each Document object contains a metadata dictionary, which can be used to store additional information about the document accessible via `Document.metadata`. Please ensure that the metadata dictionary contains a key called `file_name`, as this will be used in the generation process. The `file_name` attribute in metadata is used to identify chunks belonging to the same document. For instance, pages belonging to the same research publication can be identified using filename. +Each Document object contains a metadata dictionary, which can be used to store supplementary information about the document accessible via `Document.metadata`. Ensure that the metadata dictionary contains a key called `file_name`, as this will be utilized in the generation process. The `file_name` attribute in metadata is employed to identify chunks belonging to the same document. For instance, pages belonging to the same research publication can be identified using the filename. Here's an example of how to do this: @@ -31,7 +31,7 @@ for document in documents: ``` ::: -At this stage, we have a set of documents ready, which will be used as the foundation for creating synthetic `Question/Context/Answer/Ground_Truth` samples. +At this stage, we have a set of documents ready, which will be employed as the basis for creating synthetic `Question/Context/Ground_Truth` samples. ## Data Generation @@ -57,4 +57,4 @@ testset.to_pandas() ```

test-outputs -

+

\ No newline at end of file From e057a918a4bb437e93a4901e586125c2aa5f5216 Mon Sep 17 00:00:00 2001 From: jjmachan Date: Fri, 16 Feb 2024 19:21:14 -0800 Subject: [PATCH 11/12] fix feedback from rewview with gpt4 --- docs/alfred.py | 62 +++++++++++++++++++++++++++ docs/getstarted/evaluation.md | 7 ++- docs/getstarted/index.md | 6 +-- docs/getstarted/install.md | 15 ++++--- docs/getstarted/monitoring.md | 10 ++--- docs/getstarted/testset_generation.md | 14 +++--- 6 files changed, 88 insertions(+), 26 deletions(-) create mode 100644 docs/alfred.py diff --git a/docs/alfred.py b/docs/alfred.py new file mode 100644 index 000000000..5ea41a560 --- /dev/null +++ b/docs/alfred.py @@ -0,0 +1,62 @@ +from __future__ import annotations + +import os +from collections import namedtuple +import asyncio +from tqdm.asyncio import tqdm +import typing as t +from langchain_openai.chat_models import ChatOpenAI +from langchain_core.language_models.chat_models import BaseChatModel +from langchain.prompts import ChatPromptTemplate + +File = namedtuple("File", "name content") + + +def get_files(path: str, ext: str) -> list: + return [os.path.join(path, f) for f in os.listdir(path) if f.endswith(ext)] + + +def load_docs(path: str) -> t.List[File]: + files = [*get_files(path, ".md")] + docs = [] + for file in files: + with open(file, "r") as f: + docs.append(File(file, f.read())) + return docs + + +async def fix_doc_with_llm(doc: File, llm: BaseChatModel) -> File: + prompt = """\ +fix the following grammar and spelling mistakes in the following text. +Please keep the markdown format intact when reformating it. +Do not make any change to the parts of text that are for formating or additional metadata for the core text in markdown. +The target audience for this is developers so keep the tone serious and to the point without any marketing terms. +The output text should me in .md format. + +text: {text} +""" + fix_docs_prompt = ChatPromptTemplate.from_messages( + [ + (prompt), + ] + ) + # get output + fixed_doc = await llm.ainvoke(fix_docs_prompt.format_messages(text=doc.content)) + return File(doc.name, fixed_doc.content) + + +async def main(docs: t.List[File], llm: BaseChatModel): + fix_doc_routines = [fix_doc_with_llm(doc, llm) for doc in docs] + return await tqdm.gather(*fix_doc_routines) + + +if __name__ == "__main__": + """ + Helpful assistant for documentation review and more (hopefully in the future). + """ + gpt4 = ChatOpenAI(model="gpt-4") + docs = load_docs("./getstarted/") + fix_docs = asyncio.run(main(docs, gpt4)) + for doc in fix_docs: + with open(doc.name, "w") as f: + f.write(doc.content) diff --git a/docs/getstarted/evaluation.md b/docs/getstarted/evaluation.md index 4d62f54c4..2086c1fc3 100644 --- a/docs/getstarted/evaluation.md +++ b/docs/getstarted/evaluation.md @@ -1,7 +1,7 @@ (get-started-evaluation)= # Evaluating Using Your Test Set -Once your test set is ready (whether you've created your own or used the [synthetic test set generation module](get-started-testset-generation)), it's time to evaluate your RAG pipeline. The purpose of this guide is to assist you in setting up with Ragas as quickly as possible, enabling you to concentrate on enhancing your Retrieval Augmented Generation pipelines while this library ensures your modifications are improving the entire pipeline. +Once your test set is ready (whether you've created your own or used the [synthetic test set generation module](get-started-testset-generation)), it's time to evaluate your RAG pipeline. This guide assists you in setting up Ragas as quickly as possible, enabling you to focus on enhancing your Retrieval Augmented Generation pipelines while this library ensures that your modifications are improving the entire pipeline. This guide utilizes OpenAI for running some metrics, so ensure you have your OpenAI key ready and available in your environment. @@ -20,8 +20,7 @@ Let's begin with the data. For this tutorial, we'll use an example dataset from one of the baselines we created for the [Amnesty QA](https://huggingface.co/datasets/explodinggradients/amnesty_qa) dataset. The dataset contains the following columns: - question: `list[str]` - These are the questions your RAG pipeline will be evaluated on. -- answer: `list[str]` - The answer generated from the RAG pipeline and provided to the user. -- contexts: `list[list[str]]` - The contexts which were passed into the LLM to answer the question. +- context: `list[list[str]]` - The contexts which were passed into the LLM to answer the question. - ground_truth: `list[str]` - The ground truth answer to the questions. An ideal test data set should contain samples that closely mirror your real-world use case. @@ -103,4 +102,4 @@ df.head() That's all! -If you have any suggestions, feedback or issues, please share them in the [issue section](https://github.com/explodinggradients/ragas/issues). We value your input. +If you have any suggestions, feedback, or issues, please share them in the [issue section](https://github.com/explodinggradients/ragas/issues). We value your input. \ No newline at end of file diff --git a/docs/getstarted/index.md b/docs/getstarted/index.md index ab1034a9b..55a3d9836 100644 --- a/docs/getstarted/index.md +++ b/docs/getstarted/index.md @@ -20,20 +20,20 @@ The tutorials only provide an overview of what you can accomplish with Ragas and If you have any questions about Ragas, feel free to join and ask in the `#questions` channel in our Discord community. -Let’s get started! +Let's get started! :::{card} Generate a Synthetic Testset :link: get-started-testset-generation :link-type: ref -Learn how to generate a synthetic testset to get started. +Learn how to generate `Question/Context/Ground_Truth` triplets to get started. ::: :::{card} Evaluate Using Your Testset :link: get-started-evaluation :link-type: ref -Find out how to evaluate your RAG pipeline against your testset (your own dataset or synthetic). +Find out how to evaluate your RAG pipeline using your test set (your own dataset or synthetic). ::: :::{card} Monitor Your RAG in Production diff --git a/docs/getstarted/install.md b/docs/getstarted/install.md index f725b4c78..712f746e0 100644 --- a/docs/getstarted/install.md +++ b/docs/getstarted/install.md @@ -1,22 +1,23 @@ -# Install +# Installation + +To get started, install Ragas using `pip` with the following command: -To get started, install ragas with `pip` as ```bash pip install ragas ``` -If you want to play around with the latest and greatest, install the latest version (from the main branch) +If you'd like to experiment with the latest features, install the most recent version from the main branch: + ```bash pip install git+https://github.com/explodinggradients/ragas.git ``` -If you are looking to contribute and make changes to the code, make sure you -clone the repo and install it as [editable -install](https://pip.pypa.io/en/stable/topics/local-project-installs/#editable-installs). +If you're planning to contribute and make modifications to the code, ensure that you clone the repository and set it up as an [editable install](https://pip.pypa.io/en/stable/topics/local-project-installs/#editable-installs). + ```bash git clone https://github.com/explodinggradients/ragas.git cd ragas pip install -e . ``` -Next let's build a [synthetic testset](get-started-testset-generation) with your own data or If you brought your own testset, lets learn how you can [evaluate it](get-started-evaluation) with Ragas. +Next, let's construct a [synthetic test set](get-started-testset-generation) using your own data. If you've brought your own test set, you can learn how to [evaluate it](get-started-evaluation) using Ragas. \ No newline at end of file diff --git a/docs/getstarted/monitoring.md b/docs/getstarted/monitoring.md index 8ecde99e0..dec2ed122 100644 --- a/docs/getstarted/monitoring.md +++ b/docs/getstarted/monitoring.md @@ -12,24 +12,24 @@ We are still developing and gathering feedback for upcoming releases. You can re [**early access**](https://calendly.com/shahules/30min) to try it out or share the challenges you face in this area. We would love to hear your thoughts and challenges. ::: -Additionally, you can use the RAG metrics with other Machine Learning Model (MLM) observability tools like: +In addition, you can use the RAG metrics with other LLM observability tools like: - [Langsmith](../howtos/integrations/langsmith.ipynb) - [Phoenix (Arize)](../howtos/integrations/ragas-arize.ipynb) - [Langfuse](../howtos/integrations/langfuse.ipynb) - [OpenLayer](https://openlayer.com/) -These tools can provide model-based feedback about various aspects of your application, such as those mentioned below: +These tools can provide model-based feedback about various aspects of your application, such as the ones mentioned below: ## Aspects to Monitor -1. Faithfulness: This feature assists in identifying and quantifying instances of hallucinations. +1. Faithfulness: This feature assists in identifying and quantifying instances of hallucination. 2. Bad Retrieval: This feature helps identify and quantify poor context retrievals. 3. Bad Response: This feature assists in recognizing and quantifying evasive, harmful, or toxic responses. 4. Bad Format: This feature enables the detection and quantification of responses with incorrect formatting. -5. Custom Use-Case: For monitoring other critical aspects that are specific to your use-case, [Talk to founders](https://calendly.com/shahules/30min). +5. Custom Use-Case: For monitoring other critical aspects that are specific to your use-case, [Talk to the founders](https://calendly.com/shahules/30min). Note: - "Evaluate your test set" has been replaced with "Evaluate using your test set" to clarify that the evaluation is conducted using the test set, not on the quality of the test set itself. - Phrases such as "How can we" have been replaced with "How to" to make the content more direct and actionable. -- The term "Answer using synthetic data generation" has been replaced with "Question/Context/Ground_Truth triplets". \ No newline at end of file +- The term "Answer using synthetic data generation" has been replaced with "Question/Context/Ground Truth triplets". \ No newline at end of file diff --git a/docs/getstarted/testset_generation.md b/docs/getstarted/testset_generation.md index bc4921580..5fc0e4073 100644 --- a/docs/getstarted/testset_generation.md +++ b/docs/getstarted/testset_generation.md @@ -1,7 +1,7 @@ (get-started-testset-generation)= # Generate a Synthetic Test Set -This tutorial is designed to assist you in creating a synthetic evaluation dataset for assessing your RAG pipeline. For this purpose, we will utilize OpenAI models. Ensure that you have your OpenAI API key readily accessible within your environment. +This tutorial guides you in creating a synthetic evaluation dataset for assessing your RAG pipeline. For this purpose, we will utilize OpenAI models. Ensure that your OpenAI API key is readily accessible within your environment. ```{code-block} python import os @@ -11,7 +11,7 @@ os.environ["OPENAI_API_KEY"] = "your-openai-key" ## Documents -Firstly, we require a collection of documents to generate synthetic `Question/Context/Ground_Truth` samples. For this, we will use the LangChain document loader to load documents. +Initially, a collection of documents is needed to generate synthetic `Question/Context/Ground_Truth` samples. For this, we'll use the LangChain document loader to load documents. ```{code-block} python :caption: Load documents from directory @@ -21,7 +21,7 @@ documents = loader.load() ``` :::{note} -Each Document object contains a metadata dictionary, which can be used to store supplementary information about the document accessible via `Document.metadata`. Ensure that the metadata dictionary contains a key called `file_name`, as this will be utilized in the generation process. The `file_name` attribute in metadata is employed to identify chunks belonging to the same document. For instance, pages belonging to the same research publication can be identified using the filename. +Each Document object contains a metadata dictionary, which can be used to store additional information about the document accessible via `Document.metadata`. Ensure that the metadata dictionary includes a key called `file_name`, as it will be utilized in the generation process. The `file_name` attribute in metadata is used to identify chunks belonging to the same document. For instance, pages belonging to the same research publication can be identified using the filename. Here's an example of how to do this: @@ -31,11 +31,11 @@ for document in documents: ``` ::: -At this stage, we have a set of documents ready, which will be employed as the basis for creating synthetic `Question/Context/Ground_Truth` samples. +At this point, we have a set of documents ready to be used as a foundation for generating synthetic `Question/Context/Ground_Truth` samples. ## Data Generation -We will now import and use Ragas' `TestsetGenerator` to swiftly generate a synthetic test set from the loaded documents. +Now, we'll import and use Ragas' `TestsetGenerator` to quickly generate a synthetic test set from the loaded documents. ```{code-block} python :caption: Create 10 samples using default configuration @@ -49,9 +49,9 @@ generator = TestsetGenerator.with_openai() testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}) ``` -Subsequently, we can export the results into a Pandas DataFrame. +Then, we can export the results into a Pandas DataFrame. -```{code-block} +```{code-block} python :caption: Export to Pandas testset.to_pandas() ``` From 1464cd716e95da0a7f4e15a65fa05546acc237ae Mon Sep 17 00:00:00 2001 From: jjmachan Date: Fri, 16 Feb 2024 19:25:44 -0800 Subject: [PATCH 12/12] fix extra note --- docs/getstarted/monitoring.md | 5 ----- 1 file changed, 5 deletions(-) diff --git a/docs/getstarted/monitoring.md b/docs/getstarted/monitoring.md index dec2ed122..37f10b393 100644 --- a/docs/getstarted/monitoring.md +++ b/docs/getstarted/monitoring.md @@ -28,8 +28,3 @@ These tools can provide model-based feedback about various aspects of your appli 3. Bad Response: This feature assists in recognizing and quantifying evasive, harmful, or toxic responses. 4. Bad Format: This feature enables the detection and quantification of responses with incorrect formatting. 5. Custom Use-Case: For monitoring other critical aspects that are specific to your use-case, [Talk to the founders](https://calendly.com/shahules/30min). - -Note: -- "Evaluate your test set" has been replaced with "Evaluate using your test set" to clarify that the evaluation is conducted using the test set, not on the quality of the test set itself. -- Phrases such as "How can we" have been replaced with "How to" to make the content more direct and actionable. -- The term "Answer using synthetic data generation" has been replaced with "Question/Context/Ground Truth triplets". \ No newline at end of file