# Introduction

<a target="_blank" href="https://colab.research.google.com/github/shahules786/openai-cookbook/blob/ragas/examples/evaluation/ragas/openai-ragas-synthetic-test.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

Ragas is the de-facto opensource standard for RAG evaluations. Ragas provides features and methods to help evaluate RAG applications. In this notebook we will build a synthetic test dataset using Ragas to evaluate your RAG. 

### Contents
- [Prerequisites]()
- [Dataset preparation]()
- [Evaluation]()

### Prerequisites
- Ragas is a python package and we can install it using pip
- For creating QA pairs, you will need some documents from which you intend to create it. For the sake of this notebook, I am using few papers regarding prompt engineering
- Ragas uses model guided techniques underneath to produce scores for each metric. In this tutorial, we will use OpenAI `gpt-3.5-turbo` and `text-embedding-ada-002`. These are the default models used in ragas but you can use any LLM or Embedding of your choice by referring to this [guide](https://docs.ragas.io/en/stable/howtos/customisations/bring-your-own-llm-or-embs.html). I highly recommend that you try this notebook with open-ai so that you get a feel of it with ease.


In [1]:
! pip install -q ragas

In [1]:
!git clone https://huggingface.co/datasets/explodinggradients/prompt-engineering-guide-papers

Cloning into 'moe-papers-collection'...
remote: Enumerating objects: 15, done.[K
remote: Counting objects: 100% (12/12), done.[K
remote: Compressing objects: 100% (12/12), done.[K
remote: Total 15 (delta 1), reused 0 (delta 0), pack-reused 3[K
Unpacking objects: 100% (15/15), 2.70 MiB | 11.71 MiB/s, done.
Filtering content: 100% (2/2), 8.11 MiB | 5.72 MiB/s, done.


In [1]:
import os
os.environ["OPENAI_API_KEY"] = "<your-open-api-key>"

try:
  import google.colab
  PATH = "/content/prompt-engineering-guide-papers""
except:
  PATH = "./prompt-engineering-guide-papers"

### Data preparation

Here I am loading and parsing each of our documents to a `Document` object using langchain document loaders. You can also use llama-index so that same. 

In [1]:
from langchain.document_loaders import DirectoryLoader
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context, conditional

loader = DirectoryLoader(PATH, use_multithreading=True, silent_errors=True,sample_size=5)
documents = loader.load()

for document in documents:
    document.metadata['filename'] = document.metadata['source']

### Test set generation

Ragas aims to create high quality and diverse test dataset containing questions of different difficulty levels and types. For this we use a paradigm inspired from the idea of question evolution. One can create test dataset with different types of questions that can be synthetised by ragas, which is controlled using `distributions` parameter. Here I am creating some sample with uniform distribution of each question type.

**Note:** *To know more about the underlying paradigm refer to our [docs](https://docs.ragas.io/en/stable/concepts/testset_generation.html).*

In [3]:
generator = TestsetGenerator.with_openai()


  generator = TestsetGenerator.with_openai()


In [17]:
distributions = {simple: 0.25, reasoning: 0.25, multi_context: 0.25, conditional:0.25}

In [6]:
testset = generator.generate_with_langchain_docs(documents, test_size=25, 
                                                 raise_exceptions=False, with_debugging_logs=False,
                                                 distributions=distributions)    

embedding nodes:   0%|          | 0/286 [00:00<?, ?it/s]

Generating:   0%|          | 0/25 [00:00<?, ?it/s]

In [8]:
df = testset.to_pandas()

And Wola! That's it. You now have a test dataset. Let's inspect and save it

### Saving results
- filter some samples that have no (nan) answers before saving

In [18]:
df = df[df['ground_truth']!="nan"].reset_index(drop=True)
df.sample(5)

Unnamed: 0,question,contexts,ground_truth,evolution_type,episode_done
0,How does instruction tuning affect the zero-sh...,"[ tasks (see Table 2 in the Appendix), FLAN on...",For larger models on the order of 100B paramet...,simple,True
1,What is the Zero-shot-CoT method and how does ...,[ prompts have also focused on per-task engine...,Zero-shot-CoT is a zero-shot template-based pr...,simple,True
2,How does prompt tuning affect model performanc...,[080.863.867.439.249.4\n\nTask Cluster:# datas...,Prompt tuning improves model performance in im...,simple,True
3,What is the purpose of instruction tuning in l...,"[ via natural language instructions, such as “...",The purpose of instruction tuning in language ...,reasoning,True
4,What distinguishes Zero-shot-CoT from Few-shot...,[ prompts have also focused on per-task engine...,Zero-shot-CoT differs from Few-shot-CoT in tha...,reasoning,True
5,Which language models were used in the experim...,[list\n\n1. For all authors...\n\n(a) Do the m...,The language models used in the experiment 'Ex...,reasoning,True
6,How does Zero-shot-CoT differ from previous fe...,[ prompts have also focused on per-task engine...,Zero-shot-CoT differs from previous few-shot a...,reasoning,True
7,What are the stages in the Zero-shot-CoT metho...,[ it differs from most of the prior template p...,The Zero-shot-CoT method for reasoning and ans...,reasoning,True
8,What are the main approaches for inducing LLMs...,[2 2 0 2\n\nt c O 7\n\n] L C . s c [\n\n1 v 3 ...,The main approaches for inducing LLMs to perfo...,reasoning,True
9,Which sorting method has the most impact on Au...,[ t a R\n\n30\n\n20\n\n%\n\n(\n\ne t a R\n\n40...,The sorting method that has the most impact on...,multi_context,True


In [19]:
df.to_csv("synthetic_test_dataset.csv")

Upnext we are going into dive into how to use this to [evaluate your RAG](https://github.com/openai/openai-cookbook/examples/evaluation/ragas/openai-ragas-eval-cookbook.ipynb).

**If you liked this tutorial, checkout [ragas](https://github.com/explodinggradients/ragas) and consider leaving a star**