## Berlin Buzzwords 2024

This notebook shows how the [RAGAS framework](https://docs.ragas.io/en/stable/index.html) can be used to generate a synthetic dataset for RAG and use it for evaluation.

## 1. Installing packages and preparing environment

In [1]:
! pip install ragas



In [2]:
! pip install langchain openai unstructured unstructured[pdf] faiss-gpu python-dotenv



In [3]:
import os
import dotenv
import getpass
from google.colab import drive

drive.mount('/content/drive')

dotenv.load_dotenv('/content/drive/MyDrive/.env')

openai_api_key = os.environ.get('OPENAI_API_KEY')

if not openai_api_key:
    openai_api_key = getpass.getpass("Enter OpenAI API key:")

os.environ["OPENAI_API_KEY"] = openai_api_key

Mounted at /content/drive


## 2. Preparing dataset for RAG evaluation

As a dataset, we will use SIXT Terms and conditions.

For demo purposes, we do not indend to optimise RAG implementation here.

In [4]:
! mkdir -p rag_data
! wget -O rag_data/sixt_DE_en.pdf https://www.sixt.de/shared/t-c/sixt_DE_en.pdf

--2024-06-10 11:39:20--  https://www.sixt.de/shared/t-c/sixt_DE_en.pdf
Resolving www.sixt.de (www.sixt.de)... 18.239.225.115, 18.239.225.74, 18.239.225.6, ...
Connecting to www.sixt.de (www.sixt.de)|18.239.225.115|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 159162 (155K) [application/pdf]
Saving to: ‘rag_data/sixt_DE_en.pdf’


2024-06-10 11:39:21 (570 KB/s) - ‘rag_data/sixt_DE_en.pdf’ saved [159162/159162]



In [46]:
from langchain_community.document_loaders import DirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter




loader = DirectoryLoader("rag_data")
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=300,
    chunk_overlap=0
)

chunks = text_splitter.transform_documents(documents)

In [47]:
chunks

[Document(page_content='General Terms and Conditions of Rental\n\nGeneral Terms and Conditions of Rental (Terms and Conditions)\n\nof Sixt GmbH & Co. Autovermietung KG Zugspitzstrasse 1 DE 82049 Pullach (hereinafter referred to as “Sixt”)\n\n07.2023\n\n1\n\nGeneral Terms and Conditions of Rental', metadata={'source': 'rag_data/sixt_DE_en.pdf'}),
 Document(page_content='A: Condition of the vehicle, repairs, fuel\n\n1. Any known damage is recorded in the rental contract on handover of the vehicle. The renter shall carefully check the vehicle for further damage before starting their journey and report any further damage to Sixt immediately.\n\n2.', metadata={'source': 'rag_data/sixt_DE_en.pdf'}),
 Document(page_content='The renter undertakes to treat the vehicle with due care and in a professional manner, to observe all regulations and technical rules relevant to its use (for example, not to drive the vehicle if the level of engine oil or cooling water is too low) and to regularly check w

In [48]:
len(chunks)

200

In [50]:
for chunk in chunks:
    chunk.metadata['filename'] = chunk.metadata['source']

![](https://docs.ragas.io/en/stable/_static/imgs/eval-evolve.png)

In [51]:
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

# generate testset
test_set = generator.generate_with_langchain_docs(
    chunks,
    test_size=10,
    distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}
)


embedding nodes:   0%|          | 0/400 [00:00<?, ?it/s]

Generating:   0%|          | 0/10 [00:00<?, ?it/s]

In [52]:
test_set



In [53]:
test_set.to_pandas()

Unnamed: 0,question,contexts,ground_truth,evolution_type,metadata,episode_done
0,What is the process specified by Sixt for rental?,[rental in accordance with the process specifi...,,simple,"[{'source': 'rag_data/sixt_DE_en.pdf', 'filena...",True
1,What is the requirement for the renter regardi...,"[If Sixt terminates a rental contract, the ren...",The renter is obliged to surrender all vehicle...,simple,"[{'source': 'rag_data/sixt_DE_en.pdf', 'filena...",True
2,What is the maximum amount of cover for person...,[The Insurance cover for the rented vehicle ex...,The maximum amount of cover for personal injur...,simple,"[{'source': 'rag_data/sixt_DE_en.pdf', 'filena...",True
3,How is the claim for a contractual penalty off...,[to payment of the contractual penalty. In suc...,"In such cases, the claim to the contractual pe...",simple,"[{'source': 'rag_data/sixt_DE_en.pdf', 'filena...",True
4,What is the renter's responsibility regarding ...,[ 8.\n\n7. A public parking space must be made...,"For rentals of more than 27 days, the renter s...",simple,"[{'source': 'rag_data/sixt_DE_en.pdf', 'filena...",True
5,What is the renter's responsibility for the Ad...,[ 8.\n\n7. A public parking space must be made...,The renter is fully responsible for ensuring t...,reasoning,"[{'source': 'rag_data/sixt_DE_en.pdf', 'filena...",True
6,If the renter discovers unauthorized access to...,[ixt will re-send a copy of the invoice and ma...,If the renter takes note that unauthorized per...,reasoning,"[{'source': 'rag_data/sixt_DE_en.pdf', 'filena...",True
7,What are the conditions for the renter to have...,[rental information (available at https://sixt...,,multi_context,"[{'source': 'rag_data/sixt_DE_en.pdf', 'filena...",True
8,What is the maximum coverage for personal inju...,[The Insurance cover for the rented vehicle ex...,The maximum coverage for personal injuries and...,multi_context,"[{'source': 'rag_data/sixt_DE_en.pdf', 'filena...",True
9,What is the maximum amount of cover for person...,[The Insurance cover for the rented vehicle ex...,The maximum amount of cover for personal injur...,simple,"[{'source': 'rag_data/sixt_DE_en.pdf', 'filena...",True


In [54]:
for _, row in test_set.to_pandas().iterrows():
  print("Question:", row["question"])
  print("Answer:", row["ground_truth"])
  print("Contexts:", row["contexts"])
  print("# of contexts:", len(row["contexts"]))
  print("-" * 100)

Question: What is the process specified by Sixt for rental?
Answer: nan
Contexts: ['rental in accordance with the process specified by Sixt.']
# of contexts: 1
----------------------------------------------------------------------------------------------------
Question: What is the requirement for the renter regarding the surrender of vehicle keys if Sixt terminates a rental contract?
Answer: The renter is obliged to surrender all vehicle keys immediately to Sixt if the rental contract is terminated.
Contexts: ['If Sixt terminates a rental contract, the renter is obliged to surrender the vehicles, together with all vehicle documents, all accessories and all vehicle keys, immediately to Sixt.\n\nL: Renter’s direct debit authorization, prohibition to offset claims']
# of contexts: 1
----------------------------------------------------------------------------------------------------
Question: What is the maximum amount of cover for personal injuries and damage to property in the insurance

## 3. Running RAG on the prepared dataset

In [56]:
test_data_set = test_set.to_dataset()

from langchain_community.vectorstores import FAISS
from langchain.chains.question_answering import load_qa_chain
from langchain.chains.retrieval_qa.base import RetrievalQA
from langchain_openai import ChatOpenAI


db = FAISS.from_documents(chunks, OpenAIEmbeddings())
llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0.0,
    max_tokens=250,
)

# chain = load_qa_chain(llm=llm, chain_type="stuff")
retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": 1})

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=False
)


answers = []
for row in test_data_set:
    answers.append(qa({"query": row["question"]})["result"])

In [60]:
answers

["I'm sorry, but I don't have the specific details of the rental process specified by Sixt. You may want to visit their official website or contact their customer service for accurate and detailed information.",
 'If Sixt terminates a rental contract, the renter is required to immediately surrender the vehicle keys, along with the vehicles, all vehicle documents, and all accessories, to Sixt.',
 'The maximum amount of cover for personal injuries and damage to property in the insurance policy for the rented vehicle is EUR 100 million.',
 'The claim for a contractual penalty is offset against any claim for further compensation for damages stemming from the same breach of obligations by deducting the amount of the contractual penalty from the total amount of damages claimed. This means that if a party is entitled to both a contractual penalty and additional compensation for damages due to the same breach, the amount of the contractual penalty will be subtracted from the total damages owed

In [61]:
test_data_set_with_answers = test_data_set.add_column("answer", answers)

test_data_set_with_answers

Dataset({
    features: ['question', 'contexts', 'ground_truth', 'evolution_type', 'metadata', 'episode_done', 'answer'],
    num_rows: 10
})

In [62]:
test_data_set_with_answers.to_pandas()

Unnamed: 0,question,contexts,ground_truth,evolution_type,metadata,episode_done,answer
0,What is the process specified by Sixt for rental?,[rental in accordance with the process specifi...,,simple,"[{'filename': 'rag_data/sixt_DE_en.pdf', 'sour...",True,"I'm sorry, but I don't have the specific detai..."
1,What is the requirement for the renter regardi...,"[If Sixt terminates a rental contract, the ren...",The renter is obliged to surrender all vehicle...,simple,"[{'filename': 'rag_data/sixt_DE_en.pdf', 'sour...",True,"If Sixt terminates a rental contract, the rent..."
2,What is the maximum amount of cover for person...,[The Insurance cover for the rented vehicle ex...,The maximum amount of cover for personal injur...,simple,"[{'filename': 'rag_data/sixt_DE_en.pdf', 'sour...",True,The maximum amount of cover for personal injur...
3,How is the claim for a contractual penalty off...,[to payment of the contractual penalty. In suc...,"In such cases, the claim to the contractual pe...",simple,"[{'filename': 'rag_data/sixt_DE_en.pdf', 'sour...",True,The claim for a contractual penalty is offset ...
4,What is the renter's responsibility regarding ...,[ 8.\n\n7. A public parking space must be made...,"For rentals of more than 27 days, the renter s...",simple,"[{'filename': 'rag_data/sixt_DE_en.pdf', 'sour...",True,The renter is responsible for refilling fluids...
5,What is the renter's responsibility for the Ad...,[ 8.\n\n7. A public parking space must be made...,The renter is fully responsible for ensuring t...,reasoning,"[{'filename': 'rag_data/sixt_DE_en.pdf', 'sour...",True,The renter is responsible for ensuring that th...
6,If the renter discovers unauthorized access to...,[ixt will re-send a copy of the invoice and ma...,If the renter takes note that unauthorized per...,reasoning,"[{'filename': 'rag_data/sixt_DE_en.pdf', 'sour...",True,If the renter discovers unauthorized access to...
7,What are the conditions for the renter to have...,[rental information (available at https://sixt...,,multi_context,"[{'filename': 'rag_data/sixt_DE_en.pdf', 'sour...",True,The provided context does not specify the exac...
8,What is the maximum coverage for personal inju...,[The Insurance cover for the rented vehicle ex...,The maximum coverage for personal injuries and...,multi_context,"[{'filename': 'rag_data/sixt_DE_en.pdf', 'sour...",True,The maximum coverage for personal injuries and...
9,What is the maximum amount of cover for person...,[The Insurance cover for the rented vehicle ex...,The maximum amount of cover for personal injur...,simple,"[{'filename': 'rag_data/sixt_DE_en.pdf', 'sour...",True,The maximum amount of cover for personal injur...


## 4. Perform the evaluation

In [63]:
from ragas.metrics import (
    context_relevancy,
    faithfulness,
    answer_relevancy
)

from ragas import evaluate

result = evaluate(
    test_data_set_with_answers,
    metrics=[
        context_relevancy,
        faithfulness,
        answer_relevancy
    ]
)

result

Evaluating:   0%|          | 0/30 [00:00<?, ?it/s]

{'context_relevancy': 0.3928, 'faithfulness': 0.7083, 'answer_relevancy': 0.7822}

In [64]:
result.to_pandas()

Unnamed: 0,question,contexts,ground_truth,evolution_type,metadata,episode_done,answer,context_relevancy,faithfulness,answer_relevancy
0,What is the process specified by Sixt for rental?,[rental in accordance with the process specifi...,,simple,"[{'filename': 'rag_data/sixt_DE_en.pdf', 'sour...",True,"I'm sorry, but I don't have the specific detai...",1.0,0.0,0.0
1,What is the requirement for the renter regardi...,"[If Sixt terminates a rental contract, the ren...",The renter is obliged to surrender all vehicle...,simple,"[{'filename': 'rag_data/sixt_DE_en.pdf', 'sour...",True,"If Sixt terminates a rental contract, the rent...",0.5,0.833333,0.961244
2,What is the maximum amount of cover for person...,[The Insurance cover for the rented vehicle ex...,The maximum amount of cover for personal injur...,simple,"[{'filename': 'rag_data/sixt_DE_en.pdf', 'sour...",True,The maximum amount of cover for personal injur...,0.666667,1.0,1.0
3,How is the claim for a contractual penalty off...,[to payment of the contractual penalty. In suc...,"In such cases, the claim to the contractual pe...",simple,"[{'filename': 'rag_data/sixt_DE_en.pdf', 'sour...",True,The claim for a contractual penalty is offset ...,0.5,0.25,1.0
4,What is the renter's responsibility regarding ...,[ 8.\n\n7. A public parking space must be made...,"For rentals of more than 27 days, the renter s...",simple,"[{'filename': 'rag_data/sixt_DE_en.pdf', 'sour...",True,The renter is responsible for refilling fluids...,0.03125,1.0,0.967377
5,What is the renter's responsibility for the Ad...,[ 8.\n\n7. A public parking space must be made...,The renter is fully responsible for ensuring t...,reasoning,"[{'filename': 'rag_data/sixt_DE_en.pdf', 'sour...",True,The renter is responsible for ensuring that th...,0.0625,1.0,0.953256
6,If the renter discovers unauthorized access to...,[ixt will re-send a copy of the invoice and ma...,If the renter takes note that unauthorized per...,reasoning,"[{'filename': 'rag_data/sixt_DE_en.pdf', 'sour...",True,If the renter discovers unauthorized access to...,0.02439,1.0,0.97932
7,What are the conditions for the renter to have...,[rental information (available at https://sixt...,,multi_context,"[{'filename': 'rag_data/sixt_DE_en.pdf', 'sour...",True,The provided context does not specify the exac...,0.142857,0.0,0.0
8,What is the maximum coverage for personal inju...,[The Insurance cover for the rented vehicle ex...,The maximum coverage for personal injuries and...,multi_context,"[{'filename': 'rag_data/sixt_DE_en.pdf', 'sour...",True,The maximum coverage for personal injuries and...,0.666667,1.0,0.960755
9,What is the maximum amount of cover for person...,[The Insurance cover for the rented vehicle ex...,The maximum amount of cover for personal injur...,simple,"[{'filename': 'rag_data/sixt_DE_en.pdf', 'sour...",True,The maximum amount of cover for personal injur...,0.333333,1.0,1.0
