# 📊 **RAGAS – Valutazione delle Pipeline RAG con LangChain**

## 🧠 Cos'è RAGAS?

**RAGAS** (*Retrieval-Augmented Generation Assessment Suite*) è un framework open-source per valutare la **qualità delle pipeline RAG** usando modelli LLM e metriche standardizzate.

---

## 🎯 Obiettivo

* Valutare se una pipeline RAG:

  * Recupera i documenti giusti
  * Genera risposte accurate e pertinenti
  * Si basa sul contesto effettivamente fornito

---

## 📏 **Metriche Principali**

| 📈 Metrica               | Descrizione                                                                                                         |
| ------------------------ | ------------------------------------------------------------------------------------------------------------------- |
| **Faithfulness**         | Quanto la risposta è *fattualmente coerente* con il contesto. Punteggi alti = poche allucinazioni.                  |
| **Answer Relevance**     | Quanto la risposta è pertinente rispetto alla domanda originale. Basata su similarità semantica domanda ↔ risposta. |
| **Context Precision**    | Misura se i chunk rilevanti (gold) sono posizionati in alto tra i documenti recuperati.                             |
| **Context Recall**       | Quanto il contesto recuperato copre tutti gli elementi presenti nella risposta generata.                            |
| **Contextual Relevance** | Quanto i documenti recuperati sono rilevanti rispetto alla domanda.                                                 |

---

## 🧱 Setup del Dataset

### 📥 Caricamento Documenti

🧩 Risultato: 33 chunk creati da 3 documenti `.txt`.

In [8]:
from langchain_community.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from dotenv import load_dotenv
load_dotenv()
import os

openai_api_key = os.getenv("OPENAI_API_KEY")


loader = DirectoryLoader("./data", glob="**/*.txt")

docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    length_function=len,
    is_separator_regex=False
)

chunks = text_splitter.split_documents(docs)

libmagic is unavailable but assists in filetype detection. Please consider installing libmagic for better results.
libmagic is unavailable but assists in filetype detection. Please consider installing libmagic for better results.
libmagic is unavailable but assists in filetype detection. Please consider installing libmagic for better results.


In [9]:
print(len(chunks))

22


In [10]:
chunks[0]

Document(metadata={'source': 'data\\food.txt'}, page_content='margherita pizza; $12; classic with tomato, mozzarella, and basil; main dish\n\nspaghetti carbonara; $15; creamy pasta with pancetta and parmesan; main dish\n\nbruschetta; $8; toasted bread with tomato, garlic, and olive oil; appetizer\n\ncaprese salad; $10; fresh tomatoes, mozzarella, and basil; salad\n\nlasagna; $14; layered pasta with meat sauce and cheese; main dish\n\ntiramisu; $9; coffee-flavored italian dessert; dessert\n\ngelato; $7; traditional italian ice cream; dessert')

Per far funzionare RAGAS, abbiamo bisogno nei metadati di una chiave speciale chiamata filename. Ragas si aspetta una chiave file_name nei metadata dei Documents

In [11]:
for document in chunks:
    document.metadata['file_name'] = document.metadata['source']

---

## 🧪 Generazione del Set di Test


📌 Il set di test è costituito da:

* ❓ Una lista di **domande**
* ✅ La **ground-truth answer** (attesa)
* 📄 Chunk testuali da cui generare risposte

In [12]:
from ragas.testset import TestsetGenerator
from langchain_openai import OpenAIEmbeddings
from langchain_openai import ChatOpenAI
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper

generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)

testset = generator.generate_with_langchain_docs(
    chunks,
    testset_size=8
)


Applying CustomNodeFilter:   0%|          | 0/22 [00:00<?, ?it/s]       Node 34243639-3002-417a-b5b1-af2ced114507 does not have a summary. Skipping filtering.
Node 45540005-2f3d-4d2c-a8c1-23b64ac0d7c7 does not have a summary. Skipping filtering.
Node 9d750f6f-1077-40a1-8e6c-82bbb7bd6503 does not have a summary. Skipping filtering.
Node 5904f5e9-53fa-4126-a1dc-2234b5e94623 does not have a summary. Skipping filtering.
Node 34fb689f-1e17-4aca-a463-0ea061dd590b does not have a summary. Skipping filtering.
Node 71e8b7c4-1a16-4c41-b192-5d519aa32f21 does not have a summary. Skipping filtering.
Node 0d5b7f1f-af99-4b33-a8c2-766ae78b0618 does not have a summary. Skipping filtering.
Node fc1420c2-cfca-4594-9adf-bf302e40022d does not have a summary. Skipping filtering.
Node 3500ae0e-772d-4845-9a3f-b21b30b2553f does not have a summary. Skipping filtering.
Node 19862a1b-fd38-4472-ae27-8ec03f7ef831 does not have a summary. Skipping filtering.
Node b8e0ee70-54c2-448c-ae2a-532f0c6ceca3 does not have a 

---

## 💸 Attenzione ai Costi

| Modello         | Costo Stimato            |
| --------------- | ------------------------ |
| `gpt-3.5-turbo` | ✅ Economico              |
| `gpt-4`         | ❌ Fino a 10x più costoso |

📌 *Ogni elemento del testset richiede almeno 2 chiamate LLM (domanda + risposta).*

---

## 📁 Esportazione

Una volta generato il set, è possibile esportarlo:

```python
testset.to_pandas().to_csv("qa_testset.csv", index=False)
```



In [13]:
testset.to_pandas()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,Can you describe the characteristics and compo...,"[margherita pizza; $12; classic with tomato, m...",Bruschetta is an appetizer that consists of to...,single_hop_specifc_query_synthesizer
1,What is prosecco and how much does it cost?,[risotto milanese; $16; creamy saffron-infused...,Prosecco is an Italian sparkling white wine th...,single_hop_specifc_query_synthesizer
2,What is espresso and how much does it cost?,[calamari; $12; fried squid rings with marinar...,Espresso is a strong Italian coffee that costs...,single_hop_specifc_query_synthesizer
3,What are the characteristics of calamari and h...,[<1-hop>\n\ncalamari; $12; fried squid rings w...,"Calamari, priced at $12, is described as fried...",multi_hop_abstract_query_synthesizer
4,what is the price of calamari and gnocchi?,[<1-hop>\n\ncalamari; $12; fried squid rings w...,"The price of calamari is $12, and the price of...",multi_hop_abstract_query_synthesizer
5,Wut is the price of calamari and how does it c...,[<1-hop>\n\ncalamari; $12; fried squid rings w...,"The price of calamari is $12, while frutti di ...",multi_hop_abstract_query_synthesizer
6,What unique dishes from Sicily did Chef Amico ...,"[<1-hop>\n\nAs he grew, so did his desire to e...","While exploring Italian cuisine, Chef Amico le...",multi_hop_specific_query_synthesizer
7,What did Chef Amico learn about Italian cuisin...,[<1-hop>\n\nIn the charming streets of Palermo...,"While traveling through Italy, Chef Amico lear...",multi_hop_specific_query_synthesizer
8,How did Amico's upbringing in Palermo influenc...,[<1-hop>\n\nIn the heart of the old quarter of...,"Amico's upbringing in Palermo, where he was ra...",multi_hop_specific_query_synthesizer


Per via dei costi di generazione usiamo un datset già ampliato dove abbiamo molteplici domande ed il ground truth per la domanda che verrà utilizzato per effettuare la nostra valutazione.

---

## 🔜 Prossimo Modulo

➡️ **Valutazione automatica delle risposte** con `evaluate()` usando le metriche viste sopra, il set generato e la pipeline RAG.

---

## 🧩 Riepilogo Visivo

```
Documenti TXT
   │
   ▼
Chunking con metadati ["filename"]
   │
   ▼
Testset Generator → (Domanda, Risposta Attesa)
   │
   ▼
Valutazione con LLM + metriche RAGAS