
# 🧠 Post-Processing dei Documenti nella Pipeline RAG

## 🎯 Obiettivo

Ridurre la quantità di documenti **passati al LLM** dopo il retrieval, mantenendo **solo i più rilevanti** per la domanda dell’utente.
Due strategie principali:

1. **Reranking con Cross-Encoder**
2. **Compressione con LLM** *(nella prossima lezione)*

---

## 🔁 Strategia 1: Reranking con Cross-Encoder

### ✳️ Cos’è un Cross-Encoder?

Un **Cross-Encoder** prende in input una **coppia (query, documento)** e calcola un **singolo punteggio** di similarità, elaborando entrambi insieme.
✅ **Alta accuratezza**
❌ **Bassa efficienza** (non produce embedding riutilizzabili)

> 🔄 Diverso dal Bi-Encoder, che calcola separatamente gli embedding.


---

### 📦 Setup iniziale (pipeline RAG classica)



In [1]:
from langchain.schema import Document
from langchain_community.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.vectorstores import Chroma

from dotenv import load_dotenv

loader = DirectoryLoader("./data", glob="**/*.txt")

docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=120,
    chunk_overlap=20,
    length_function=len,
    is_separator_regex=False
)

embedding_function = OpenAIEmbeddings()

model = ChatOpenAI()

chunks = text_splitter.split_documents(docs)

db = Chroma.from_documents(chunks, embedding_function)

retriever = db.as_retriever()

libmagic is unavailable but assists in filetype detection. Please consider installing libmagic for better results.
libmagic is unavailable but assists in filetype detection. Please consider installing libmagic for better results.
libmagic is unavailable but assists in filetype detection. Please consider installing libmagic for better results.


In [2]:
def format_docs(docs: list[Document]):
    return "\n".join(doc.page_content for doc in docs)

In [3]:
# creiamo la RAG pipeline
template = """Answer the question besed only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

prompt

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='Answer the question besed only on the following context:\n{context}\n\nQuestion: {question}\n'), additional_kwargs={})])

In [4]:
rag_chain_from_docs = (
    RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
    | prompt
    | model
    | StrOutputParser()
)

rag_chain_with_source = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
).assign(answer=rag_chain_from_docs)



In [5]:
result = rag_chain_with_source.invoke(input="Who is the owner of the restaurant")

In [6]:
result

{'context': [Document(metadata={'source': 'data\\founder.txt'}, page_content='Creating Chef Amico’s Restaurant'),
  Document(metadata={'source': 'data\\restaurant.txt'}, page_content="into Chef Amico. Her mission was to uncover the secret behind the restaurant's growing fame. She was greeted by Amico"),
  Document(metadata={'source': 'data\\restaurant.txt'}, page_content='One evening, as the sun cast a golden glow over the city, a renowned food critic, Elena Rossi, stepped into Chef Amico.'),
  Document(metadata={'source': 'data\\founder.txt'}, page_content='and relish life’s simple pleasures. His restaurant was a haven where strangers became friends over plates of arancini')],
 'question': 'Who is the owner of the restaurant',
 'answer': 'Chef Amico is the owner of the restaurant.'}

---

## 🚫 Perchè tale approccio è negativo? 

Problema: recupero statico Top-K

> ❌ Recupera sempre i documenti Top-K, **anche se poco rilevanti**. Non sappiamo quanto siano adatti a rispondere effettivamente alla domanda.


## Cross Encoder Model

- Per affrontare tale problema, possiamo utilizzare un `Cross Encoder`

- Un `Cross Encoder` è un tipo di modello che prende una coppia di input, come una query e un document, e li elabora assieme per predire un singolo punteggio che indica la rilevanza o la somiglianza

> Perchè non usare un modello di embedding?

- Un modello di embedding è un cosidetto modello di codifica (bi-Encoder)
- Un Bi-Ecoder model genera embedding di frasi indipendenti, consentendo confronti efficaci tra grandi insiemi di dati.

- Quindi è molto utile per tasks come il recupero di informazioni, la ricerca semantica o il clustering 

- Con il **Cross-Encoder** model, d'altra parte, elabora le coppie di frasi insieme e predice direttamente un punteggio di somiglianza che ofre un'accuratteza molto più elevata, ma manca dell'efficienza nella versatilità di generare embeddings riutilizzabili.

- La raccomandazione è di utilizzare i Cross-Encoders ogni volta che si dispone di un insieme predefinito di coppie di frasi a cui si vuole attribuire un punteggio, come 20 documenti e una query.

- I Cross-Encoder sono più lenti, ma raggiungono prestazioni migliori rispetto ai Bi-Encoders 

![alt](../images/cross_encoder.png)

---

## 🧠 Reranking con `sentence-transformers`

In [7]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [8]:
from sentence_transformers import CrossEncoder

cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")



config.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

In [9]:
# ora creiamo le coppie di frasi 
docs = result['context'] # lista dei Ducument

contents = [doc.page_content for doc in docs]

In [10]:
contents

['Creating Chef Amico’s Restaurant',
 "into Chef Amico. Her mission was to uncover the secret behind the restaurant's growing fame. She was greeted by Amico",
 'One evening, as the sun cast a golden glow over the city, a renowned food critic, Elena Rossi, stepped into Chef Amico.',
 'and relish life’s simple pleasures. His restaurant was a haven where strangers became friends over plates of arancini']

In [11]:
# creiamo le coppie
pairs = []

for text in contents:
    pairs.append(["Who is the owner of the  restaurant", text])

pairs

[['Who is the owner of the  restaurant', 'Creating Chef Amico’s Restaurant'],
 ['Who is the owner of the  restaurant',
  "into Chef Amico. Her mission was to uncover the secret behind the restaurant's growing fame. She was greeted by Amico"],
 ['Who is the owner of the  restaurant',
  'One evening, as the sun cast a golden glow over the city, a renowned food critic, Elena Rossi, stepped into Chef Amico.'],
 ['Who is the owner of the  restaurant',
  'and relish life’s simple pleasures. His restaurant was a haven where strangers became friends over plates of arancini']]

In [12]:
scores = cross_encoder.predict(pairs)

scores

array([-3.4635894, -3.478981 , -4.9786096, -5.300245 ], dtype=float32)

I valori assoluti del non hanno importanza. Quello importante è l'ordine dei numeri.

Quindi dobbiamo in qualche modo riunire tutto questo con i documenti. 

Per questo possiamo usare la funzione zip e poi usare la funzione sorted integrata per ordinare i documenti in ordine crescente. 

In [13]:
scored_docs = zip(scores, docs)

sorted_docs = sorted(scored_docs, reverse=True)

sorted_docs

[(np.float32(-3.4635894),
  Document(metadata={'source': 'data\\founder.txt'}, page_content='Creating Chef Amico’s Restaurant')),
 (np.float32(-3.478981),
  Document(metadata={'source': 'data\\restaurant.txt'}, page_content="into Chef Amico. Her mission was to uncover the secret behind the restaurant's growing fame. She was greeted by Amico")),
 (np.float32(-4.9786096),
  Document(metadata={'source': 'data\\restaurant.txt'}, page_content='One evening, as the sun cast a golden glow over the city, a renowned food critic, Elena Rossi, stepped into Chef Amico.')),
 (np.float32(-5.300245),
  Document(metadata={'source': 'data\\founder.txt'}, page_content='and relish life’s simple pleasures. His restaurant was a haven where strangers became friends over plates of arancini'))]

Come si può vedere il document con "Creating Chef Amic's Restaurant'" è quello con il punteggio più alto per rispondere alla domanda "Who is the owner of the restuarant".

Ora ci ritroviamo con un elenco di punteggi e documenti e possiamo logicamnete rimuovere il punteggio e utilizzare un indice per ridurre il numero di documenti da passare a un LLM. 

In [14]:
reranked_docs = [doc for _, doc in sorted_docs][0:2]

reranked_docs

[Document(metadata={'source': 'data\\founder.txt'}, page_content='Creating Chef Amico’s Restaurant'),
 Document(metadata={'source': 'data\\restaurant.txt'}, page_content="into Chef Amico. Her mission was to uncover the secret behind the restaurant's growing fame. She was greeted by Amico")]

Questo è un passo molto importanate, perchè non si ordina solo per riordinare i documenti, ma per ridurre il numero di documenti da inviare a un LLM. 

Supponiamo di aver recuperato 20 documenti ma di voler inviare solo i quattro più importanti. 

Ora vogliamo integrare questi passaggi fatti a mano in una pipeline.

In [15]:
# retriever il quale recupera più di 4 documents
retriever = db.as_retriever(search_kwargs={"k": 10})

creiamo i passaggi precedenti in un'unica funzione in cui:

- passiamo i dati di input, 
- estraiamo anche i documenti 
- creiamo le nostre coppie 
- usiamo il modello CrossEncoder
- riordiniamo i documenti

In [16]:
from sentence_transformers import CrossEncoder
from langchain_core.runnables import RunnableLambda

def rerank_documents(input_data):
    query = input_data['question']
    docs = input_data['context']

    cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")

    contents = [doc.page_content for doc in docs]

    pairs = [(query, text) for text in contents]
    scores = cross_encoder.predict(pairs)

    scored_docs = zip(scores, docs)

    sorted_docs = sorted(scored_docs, key=lambda x: x[0], reverse=True) # ordina in base al punteggio

    return [doc for _, doc in sorted_docs]

# integriamo tale funzione in LCEL usando RunnableLambda

template = """Answer the question based only on the following context:
{context}

Question_ {question}
"""

prompt = ChatPromptTemplate.from_template(template)

model = ChatOpenAI()

rag_chain_from_docs = (
    RunnablePassthrough.assign(context=RunnableLambda(rerank_documents))
    | prompt
    | model
    | StrOutputParser()
)

rag_chain_with_source = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()} # questi due vengono eseguiti in parallelo (contemporaneamente) su uno stesso input (question)
).assign(answer=rag_chain_from_docs)

In [17]:
rag_chain_with_source

{
  context: VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001A2A5DBBDF0>, search_kwargs={'k': 10}),
  question: RunnablePassthrough()
}
| RunnableAssign(mapper={
    answer: RunnableAssign(mapper={
              context: RunnableLambda(lambda x: x[0])
            })
            | ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='Answer the question based only on the following context:\n{context}\n\nQuestion_ {question}\n'), additional_kwargs={})])
            | ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x000001A30F502E30>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x000001A30F502C80>, root_client=<openai.OpenAI object 

In [18]:
result = rag_chain_with_source.invoke(input="Who is the owner of the rastaurant")

result

{'context': [Document(metadata={'source': 'data\\founder.txt'}, page_content='Creating Chef Amico’s Restaurant'),
  Document(metadata={'source': 'data\\restaurant.txt'}, page_content="into Chef Amico. Her mission was to uncover the secret behind the restaurant's growing fame. She was greeted by Amico"),
  Document(metadata={'source': 'data\\restaurant.txt'}, page_content='One evening, as the sun cast a golden glow over the city, a renowned food critic, Elena Rossi, stepped into Chef Amico.'),
  Document(metadata={'source': 'data\\founder.txt'}, page_content='and relish life’s simple pleasures. His restaurant was a haven where strangers became friends over plates of arancini'),
  Document(metadata={'source': 'data\\founder.txt'}, page_content='Philosophy of Hospitality'),
  Document(metadata={'source': 'data\\founder.txt'}, page_content='young chefs, shares his knowledge at culinary workshops, and supports local farmers and producers.'),
  Document(metadata={'source': 'data\\founder.txt

---

## 📈 Vantaggi del Reranking

| ✅ Vantaggi                            | ❌ Limiti                         |
| ------------------------------------- | -------------------------------- |
| Alta precisione                       | Più lento dei Bi-Encoder         |
| Utile quando i documenti sono ambigui | Valuta solo **relativamente**    |
| Riduce numero di documenti al LLM     | Non filtra assolutamente l’utile |


![alt](../images/drawbacks_cross_encoder.png)

---

## 🧩 Conclusione

Il **Cross-Encoder** migliora la qualità dei documenti recuperati e riduce il carico sul LLM.
Tuttavia, non **filtra** completamente i documenti irrilevanti.
➡️ Per questo, serve la **compressione basata su LLM**, che vedremo nella prossima lezione.

---

🎓 **Prossima lezione**: `LLM-based document compression` per filtrare in modo "intelligente" documenti poco informativi usando la comprensione semantica di un LLM.