# üîç Migliorare le Query in RAG: Multi-Query Approach

## üéØ Obiettivo

Migliorare la qualit√† del recupero da un **vector store** riscrivendo una query iniziale in pi√π varianti semantiche, aumentando la probabilit√† che almeno una di esse ottenga il documento pi√π rilevante.

---

## üß† Perch√© usare un approccio multi-query?

> Anche se una query sembra ben formulata, pu√≤ mancare la **corretta corrispondenza semantica** con i chunk nel database vettoriale.

Esempio:

* Query originale: **"Chi √® il proprietario del ristorante?"**
* Ma il documento potrebbe contenere: *"Questo ristorante √® stato fondato da Gianni Rossi."*

Una leggera riformulazione della query pu√≤ migliorare il **recall**.

---

## ‚öôÔ∏è Pipeline Step-by-Step

### 1. ‚úÖ Preparazione

In [5]:
from dotenv import load_dotenv
import os
load_dotenv()
from langchain_openai import OpenAIEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI
from langchain_community.vectorstores.chroma import Chroma
from langchain_community.document_loaders.directory import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = DirectoryLoader("./data", glob="**/*.txt")

docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,
    chunk_overlap=20,
    length_function=len, 
    is_separator_regex=False
)

chunks = text_splitter.split_documents(docs)

embedding_function = OpenAIEmbeddings()
model = ChatOpenAI()

db = Chroma.from_documents(docs, embedding_function)

retriever = db.as_retriever()



---

### 2. ‚ú® Prompt di riscrittura multi-query

Ora diciamo che vogliamo chiedere chi √® il proprietario del ristorante.

La query: "Who owns the restaurant?"

Va bene ed otteniamo i top-k documenti, ma forse si scartano alcuni documenti che sono pi√π adatti a rispondere a questa query. Forse documenti che sono pi√π rilevanti per rispondere a questa query sono leggermente diversi dalla domanda e quindi si recuperano altri documenti meno rilevanti.

Soluzione: 
Usiamo un LLM per generare delle varianti della query in modo da retrivare tutti i documenti rilevanti.

Prompt per generare 5 varianti della stessa query:

```text
Sei un assistente AI specializzato in riformulazione di domande per il recupero semantico. 

Il tuo compito √® generare 5 versioni diverse della seguente domanda utente, tutte semanticamente simili ma formulate in modo diverso.

Rispondi solo con l'elenco delle domande, senza numerazione o spiegazioni.

Domanda: {query}
```

In [11]:
from langchain.prompts import PromptTemplate
from langchain_core.runnables import RunnableLambda
import re

query = "Who owns the restaurant?"

QUERY_PROMPT = PromptTemplate(
    template="""You are an AI language model assistant. Your task is to generate five 
    different versions of the given user question to retrieve relevant documents from a vector
    database. By generating multiple perspectives on the user question, your goal is to help
    the user overcome some of the limitations of the distance-based similarity search.
    Provide these alternative question like this:
    <<question1>>
    <<question2>>
    Only provide the query, no numbering.
    Original question: {question}
    """,
    input_variables=['question']
)


def split_and_clean_text(input_text):
    return [item for item in re.split(r"<<|>>", input_text) if item.strip()]

---

### 3. üõ†Ô∏è Chain di LangChain per Multi-Query

In [12]:
model = ChatOpenAI()
multiquery_chain = (
    QUERY_PROMPT | model | StrOutputParser() | RunnableLambda(split_and_clean_text)
)

In [13]:
list_of_questions = multiquery_chain.invoke(query)

In [14]:
list_of_questions

['What individual or entity has ownership of the restaurant?',
 'To whom does the restaurant belong?',
 'Which person or group holds ownership of the restaurant?',
 'Who is the proprietor of the restaurant?',
 'Who has legal ownership of the restaurant?']

In [15]:
docs = [retriever.invoke(q) for q in list_of_questions]

Dato che le queries sono comunque simili, in docs avremo comunque dei duplicati

In [16]:
docs

[[Document(metadata={'source': 'data\\restaurant.txt'}, page_content="In the charming streets of Palermo, tucked away in a quaint alley, stood Chef Amico, a restaurant that was more than a mere eatery‚Äîit was a slice of Sicilian heaven. Founded by Amico, a chef whose name was synonymous with passion and creativity, the restaurant was a mosaic of his life‚Äôs journey through the flavors of Italy.\n\nChef Amico‚Äôs doors opened to a world where the aromas of garlic and olive oil were as welcoming as a warm embrace. The walls, adorned with photos of Amico‚Äôs travels and family recipes, spoke of a rich culinary heritage. The chatter and laughter of patrons filled the air, creating a symphony as delightful as the dishes served.\n\nOne evening, as the sun cast a golden glow over the city, a renowned food critic, Elena Rossi, stepped into Chef Amico. Her mission was to uncover the secret behind the restaurant's growing fame. She was greeted by Amico himself, whose eyes sparkled with the joy

In [18]:
#per togliere i duplicati
def flatten_and_unique_documents(documents):
    flattened_docs = [doc for sublist in documents for doc in sublist]

    unique_docs = []
    unique_contents = set()
    for doc in flattened_docs:
        if doc.page_content not in unique_contents:
            unique_docs.append(doc)
            unique_contents.add(doc.page_content)

    return unique_docs

In [19]:
flatten_and_unique_documents(documents=docs)

[Document(metadata={'source': 'data\\restaurant.txt'}, page_content="In the charming streets of Palermo, tucked away in a quaint alley, stood Chef Amico, a restaurant that was more than a mere eatery‚Äîit was a slice of Sicilian heaven. Founded by Amico, a chef whose name was synonymous with passion and creativity, the restaurant was a mosaic of his life‚Äôs journey through the flavors of Italy.\n\nChef Amico‚Äôs doors opened to a world where the aromas of garlic and olive oil were as welcoming as a warm embrace. The walls, adorned with photos of Amico‚Äôs travels and family recipes, spoke of a rich culinary heritage. The chatter and laughter of patrons filled the air, creating a symphony as delightful as the dishes served.\n\nOne evening, as the sun cast a golden glow over the city, a renowned food critic, Elena Rossi, stepped into Chef Amico. Her mission was to uncover the secret behind the restaurant's growing fame. She was greeted by Amico himself, whose eyes sparkled with the joy 

---

### ‚úÖ Vantaggi

* ‚úÖ Migliore copertura semantica ‚Üí aumento del **recall**
* ‚úÖ Minimizza il rischio che una query troppo letterale fallisca
* ‚úÖ Approccio semplice da implementare con `LangChain Runnable`

---

## üîÅ A seguire: Approccio HyDE (Hypothetical Document Embeddings)

Nel prossimo video imparerai come generare **risposte ipotetiche** alla domanda e usarle come base per il retrieval ‚Äî un'alternativa efficace quando le domande sono troppo ambigue o generiche.