# Query Rewriting for Multi-Turn Conversations

In this notebook, we demonstrate how **query rewriting** is essential for effective retrieval in multi-turn conversational scenarios.

## The Problem

In multi-turn conversations, users often:
- Use pronouns ("it", "they", "that")
- Make implicit references to previous topics
- Ask follow-up questions without full context
- Use abbreviations that need expansion

These natural conversation patterns make it difficult for RAG systems to retrieve relevant documents because the query lacks context.

## The Solution

Query rewriting transforms context-dependent queries into self-contained versions by:
1. Incorporating conversation history
2. Replacing pronouns with specific referents
3. Expanding abbreviations and acronyms
4. Extracting core content while removing conversational fluff

We'll demonstrate this by comparing retrieval performance with and without query rewriting.

## Setup Environment

In [None]:
# Import necessary libraries
import sys
import os
import json
import pandas as pd
from typing import List, Dict
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient

sys.path.append(os.path.join(os.getcwd(), '..'))

from dotenv import load_dotenv
load_dotenv()

# Configure Qdrant connection
QDRANT_URL = os.getenv("QDRANT_URL")
QDRANT_API_KEY = os.getenv("QDRANT_API_KEY")

# Validate credentials
if not QDRANT_URL or not QDRANT_API_KEY:
    raise ValueError(
        "Qdrant credentials not found!\n"
        "Please set QDRANT_URL and QDRANT_API_KEY in your .env file."
    )

collection_name = "sample_collection.snapshot"

print(f"✓ Environment configured")
print(f"  Qdrant URL: {QDRANT_URL}")
print(f"  Collection: {collection_name}")

## Connect to Qdrant Vector Database

In [None]:
# Connect to Qdrant
try:
    qdrant_client = QdrantClient(
        url=QDRANT_URL,
        api_key=QDRANT_API_KEY,
    )
    # Verify connection
    info = qdrant_client.get_collection(collection_name)
    print(f"✓ Connected to Qdrant")
    print(f"  Collection: {collection_name}")
    print(f"  Points: {info.points_count}")
    print(f"  Status: {info.status}")
except Exception as e:
    raise Exception(f"Failed to connect to Qdrant: {e}")

## Initialize Embedding Model

In [None]:
# Initialize the embedding model
embedder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

print(f"✓ Embedding model loaded: all-MiniLM-L6-v2")

## Initialize Language Model for Query Rewriting

You have two options for the language model:

**Option 1: Local Model (No API required)**
- Uses Hugging Face Transformers to run a model locally
- Recommended: `microsoft/Phi-3.5-mini-instruct` (~7GB download)
- Pros: No API costs, works offline, full privacy
- Cons: Requires GPU/CPU resources, slower than API

**Option 2: API-based Model (Requires API key)**
- Uses OpenAI API or compatible providers (Ollama, LocalAI, vLLM, etc.)
- Recommended: `gpt-4o-mini` for cost-effectiveness
- Pros: Fast, no local resources needed
- Cons: Requires API key, costs per token

Choose the option that best fits your setup by uncommenting the appropriate section below.

In [None]:
import torch
from transformers import pipeline

# ============================================================================
# OPTION 1: Use a local model (loads model weights locally)
# ============================================================================
# Uncomment the lines below to use a local model
# This is useful if you don't have an API key or want to run everything locally
# Note: First time will download ~7GB of model weights

# model_id = "microsoft/Phi-3.5-mini-instruct"
# model = pipeline(
#     "text-generation",
#     model=model_id,
#     torch_dtype=torch.bfloat16,
#     device_map="auto",
# )
# 
# def call_llm(messages: List[Dict], max_tokens: int = 500, temperature: float = 0.1) -> str:
#     """
#     Call the local LLM with given messages.
#     
#     Args:
#         messages: List of message dicts with 'role' and 'content'
#         max_tokens: Maximum tokens to generate
#         temperature: Sampling temperature
#     
#     Returns:
#         Generated text response
#     """
#     response = model(messages, max_new_tokens=max_tokens, temperature=temperature)
#     return response[0]['generated_text'][-1]['content']
# 
# print(f"✓ Local LLM initialized: {model_id}")

# ============================================================================
# OPTION 2: Use an API-based model (requires API key)
# ============================================================================
# Comment out this section if using OPTION 1 (local model)
# This works with OpenAI and any OpenAI-compatible provider (LocalAI, Ollama, vLLM, etc.)

from openai import OpenAI

# Initialize OpenAI client (or compatible provider)
client = OpenAI(
    api_key=os.getenv("OPENAI_API_KEY"),
    base_url=os.getenv("OPENAI_BASE_URL", "https://api.openai.com/v1")  # Defaults to OpenAI if not set
)

def call_llm(messages: List[Dict], max_tokens: int = 500, temperature: float = 0.1) -> str:
    """
    Call the LLM API with given messages.
    
    Args:
        messages: List of message dicts with 'role' and 'content'
        max_tokens: Maximum tokens to generate
        temperature: Sampling temperature
    
    Returns:
        Generated text response
    """
    response = client.chat.completions.create(
        model="gpt-4o-mini",  # or "gpt-4", "gpt-3.5-turbo", or local model name for compatible providers
        messages=messages,
        max_tokens=max_tokens,
        temperature=temperature
    )
    return response.choices[0].message.content

print(f"✓ LLM initialized: gpt-4o-mini (API)")

## Query Rewriting Prompt

This prompt instructs the LLM to rewrite queries by:
- Incorporating conversation context
- Replacing pronouns with specific referents
- Expanding abbreviations and acronyms (context-aware)
- Removing conversational fluff
- Extracting core content

In [None]:
query_rewriting_prompt = """You are an expert assistant specializing in Earth Observation (EO). Your task is to rewrite the user's final query into a self-contained version suitable for a RAG (Retrieval-Augmented Generation) system.

Your rewritten query must:
- Preserve the user's original intent and core question
- Include all necessary context from the conversation history
- Be interpretable without access to prior conversation turns
- Disambiguate towards Earth Observation concepts when ambiguous
- Expand abbreviations and acronyms for clarity when needed
- Be concise and focused on the core content

Rewriting Rules:

1. Context incorporation:
   - If the query references previous context (pronouns, implicit topics, follow-ups), incorporate necessary context
   - Replace pronouns (it, they, this, that) with their specific referents
   - Make implicit references explicit

2. Core Content Extraction:
   - Reduce the amount of information in the original question
   - Extract only the most core content needed to retrieve relevant information
   - Remove conversational fluff, politeness phrases, and redundant information
   - The rewritten query should be brief and focused
   - Keep it shorter than simply replacing keywords - aim for concise, essential information only

3. Clarification and expansion:
   - Expand abbreviations and acronyms (e.g., "EO" → "Earth Observation", "ESA" → "European Space Agency", "SAR" → "Synthetic Aperture Radar", "NDVI" → "Normalized Difference Vegetation Index")
   - EXCEPTION: Do NOT expand acronyms if the conversation context already clearly establishes what they refer to
   - When ambiguous terms could refer to EO or non-EO concepts, disambiguate towards Earth Observation
   - Maintain technical accuracy

4. Context-aware acronym handling:
   - If previous conversation turns have already mentioned and explained an acronym, keep it as-is in the rewritten query
   - If the acronym appears for the first time or context is unclear, expand it
   - Common well-established acronyms in Earth Observation (e.g., "SAR", "NDVI", "RGB") can remain unexpanded if they appear in a clearly technical EO context

5. What NOT to do:
   - Do NOT add information not present in the conversation
   - Do NOT change technical terms unnecessarily
   - Do NOT keep conversational elements like "I was wondering", "Could you please", "Thank you"
   - Do NOT preserve unnecessary details that don't affect the core query
   - Do NOT rewrite if the user explicitly asks for their original query to be kept

Examples:

Example 1 (Abbreviation expansion + core extraction):
Conversation: [empty]
Last query: "Hi! I was wondering if you could tell me what are the main applications of S2 data? Thanks!"
→ rewritten_query: "Sentinel-2 data application"

Example 2 (Acronym expansion + core extraction):
Conversation: [empty]
Last query: "Could you please explain to me how SAR technology works in remote sensing?"
→ rewritten_query: "Synthetic Aperture Radar remote sensing"

Example 3 (Context incorporation + core extraction):
Conversation:
User: "Tell me about Sentinel-1"
Assistant: "Sentinel-1 is a radar imaging mission..."
Last query: "That's interesting! What is its spatial resolution and how does it compare to other satellites?"
→ rewritten_query: "Sentinel-1 spatial resolution comparison"

Example 4 (Implicit reference + core extraction):
Conversation:
User: "What sensors does Landsat 8 have?"
Assistant: "Landsat 8 carries OLI and TIRS sensors..."
Last query: "Which of those bands would be best for vegetation monitoring in agricultural areas?"
→ rewritten_query: "Vegetation monitoring agricultural areas Landsat 8 bands"

Example 5 (Ambiguity resolution + core extraction):
Conversation: [empty]
Last query: "I'm trying to understand what are the different types of resolution that exist?"
→ rewritten_query: "types of resolution in Earth Observation"

Now process the following:

Conversation:
{conversation}

Last query:
{last_utterance}

Respond with ONLY the rewritten query, nothing else."""

## Load Conversation Examples

We load real multi-turn conversation examples to demonstrate query rewriting.

In [None]:
# Load conversations from JSONL file
conversations = []
with open('../data/eo_conversations.jsonl', 'r') as f:
    for line in f:
        conversations.append(json.loads(line))

print(f"✓ Loaded {len(conversations)} conversation examples")
print(f"\nExample conversation:")
print(f"{'='*80}")
sample = conversations[2]  # Example with context
print(f"Conversation:\n{sample['conversation']}")
print(f"\nLast utterance: {sample['last_utterance']}")
print(f"{'='*80}")

## Query Rewriting Function

In [None]:
def rewrite_query(conversation: str, last_utterance: str) -> str:
    """
    Rewrite a query using conversation context.
    
    Args:
        conversation: Previous conversation history
        last_utterance: The user's last query
    
    Returns:
        Rewritten query
    """
    # Format the prompt
    prompt = query_rewriting_prompt.format(
        conversation=conversation if conversation else "[empty]",
        last_utterance=last_utterance
    )
    
    # Call LLM
    messages = [{"role": "user", "content": prompt}]
    rewritten = call_llm(messages, max_tokens=100, temperature=0.1)
    
    return rewritten.strip()

print("✓ Query rewriting function defined")

## Retrieval Function

In [None]:
def retrieve_documents(query: str, k: int = 5) -> List[Dict]:
    """
    Retrieve documents from Qdrant for a given query.
    
    Args:
        query: Search query
        k: Number of documents to retrieve
    
    Returns:
        List of retrieved documents with content and metadata
    """
    # Generate query embedding
    query_embedding = embedder.encode([query])[0].tolist()
    
    # Query Qdrant
    results = qdrant_client.query_points(
        collection_name=collection_name,
        query=query_embedding,
        limit=k,
        score_threshold=0.3
    )
    
    # Extract results
    documents = []
    for point in results.points:
        documents.append({
            'content': point.payload.get('content', '') or point.payload.get('text', ''),
            'file_path': point.payload.get('file_path', ''),
            'score': point.score
        })
    
    return documents

print("✓ Retrieval function defined")

## Demonstration: Query Rewriting Examples

Let's see how query rewriting transforms context-dependent queries into self-contained ones.

In [None]:
# Select interesting examples to demonstrate
demo_indices = [0, 2, 3, 7, 10, 13]  # Mix of different query types

print("QUERY REWRITING EXAMPLES")
print("="*80)

rewriting_examples = []

for idx in demo_indices:
    conv = conversations[idx]
    
    # Rewrite the query
    rewritten = rewrite_query(conv['conversation'], conv['last_utterance'])
    
    rewriting_examples.append({
        'conversation': conv['conversation'],
        'original_query': conv['last_utterance'],
        'rewritten_query': rewritten
    })
    
    print(f"\nExample {len(rewriting_examples)}:")
    print("-" * 80)
    if conv['conversation']:
        print(f"Context: {conv['conversation'][:150]}...")
    else:
        print("Context: [empty]")
    print(f"\nOriginal: {conv['last_utterance']}")
    print(f"Rewritten: {rewritten}")

print("\n" + "="*80)

## Comparison: Retrieval With vs Without Query Rewriting

Now we'll demonstrate the impact of query rewriting on retrieval quality by:
1. Retrieving documents using the **original query** (as-is)
2. Retrieving documents using the **rewritten query**
3. Comparing the results

For conversations with context (multi-turn), we expect the rewritten query to perform better.

In [None]:
# Select examples with conversation context (multi-turn scenarios)
multi_turn_examples = [conv for conv in conversations if conv['conversation']]

print(f"Comparing retrieval for {len(multi_turn_examples)} multi-turn conversations")
print(f"Retrieving top-5 documents for each query\n")

comparison_results = []

for idx, conv in enumerate(multi_turn_examples[:5]):  # Analyze first 5 multi-turn examples
    original_query = conv['last_utterance']
    
    # Rewrite the query
    rewritten_query = rewrite_query(conv['conversation'], original_query)
    
    # Retrieve with original query
    original_results = retrieve_documents(original_query, k=5)
    
    # Retrieve with rewritten query
    rewritten_results = retrieve_documents(rewritten_query, k=5)
    
    # Store results
    comparison_results.append({
        'conversation': conv['conversation'],
        'original_query': original_query,
        'rewritten_query': rewritten_query,
        'original_docs': original_results,
        'rewritten_docs': rewritten_results,
        'original_avg_score': sum(d['score'] for d in original_results) / len(original_results) if original_results else 0,
        'rewritten_avg_score': sum(d['score'] for d in rewritten_results) / len(rewritten_results) if rewritten_results else 0
    })
    
    print(f"\n{'='*80}")
    print(f"Example {idx + 1}")
    print(f"{'='*80}")
    print(f"\nConversation Context:")
    print(conv['conversation'])
    print(f"\nOriginal Query: {original_query}")
    print(f"Rewritten Query: {rewritten_query}")
    
    print(f"\n{'-'*80}")
    print("RETRIEVAL WITH ORIGINAL QUERY")
    print(f"{'-'*80}")
    print(f"Average Score: {comparison_results[-1]['original_avg_score']:.4f}")
    print(f"\nTop 3 Documents:")
    for i, doc in enumerate(original_results[:3]):
        print(f"\n  [{i+1}] Score: {doc['score']:.4f}")
        print(f"      File: {doc['file_path'].split('/')[-1] if doc['file_path'] else 'N/A'}")
        print(f"      Content: {doc['content'][:150]}...")
    
    print(f"\n{'-'*80}")
    print("RETRIEVAL WITH REWRITTEN QUERY")
    print(f"{'-'*80}")
    print(f"Average Score: {comparison_results[-1]['rewritten_avg_score']:.4f}")
    print(f"\nTop 3 Documents:")
    for i, doc in enumerate(rewritten_results[:3]):
        print(f"\n  [{i+1}] Score: {doc['score']:.4f}")
        print(f"      File: {doc['file_path'].split('/')[-1] if doc['file_path'] else 'N/A'}")
        print(f"      Content: {doc['content'][:150]}...")



print(f"\n\n{'='*80}")
print("COMPARISON COMPLETE")
print(f"{'='*80}")

## Conclusion

This notebook demonstrated that **query rewriting is fundamental for multi-turn conversational RAG systems** because:

1. **Context Integration**: Rewritten queries incorporate conversation history, making them self-contained
2. **Improved Retrieval**: On average, rewritten queries achieve higher retrieval scores
3. **Better Relevance**: Retrieved documents are more relevant to the user's actual intent
4. **Reduced Ambiguity**: Pronouns and implicit references are resolved
5. **Expanded Coverage**: Abbreviations and acronyms are expanded for better matching


### Recommendations:

1. **Always use query rewriting** for multi-turn conversational systems
2. **Maintain conversation history** to provide context for rewriting
3. **Fine-tune the rewriting prompt** for your specific domain
4. **Monitor rewriting quality** and adjust the prompt as needed
