# 🔄 Query Rephrasing Techniques Demo

This notebook demonstrates different query rephrasing techniques used in retrieval systems to improve search quality and relevance.  

## 📝 Overview

We'll explore four main query processing techniques:

**🔍 Query Expansion** – Add synonyms, related terms, or contextual entities  
**🪓 Query Decomposition** – Break compound queries into atomic sub-queries  
**✏️ Query Rewriting** – Make context-dependent queries standalone  
**🤖 Self-Querying** – Transform complex input into optimal search queries  

In [5]:
# # Import required modules
from retrieval_playground.src.pre_retrieval.query_rephrasing import (
    expand_query,
    decompose_query,
    rewrite_query,
    self_query,
    QUERY_EXAMPLES
)

## 1. Query Expansion 🔍

🎯 **Purpose**: Expand queries by replacing abbreviations, adding context, or enriching with domain-specific terms.

🗒️ **When to use**: 
- Queries contain abbreviations or acronyms
- Queries are too broad or vague
- Queries lack domain-specific terms
- Queries are incomplete questions or direct phrases


In [6]:
print("=" * 60)
print("QUERY EXPANSION EXAMPLES")
print("=" * 60)

expansion_examples = QUERY_EXAMPLES["expansion"]

for i, example in enumerate(expansion_examples, 1):
    query = example["query"]
    print(f"\n{i}. Original Query:")
    print(f"   '{query}'")
    
    try:
        expanded = expand_query(query)
        print(f"\n   Expanded Query:")
        print(f"   '{expanded}'")
    except Exception as e:
        print(f"   Error: {e}")
    
    print("-" * 40)


QUERY EXPANSION EXAMPLES

1. Original Query:
   'Generative AI transformer evaluation benchmarks'

   Expanded Query:
   'Benchmarks for evaluating large language models and generative AI transformers'
----------------------------------------

2. Original Query:
   'Retrieved capabilities of LLMs and FLOPs scaling'

   Expanded Query:
   'What are the retrieved capabilities of large language models (LLMs) and how does their performance scale with FLOPS?'
----------------------------------------

3. Original Query:
   'AI models'

   Expanded Query:
   'What are the different types of AI models and their applications?'
----------------------------------------

4. Original Query:
   'tell me about computer vision'

   Expanded Query:
   'What is computer vision and how does it work?'
----------------------------------------

5. Original Query:
   'Physics-informed neural network for fatigue life prediction'

   Expanded Query:
   'Physics-informed neural networks for predicting fatigue l

## 2. Query Decomposition 🪓

🎯 **Purpose**: Break down complex queries with multiple intents into smaller, atomic sub-queries.

🗒️ **When to use**:
- Queries contain multiple questions or intents
- Compound queries that can be better answered separately
- Complex queries that need to be processed independently


In [7]:
print("=" * 60)
print("QUERY DECOMPOSITION EXAMPLES")
print("=" * 60)

decomposition_examples = QUERY_EXAMPLES["decomposition"]

for i, example in enumerate(decomposition_examples, 1):
    query = example["query"]
    print(f"\n{i}. Original Complex Query:")
    print(f"   '{query}'")
    
    try:
        sub_queries = decompose_query(query)
        print(f"\n   Decomposed Sub-queries:")
        for j, sub_query in enumerate(sub_queries, 1):
            print(f"   {j}. {sub_query}")
    except Exception as e:
        print(f"   Error: {e}")
    
    print("-" * 40)


QUERY DECOMPOSITION EXAMPLES

1. Original Complex Query:
   'What are neural networks and how do they work in computer vision applications?'

   Decomposed Sub-queries:
   1. ```python
   2. ['What are neural networks?', 'How do neural networks work in computer vision applications?']
   3. ```
----------------------------------------

2. Original Complex Query:
   'Explain machine learning algorithms, their types, and performance evaluation metrics'

   Decomposed Sub-queries:
   1. ```python
   2. ['Explain machine learning algorithms', 'What are the types of machine learning algorithms?', 'What are the performance evaluation metrics for machine learning algorithms?']
   3. ```
----------------------------------------

3. Original Complex Query:
   'What is generative AI, what are its applications, and what are the ethical considerations?'

   Decomposed Sub-queries:
   1. ```python
   2. ['What is generative AI?', 'What are the applications of generative AI?', 'What are the ethical c

## 3. Query Rewriting ✏️

🎯 **Purpose**: Transform context-dependent queries into standalone queries suitable for retrieval.

🗒️ **When to use**:
- Queries contain pronouns or references to previous context
- Follow-up questions in conversations
- Incomplete queries that depend on prior information


In [8]:
print("=" * 60)
print("QUERY REWRITING EXAMPLES")
print("=" * 60)

rewriting_examples = QUERY_EXAMPLES["rewriting"]

for i, example in enumerate(rewriting_examples, 1):
    query = example["query"]
    context = example["previous_conversation_history"]
    
    print(f"\n{i}. Context-Dependent Query:")
    print(f"   '{query}'")
    
    print(f"\n   Previous Context:")
    print(f"   {context}")
    
    try:
        rewritten = rewrite_query(query, context)
        print(f"\n   Standalone Query:")
        print(f"   '{rewritten}'")
    except Exception as e:
        print(f"   Error: {e}")
    
    print("-" * 40)


QUERY REWRITING EXAMPLES

1. Context-Dependent Query:
   'How does it work?'

   Previous Context:
   User asked about counterfactual generation in machine learning. Assistant explained that it's a method for creating hypothetical scenarios to understand causal relationships in data.

   Standalone Query:
   'How does counterfactual generation in machine learning work?'
----------------------------------------

2. Context-Dependent Query:
   'What are the main advantages?'

   Previous Context:
   User inquired about Riemannian manifolds for change point detection. Assistant described how these mathematical structures can capture complex data geometries for robust statistical analysis.

   Standalone Query:
   'What are the main advantages of using Riemannian manifolds for change point detection?'
----------------------------------------

3. Context-Dependent Query:
   'Can you explain the methodology?'

   Previous Context:
   User asked about publication bias in meta-analysis researc

## 4. Self-Querying 🤖

🎯 **Purpose**: Transform complex user input into optimal search queries for retrieval.

🗒️ **When to use**:
- Complex, multi-faceted user requests
- When you need to generate multiple focused search queries
- To optimize retrieval by creating targeted queries


In [9]:
print("=" * 60)
print("SELF-QUERYING EXAMPLES")
print("=" * 60)

self_querying_examples = QUERY_EXAMPLES["self_querying"]

for i, example in enumerate(self_querying_examples, 1):
    query = example["query"]
    print(f"\n{i}. Complex User Input:")
    print(f"   '{query}'")
    
    try:
        search_queries = self_query(query)
        print(f"\n   Optimal Search Queries:")
        for j, search_query in enumerate(search_queries, 1):
            print(f"   {j}. {search_query}")
    except Exception as e:
        print(f"   Error: {e}")
    
    print("-" * 40)


SELF-QUERYING EXAMPLES

1. Complex User Input:
   'I need to understand the current state of research in computer vision, particularly focusing on segmentation techniques for remote sensing applications. What are the key papers and methodologies?'

   Optimal Search Queries:
   1. ```python
   2. [
   3. "computer vision remote sensing image segmentation",
   4. "deep learning segmentation remote sensing",
   5. "remote sensing image segmentation review",
   6. "semantic segmentation remote sensing applications",
   7. "instance segmentation remote sensing datasets",
   8. "key papers remote sensing image segmentation",
   9. "methodologies remote sensing image segmentation",
   10. "comparison of segmentation algorithms remote sensing",
   11. "recent advances remote sensing image segmentation",
   12. "computer vision segmentation techniques remote sensing overview"
   13. ]
   14. ```
----------------------------------------

2. Complex User Input:
   'Please explain the mathematica

## Testing Section

Try your own queries with different rephrasing techniques:


In [10]:
# Test your own query expansion
test_query = "AI models"  

print(f"Original: {test_query}")
print(f"Expanded: {expand_query(test_query)}")


Original: AI models
Expanded: What are the different types of AI models and their applications?


In [11]:
# Test your own query decomposition
test_query = "What are neural networks and how do they work?"  

print(f"Original: {test_query}")
sub_queries = decompose_query(test_query)
print("Sub-queries:")
for i, sq in enumerate(sub_queries, 1):
    print(f"{i}. {sq}")


Original: What are neural networks and how do they work?
Sub-queries:
1. ```python
2. ['What are neural networks?', 'How do neural networks work?']
3. ```


In [12]:
# Test your own query rewriting
test_query = "How does it work?"  
test_context = "User asked about neural networks"  

print(f"Context-dependent: {test_query}")
print(f"Context: {test_context}")
print(f"Standalone: {rewrite_query(test_query, test_context)}")


Context-dependent: How does it work?
Context: User asked about neural networks
Standalone: How do neural networks work?


In [13]:
# Test your own self-querying
test_query = "I want to learn about machine learning"  

print(f"Complex input: {test_query}")
search_queries = self_query(test_query)
print("Optimal search queries:")
for i, sq in enumerate(search_queries, 1):
    print(f"{i}. {sq}")


Complex input: I want to learn about machine learning
Optimal search queries:
1. ```python
2. ['machine learning tutorial', 'machine learning for beginners', 'introduction to machine learning', 'machine learning concepts', 'machine learning algorithms']
3. ```


## 🏁 Summary

This notebook demonstrated four key query rephrasing techniques:  

**🔍 Query Expansion**: Enriches queries with additional context and domain-specific terms  
**🪓 Query Decomposition**: Breaks complex multi-intent queries into focused sub-queries  
**✏️ Query Rewriting**: Makes context-dependent queries standalone for better retrieval  
**🤖 Self-Querying**: Transforms complex user inputs into optimized search queries  

✨ These techniques help improve retrieval quality by ensuring queries are:  
- Clear and specific  
- Context-independent  
- Properly scoped  
- Optimized for search  

