This code implements three query transformation techniques to enhance the retrieval process in Retrieval-Augmented Generation (RAG) systems:

1. Query Rewriting
2. Step-back Prompting
3. Sub-query Decomposition

Each technique aims to improve the relevance and comprehensiveness of retrieved information by modifying or expanding the original query.

In [3]:
import os
import sys
from dotenv import load_dotenv
load_dotenv()
from langchain_groq import ChatGroq
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain.vectorstores import Chroma
from langchain_core.documents import Document
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate

### 1. Query-rewriting : Reformulate queries to improve retrieval

In [4]:
re_write_llm = ChatOpenAI(temperature=0, model_name="gpt-4o", max_tokens=4000)

#prompt template for query rewriting
query_rewrite_template = PromptTemplate(
    template = """ You are an AI assistant tasked with reformulating user queries to improve retrieval in a RAG system. 
Given the original query, rewrite it to be more specific, detailed, and likely to retrieve relevant information.

Original query: {original_query}

Rewritten query:""",
input_variables=["original_query"]
)

# Chain for query rewriter
query_rewriter = query_rewrite_template | re_write_llm

def rewrite_query(original_query):
    """
    Rewrite the original query to improve retrieval.
    
    Args:
    original_query (str): The original user query
    
    Returns:
    str: The rewritten query
    """
    response = query_rewriter.invoke({"original_query":original_query})

    return response.content

##### Demostarte on use case

In [5]:
#Example query on climate date
original_query = "What are the impacts of climate change on the environment?"
rewritten_query = rewrite_query(original_query)
print("Original query:", original_query)
print("Rewritten query:", rewritten_query)

Original query: What are the impacts of climate change on the environment?
Rewritten query: How does climate change affect various aspects of the environment, such as biodiversity, sea levels, weather patterns, and ecosystems?


#### 2. Step-back prompting - Generate broader queries for better context retrieval

In [6]:
step_back_llm = ChatOpenAI(temperature=0, model_name="gpt-4o", max_tokens=4000)

#prompt template for step back prompting
step_back_template = PromptTemplate(
    template = """ You are an AI assistant tasked with generating broader, more general queries to improve context retrieval in a RAG system.
Given the original query, generate a step-back query that is more general and can help retrieve relevant background information.

Original query: {original_query}

Step-back query:""",
input_variables=["original_query"]
)

# Chain for step back prompting
step_back_chain = step_back_template | step_back_llm

def step_back_query(original_query):
    """
    Generate a step-back query to retrieve broader context.
    
    Args:
    original_query (str): The original user query
    
    Returns:
    str: The step-back query
    """
    response = step_back_chain.invoke({"original_query":original_query})

    return response.content

In [7]:
#Example query on climate date
original_query = "What are the impacts of climate change on the environment?"
stepback_query = step_back_query(original_query)
print("Original query:", original_query)
print("Rewritten query:", stepback_query)

Original query: What are the impacts of climate change on the environment?
Rewritten query: What are the general effects of environmental changes on ecosystems and biodiversity?


##### 3. Sub query decomposition - Breaking complex queries into smaller sub queries

In [8]:
sub_query_llm = ChatOpenAI(temperature=0, model_name="gpt-4o", max_tokens=4000)

#prompt template for query decomposition
sub_query_template = PromptTemplate(
    template = """ You are an AI assistant tasked with breaking down complex queries into simpler sub-queries for a RAG system.
Given the original query, decompose it into 2-4 simpler sub-queries that, when answered together, would provide a comprehensive response to the original query.

Original query: {original_query}

example: What are the impacts of climate change on the environment?

Sub-queries:
1. What are the impacts of climate change on biodiversity?
2. How does climate change affect the oceans?
3. What are the effects of climate change on agriculture?
4. What are the impacts of climate change on human health?""",
input_variables=["original_query"]
)

# Chain for step back prompting
sub_query_chain = sub_query_template | sub_query_llm

def generate_sub_query(original_query):
    """
    Decompose the original query into simpler sub-queries.
    
    Args:
    original_query (str): The original complex query
    
    Returns:
    List[str]: A list of simpler sub-queries
    """
    response = sub_query_chain.invoke({"original_query":original_query})

    return response.content

In [9]:
#Example query on climate date
original_query = "What are the impacts of climate change on the environment?"
sub_query = generate_sub_query(original_query)
print("Original query:", original_query)
print("Rewritten query:", sub_query)

Original query: What are the impacts of climate change on the environment?
Rewritten query: Sub-queries:
1. How does climate change affect weather patterns and extreme weather events?
2. What are the impacts of climate change on ecosystems and wildlife habitats?
3. How does climate change influence sea levels and coastal areas?
4. What are the effects of climate change on freshwater resources and availability?
