### Query Transformations for Enhanced RAG Systems

This notebook implements three query transformation techniques to enhance retrieval performance in RAG systems without relying on specialized libraries like LangChain. By modifying user queries, we can significantly improve the relevance and comprehensiveness of retrieved information.

#### Key Transformation Techniques

- Query Rewriting: Makes queries more specific and detailed for better search precision.
- Step-back Prompting: Generates broader queries to retrieve useful contextual information.
- Sub-query Decomposition: Breaks complex queries into simpler components for comprehensive retrieval.


In [None]:
!pip install -q PymuPDF
!pip install -q python-dotenv
!pip install -q bitsandbytes

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.1/24.1 MB[0m [31m85.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 MB[0m [31m33.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m118.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m95.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m52.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
import os
import fitz
from dotenv import load_dotenv
import numpy as np
import json
import re

from openai import OpenAI
from tqdm import tqdm

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel, pipeline, AutoModelForCausalLM
from sklearn.metrics.pairwise import cosine_similarity

This notebook uses Llama3.2 3B Instruct generative model and BGE-base-en embedding model, so minimum GPU memory requirement is 8GB.

In [None]:
device = "cuda"
gen_tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path="unsloth/Llama-3.2-3B-Instruct")
gen_model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path="unsloth/Llama-3.2-3B-Instruct",
    torch_dtype="auto",
    device_map="auto"
    )

embed_model = AutoModel.from_pretrained("BAAI/bge-base-en")
embed_tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-base-en")

config.json:   0%|          | 0.00/890 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.46G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/719 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

#### Create gen function to process generated text

In [None]:
def gen(system_prompt, user_prompt): # work with unsloth/Llama-3.2-3B-Instruct
    text = gen_tokenizer.apply_chat_template(
        conversation = [
            {
                "role": "system",
                "content": system_prompt
            },
            {
                "role": "user",
                "content": user_prompt
            }
        ],
        tokenize = False,
        add_generation_prompt = False
    )

    model_inputs = gen_tokenizer([text], return_tensors = "pt").to(device)

    generated_ids = gen_model.generate(
        **model_inputs,
        do_sample = True
    )

    generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]

    response =  gen_tokenizer.batch_decode(generated_ids, skip_special_tokens = True)[0].strip("assistant\n\n")

    # print("===========================================")
    # print(f"resposne: \n{response}")
    # print("===========================================")
    return response

#### 1. Query Rewriting

This technique makes queries more specific and detailed to improve precision in retrieval

In [None]:
def rewrite_query(original_query, model="unsloth/Llama-3.2-3B-Instruct"):
    """
    Rewrites a query to make it more specific and detailed for better retrieval.

    Args:
        original_query (str): The original user query
        model (str): The model to use for query rewriting

    Returns:
        str: The rewritten query
    """
    # Define the system prompt to guide the AI assistant's behavior
    system_prompt = "You are an AI assistant specialized in improving search queries. Your task is to rewrite user queries to be more specific, detailed, and likely to retrieve relevant information."

    # Define the user prompt with the original query to be rewritten
    user_prompt = f"""
    Rewrite the following query to make it more specific and detailed. Include relevant terms and concepts that might help in retrieving accurate information. The response MUST BE ONE rewrited text, no more additional text.

    Original query: {original_query}

    Rewritten query:
    """
    response = gen(system_prompt, user_prompt)

    return response

In [81]:
rewrite_query("What are the impacts of AI on job automation and employment?")

'What are the specific job displacement rates and economic implications of AI-driven automation on various industries and sectors, particularly in the manufacturing, transportation, and service sectors, and how do these changes impact employment opportunities, skill sets, and workforce development strategies in the United States and Europe, with a focus on the effects on low-skilled, low-wage, and gig economy workers, and what are the potential benefits and drawbacks of universal basic income, retraining programs, and social safety nets in mitigating the negative consequences of AI-driven job automation?'

#### 2. Step back query transformation

This technique generates broader queries to retrieve contextual background information

In [None]:
def generate_step_back_query(original_query, model="unsloth/Llama-3.2-3B-Instruct"):
    """
    Generates a more general 'step-back' query to retrieve broader context.

    Args:
        original_query (str): The original user query
        model (str): The model to use for step-back query generation

    Returns:
        str: The step-back query
    """
    # Define the system prompt to guide the AI assistant's behavior
    system_prompt = "You are an AI assistant specialized in search strategies. Your task is to generate broader, more general versions of specific queries to retrieve relevant background information."

    # Define the user prompt with the original query to be generalized
    user_prompt = f"""
    Generate a broader, more general version of the following query that could help retrieve useful background information. DO NOT include additional unrelated text

    Original query: {original_query}

    Step-back query:
    """

    response = gen(system_prompt, user_prompt)

    return response


In [65]:
generate_step_back_query("What are the impacts of AI on job automation and employment?")

'What are the effects of technological advancements on workforce displacement and labor market dynamics?'

#### 3. Sub-query Composition

This technique breaks down complex queries into simpler components for comprehensive retrieval.

In [66]:
def decompose_query(original_query, num_subqueries=4, model="unsloth/Llama-3.2-3B-Instruct"):
    """
    Decomposes a complex query into simpler sub-queries.

    Args:
        original_query (str): The original complex query
        num_subqueries (int): Number of sub-queries to generate
        model (str): The model to use for query decomposition

    Returns:
        List[str]: A list of simpler sub-queries
    """
    # Define the system prompt to guide the AI assistant's behavior
    system_prompt = "You are an AI assistant specialized in breaking down complex questions. Your task is to decompose complex queries into simpler sub-questions that, when answered together, address the original query."

    # Define the user prompt with the original query to be decomposed
    user_prompt = f"""
    Break down the following complex query into {num_subqueries} simpler sub-queries. Each sub-query should focus on a different aspect of the original question.

    Original query: {original_query}

    Generate {num_subqueries} sub-queries, one per line, in this format:
    1. [First sub-query]
    2. [Second sub-query]
    And so on...
    """

    content = gen(system_prompt, user_prompt)

    # Extract numbered queries using simple parsing
    lines = content.split("\n")
    sub_queries = []

    for line in lines:
        if line.strip() and any(line.strip().startswith(f"{i}.") for i in range(1, 10)):
            # Remove the number and leading space
            query = line.strip()
            query = query[query.find(".")+1:].strip()
            sub_queries.append(query)

    return sub_queries


In [67]:
decompose_query('What are the impacts of AI on job automation and employment?', num_subqueries=4)

['What are the primary job roles that are at risk of being automated by AI?',
 'How do AI-driven automation changes affect the job market and the demand for new skills?',
 'What are the potential economic and societal impacts of widespread AI-driven job automation on employment?',
 'How can workers and industries adapt to the changing job market and mitigate the negative effects of AI-driven automation on employment?']

In [82]:
# Example query
original_query = "What are the impacts of AI on job automation and employment?"

# Apply query transformations
print("Original Query:", original_query)

# Query Rewriting
rewritten_query = rewrite_query(original_query)
print("\n1. Rewritten Query:")
print(rewritten_query)

# Step-back Prompting
step_back_query = generate_step_back_query(original_query)
print("\n2. Step-back Query:")
print(step_back_query)

# Sub-query Decomposition
sub_queries = decompose_query(original_query, num_subqueries=4)
print("\n3. Sub-queries:")
for i, query in enumerate(sub_queries, 1):
    print(f"   {i}. {query}")

Original Query: What are the impacts of AI on job automation and employment?

1. Rewritten Query:
What are the specific job displacement and creation impacts of artificial intelligence (AI) on employment, including the effects on low-skilled and high-skilled workers, and the resulting need for upskilling and reskilling programs, as well as the potential benefits of AI-driven entrepreneurship and the emergence of new industries and job roles in fields such as data science, machine learning, and cybersecurity?

2. Step-back Query:
What are the effects of automation and artificial intelligence on the modern workforce and labor market?

3. Sub-queries:
   1. What are the primary job roles that are most susceptible to automation by AI?
   2. How does AI-driven automation affect the types of jobs that are created in new industries and sectors?
   3. What are the potential benefits and drawbacks of AI-driven job displacement for workers who lose their jobs to automation?
   4. How do governme

#### Building Simple Vector Store

In [None]:
class SimpleVectorStore:
    """
    A simple vector store implementation using NumPy.
    """
    def __init__(self):
        """
        Initialize the vector store.
        """
        self.vectors = []  # List to store embedding vectors
        self.texts = []  # List to store original texts
        self.metadata = []  # List to store metadata for each text

    def add_item(self, text, embedding, metadata=None):
        """
        Add an item to the vector store.

        Args:
        text (str): The original text.
        embedding (List[float]): The embedding vector.
        metadata (dict, optional): Additional metadata.
        """
        self.vectors.append(np.array(embedding))  # Convert embedding to numpy array and add to vectors list
        self.texts.append(text)  # Add the original text to texts list
        self.metadata.append(metadata or {})  # Add metadata to metadata list, use empty dict if None

    def similarity_search(self, query_embedding, k=5):
        """
        Find the most similar items to a query embedding.

        Args:
        query_embedding (List[float]): Query embedding vector.
        k (int): Number of results to return.

        Returns:
        List[Dict]: Top k most similar items with their texts and metadata.
        """
        if not self.vectors:
            return []  # Return empty list if no vectors are stored

        # Convert query embedding to numpy array
        query_vector = query_embedding

        # Calculate similarities using cosine similarity
        similarities = []
        for i, vector in enumerate(self.vectors):
            # Compute cosine similarity between query vector and stored vector
            similarity = np.dot(query_vector, vector.squeeze()) / (np.linalg.norm(query_vector) * np.linalg.norm(vector.squeeze()))
            similarities.append((i, similarity))  # Append index and similarity score

        # Sort by similarity (descending)
        similarities.sort(key=lambda x: x[1], reverse=True)

        # Return top k results
        results = []
        for i in range(min(k, len(similarities))):
            idx, score = similarities[i]
            results.append({
                "text": self.texts[idx],  # Add the corresponding text
                "metadata": self.metadata[idx],  # Add the corresponding metadata
                "similarity": score  # Add the similarity score
            })

        return results  # Return the list of top k similar items


#### Create emebedding

In [None]:
embed_model.to(device)
def embed(text):
    if isinstance(text, str): text = [text] # if single string => convert into a list of one element
    inputs = embed_tokenizer(text, padding=True, truncation=True, return_tensors="pt").to(device) # tokenize input
    with torch.no_grad():
        output = embed_model(**inputs) # running model
        embedding = F.normalize(output.last_hidden_state[:, 0, :], p=2, dim=1) # normalize vector to L2
    return embedding.cpu().numpy() # pass to cpu with numpy array (n, dim)

In [None]:
def extract_text_from_pdf(pdf_path):
    """
    Extracts text from a PDF file.

    Args:
    pdf_path (str): Path to the PDF file.

    Returns:
    str: Extracted text from the PDF.
    """
    # Open the PDF file
    mypdf = fitz.open(pdf_path)
    all_text = ""  # Initialize an empty string to store the extracted text

    # Iterate through each page in the PDF
    for page in mypdf:
        all_text += page.get_text("text")
    return all_text  # Return the extracted text


In [None]:
def chunk_text(text, n=1000, overlap=200):
    """
    Chunks the given text into segments of n characters with overlap.

    Args:
    text (str): The text to be chunked.
    n (int): The number of characters in each chunk.
    overlap (int): The number of overlapping characters between chunks.

    Returns:
    List[str]: A list of text chunks.
    """
    chunks = []  # Initialize an empty list to store the chunks

    # Loop through the text with a step size of (n - overlap)
    for i in range(0, len(text), n - overlap):
        # Append a chunk of text from index i to i + n to the chunks list
        chunks.append(text[i:i + n])

    return chunks  # Return the list of text chunks


In [74]:
def process_document(pdf_path, chunk_size=1000, chunk_overlap=200):
    """
    Process a document for RAG.

    Args:
    pdf_path (str): Path to the PDF file.
    chunk_size (int): Size of each chunk in characters.
    chunk_overlap (int): Overlap between chunks in characters.

    Returns:
    SimpleVectorStore: A vector store containing document chunks and their embeddings.
    """
    print("Extracting text from PDF...")
    extracted_text = extract_text_from_pdf(pdf_path)

    print("Chunking text...")
    chunks = chunk_text(extracted_text, chunk_size, chunk_overlap)
    print(f"Created {len(chunks)} text chunks")

    print("Creating embeddings for chunks...")
    # Create embeddings for all chunks at once for efficiency
    chunk_embeddings = embed(chunks)

    # Create vector store
    store = SimpleVectorStore()

    # Add chunks to vector store
    for i, (chunk, embedding) in enumerate(zip(chunks, chunk_embeddings)):
        store.add_item(
            text=chunk,
            embedding=embedding,
            metadata={"index": i, "source": pdf_path}
        )

    print(f"Added {len(chunks)} chunks to the vector store")
    return store


In [None]:
def transformed_search(query, vector_store, transformation_type, top_k=3):
    """
    Search using a transformed query.

    Args:
        query (str): Original query
        vector_store (SimpleVectorStore): Vector store to search
        transformation_type (str): Type of transformation ('rewrite', 'step_back', or 'decompose')
        top_k (int): Number of results to return

    Returns:
        List[Dict]: Search results
    """
    print(f"Transformation type: {transformation_type}")
    print(f"Original query: {query}")

    results = []

    if transformation_type == "rewrite":
        # Query rewriting
        transformed_query = rewrite_query(query)
        print(f"Rewritten query: {transformed_query}")

        # Create embedding for transformed query
        query_embedding = embed(transformed_query)

        # Search with rewritten query
        results = vector_store.similarity_search(query_embedding, k=top_k)

    elif transformation_type == "step_back":
        # Step-back prompting
        transformed_query = generate_step_back_query(query)
        print(f"Step-back query: {transformed_query}")

        # Create embedding for transformed query
        query_embedding = embed(transformed_query)

        # Search with step-back query
        results = vector_store.similarity_search(query_embedding, k=top_k)

    elif transformation_type == "decompose":
        # Sub-query decomposition
        sub_queries = decompose_query(query)
        print("Decomposed into sub-queries:")
        for i, sub_q in enumerate(sub_queries, 1):
            print(f"{i}. {sub_q}")

        # Create embeddings for all sub-queries
        sub_query_embeddings = embed(sub_queries)

        # Search with each sub-query and combine results
        all_results = []
        for i, embedding in enumerate(sub_query_embeddings):
            sub_results = vector_store.similarity_search(embedding, k=2)  # Get fewer results per sub-query
            all_results.extend(sub_results)

        # Remove duplicates (keep highest similarity score)
        seen_texts = {}
        for result in all_results:
            text = result["text"]
            if text not in seen_texts or result["similarity"] > seen_texts[text]["similarity"]:
                seen_texts[text] = result

        # Sort by similarity and take top_k
        results = sorted(seen_texts.values(), key=lambda x: x["similarity"], reverse=True)[:top_k]

    else:
        # Regular search without transformation
        query_embedding = embed(query)
        results = vector_store.similarity_search(query_embedding, k=top_k)

    return results


In [69]:
def generate_response(query, context, model="unsloth/Llama-3.2-3B-Instruct"):
    """
    Generates a response based on the query and retrieved context.

    Args:
        query (str): User query
        context (str): Retrieved context
        model (str): The model to use for response generation

    Returns:
        str: Generated response
    """
    # Define the system prompt to guide the AI assistant's behavior
    system_prompt = "You are a helpful AI assistant. Answer the user's question based only on the provided context. If you cannot find the answer in the context, state that you don't have enough information."

    # Define the user prompt with the context and query
    user_prompt = f"""
        Context:
        {context}

        Question: {query}

        Please provide a comprehensive answer based only on the context above.
    """

    response = gen(system_prompt, user_prompt)
    # Return the generated response, stripping any leading/trailing whitespace
    return response


#### Completed pipeline for query transformation

In [70]:
def rag_with_query_transformation(pdf_path, query, transformation_type=None):
    """
    Run complete RAG pipeline with optional query transformation.

    Args:
        pdf_path (str): Path to PDF document
        query (str): User query
        transformation_type (str): Type of transformation (None, 'rewrite', 'step_back', or 'decompose')

    Returns:
        Dict: Results including query, transformed query, context, and response
    """
    # Process the document to create a vector store
    vector_store = process_document(pdf_path)

    # Apply query transformation and search
    if transformation_type:
        # Perform search with transformed query
        results = transformed_search(query, vector_store, transformation_type)
    else:
        # Perform regular search without transformation
        query_embedding = embed(query)
        results = vector_store.similarity_search(query_embedding, k=3)

    # Combine context from search results
    context = "\n\n".join([f"PASSAGE {i+1}:\n{result['text']}" for i, result in enumerate(results)])

    # Generate response based on the query and combined context
    response = generate_response(query, context)

    # Return the results including original query, transformation type, context, and response
    return {
        "original_query": query,
        "transformation_type": transformation_type,
        "context": context,
        "response": response
    }


#### Evaluation

In [71]:
def compare_responses(results, reference_answer, model="unsloth/Llama-3.2-3B-Instruct"):
    """
    Compare responses from different query transformation techniques.

    Args:
        results (Dict): Results from different transformation techniques
        reference_answer (str): Reference answer for comparison
        model (str): Model for evaluation
    """
    # Define the system prompt to guide the AI assistant's behavior
    system_prompt = """You are an expert evaluator of RAG systems.
    Your task is to compare different responses generated using various query transformation techniques
    and determine which technique produced the best response compared to the reference answer."""

    # Prepare the comparison text with the reference answer and responses from each technique
    comparison_text = f"""Reference Answer: {reference_answer}\n\n"""

    for technique, result in results.items():
        comparison_text += f"{technique.capitalize()} Query Response:\n{result['response']}\n\n"

    # Define the user prompt with the comparison text
    user_prompt = f"""
    {comparison_text}

    Compare the responses generated by different query transformation techniques to the reference answer.

    For each technique (original, rewrite, step_back, decompose):
    1. Score the response from 1-10 based on accuracy, completeness, and relevance
    2. Identify strengths and weaknesses

    Then rank the techniques from best to worst and explain which technique performed best overall and why.
    """

    # Generate the evaluation response using the specified model
    response = gen(system_prompt, user_prompt)

    # Print the evaluation results
    print("\n===== EVALUATION RESULTS =====")
    print(response)
    print("=============================")


In [72]:
def evaluate_transformations(pdf_path, query, reference_answer=None):
    """
    Evaluate different transformation techniques for the same query.

    Args:
        pdf_path (str): Path to PDF document
        query (str): Query to evaluate
        reference_answer (str): Optional reference answer for comparison

    Returns:
        Dict: Evaluation results
    """
    # Define the transformation techniques to evaluate
    transformation_types = [None, "rewrite", "step_back", "decompose"]
    results = {}

    # Run RAG with each transformation technique
    for transformation_type in transformation_types:
        type_name = transformation_type if transformation_type else "original"
        print(f"\n===== Running RAG with {type_name} query =====")

        # Get the result for the current transformation type
        result = rag_with_query_transformation(pdf_path, query, transformation_type)
        results[type_name] = result

        # Print the response for the current transformation type
        print(f"Response with {type_name} query:")
        print(result["response"])
        print("=" * 50)

    # Compare results if a reference answer is provided
    if reference_answer:
        compare_responses(results, reference_answer)

    return results


In [79]:
# Load the validation data from a JSON file
with open('val.json') as f:
    data = json.load(f)

# Extract the first query from the validation data
query = data[7]['question']

# Extract the reference answer from the validation data
reference_answer = data[7]['ideal_answer']

# pdf_path
pdf_path = "AI_Information.pdf"

# Run evaluation
evaluation_results = evaluate_transformations(pdf_path, query, reference_answer)


===== Running RAG with original query =====
Extracting text from PDF...
Chunking text...
Created 42 text chunks
Creating embeddings for chunks...
Added 42 chunks to the vector store
Response with original query:
Based on the provided context, a 'cobot' refers to a robot that works alongside humans in collaborative settings. This term is mentioned in Passage 1, where it is stated that AI-powered robots can work alongside humans in "collaborative settings (cobots)".

===== Running RAG with rewrite query =====
Extracting text from PDF...
Chunking text...
Created 42 text chunks
Creating embeddings for chunks...
Added 42 chunks to the vector store
Transformation type: rewrite
Original query: What is a 'cobot'?
resposne: 
What are the differences between a cobot and a robot, and how do collaborative robots, also known as cobots, differ from industrial robots in terms of their design, functionality, and applications in manufacturing and logistics, and what are some examples of cobot models a