#	What is Query Expansion/Transformation RAG?

Query Expansion is a technique used to improve the effectiveness of information retrieval systems by reformulating or augmenting the original query. The goal is to improve the recall of relevant documents by including related terms or concepts that might not have been explicitly mentioned in the original query.

# Query Transformation RAG Implementation:

1. **Query Transformation:** We use the LLM to generate 3 alternative versions of the original query.
2. **Multi-Query Retrieval:** We retrieve documents using both the original query and the transformed queries.
3. **Deduplication:** We remove duplicate documents from the retrieved set.
4. **Response Generation:** Using the combined context from all retrieved documents, we generate a final response to the original query.

# Setup

1. **[LLM](https://deepmind.google/technologies/gemini/pro/):** Google's free gemini-pro api endpoint ([Google's API Key](https://console.cloud.google.com/apis/credentials))
2. **[Vector Store](https://www.pinecone.io/learn/vector-database/):** [ChromaDB](https://www.trychroma.com/)
3. **[Embedding Model](https://qdrant.tech/articles/what-are-embeddings/):** [nomic-embed-text-v1.5](https://www.nomic.ai/blog/posts/nomic-embed-text-v1)
4. **[LLM Framework](https://python.langchain.com/v0.2/docs/introduction/):** LangChain
5. **[Huggingface API Key](https://huggingface.co/settings/tokens)**


# Install required libraries

In [1]:
!pip install -q -U \
     Sentence-transformers==3.0.1 \
     langchain==0.2.11 \
     langchain-google-genai==1.0.7 \
     langchain-chroma==0.1.2 \
     langchain-community==0.2.10 \
     langchain-huggingface==0.0.3 \
     einops==0.8.0

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m227.1/227.1 kB[0m [31m16.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m990.3/990.3 kB[0m [31m52.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m91.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.2/43.2 kB[0m [31m3.7 MB/s[0m eta [36m0:00

# Import related libraries related to Langchain, HuggingfaceEmbedding

In [2]:
from langchain_google_genai import (
    ChatGoogleGenerativeAI,
    HarmBlockThreshold,
    HarmCategory,
)
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import ChatPromptTemplate
from langchain.document_loaders import WebBaseLoader



In [3]:
import os
import getpass

# Provide Google API Key. You can create Google API key at following link

[Google Gemini-Pro API Creation Link](https://console.cloud.google.com/apis/credentials)

[YouTube Video](https://www.youtube.com/watch?v=ZHX7zxvDfoc)



In [4]:
os.environ["GOOGLE_API_KEY"] = getpass.getpass()

··········


# Provide Huggingface API Key. You can create Huggingface API key at following link

[Higgingface API Creation Link](https://huggingface.co/settings/tokens)




In [5]:
os.environ["HF_TOKEN"] = getpass.getpass()

··········


# Step 1: Load and preprocess data code

In [6]:
def load_and_process_data(url):
    # Load data from web
    loader = WebBaseLoader(url)
    data = loader.load()

    # Split text into chunks (Experiment with Chunk Size and Chunk Overlap to get optimal chunking)
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    chunks = text_splitter.split_documents(data)

    return chunks

# Step 2: Create multiple vector stores code

In [7]:
def create_vector_store(chunks):
  embeddings = HuggingFaceEmbeddings(model_name="nomic-ai/nomic-embed-text-v1.5", model_kwargs = {'trust_remote_code': True})
  vectorstore = Chroma.from_documents(chunks, embeddings)
  return vectorstore

# Step 3: Query Transformation-RAG related code

1. **Query Transformation:** We use the LLM to generate 3 alternative versions of the original query.
2. **Multi-Query Retrieval:** We retrieve documents using both the original query and the transformed queries.
3. **Deduplication:** We remove duplicate documents from the retrieved set.
4. **Response Generation:** Using the combined context from all retrieved documents, we generate a final response to the original query.

In [8]:
def query_transformation_rag(original_query, vectorstore, llm):
    # Query Transformation
    transform_prompt = ChatPromptTemplate.from_template(
        "Given the original query, generate 3 alternative versions that might improve "
        "retrieval effectiveness. Each version should capture a different aspect or "
        "use different terminology related to the original query.\n"
        "Original query: {query}\n"
        "Transformed queries (provide 3):"
    )
    transform_chain = transform_prompt | llm
    try:
        transformed_queries_response = transform_chain.invoke({"query": original_query})
        transformed_queries = transformed_queries_response.content.split('\n')
        transformed_queries = [q.split(': ')[-1].strip() for q in transformed_queries if q.strip()]
        transformed_queries = transformed_queries[:3]  # Ensure we have at most 3 queries
    except Exception as e:
        print(f"Error transforming query: {e}")
        transformed_queries = [original_query]  # Fallback to original query

    # Retrieve documents for each transformed query
    all_docs = []
    for query in [original_query] + transformed_queries:
        docs = vectorstore.similarity_search(query, k=2)
        all_docs.extend(docs)

    # Remove duplicates and combine retrieved documents
    unique_docs = list({doc.page_content: doc for doc in all_docs}.values())
    context = "\n\n".join([doc.page_content for doc in unique_docs])

    # Generate response using combined context
    response_prompt = ChatPromptTemplate.from_template(
        "You are an AI assistant tasked with answering questions based on the provided context. "
        "The context contains information retrieved using the original query and its transformed versions. "
        "Please analyze the context carefully and provide a comprehensive answer to the original question. "
        "If the context doesn't contain enough information, use your general knowledge to supplement the answer, "
        "but prioritize information from the context when available.\n\n"
        "Original Question: {original_query}\n"
        "Context:\n{context}\n\n"
        "Answer:"
    )
    response_chain = response_prompt | llm
    try:
        response = response_chain.invoke({"context": context, "original_query": original_query})
        final_answer = response.content
    except Exception as e:
        print(f"Error generating response: {e}")
        final_answer = "I apologize, but I encountered an error while generating the response."

    return {
        "original_query": original_query,
        "transformed_queries": transformed_queries,
        "final_answer": final_answer,
        "retrieved_context": context
    }

# Step 4: Create chunk of web data to Chroma Vector Store

In [9]:
# Initialize the gemini-pro language model with specified settings (Change temeprature  and other parameters as per your requirement)
llm = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0.3, safety_settings={
          HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
        },)

# Load and process data
url = "https://en.wikipedia.org/wiki/Artificial_intelligence"
chunks = load_and_process_data(url)

# Create vector store
vectorstore = create_vector_store(chunks)

modules.json:   0%|          | 0.00/255 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/140 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/71.2k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/120 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/2.06k [00:00<?, ?B/s]

configuration_hf_nomic_bert.py:   0%|          | 0.00/1.96k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/nomic-ai/nomic-bert-2048:
- configuration_hf_nomic_bert.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_hf_nomic_bert.py:   0%|          | 0.00/84.7k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/nomic-ai/nomic-bert-2048:
- modeling_hf_nomic_bert.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors:   0%|          | 0.00/547M [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/1.19k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/286 [00:00<?, ?B/s]

# Step 5: Run Query Transformation RAG

This implementation shows the key parts of Query Transformation RAG:

1. Generation of alternative query formulations to capture different aspects or terminology
2. Retrieval using both original and transformed queries to improve coverage
3. Deduplication of retrieved documents to avoid redundancy
Generation of a comprehensive response using the expanded context

In [10]:
# Example query
original_query = "What are the ethical considerations in AI development?"

# Run Query Transformation RAG
result = query_transformation_rag(original_query, vectorstore, llm)

print("Original Query:", result["original_query"])
print("\nTransformed Queries:")
for i, query in enumerate(result["transformed_queries"], 1):
    print(f"{i}. {query}")
print("\nFinal Answer:")
print(result["final_answer"])
print("\nRetrieved Context (first 500 characters):")
print(result["retrieved_context"][:500] + "...")

Original Query: What are the ethical considerations in AI development?

Transformed Queries:
1. 1. **Ethical implications of artificial intelligence (AI) development**
2. 2. **Moral dilemmas in the creation of AI systems**
3. 3. **Social responsibility in the design and deployment of AI**

Final Answer:
Ethical considerations in AI development include:

* **Promotion of well-being:** AI systems should be designed to benefit people and communities, considering social and ethical implications throughout the design, development, and implementation process.
* **Sentience and exploitation:** If AI sentience emerges, it's crucial to avoid denying its moral status and prevent large-scale suffering through careless exploitation.
* **Ethical decision-making:** AI systems can be equipped with ethical principles and procedures to resolve ethical dilemmas, known as machine ethics or computational morality.

Retrieved Context (first 500 characters):
Frameworks
Artificial Intelligence projects can h