### Query Enhancement – Query Expansion Techniques

In a RAG pipeline, the quality of the query sent to the retriever determines how good the retrieved context is — and therefore, how accurate the LLM’s final answer will be.

That’s where Query Expansion / Enhancement comes in.

#### 🎯 What is Query Enhancement?
Query enhancement refers to techniques used to improve or reformulate the user query to retrieve better, more relevant documents from the knowledge base.
It is especially useful when:

- The original query is short, ambiguous, or under-specified
- You want to broaden the scope to catch synonyms, related phrases, or spelling variants

In [14]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
# from langchain.chat_models import init_chat_model
from langchain.prompts import PromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableMap
from pydantic import SecretStr
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI

In [15]:
"""
STEP 3: Environment Setup
Load API credentials securely from environment variables.
"""

import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Get Google API key with error handling
GOOGLE_API_KEY = os.getenv('GOOGLE_API_KEY')
if not GOOGLE_API_KEY:
    raise ValueError("GOOGLE_API_KEY not found in environment variables")

print("✓ API keys loaded successfully")

✓ API keys loaded successfully


In [17]:
# Initialize Google embeddings model for vector representation
embedding_model = GoogleGenerativeAIEmbeddings(
    model="models/gemini-embedding-001"
)


# Initialize Google's Gemini model for response generation
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash-exp",    # Latest flash model
    temperature=0,               # Deterministic output for consistency in ranking
    max_tokens=None,            # Use model default token limit
    timeout=None,               # No timeout limit for ranking requests
    max_retries=2,              # Retry failed requests twice
)

print("✓ Gemini model initialized for reranking")

✓ Gemini model initialized for reranking


In [18]:
## step1 : Load and split the dataset
loader = TextLoader("langchain_crewai_dataset.txt")
raw_docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = splitter.split_documents(raw_docs)

In [19]:
chunks

[Document(metadata={'source': 'langchain_crewai_dataset.txt'}, page_content='LangChain is an open-source framework designed for developing applications powered by large language models (LLMs). It simplifies the process of building, managing, and scaling complex chains of thought by abstracting prompt management, retrieval, memory, and agent orchestration. Developers can use'),
 Document(metadata={'source': 'langchain_crewai_dataset.txt'}, page_content='and agent orchestration. Developers can use LangChain to create end-to-end pipelines that connect LLMs with tools, APIs, vector databases, and other knowledge sources. (v1)'),
 Document(metadata={'source': 'langchain_crewai_dataset.txt'}, page_content='At the heart of LangChain lies the concept of chains, which are sequences of calls to LLMs and other tools. Chains can be simple, such as a single prompt fed to an LLM, or complex, involving multiple conditionally executed steps. LangChain makes it easy to compose and reuse chains using st

In [20]:
### step 2: Vector Store
embedding_model=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore=FAISS.from_documents(chunks,embedding_model)

## step 3:MMR Retriever
retriever=vectorstore.as_retriever(search_type="mmr",search_kwargs={"k":5})
retriever


VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7d0ce039a0c0>, search_type='mmr', search_kwargs={'k': 5})

In [None]:
# import time
# # Process chunks in smaller batches to avoid rate limits
# batch_size = 5  # Adjust this based on your rate limits
# all_embeddings = []

# try:
#     # Create vector store with first batch
#     first_batch = chunks[:batch_size]
#     print(f"Processing first batch of {len(first_batch)} chunks...")
#     vectorstore = FAISS.from_documents(first_batch, embedding_model)
    
#     # Process remaining chunks in batches
#     for i in range(batch_size, len(chunks), batch_size):
#         batch = chunks[i:i+batch_size]
#         print(f"Processing batch {i//batch_size + 1}: chunks {i} to {min(i+batch_size, len(chunks))}")
        
#         # Add delay between batches
#         time.sleep(3)  # 3 second delay between batches
        
#         # Create temporary vector store for this batch
#         temp_vectorstore = FAISS.from_documents(batch, embedding_model)
        
#         # Merge with main vector store
#         vectorstore.merge_from(temp_vectorstore)
        
#         print(f"✓ Processed batch {i//batch_size + 1}")

#     print("✅ Vector store created successfully with all chunks")
    
#     ## step 3: MMR Retriever
#     retriever = vectorstore.as_retriever(search_type="mmr", search_kwargs={"k": 5})
#     print("✓ MMR Retriever initialized")
    
# except Exception as e:
#     print(f"❌ Error creating vector store: {e}")
#     print("💡 Try reducing batch_size or increasing sleep time")
#     raise

# retriever

Processing first batch of 5 chunks...
Processing batch 2: chunks 5 to 10
✓ Processed batch 2
Processing batch 3: chunks 10 to 15
✓ Processed batch 3
Processing batch 4: chunks 15 to 20
✓ Processed batch 4
Processing batch 5: chunks 20 to 25
✓ Processed batch 5
Processing batch 6: chunks 25 to 30
✓ Processed batch 6
Processing batch 7: chunks 30 to 35
✓ Processed batch 7
Processing batch 8: chunks 35 to 40
✓ Processed batch 8
Processing batch 9: chunks 40 to 45
✓ Processed batch 9
Processing batch 10: chunks 45 to 50
✓ Processed batch 10
Processing batch 11: chunks 50 to 55
✓ Processed batch 11
Processing batch 12: chunks 55 to 60
✓ Processed batch 12
Processing batch 13: chunks 60 to 65
✓ Processed batch 13
Processing batch 14: chunks 65 to 70
✓ Processed batch 14
Processing batch 15: chunks 70 to 75
✓ Processed batch 15
Processing batch 16: chunks 75 to 80
✓ Processed batch 16
Processing batch 17: chunks 80 to 85
✓ Processed batch 17
Processing batch 18: chunks 85 to 90
✓ Processed ba

VectorStoreRetriever(tags=['FAISS', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7d0e11da0a10>, search_type='mmr', search_kwargs={'k': 5})

In [21]:
# Query expansion
query_expansion_prompt = PromptTemplate.from_template("""
You are a helpful assistant. Expand the following query to improve document retrieval by adding relevant synonyms, technical terms, and useful context.

Original query: "{query}"

Expanded query:
""")

query_expansion_chain=query_expansion_prompt| llm | StrOutputParser()
query_expansion_chain

PromptTemplate(input_variables=['query'], input_types={}, partial_variables={}, template='\nYou are a helpful assistant. Expand the following query to improve document retrieval by adding relevant synonyms, technical terms, and useful context.\n\nOriginal query: "{query}"\n\nExpanded query:\n')
| ChatGoogleGenerativeAI(model='models/gemini-2.0-flash-exp', google_api_key=SecretStr('**********'), temperature=0.0, max_retries=2, client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x7d0e11dcfda0>, default_metadata=())
| StrOutputParser()

In [22]:
query_expansion_chain.invoke({"query":"Langchain memory"})

'Okay, here\'s an expanded query for "Langchain memory," designed to improve document retrieval by incorporating synonyms, technical terms, and useful context:\n\n**Expanded Query:**\n\n**(Langchain OR "Language Model Integration Framework") AND (memory OR "conversational memory" OR "chat history" OR "state management" OR "context management" OR "memory buffer" OR "knowledge graph" OR "vector database" OR "retrieval augmented generation" OR "RAG") AND (persistence OR storage OR "long-term memory" OR "short-term memory" OR "episodic memory" OR "semantic memory" OR "memory retrieval" OR "memory update" OR "memory deletion" OR "memory compression" OR "memory indexing") AND (chains OR agents OR conversational OR chatbot OR "dialogue system" OR "question answering" OR "task automation") AND (implementation OR architecture OR design OR "best practices" OR tutorial OR example OR "code snippet" OR "performance optimization" OR limitations OR challenges OR evaluation OR comparison)**\n\n**Expla

In [23]:
# RAG answering prompt
answer_prompt = PromptTemplate.from_template("""
Answer the question based on the context below.

Context:
{context}

Question: {input}
""")

document_chain=create_stuff_documents_chain(llm=llm,prompt=answer_prompt)

In [24]:
# Step 5: Full RAG pipeline with query expansion
rag_pipeline = (
    RunnableMap({
        "input": lambda x: x["input"],
        "context": lambda x: retriever.invoke(query_expansion_chain.invoke({"query": x["input"]}))
    })
    | document_chain
)

In [25]:
# Step 6: Run query
query = {"input": "What types of memory does LangChain support?"}
print(query_expansion_chain.invoke({"query":query}))
response = rag_pipeline.invoke(query)
print("✅ Answer:\n", response)

```json
{
  "input": "What types of memory does LangChain support?",
  "expanded_query": {
    "query": "What types of memory does LangChain support?  Specifically, what memory implementations, memory modules, memory classes, or memory components are available within the LangChain framework?  This includes, but is not limited to, conversation buffer memory, conversation buffer window memory, conversation summary memory, conversation summary buffer memory, vector store-backed memory, entity memory, knowledge graph memory, and context-aware memory.  Also, what are the different ways LangChain can store and retrieve conversational history, chat history, or previous interactions?  Are there specific data structures or databases used for memory storage, such as dictionaries, lists, Redis, Chroma, Pinecone, or other vector databases?  How does LangChain manage short-term memory and long-term memory?  What are the trade-offs between different memory types in terms of performance, cost, and co