### Query Enhancement ‚Äì Query Expansion Techniques

In a RAG pipeline, the quality of the query sent to the retriever determines how good the retrieved context is ‚Äî and therefore, how accurate the LLM‚Äôs final answer will be.

That‚Äôs where Query Expansion / Enhancement comes in.

#### üéØ What is Query Enhancement?
Query enhancement refers to techniques used to improve or reformulate the user query to retrieve better, more relevant documents from the knowledge base.
It is especially useful when:

- The original query is short, ambiguous, or under-specified
- You want to broaden the scope to catch synonyms, related phrases, or spelling variants

In [1]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_experimental.text_splitter import SemanticChunker
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chat_models import init_chat_model
from langchain.prompts import PromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableMap

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
TOP_K = 5
TOP_J = 50

In [3]:
## step1 : Load and split the dataset
loader = TextLoader("langchain_crewai_dataset.txt")
raw_docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = splitter.split_documents(raw_docs)


In [4]:
print(chunks)

[Document(metadata={'source': 'langchain_crewai_dataset.txt'}, page_content='LangChain is an open-source framework designed for developing applications powered by large language models (LLMs). It simplifies the process of building, managing, and scaling complex chains of thought by abstracting prompt management, retrieval, memory, and agent orchestration. Developers can use'), Document(metadata={'source': 'langchain_crewai_dataset.txt'}, page_content='and agent orchestration. Developers can use LangChain to create end-to-end pipelines that connect LLMs with tools, APIs, vector databases, and other knowledge sources. (v1)'), Document(metadata={'source': 'langchain_crewai_dataset.txt'}, page_content='At the heart of LangChain lies the concept of chains, which are sequences of calls to LLMs and other tools. Chains can be simple, such as a single prompt fed to an LLM, or complex, involving multiple conditionally executed steps. LangChain makes it easy to compose and reuse chains using stan

In [32]:
# semantic splitter
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2", model_kwargs={"device": "cuda"})
splitter = SemanticChunker(embedding_model)
chunks = splitter.split_documents(raw_docs)

In [79]:
len(chunks)

241

In [5]:
from tqdm import tqdm
from langchain_community.vectorstores import FAISS

def build_faiss_with_progress(docs, embedding_model, batch_size=32, normalize=True):
    """
    Construye un √≠ndice FAISS desde documentos mostrando una barra de progreso.

    Args:
        docs (List[Document]): Lista de documentos de LangChain.
        embedding_model: Instancia de HuggingFaceEmbeddings u otro modelo compatible.
        batch_size (int): Tama√±o de lote para embeddings.
        normalize (bool): Si normalizar embeddings (coseno). Equivale a normalize_L2=True.

    Returns:
        FAISS: Vectorstore listo para usar como retriever.
    """
    texts = [d.page_content for d in docs]
    embeddings = []

    for i in tqdm(range(0, len(texts), batch_size), desc="Generando embeddings"):
        batch = texts[i:i+batch_size]
        batch_emb = embedding_model.embed_documents(batch)
        embeddings.extend(batch_emb)

    # Crear FAISS desde embeddings precomputados
    vectorstore = FAISS.from_embeddings(
        list(zip(texts, embeddings)),
        embedding=embedding_model,
        normalize_L2=normalize
    )

    return vectorstore

In [6]:
### step 2: Vector Store
embedding_model=HuggingFaceEmbeddings(
    model_name="google/embeddinggemma-300m",
    model_kwargs={"device": "cuda"}  # <--- GPU
)
vectorstore=build_faiss_with_progress(chunks, embedding_model, batch_size=16, normalize=True)

Generando embeddings: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 16/16 [00:00<00:00, 16.69it/s]


In [12]:
# get list of documents from vectorstore
all_docs = vectorstore.docstore._dict.values()
all_docs = list(all_docs)

print(all_docs)

[Document(id='0f50d9d6-94c7-4477-a6de-290c72f58064', metadata={}, page_content='LangChain is an open-source framework designed for developing applications powered by large language models (LLMs). It simplifies the process of building, managing, and scaling complex chains of thought by abstracting prompt management, retrieval, memory, and agent orchestration. Developers can use'), Document(id='fd5a8ad4-e16f-4a62-a403-e2f3e1e5b072', metadata={}, page_content='and agent orchestration. Developers can use LangChain to create end-to-end pipelines that connect LLMs with tools, APIs, vector databases, and other knowledge sources. (v1)'), Document(id='a4d7fc71-7c7f-4f4c-8a2d-ccdb41bd639e', metadata={}, page_content='At the heart of LangChain lies the concept of chains, which are sequences of calls to LLMs and other tools. Chains can be simple, such as a single prompt fed to an LLM, or complex, involving multiple conditionally executed steps. LangChain makes it easy to compose and reuse chains u

In [90]:
## step 3:MMR Retriever
retriever=vectorstore.as_retriever(search_type="similarity", search_kwargs={"k":25})
retriever

VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x70d67829fdd0>, search_kwargs={'k': 25})

In [91]:
from langchain_community.retrievers import BM25Retriever, TFIDFRetriever 

sparse_retriever=TFIDFRetriever.from_documents(chunks)
sparse_retriever.k=25 ##top-J documents to retriever

In [92]:
from langchain.retrievers import EnsembleRetriever

hybrid_retriever=EnsembleRetriever(
    retrievers=[retriever,sparse_retriever],
    weights=[0.7,0.3]
)

In [99]:
candidates = hybrid_retriever.invoke("What is Langchain?")
print(f"Number of candidates: {len(candidates)}")
print(candidates[0])

Number of candidates: 30
page_content='One of the standout features of LangChain is its support for agents. Agents use LLMs to reason about which tool to call, what input to provide, and how to process the output. LangChain agents can execute multi-step tasks, integrating with tools like web search, calculators, code execution' metadata={'source': 'langchain_crewai_dataset.txt'}


In [94]:
## re-rank with cross-encoder
from sentence_transformers.cross_encoder import CrossEncoder
import numpy as np

reranker  = CrossEncoder("BAAI/bge-reranker-large", device="cuda")

def rerank_with_crossencoder(query, docs, top_k=5):
    """
    Reordena documentos con un CrossEncoder.
    
    Args:
        query (str): la consulta.
        docs (List[Document]): lista de LangChain Document.
        top_k (int): cu√°ntos documentos devolver.

    Returns:
        List[Tuple[Document, float]]: documentos ordenados con sus scores.
    """
    # 2) Crear pares (query, documento)
    pairs = [(query, d.page_content) for d in docs]

    # 3) Obtener puntuaciones
    scores = reranker.predict(pairs)

    # 4) Ordenar por score descendente
    ranked = sorted(zip(docs, scores), key=lambda x: x[1], reverse=True)

    return ranked[:top_k]


In [95]:
for c in candidates:
    print(c.page_content)
    print("-----")

One of the standout features of LangChain is its support for agents. Agents use LLMs to reason about which tool to call, what input to provide, and how to process the output. LangChain agents can execute multi-step tasks, integrating with tools like web search, calculators, code execution
-----
LangChain integrates seamlessly with vector databases like FAISS, Chroma, Pinecone, and Weaviate, enabling semantic search within large document corpora. This capability is especially important in Retrieval-Augmented Generation (RAG), where external knowledge is fetched and injected into the LLM
-----
CrewAI is compatible with LangChain agents and tools, allowing hybrid systems where LangChain handles retrieval and tool wrapping, while CrewAI manages role-based collaboration. (v9)
-----
CrewAI is compatible with LangChain agents and tools, allowing hybrid systems where LangChain handles retrieval and tool wrapping, while CrewAI manages role-based collaboration. (v5)
-----
CrewAI is compatible wi

In [96]:
# Re-rankear con cross-encoder
reranked = rerank_with_crossencoder("What is Langchain?", candidates, top_k=5)

# Mostrar resultados
for doc, score in reranked:
    print(f"[Score: {score:.4f}] {doc.page_content}")

[Score: 0.0337] and agent orchestration. Developers can use LangChain to create end-to-end pipelines that connect LLMs with tools, APIs, vector databases, and other knowledge sources. (v1)
[Score: 0.0325] and agent orchestration. Developers can use LangChain to create end-to-end pipelines that connect LLMs with tools, APIs, vector databases, and other knowledge sources. (v9)
[Score: 0.0169] One of the standout features of LangChain is its support for agents. Agents use LLMs to reason about which tool to call, what input to provide, and how to process the output. LangChain agents can execute multi-step tasks, integrating with tools like web search, calculators, code execution
[Score: 0.0090] LangChain integrates seamlessly with vector databases like FAISS, Chroma, Pinecone, and Weaviate, enabling semantic search within large document corpora. This capability is especially important in Retrieval-Augmented Generation (RAG), where external knowledge is fetched and injected into the LLM
[Sc

In [88]:
# generar respuesta con LLM usando los documentos re-rankeados
llm = init_chat_model("gpt-3.5-turbo", temperature=0, max_tokens=512)
prompt_template = """Eres un asistente √∫til y preciso. Usa la siguiente informaci√≥n para responder a la pregunta al final.
{context}
Pregunta: {question}
Respuesta:"""
prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=prompt_template,
    output_parser=StrOutputParser()
)
chain = create_stuff_documents_chain(llm=llm, prompt=prompt)
response = chain.invoke({
    "context": [doc for doc, score in reranked],
    "question": "What is Langchain?"
})
print(response)


LangChain is a platform that allows developers to create end-to-end pipelines connecting Large Language Models (LLMs) with tools, APIs, vector databases, and other knowledge sources. It also supports agents that use LLMs to reason about tasks and execute multi-step processes. Additionally, LangChain integrates seamlessly with vector databases for semantic search within large document corpora.


In [5]:
## step 4 : LLM and Prompt

import os
from dotenv import load_dotenv
load_dotenv()

os.environ["OPENAI_API_KEY"]=os.getenv("OPENAI_API_KEY")

llm=init_chat_model("openai:o4-mini")
llm


ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x7c421dd94550>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x7c421dbe43d0>, root_client=<openai.OpenAI object at 0x7c421df73d50>, root_async_client=<openai.AsyncOpenAI object at 0x7c421dbe40d0>, model_name='o4-mini', model_kwargs={}, openai_api_key=SecretStr('**********'))

In [6]:
# Query expansion
query_expansion_prompt = PromptTemplate.from_template("""
You are a helpful assistant. Expand the following query to improve document retrieval by adding relevant synonyms, technical terms, and useful context.

Original query: "{query}"

Expanded query:
""")

query_expansion_chain=query_expansion_prompt| llm | StrOutputParser()
query_expansion_chain

PromptTemplate(input_variables=['query'], input_types={}, partial_variables={}, template='\nYou are a helpful assistant. Expand the following query to improve document retrieval by adding relevant synonyms, technical terms, and useful context.\n\nOriginal query: "{query}"\n\nExpanded query:\n')
| ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x7c421dd94550>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x7c421dbe43d0>, root_client=<openai.OpenAI object at 0x7c421df73d50>, root_async_client=<openai.AsyncOpenAI object at 0x7c421dbe40d0>, model_name='o4-mini', model_kwargs={}, openai_api_key=SecretStr('**********'))
| StrOutputParser()

In [7]:
query_expansion_chain.invoke({"query":"Langchain memory"})

'‚ÄúLangChain memory‚Äù OR ‚ÄúLang Chain memory management‚Äù OR ‚ÄúLangchain conversational memory‚Äù OR ‚Äúpersistent context storage‚Äù OR ‚Äústateful agent memory‚Äù OR ‚Äúsession storage for LLMs‚Äù OR ‚Äúmemory modules‚Äù OR ‚Äúmemory backends‚Äù OR ‚ÄúConversationBufferMemory‚Äù OR ‚ÄúConversationSummaryMemory‚Äù OR ‚ÄúEntityMemory‚Äù OR ‚ÄúCombinedMemory‚Äù OR ‚Äúmemory store‚Äù OR ‚Äúvector memory store‚Äù OR ‚ÄúRedis memory‚Äù OR ‚ÄúSQLite memory‚Äù OR ‚Äúin-memory cache‚Äù OR ‚Äúretrieval-augmented generation‚Äù OR ‚Äúchat history retrieval‚Äù OR ‚Äúcontext window management‚Äù OR ‚ÄúLLM memory engineering‚Äù OR ‚ÄúPython LangChain memory examples‚Äù'

In [8]:
# RAG answering prompt
answer_prompt = PromptTemplate.from_template("""
Answer the question based on the context below.

Context:
{context}

Question: {input}
""")

document_chain=create_stuff_documents_chain(llm=llm,prompt=answer_prompt)

In [9]:
# Step 5: Full RAG pipeline with query expansion
rag_pipeline = (
    RunnableMap({
        "input": lambda x: x["input"],
        "context": lambda x: retriever.invoke(query_expansion_chain.invoke({"query": x["input"]}))
    })
    | document_chain
)

In [10]:
# Step 6: Run query
query = {"input": "What types of memory does LangChain support?"}
print(query_expansion_chain.invoke({"query":query}))
response = rag_pipeline.invoke(query)
print("‚úÖ Answer:\n", response)

Expanded query:
‚Äú(LangChain OR ‚ÄúLang Chain‚Äù) AND (‚Äúmemory support‚Äù OR memory OR ‚Äúmemory module‚Äù OR ‚Äúconversational memory‚Äù OR ‚Äúsession memory‚Äù OR context OR ‚Äústate management‚Äù OR ‚Äúcontext persistence‚Äù) AND (types OR categories OR implementations OR modules OR backends OR patterns OR architectures) AND (examples OR e.g. OR such as OR including) AND (ConversationBufferMemory OR ConversationSummaryMemory OR CombinedMemory OR VectorStoreRetrieverMemory OR ‚Äúexternal vector store‚Äù OR ‚Äúknowledge base‚Äù OR Redis OR PostgreSQL OR Pinecone OR Chroma OR Milvus OR Weaviate) AND (ephemeral OR persistent OR summary OR buffer OR retrieval‚Äêaugmented)
‚úÖ Answer:
 LangChain currently ships two ‚Äúplug-and-play‚Äù memory back-ends for chat-based agents:

1. ConversationBufferMemory  
   ‚Äì Keeps a running buffer of the full back-and-forth.  
2. ConversationSummaryMemory  
   ‚Äì Actively summarizes older turns into a concise recap to stay within token limits.


In [11]:
# Step 6: Run query
query = {"input": "CrewAI agents?"}
print(query_expansion_chain.invoke({"query":query}))
response = rag_pipeline.invoke(query)
print("‚úÖ Answer:\n", response)

Expanded query:

("CrewAI agents" OR "Crew AI agents" OR "autonomous crew management agents" OR "AI-based crew coordination agents" OR "virtual crew assistants" OR "autonomous crew assistants" OR "crew management software agents" OR "AI crew scheduler" OR "mult i-agent system for crew planning" OR "agent-based modeling for crew assignment" OR "reinforcement learning crew scheduler" OR "distributed crew coordination agents")  
AND  
("crew scheduling" OR "workforce management" OR "staff rostering" OR "resource allocation" OR "operations planning")  
AND  
("airline crew" OR "maritime crew" OR "railway staff" OR "hospital nursing staff" OR "logistics teams")  
AND  
("machine learning" OR "multi-agent systems (MAS)" OR "reinforcement learning" OR "natural language processing" OR "decentralized AI" OR "software agents architecture")
‚úÖ Answer:
 CrewAI agents are semi-autonomous, role-based workers in a multi-agent system.  Key characteristics include:  
‚Ä¢ Defined Role ‚Äì e.g. research