##  Why Use Query Decomposition?

- Complex queries often involve multiple concepts

- LLMs or retrievers may miss parts of the original question

- It enables multi-hop reasoning (answering in steps)

- Allows parallelism (especially in multi-agent frameworks)

In [1]:
from langchain.chat_models import init_chat_model
from langchain_core.prompts import PromptTemplate
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
#from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.runnables import RunnableSequence

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Load and embed the document
loader = TextLoader("langchain-crewai-dataset.txt")
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=350, chunk_overlap=50)
chunks = splitter.split_documents(docs)

embedding = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(chunks, embedding)
retriever = vectorstore.as_retriever(search_type="mmr", search_kwargs={"k": 4, "lambda_mult": 0.7})

In [3]:
import os
from dotenv import load_dotenv
load_dotenv()

os.environ["GROQ_API_KEY"]=os.getenv("GROQ_API_KEY")

llm=init_chat_model(model="groq:llama-3.3-70b-versatile")
llm

ChatGroq(profile={'max_input_tokens': 131072, 'max_output_tokens': 32768, 'image_inputs': False, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 'video_outputs': False, 'reasoning_output': False, 'tool_calling': True}, client=<groq.resources.chat.completions.Completions object at 0x130a26b70>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x130d1dbe0>, model_name='llama-3.3-70b-versatile', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [4]:
# Query decomposition
decomposition_prompt = PromptTemplate.from_template("""
You are an AI assistant. Decompose the following complex question into 2 to 4 smaller sub-questions for better document retrieval.
Just decompose and give results don't reply
Question: "{question}"

Sub-questions:
""")
decomposition_chain = decomposition_prompt | llm | StrOutputParser()

In [5]:
query = "How does LangChain use memory and agents compared to CrewAI?"
decomposition_question=decomposition_chain.invoke({"question": query})

In [8]:
decomposition_question

"1. What is LangChain's approach to memory management?\n2. How does LangChain utilize agents in its architecture?\n3. What are the key differences between LangChain and CrewAI in terms of memory and agent usage?"

In [6]:
print(decomposition_question)

1. What is LangChain's approach to memory management?
2. How does LangChain utilize agents in its architecture?
3. What are the key differences between LangChain and CrewAI in terms of memory and agent usage?


In [9]:
# QA chain per sub-question
qa_prompt = PromptTemplate.from_template("""
Use the context below to answer the question.
Be concise and clear about what you are presenting.
Context:
{context}

Question: {input}
""")
qa_chain = qa_prompt | llm | StrOutputParser()

In [13]:
# Full RAG pipeline logic
def full_query_decomposition_rag_pipeline(user_query):
    subq_text = decomposition_chain.invoke({"question":user_query})
    subq_questions = [q.strip(" -1234567890.").strip() for q in subq_text.split("\n") if q.strip()]
    full_answers = []
    for sub_q in subq_questions:
        docs = retriever.invoke(sub_q)
        answer = qa_chain.invoke({"input":sub_q,"context":docs})
        full_answers.append(f"Q: {sub_q}\nA: {answer}")
    
    return "\n\n".join(full_answers)


In [15]:
# Step 6: Run
query = "How does LangChain use memory and agents compared to CrewAI?"
final_answer = full_query_decomposition_rag_pipeline(query)

print(final_answer)
answers_last = qa_chain.invoke({"input":query,"context":final_answer})
print("-----------------")
print(answers_last)

Q: What is LangChain's approach to memory management?
A: LangChain's approach to memory management involves using memory modules, such as ConversationBufferMemory and ConversationSummaryMemory. These modules allow the LLM to maintain awareness of previous conversation turns and summarize long interactions to fit within token limits.

Q: How does LangChain utilize agents in its architecture?
A: LangChain utilizes agents in its architecture through a planner-executor model. Agents plan a sequence of tool invocations to achieve a goal, incorporating:

1. Dynamic decision-making
2. Branching logic
3. Context-aware memory use across steps

These agents use Large Language Models (LLMs) to reason about tool invocation, input provision, and output processing, enabling multi-step task execution and integration with various tools, such as web search, calculators, and custom APIs.

Q: What is CrewAI's approach to memory and agent management for comparison?
A: CrewAI's approach to agent management