# Lab 1.2: Conversational RAG with Groq

In this lab, we will extend our RAG system to support **Conversational Memory**. This allows users to ask follow-up questions, and the system will understand the context from previous turns.

## Key Concepts
1. **History Aware Retriever**: Rephrases the user's latest query using the chat history so it makes sense as a standalone query to the vector store.
2. **Chat History**: Maintaining line of conversation.

In [None]:
# 1. Install Dependencies
%pip install -qU langchain langchain-groq langchain-community langchain-huggingface chromadb sentence-transformers

In [1]:
# 2. Setup API Keys
import getpass
import os

if "GROQ_API_KEY" not in os.environ:
    os.environ["GROQ_API_KEY"] = getpass.getpass("Enter your Groq API Key: ")

In [2]:
# 3. Setup Vector Store (Same as Lab 1.1)
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma

# Load
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
docs = loader.load()

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

# Embed & Store
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings)
retriever = vectorstore.as_retriever()

  from .autonotebook import tqdm as notebook_tqdm
USER_AGENT environment variable not set, consider setting it to identify your requests.


## 4. History Aware Retriever
We need a chain that takes the `chat_history` and the `input` and generates a search query.

In [4]:
from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Initialize LLM
llm = ChatGroq(
    model="qwen/qwen3-32b",
    temperature=0,
    reasoning_format="parsed"
)

contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

# Chain to rephrase question
contextualize_q_chain = contextualize_q_prompt | llm | StrOutputParser()

## 5. QA Chain with History
Now we create the final chain that uses the retrieved documents to answer.

In [5]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

qa_system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

def contextualized_question(input: dict):
    if input.get("chat_history"):
        return contextualize_q_chain
    else:
        return input.get("input")

rag_chain = (
    RunnablePassthrough.assign(
        context=contextualized_question | retriever | format_docs
    )
    | qa_prompt
    | llm
    | StrOutputParser()
)

## 6. Testing the Chat
We can now manage specific chat sessions.

In [6]:
from langchain_core.messages import HumanMessage, AIMessage

chat_history = []

# First Question
user_input = "What are the main modules in an agent's architecture?"
# rag_chain returns string now
answer = rag_chain.invoke({"input": user_input, "chat_history": chat_history})
print(f"User: {user_input}")
print(f"AI: {answer}")

# Update History
chat_history.extend([HumanMessage(content=user_input), AIMessage(content=answer)])

# Second Question (Follow-up)
user_input = "What is the difference between them?"
answer = rag_chain.invoke({"input": user_input, "chat_history": chat_history})
print(f"\nUser: {user_input}")
print(f"AI: {answer}")

chat_history.extend([HumanMessage(content=user_input), AIMessage(content=answer)])

User: What are the main modules in an agent's architecture?
AI: The main modules in an agent's architecture are **Planning**, **Memory**, and **Tool Use**. Planning involves task decomposition, self-reflection, and optimizing actions based on beliefs and environment observations. Memory includes short-term (contextual learning) and long-term (external storage/retrieval) components, while Tool Use enables API calls for external information or execution.

User: What is the difference between them?
AI: The **Planning** module focuses on task decomposition, subgoal creation, and self-reflection to optimize decision-making. The **Memory** module stores and retrieves information (short-term for context, long-term for persistent data) to inform actions and learning. The **Tool Use** module enables interaction with external APIs to access real-time data or execute tasks beyond the agentâ€™s internal knowledge. Each serves distinct roles: Planning strategizes, Memory retains context, and Tool U