# ðŸ¦œðŸ”— Conversational RAG with LangChain, Groq, and ChromaDB

### **Project Overview**
This notebook demonstrates how to build a **Retrieval-Augmented Generation (RAG)** pipeline. The system allows users to chat with external documents (in this case, a technical blog post) that the LLM was not explicitly trained on.

**Key Technologies:**
* **LangChain:** For orchestration and chain management.
* **Groq (Llama 3):** For ultra-fast LLM inference.
* **HuggingFace:** For generating open-source text embeddings.
* **ChromaDB:** As the local vector store for semantic search.

### **Architecture**
1.  **Ingest:** Load data from a URL.
2.  **Split:** Chunk the text into manageable pieces.
3.  **Embed:** Convert chunks into vectors using `all-MiniLM-L6-v2`.
4.  **Store:** Save vectors in ChromaDB.
5.  **Retrieve:** Fetch relevant context based on user queries.
6.  **Generate:** Use Llama-3 via Groq to answer questions using the retrieved context.

## 1. Environment Setup
Create a `.env` file in your project directory containing your API keys:
```text
GROQ_API_KEY=your_groq_api_key_here
HF_TOKEN=your_huggingface_token_here

In [None]:
# Install necessary libraries (Uncomment to run)
# !pip install langchain langchain-community langchain-groq langchain-huggingface chromadb python-dotenv bs4

In [1]:
import os
import bs4
from dotenv import load_dotenv

# LangChain Imports
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.chains import create_retrieval_chain, create_history_aware_retriever
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.messages import AIMessage, HumanMessage

# Load Environment Variables
load_dotenv()  # This looks for a .env file in the current directory

# Verify API Keys are loaded
groq_api_key = os.getenv("GROQ_API_KEY")
hf_token = os.getenv("HF_TOKEN")

if not groq_api_key or not hf_token:
    raise ValueError("Please ensure GROQ_API_KEY and HF_TOKEN are set in your .env file.")

# Initialize LLM
llm = ChatGroq(
    groq_api_key=groq_api_key, 
    model_name="llama-3.1-8b-instant"
)

print(f"LLM Initialized: {llm.model_name}")

USER_AGENT environment variable not set, consider setting it to identify your requests.


LLM Initialized: llama-3.1-8b-instant


## 2. Data Ingestion (Document Loading)
We will load a comprehensive blog post on "LLM Powered Autonomous Agents" by Lilian Weng. We use `SoupStrainer` to parse only the relevant content (title, headers, and content), ignoring navigation bars and footers.

In [3]:
# Load the document from the web
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)

docs = loader.load()

# Display a snippet of the loaded content
print(f"Loaded Document Title: {docs[0].metadata['source']}")
print(f"Content Snippet: {docs[0].page_content[:500]}...")

Loaded Document Title: https://lilianweng.github.io/posts/2023-06-23-agent/
Content Snippet: 

      LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In...


## 3. Splitting and Embedding
LLMs have a context window limit. To handle large documents, we:
1.  **Split** the text into smaller chunks (1000 characters) with overlap (200 characters) to preserve context between chunks.
2.  **Embed** these chunks into vectors using a HuggingFace model.
3.  **Store** them in ChromaDB for efficient retrieval.

In [4]:
# 1. Text Splitting
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
print(f"Document split into {len(splits)} chunks.")

# 2. Embeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# 3. Vector Store
vector_store = Chroma.from_documents(documents=splits, embedding=embeddings)
retriever = vector_store.as_retriever()

Document split into 63 chunks.


## 4. RAG Pipeline V1: Single Question Answering
We create a standard retrieval chain that:
1. Takes a user question.
2. Retrieves relevant document chunks.
3. Inserts chunks into a system prompt.
4. Generates an answer.

In [5]:
# Define the System Prompt
system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

# Create the Chain
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

# Test the Chain
response = rag_chain.invoke({"input": "What is the definition of an agent?"})
print("Answer:", response['answer'])

Answer: I don't know.


## 5. RAG Pipeline V2: Conversational RAG (With Memory)
To support a chat interface, we need the system to understand follow-up questions (e.g., "Tell me more about *that*").

We use a **History Aware Retriever**. This step reformulates the user's latest question based on the chat history to make it a standalone query that can be used for searching the vector store.

In [8]:
# 1. Contextualize Question Prompt
# This prompt helps the LLM understand references to previous messages
contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

# Create History Aware Retriever
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

# 2. QA Prompt (Answer Generation)
qa_system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

# 3. Final Conversational Chain
question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)
conversational_rag_chain = create_retrieval_chain(
    history_aware_retriever, question_answer_chain
)

### Testing the Conversational Interface
We will simulate a conversation where the second question relies on the context of the first.

In [9]:
# Initialize Chat History
chat_history = []

# Question 1
question1 = "What is Self-Reflection in agents?"
response1 = conversational_rag_chain.invoke(
    {"input": question1, "chat_history": chat_history}
)

print(f"Human: {question1}")
print(f"AI: {response1['answer']}\n")

# Update History
chat_history.extend(
    [
        HumanMessage(content=question1),
        AIMessage(content=response1["answer"]),
    ]
)

# Question 2 (Follow-up)
question2 = "How is it created?" 
# The model must understand 'it' refers to 'Self-Reflection'
response2 = conversational_rag_chain.invoke(
    {"input": question2, "chat_history": chat_history}
)

print(f"Human: {question2}")
print(f"AI: {response2['answer']}")

Human: What is Self-Reflection in agents?
AI: In agents, self-reflection is a process that enables them to improve iteratively by refining past action decisions and correcting previous mistakes. It helps agents learn from their experiences and adapt to new situations.

Human: How is it created?
AI: Self-reflection in agents is created by showing two-shot examples to the Large Language Model (LLM), where each example is a pair of (failed trajectory, ideal reflection for guiding future changes in the plan).
