# Lab 1.1: RAG Fundamentals with Groq & LangChain

In this lab, we will build a basic Retrieval-Augmented Generation (RAG) system using:
- **LLM**: Groq (Llama 3)
- **Embeddings**: HuggingFace (via `sentence-transformers`)
- **Vector Store**: ChromaDB
- **Framework**: LangChain

## Objectives
1. Setup environment and API keys.
2. Load and chunk a text document.
3. Create vector embeddings and store them.
4. Query the data using an LLM.

In [None]:
# 1. Install Dependencies
%pip install -qU langchain langchain-groq langchain-community langchain-huggingface chromadb sentence-transformers

In [1]:
# 2. Setup API Keys
import getpass
import os

if "GROQ_API_KEY" not in os.environ:
    os.environ["GROQ_API_KEY"] = getpass.getpass("Enter your Groq API Key: ")

## 2. Load and Process Documents
We will load a web page or create some sample text to index.

In [2]:
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load a sample article (e.g., Lilian Weng's post on Agents)
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
docs = loader.load()

# Check content
print(f"Loaded {len(docs)} document(s).")
print(docs[0].page_content[:500])  # Print first 500 chars

  from .autonotebook import tqdm as notebook_tqdm
USER_AGENT environment variable not set, consider setting it to identify your requests.


Loaded 1 document(s).






LLM Powered Autonomous Agents | Lil'Log







































Lil'Log

















|






Posts




Archive




Search




Tags




FAQ









      LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


 


Table of Contents



Agent System Overview

Component One: Planning

Task Decomposition

Self-Reflection


Component Two: Memory

Types of Memory

Maximum Inner Product Search (MIPS)


Component Three:


In [3]:
# Split text into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
splits = text_splitter.split_documents(docs)
print(f"Split into {len(splits)} chunks.")

Split into 66 chunks.


## 3. Embeddings & Vector Store
We will use `HuggingFaceEmbeddings` (runs locally/free) and `Chroma` document store.

In [4]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma

# Initialize Embeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Create Vector Store
vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=embeddings
)

# Create a retriever interface
retriever = vectorstore.as_retriever()

## 4. Query with Groq
Now we create a RAG chain to answer questions based on the document.

In [5]:
from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Initialize LLM
llm = ChatGroq(
    model="qwen/qwen3-32b",
    temperature=0,
    reasoning_format="parsed"
)

# System Prompt
system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

def format_docs(docs):
    """Combine the content of all retrieved documents into one string."""
    return "\n\n".join(doc.page_content for doc in docs)

# Create Chain
rag_chain = (
    {"context": retriever | format_docs, "input": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Run Query
response = rag_chain.invoke("What is LSH?")
print("Answer:", response)

Answer: LSH (Locality-Sensitive Hashing) is a technique that uses hashing functions to map similar input items to the same buckets with high probability. It reduces the number of comparisons needed for similarity searches by grouping similar data together in a smaller number of buckets than the total input size. This method is particularly useful for efficient approximate nearest neighbor searches in high-dimensional spaces.


## Exercise
Try changing the URL to a different article or PDF and see how the answers change!