# 🧑‍💻 Session 6: Retrieval Augmented Generation (RAG)

RAG combines:
1. **Retriever** → Fetches relevant documents from a vector store.  
2. **LLM** → Generates context-aware responses using query + retrieved docs.  

In this session, we will:
- Store documents with **Gemini Embeddings**  
- Retrieve docs from **Chroma**  
- Use **Groq LLM** for answering questions  


In [1]:
# 📌 Install dependencies
!pip install -q langchain==0.3.27 langchain-community==0.3.31 langchain-groq==0.3.8 langchain-google-genai==2.1.12 chromadb==1.2.1


[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m34.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.7/50.7 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.7/20.7 MB[0m [31m70.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m278.2/278.2 kB[0m [31m18.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m56.8 MB/s[0m eta [36m0:00:00[

In [2]:
!pip show langchain langchain-community langchain-groq langchain-google-genai chromadb

Name: langchain
Version: 0.3.27
Summary: Building applications with LLMs through composability
Home-page: 
Author: 
Author-email: 
License: MIT
Location: /usr/local/lib/python3.12/dist-packages
Requires: langchain-core, langchain-text-splitters, langsmith, pydantic, PyYAML, requests, SQLAlchemy
Required-by: langchain-community
---
Name: langchain-community
Version: 0.3.31
Summary: Community contributed LangChain integrations.
Home-page: 
Author: 
Author-email: 
License: MIT
Location: /usr/local/lib/python3.12/dist-packages
Requires: aiohttp, dataclasses-json, httpx-sse, langchain, langchain-core, langsmith, numpy, pydantic-settings, PyYAML, requests, SQLAlchemy, tenacity
Required-by: 
---
Name: langchain-groq
Version: 0.3.8
Summary: An integration package connecting Groq and LangChain
Home-page: 
Author: 
Author-email: 
License: MIT
Location: /usr/local/lib/python3.12/dist-packages
Requires: groq, langchain-core
Required-by: 
---
Name: langchain-google-genai
Version: 2.1.12
Summary: An

## 🔑 Setup API Keys
- Google Gemini API key → for embeddings  
- Groq API key → for LLM (LLaMA models)


In [3]:
from google.colab import userdata
GEMINI_API_KEY = userdata.get('GEMINI_API_KEY')
GROQ_API_KEY = userdata.get('GROQ_API_KEY')

## 📄 Step 1: Create Documents
We’ll use IPL players knowledge base.


In [4]:
from langchain.schema import Document

docs = [
    Document(
        page_content="Virat Kohli is one of the most successful batsmen in IPL history and has captained RCB.",
        metadata={"team": "Royal Challengers Bangalore"}
    ),
    Document(
        page_content="Rohit Sharma is the most successful captain in IPL history, winning five titles with Mumbai Indians.",
        metadata={"team": "Mumbai Indians"}
    ),
    Document(
        page_content="MS Dhoni has led Chennai Super Kings to multiple IPL titles and is known as Captain Cool.",
        metadata={"team": "Chennai Super Kings"}
    ),
    Document(
        page_content="Jasprit Bumrah is a leading fast bowler for Mumbai Indians, famous for his yorkers.",
        metadata={"team": "Mumbai Indians"}
    ),
    Document(
        page_content="Ravindra Jadeja is an all-rounder for Chennai Super Kings, contributing with bat, ball, and fielding.",
        metadata={"team": "Chennai Super Kings"}
    )
]


## 🗂️ Step 2: Store Documents with Gemini Embeddings in Chroma
We’ll use **GoogleGenerativeAIEmbeddings** for vector representation.


In [5]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.vectorstores import Chroma

embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001", google_api_key=GEMINI_API_KEY)

# Create Chroma vector store
vector_store = Chroma(
    embedding_function=embeddings,
    persist_directory="rag_chroma_db",
    collection_name="ipl_docs"
)

# Add documents
vector_store.add_documents(docs)


  vector_store = Chroma(


['f8e28e1f-7b35-4435-aabe-828e5e2f58ea',
 'b509e1ea-8713-442d-a78c-75339ed3346b',
 'aea8106e-4f87-402e-a32c-0db2e83de00a',
 'a37ca89e-527c-49d2-9f4e-c9817e6ddcae',
 'bcd29172-fa1b-455d-b76c-d004dec348c9']

## 🔎 Step 3: Create Retriever
Retriever fetches relevant chunks from Chroma.


In [6]:
retriever = vector_store.as_retriever(search_kwargs={"k": 2})


## 🧠 Step 4: Initialize Groq LLM
We use Groq-hosted LLaMA 3.


In [7]:
from langchain_groq import ChatGroq
from google.colab import userdata
from langchain_groq import ChatGroq

# Load API key
GROQ_API_KEY = userdata.get('GROQ_API_KEY')
llm = ChatGroq(
    model="openai/gpt-oss-20b",
    api_key=GROQ_API_KEY,
    temperature=0.3,
    max_tokens=200
)



## 🔗 Step 5: Create RAG Chain
Combine retriever + LLM into a RetrievalQA pipeline.


In [8]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)


## 💬 Step 6: Ask Questions
Test RAG pipeline with cricket-related queries.


In [9]:
# Query 1
query = "Who is the most successful IPL captain?"
response = qa_chain.invoke({"query": query})

print("Query:", query)
print("Answer:", response["result"])
print("\nSources:", [doc.metadata for doc in response["source_documents"]])

# Query 2
query2 = "Which bowler is famous for yorkers?"
response2 = qa_chain.invoke({"query": query2})

print("\nQuery:", query2)
print("Answer:", response2["result"])
print("\nSources:", [doc.metadata for doc in response2["source_documents"]])


Query: Who is the most successful IPL captain?
Answer: The most successful IPL captain is **Rohit Sharma**, who has led Mumbai Indians to five championship titles.

Sources: [{'team': 'Mumbai Indians'}, {'team': 'Royal Challengers Bangalore'}]

Query: Which bowler is famous for yorkers?
Answer: The bowler famous for his yorkers is **Jasprit Bumrah**.

Sources: [{'team': 'Mumbai Indians'}, {'team': 'Mumbai Indians'}]


# ✅ Summary
- Used **Google Gemini Embeddings** to vectorize documents  
- Stored + Retrieved docs from **Chroma**  
- Connected retriever with **Groq LLM**  
- Answered queries using **RAG pipeline**  

👉 Next session: **Advanced RAG – Custom Prompts, Comparisons, and Deep Dive**
