# Build a RAG with Langchain and ChromaDB

To build this RAG system, we will use a "hybrid" approach: the HuggingFace model runs locally on your machine to create embeddings, while the Gemini API is called over the web to generate the final response.

## Install Dependencies

We need to install the specific integration packages for LangChain to talk to Google (Gemini), HuggingFace, and ChromaDB.

Activate your project virtual environment and install:
```
pip install langchain langchain-google-genai langchain-huggingface langchain-chroma
```

## Initialize the Embedding Model

We will use a popular HuggingFace model (all-MiniLM-L6-v2) because it is lightweight and runs fast on a standard CPU.

In [2]:
import os
from langchain_huggingface import HuggingFaceEmbeddings

# Initialize the local HuggingFace embedding model
# This model will download automatically on first run (~80MB)
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

print("Embedding model initialized locally.")

Embedding model initialized locally.


## Setup Gemini and ChromaDB

Now we set up the "Generator" (Gemini) and our "Knowledge Base" (ChromaDB). You will need your Google AI Studio API Key.

In [5]:
from dotenv import load_dotenv, find_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_chroma import Chroma
from langchain_core.documents import Document


# 1. Set your API Key
# Load the variables from .env into the environment
load_dotenv(find_dotenv())
# Access the key using os.environ
api_key = os.getenv("GOOGLE_API_KEY")

# 2. Initialize Gemini 2.0 (The LLM)
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash-lite", temperature=0)

# 3. Create a tiny local database with some example data
docs = [
    Document(page_content="Gemini 2.0 was released in late 2024 with improved multimodal capabilities."),
    Document(page_content="RAG stands for Retrieval-Augmented Generation, a technique to ground LLMs in external data."),
]

# This creates an in-memory ChromaDB using our HF embeddings
vectorstore = Chroma.from_documents(documents=docs, embedding=embeddings)

print("System ready.")

System ready.


## Create the Retrieval Chain

We use the LangChain Expression Language (LCEL) to pipe the different components together.

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# 1. Define the Prompt Template (The "Instructions")
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

# 2. Setup the Retriever (The "Search Engine")
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

# 3. Build the Chain
# This chain: Finds data -> formats the prompt -> sends to Gemini -> cleans the output
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# 4. Run the example
query = "What is RAG?"
response = rag_chain.invoke(query)

print(f"Question: {query}")
print(f"AI Response: {response}")

Question: What is RAG?
AI Response: RAG stands for Retrieval-Augmented Generation, a technique to ground LLMs in external data.
