# RAG with ChromaDB – Simple Google Colab Demo

This notebook demonstrates Retrieval Augmented Generation (RAG) using ChromaDB.


In [2]:
# Install necessary libraries: chromadb for the vector database and sentence-transformers for generating embeddings.
!pip install chromadb sentence-transformers



In [3]:
documents = [
    "RAG stands for Retrieval Augmented Generation.",
    "ChromaDB is a vector database used to store embeddings.",
    "RAG reduces hallucinations by grounding answers using retrieved documents.",
    "Fine-tuning changes behavior, RAG injects knowledge at runtime.",
    "RAG retrieves relevant chunks before generation."
]
# Display the list of documents to be used.
documents

['RAG stands for Retrieval Augmented Generation.',
 'ChromaDB is a vector database used to store embeddings.',
 'RAG reduces hallucinations by grounding answers using retrieved documents.',
 'Fine-tuning changes behavior, RAG injects knowledge at runtime.',
 'RAG retrieves relevant chunks before generation.']

In [4]:
# Import the SentenceTransformer model for generating embeddings.
from sentence_transformers import SentenceTransformer
# Import the chromadb library for interacting with ChromaDB.
import chromadb

# Initialize the SentenceTransformer model with 'all-MiniLM-L6-v2' for creating embeddings.
model = SentenceTransformer('all-MiniLM-L6-v2')
# Generate embeddings for each document in the 'documents' list.
embeddings = model.encode(documents)

# Initialize the ChromaDB client.
client = chromadb.Client()
# Create a new collection named 'rag_demo' in ChromaDB.
collection = client.create_collection('rag_demo')

# Iterate through each document and its corresponding embedding.
for i, doc in enumerate(documents):
    # Add each document, its ID, and its embedding (converted to a list) to the ChromaDB collection.
    collection.add(ids=[str(i)], documents=[doc], embeddings=[embeddings[i].tolist()])

# Print a confirmation message once documents are stored.
print('Documents stored in ChromaDB')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Documents stored in ChromaDB


In [5]:
# Define the query string.
query = 'How does RAG reduce hallucinations?'
# Generate the embedding for the query.
q_embed = model.encode([query])

# Perform a similarity search in the ChromaDB collection using the query embedding.
# Retrieve the top 2 most relevant results.
results = collection.query(query_embeddings=q_embed.tolist(), n_results=2)
# Extract and display the documents from the search results.
results['documents'][0]

['RAG reduces hallucinations by grounding answers using retrieved documents.',
 'RAG stands for Retrieval Augmented Generation.']