# Retrieval-Augmented Generation (RAG) with Voyage AI

This notebook is a companion to the [RAG with Voyage AI](https://www.mongodb.com/docs/voyageai/tutorials/rag/) tutorial. Refer to the page for set-up instructions and detailed explanations.

Retrieval-augmented generation (RAG) is an architecture that uses semantic search to augment large language models (LLMs) with additional data, enabling them to generate more accurate responses.

While semantic search retrieves relevant documents based on meaning, RAG takes this a step further by providing those retrieved documents as context to an LLM. This additional context helps the LLM generate a more accurate response to a user's query, reducing hallucinations. Voyage AI provides best-in-class embedding and reranking models to power retrieval for your RAG applications.

<a target="_blank" href="https://colab.research.google.com/github/mongodb/docs-notebooks/blob/main/voyageai/notebooks/rag.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

## Prerequisites

To complete this tutorial, you must have the following:

- Python 3.9+
- A model API key to access Voyage AI models
- An LLM API key (Anthropic or OpenAI)
- For MongoDB storage: A MongoDB Atlas cluster with connection string

## Install Required Packages

In [None]:
!pip install --upgrade voyageai numpy anthropic openai langchain-community langchain-text-splitters pypdf python-dotenv pymongo

## Set Environment Variables

In [None]:
import os

# Set your API keys
os.environ["VOYAGE_API_KEY"] = "<your-model-api-key>"

# Choose your LLM provider
LLM_PROVIDER = "anthropic"  # Options: "anthropic" or "openai"
os.environ["ANTHROPIC_API_KEY"] = "<your-anthropic-api-key>"
os.environ["OPENAI_API_KEY"] = "<your-openai-api-key>"

# If using MongoDB for vector search (optional)
os.environ["MONGODB_URI"] = "<your-mongodb-connection-string>"

## Initialize Clients

In [None]:
import numpy as np
from voyageai import Client as VoyageClient
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Initialize Voyage client
voyage_client = VoyageClient(api_key=os.environ.get("VOYAGE_API_KEY"))

# Initialize LLM client based on provider
if LLM_PROVIDER == "anthropic":
    from anthropic import Anthropic
    llm_client = Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
    LLM_MODEL = "claude-sonnet-4-5-20250929"
elif LLM_PROVIDER == "openai":
    from openai import OpenAI
    llm_client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
    LLM_MODEL = "gpt-4o"
else:
    raise ValueError("Unsupported LLM provider. Please choose 'anthropic' or 'openai'.")

# Model configuration
VOYAGE_MODEL = "voyage-4-large"

print(f"Using LLM provider: {LLM_PROVIDER}")

## In-Memory Storage Implementation

> **Note:** Storing vectors in memory is suitable for prototyping and experimentation. For production applications, use a vector database like MongoDB Atlas for efficient retrieval from larger datasets.

In [None]:
# In-memory storage
documents = []
embeddings = []

## Ingest Data

Load a PDF, split into chunks, and generate embeddings.

In [None]:
# Load the PDF
loader = PyPDFLoader("https://investors.mongodb.com/node/13576/pdf")
data = loader.load()

# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=400,
    chunk_overlap=20
)
chunks = text_splitter.split_documents(data)

# Generate embeddings and store in memory
print(f"Generating embeddings for {len(chunks)} chunks...")
for chunk in chunks:
    result = voyage_client.embed(
        [chunk.page_content],
        model=VOYAGE_MODEL,
        input_type="document"
    )
    embedding = np.array(result.embeddings[0], dtype=np.float32)
    documents.append(chunk.page_content)
    embeddings.append(embedding)

print(f"Ingested {len(documents)} documents")

## Retrieve Documents and Generate Response

In [None]:
# Example query
query = "What are MongoDB's latest AI announcements?"
print(f"Query: {query}\n")

# Generate query embedding
query_result = voyage_client.embed(
    [query],
    model=VOYAGE_MODEL,
    input_type="query"
)
query_embedding = np.array(query_result.embeddings[0], dtype=np.float32)

# Calculate similarity scores using dot product
embeddings_array = np.array(embeddings)
similarities = np.dot(embeddings_array, query_embedding)

# Get top-5 most similar documents
top_k = 5
top_indices = np.argsort(similarities)[::-1][:top_k]

retrieved_docs = []
for idx in top_indices:
    retrieved_docs.append({
        "text": documents[idx],
        "score": float(similarities[idx])
    })

# Combine retrieved documents into context
context = "\n\n".join([doc["text"] for doc in retrieved_docs])

# Create prompt with context
prompt = f"""Based on the following information, answer the question.

Context:
{context}

Question: {query}

Answer:"""

# Generate response based on LLM provider
if LLM_PROVIDER == "anthropic":
    response = llm_client.messages.create(
        model=LLM_MODEL,
        max_tokens=1024,
        system="You are a helpful assistant that answers questions based on the provided context.",
        messages=[{"role": "user", "content": prompt}]
    )
    answer = response.content[0].text
elif LLM_PROVIDER == "openai":
    response = llm_client.chat.completions.create(
        model=LLM_MODEL,
        messages=[
            {"role": "system", "content": "You are a helpful assistant that answers questions based on the provided context."},
            {"role": "user", "content": prompt}
        ]
    )
    answer = response.choices[0].message.content
else:
    answer = "Unsupported LLM provider."

print(f"Response:\n{answer}")

---

# RAG with MongoDB Vector Search

The following section demonstrates how to implement RAG with MongoDB Atlas as the vector store for production applications.

## Initialize MongoDB Connection

In [None]:
from pymongo import MongoClient
from pymongo.operations import SearchIndexModel
import time

# Initialize MongoDB client
mongo_client = MongoClient(os.environ.get("MONGODB_URI"))
rag_db = mongo_client["rag_db"]
collection = rag_db["test"]

print("Connected to MongoDB")

## Ingest Data into MongoDB

In [None]:
# Load the PDF
loader = PyPDFLoader("https://investors.mongodb.com/node/13576/pdf")
data = loader.load()

# Split the data into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=20)
chunks = text_splitter.split_documents(data)

# Generate embeddings and prepare documents
docs_to_insert = []
print(f"Generating embeddings for {len(chunks)} chunks...")
for doc in chunks:
    result = voyage_client.embed(
        [doc.page_content],
        model=VOYAGE_MODEL,
        input_type="document"
    )
    embedding = np.array(result.embeddings[0], dtype=np.float32)
    if embedding is not None:
        docs_to_insert.append({
            "text": doc.page_content,
            "embedding": embedding.tolist()
        })

# Insert documents into the collection
if docs_to_insert:
    collection.insert_many(docs_to_insert)
    print(f"Inserted {len(docs_to_insert)} documents into MongoDB")

## Create Vector Search Index

In [None]:
index_name = "vector_index"

# Check if index already exists
existing_indexes = list(collection.list_search_indexes(index_name))
if existing_indexes:
    if existing_indexes[0].get("queryable"):
        print("Vector index already exists and is queryable")
else:
    # Create the search index
    print("Creating vector search index...")
    search_index_model = SearchIndexModel(
        definition = {
            "fields": [
                {
                    "type": "vector",
                    "numDimensions": 1024,
                    "path": "embedding",
                    "similarity": "dotProduct"
                }
            ]
        },
        name=index_name,
        type="vectorSearch"
    )

    collection.create_search_index(model=search_index_model)

    # Wait for index to become queryable
    print("Waiting for index to become queryable...")
    while True:
        indices = list(collection.list_search_indexes(index_name))
        if len(indices) and indices[0].get("queryable") is True:
            print("Index is ready!")
            break
        time.sleep(5)

## Retrieve Documents and Generate Response

In [None]:
# Query with MongoDB
query = "What are MongoDB's latest AI announcements?"
print(f"Query: {query}\n")

# Generate query embedding
query_result = voyage_client.embed(
    [query],
    model=VOYAGE_MODEL,
    input_type="query"
)
query_embedding = np.array(query_result.embeddings[0], dtype=np.float32)

# Define the aggregation pipeline
pipeline = [
    {
        "$vectorSearch": {
            "index": "vector_index",
            "queryVector": query_embedding.tolist(),
            "path": "embedding",
            "exact": True,
            "limit": 5
        }
    },
    {
        "$project": {
            "_id": 0,
            "text": 1
        }
    }
]

# Execute the query
results = collection.aggregate(pipeline)
context_docs = list(results)

# Convert documents to string
context_string = " ".join([doc["text"] for doc in context_docs])

# Construct prompt for the LLM
prompt = f"""Use the following pieces of context to answer the question at the end.
    {context_string}
    Question: {query}
"""

# Generate response based on LLM provider
if LLM_PROVIDER == "anthropic":
    message = llm_client.messages.create(
        model=LLM_MODEL,
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    )
    answer = message.content[0].text
elif LLM_PROVIDER == "openai":
    completion = llm_client.chat.completions.create(
        model=LLM_MODEL,
        messages=[{"role": "user", "content": prompt}]
    )
    answer = completion.choices[0].message.content
else:
    answer = "Unsupported LLM provider."

print(f"Answer: {answer}")