# Enhanced RAG Tutorial with PDF Support using LlamaIndex and KlusterAI

This notebook demonstrates a Retrieval Augmented Generation (RAG) pipeline with PDF document support using LlamaIndex.

We will use KlusterAI's OpenAPI compatible endpoints for:
1. **Embeddings**: Leveraging the `BAAI/bge-m3` model.
2. **Language Model (LLM) for Querying**: Utilizing the `meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8` model.

**Steps:**
1. Install necessary libraries
2. Set up API keys and model endpoints
3. Demonstrate embedding generation with a sample text
4. Load a PDF document as our knowledge base
5. Configure the LlamaIndex components (LLM and Embeddings)
6. Create an index from the document
7. Query the index and compare with non-RAG responses

## 1. Install Necessary Libraries

In [None]:
# Install the necessary packages, including the PDF reader for LlamaIndex
!pip install llama-index llama-index-llms-openai-like llama-index-embeddings-openai-like llama-index-readers-file requests

## 2. Set up API Keys and Model Endpoints

We'll use getpass to securely input your KlusterAI API key without displaying it.

In [8]:
import os
import logging
import sys
import requests
import json
from getpass import getpass
from pprint import pprint

# Optional: Enable detailed logging for LlamaIndex
# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

# Get API key securely using getpass
KLUSTER_API_KEY = getpass("Enter your KlusterAI API Key: ")
KLUSTER_BASE_URL = "https://api.kluster.ai/v1" # KlusterAI base URL

os.environ["OPENAI_API_KEY"] = KLUSTER_API_KEY # LlamaIndex uses this env var for OpenAI-compatible APIs
os.environ["OPENAI_API_BASE"] = KLUSTER_BASE_URL

## 3. Embedding Demonstration

Let's first demonstrate how to generate embeddings directly using the KlusterAI API. This helps illustrate what embeddings look like and how they're used in RAG systems.

In [9]:
def generate_embedding(text):
    """Generate embeddings for the given text using KlusterAI API"""
    headers = {
        "Authorization": f"Bearer {KLUSTER_API_KEY}",
        "Content-Type": "application/json"
    }
    
    response = requests.post(
        f"{KLUSTER_BASE_URL}/embeddings",
        headers=headers,
        json={
            "model": "BAAI/bge-m3",
            "input": text,
            "encoding_format": "float"
        }
    )
    
    if response.status_code != 200:
        raise Exception(f"Error generating embedding: {response.text}")
        
    return response.json()

# Generate embedding for our example text about Paris
sample_text = "The capital of France is Paris. It is known for the Eiffel Tower."
embedding_result = generate_embedding(sample_text)

# Pretty print the first 10 dimensions of the embedding vector
print(f"Sample text: '{sample_text}'")
print(f"Model used: {embedding_result['model']}")
print(f"Embedding dimensions: {len(embedding_result['data'][0]['embedding'])}")
print("\nFirst 10 dimensions of the embedding vector:")
print(embedding_result['data'][0]['embedding'][:10])

# Show token usage information
print(f"\nToken usage: {embedding_result['usage']['prompt_tokens']} tokens")

Sample text: 'The capital of France is Paris. It is known for the Eiffel Tower.'
Model used: BAAI/bge-m3
Embedding dimensions: 1024

First 10 dimensions of the embedding vector:
[0.01739501953125, 0.048370361328125, -0.006679534912109375, 0.0302734375, -0.01477813720703125, -0.00627899169921875, -0.0053253173828125, 0.004657745361328125, -0.019866943359375, 0.03375244140625]

Token usage: 17 tokens


## 4. Loading PDF Document

We'll download and load a Polar Bear PDF document to use as our knowledge base for the RAG system.

In [None]:
import urllib.request
import os

# Create a directory for our PDFs if it doesn't exist
pdf_dir = "sample_pdfs"
os.makedirs(pdf_dir, exist_ok=True)

# Download a sample PDF about Polar Bears (you can replace with your own PDFs)
sample_pdf_url = "https://portals.iucn.org/library/sites/library/files/documents/SSC-OP-007.pdf"  
pdf_path = os.path.join(pdf_dir, "polar_bears.pdf")

if not os.path.exists(pdf_path):
    print(f"Downloading sample PDF to {pdf_path}...")
    urllib.request.urlretrieve(sample_pdf_url, pdf_path)
    print("Download complete!")
else:
    print(f"Sample PDF already exists at {pdf_path}")

Downloading sample PDF to sample_pdfs/polar_bears.pdf...
Download complete!


In [23]:
# Import the necessary document loader from llama_index
from llama_index.core import Document
from llama_index.core import SimpleDirectoryReader

# Load documents from the PDF file
print(f"Loading PDF from {pdf_dir}...")
pdf_reader = SimpleDirectoryReader(input_dir=pdf_dir)
documents = pdf_reader.load_data()

print(f"Loaded {len(documents)} document(s) from PDF file")

# Print a preview of the PDF document (first 500 characters)
if documents:
    print("\nPreview of PDF content:")
    print(f"{documents[0].text[:500]}...")

Loading PDF from sample_pdfs...
Loaded 115 document(s) from PDF file

Preview of PDF content:
...


## 5. Configure LlamaIndex Components (LLM and Embeddings)

We will use `OpenAILike` for the LLM and `OpenAILikeEmbedding` for the embedding model, pointing them to the KlusterAI endpoints and specifying the desired models.

In [25]:
from llama_index.llms.openai_like import OpenAILike
from llama_index.embeddings.openai_like import OpenAILikeEmbedding
from llama_index.core import Settings

# Configure the LLM (Query Model)
llm = OpenAILike(
    model="meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
    api_base=KLUSTER_BASE_URL,
    api_key=KLUSTER_API_KEY,
    is_chat_model=True
)

# Configure the embedding model using OpenAILikeEmbedding
embed_model = OpenAILikeEmbedding(
    model_name="BAAI/bge-m3",
    api_base=KLUSTER_BASE_URL, 
    api_key=KLUSTER_API_KEY
    # No dimensions parameter - it's not supported by KlusterAI
)

# Set the global settings for LlamaIndex
Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = 512
Settings.chunk_overlap = 20

print("LlamaIndex LLM and Kluster AI Embedding Model configured.")

LlamaIndex LLM and Kluster AI Embedding Model configured.


## 6. Create an Index

Now we'll create a `VectorStoreIndex` from our PDF document.

In [26]:
from llama_index.core import VectorStoreIndex

# Create the index from the PDF document
print("Creating index from PDF document...")
index = VectorStoreIndex.from_documents(
    documents
)
print("Index created successfully!")

Creating index from PDF document...
Index created successfully!


## 7. Query the Index and Compare with Non-RAG Responses

Now we'll compare responses using RAG (with our knowledge base) versus direct LLM responses without context.

In [27]:
# Create a query engine for RAG
query_engine = index.as_query_engine()

# Function to get a direct response from the LLM without using RAG
def get_direct_llm_response(query):
    """Get a response directly from the LLM without using RAG"""
    return llm.complete(query).text

print("Query engines prepared. Ready to compare RAG vs non-RAG responses!")

Query engines prepared. Ready to compare RAG vs non-RAG responses!


In [None]:
# Query about content from Polar Paper PDF
pdf_query = "Fact check this: (The NWT suggested caution regarding a proposal that polar bear hides be transportable to the U.S. on CITES permits. It was suggested that whalebone carvings and seal-skin products be considered first and then if there are no political problems, possibly consider polar bears. T)) If you dont know, say 'I don't know'."

print(f"Query: {pdf_query}\n")

print("--- RAG Response (using our knowledge base) ---")
rag_response = query_engine.query(pdf_query)
print(f"{rag_response}")

print("--- Direct LLM Response (without RAG) ---")
direct_response = get_direct_llm_response(pdf_query)
print(direct_response)

Query: Fact check this: (The NWTsuggested caution regarding a proposal that polar bear hides be transportable to the U.S. on CITES permits. It was suggested that whalebone carvings and seal-skin products be considered first and then if there are no political problems, possibly consider polar bears. T)) If you dont know, say 'I don't know'.

--- RAG Response (using our knowledge base) ---
T. The given statement is true according to the provided context information. The relevant text states: "The NWT suggested caution regarding a proposal that polar bear hides be transportable to the U.S. on CITES permits. It was suggested that whalebone carvings and seal-skin products be considered first and then if there are no political problems, possibly consider polar bears. The PBAC concurred because of the potential for damaging progress made in the management of polar bears in Canada."
--- Direct LLM Response (without RAG) ---
To fact-check the given statement, we need to verify its accuracy agai

In [34]:
# Query about a specific technical detail in the paper
technical_query = "What does the Toxicology and Monitoring of Pollutant Levels in Polar Bear Tissue says about the CHC levels? IMPORTANT: If you don't know, say 'I don't know'."

print(f"Query: {technical_query}\n")

print("--- RAG Response (using our knowledge base) ---")
rag_response = query_engine.query(technical_query)
print(f"{rag_response}\n")

print("--- Direct LLM Response (without RAG) ---")
direct_response = get_direct_llm_response(technical_query)
print(direct_response)

Query: What does the Toxicology and Monitoring of Pollutant Levels in Polar Bear Tissue says about the CHC levels? IMPORTANT: If you don't know, say 'I don't know'.

--- RAG Response (using our knowledge base) ---
The levels of CHCs were generally inversely correlated to latitude, and reanalysis of polar bear fat samples showed that the level of most CHCs, especially chlordane compounds, had increased from 1969 to 1984 in Hudson Bay and Baffin Bay bears.

--- Direct LLM Response (without RAG) ---
I don't know the specific details about what the Toxicology and Monitoring of Pollutant Levels in Polar Bear Tissue says about the CHC levels. If you're looking for accurate information on this topic, I recommend consulting the original research or a reliable scientific summary.


In [35]:
# Query about authors and publication details
authors_query = "Who are the authors of the Polar Bear Paper?. IMPORTANT: If you don't know, say 'I don't know'."

print(f"Query: {authors_query}\n")

print("--- RAG Response (using our knowledge base) ---")
rag_response = query_engine.query(authors_query)
print(f"{rag_response}\n")

print("--- Direct LLM Response (without RAG) ---")
direct_response = get_direct_llm_response(authors_query)
print(direct_response)

Query: Who are the authors of the Polar Bear Paper?. IMPORTANT: If you don't know, say 'I don't know'.

--- RAG Response (using our knowledge base) ---
Steven C. Amstrup and Oystein Wiig are the compilers and editors of the Polar Bear publication, as indicated on page 3. However, the authors of specific papers or research mentioned within the publication include various individuals such as Stirling, Schweinsburg, Kolenosky, Juniper, Robertson, Luttich, Calvelt, Sjare, Taylor, Bunnell, DeMaster, and Smith. Without more specific information about the "Polar Bear Paper," it's challenging to provide a definitive list of authors. Therefore, based on the information available, the compilers and editors are Steven C. Amstrup and Oystein Wiig.

--- Direct LLM Response (without RAG) ---
I don't know.


## Conclusion

This notebook demonstrated a RAG system using LlamaIndex and KlusterAI that incorporates a PDF document as a knowledge source. We've seen:

1. **How embeddings work**: We generated and visualized embeddings using the BAAI/bge-m3 model.
2. **PDF integration**: We loaded and processed a research paper (the GPT-3 paper) for our knowledge base.
3. **RAG vs. Direct LLM**: We compared responses from our RAG system to direct LLM outputs.

**Key observations:**
- RAG responses include specific information from the PDF that may not be in the LLM's training data
- For queries about details in the paper, RAG provides more precise and accurate answers
- RAG helps ground the model's responses in the actual content of the document rather than relying on the model's pre-trained knowledge

**Next steps:**
- Try with your own PDFs or other document types
- Experiment with different chunking strategies to optimize retrieval
- Customize the query engine with metadata filters
- Implement more advanced RAG techniques like HyDE or reranking