# Dynamic RAG with ChromaDB and OpenRouter

**Copyright 2024, Denis Rothman**

This notebook demonstrates Dynamic RAG using:
- **ChromaDB**: Lightweight, ephemeral vector storage (in-memory)
- **OpenRouter**: Cloud-based LLM API (no local model downloads)
- **Llama 3.1 8B**: High-quality model for production RAG (~$0.06 per 1M tokens)

## What is Dynamic RAG?

Dynamic RAG creates vector embeddings on-the-fly during each session without persistent storage. This approach is ideal for:
- Development and testing
- Temporary datasets
- Cost-effective prototyping
- Learning RAG concepts

**Local Jupyter Setup:** This notebook uses `.env` files for API keys and runs entirely on your local machine.

[Reference: ChromaDB documentation](https://docs.trychroma.com/getting-started)

# Environment Setup

**Local Jupyter Setup:** This notebook uses `.env` file for API keys.

Required API keys in `.env` file:
```
OPENROUTER_API_KEY=sk-or-...
```

**Important Notes:**
- This notebook was migrated from Google Colab to local Jupyter
- Uses ChromaDB for temporary in-memory vector storage
- Uses OpenRouter API for Llama-2 model (no local model download required)
- All file operations use UTF-8 encoding for Windows compatibility

## OpenRouter

Sign up on OpenRouter to obtain your API key:

https://openrouter.ai/

OpenRouter provides access to various open-source models including Llama without needing to download large models locally.

**Local Setup:** Store your token in `.env` file as `OPENROUTER_API_KEY=sk-or-...`

In [1]:
# Environment Setup - Load API keys from .env file
import os
from dotenv import load_dotenv

# Load API keys from .env file
load_dotenv()

# Get OpenRouter API key
OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")

if not OPENROUTER_API_KEY:
    raise ValueError("OPENROUTER_API_KEY not found in .env file")

# Set environment variable
os.environ['OPENROUTER_API_KEY'] = OPENROUTER_API_KEY

print("✓ Environment configured")
print(f"  OpenRouter API Key: {OPENROUTER_API_KEY[:10]}...")
print("  Using ChromaDB (in-memory - no persistence)")
print("  Using OpenRouter for LLM (no local model required)")

✓ Environment configured
  OpenRouter API Key: sk-or-v1-0...
  Using ChromaDB (in-memory - no persistence)
  Using OpenRouter for LLM (no local model required)


In [2]:
# Install datasets package
# Note: Run this cell only once or if package is not installed
import subprocess
import sys

try:
    import datasets
    print(f"✓ datasets already installed (version {datasets.__version__})")
except ImportError:
    print("Installing datasets...")
    subprocess.check_call([sys.executable, "-m", "pip", "install", "datasets==2.20.0"])
    print("✓ datasets installed successfully")

✓ datasets already installed (version 4.3.0)


In [3]:
# Install OpenAI package (OpenRouter is compatible with OpenAI API)
import subprocess
import sys

try:
    import openai
    print(f"✓ openai already installed (version {openai.__version__})")
except ImportError:
    print("Installing openai...")
    subprocess.check_call([sys.executable, "-m", "pip", "install", "openai"])
    print("✓ openai installed successfully")

✓ openai already installed (version 1.109.1)


## Installing Dependencies

**Note:** OpenRouter provides cloud-based access to open-source models like Llama-2, eliminating the need for local model downloads and GPU requirements.

In [4]:
# No accelerate needed for OpenRouter API
print("✓ No accelerate package needed - using cloud API")

✓ No accelerate package needed - using cloud API


In [38]:
# Model Configuration for OpenRouter
#
# This notebook uses Llama 3.1 8B Instruct, a high-quality model via OpenRouter.
#
# Model: meta-llama/llama-3.1-8b-instruct
# - Type: Paid (very affordable ~$0.06 per 1M tokens)
# - Good for: Production RAG, excellent response quality
# - Context window: 128K tokens
#
# Alternative paid models:
# - "mistralai/mistral-7b-instruct" (~$0.07 per 1M tokens)
# - "google/gemini-flash-1.5" (~$0.075 per 1M tokens)
# - "anthropic/claude-3-haiku" (~$0.25 per 1M tokens, highest quality)

MODEL = "meta-llama/llama-3.1-8b-instruct"

print(f"✓ Model configured: {MODEL}")
print("  Model type: Paid (~$0.06 per 1M tokens)")
print("  IMPORTANT: Re-run cell 11 after changing this MODEL variable!")

✓ Model configured: meta-llama/llama-3.1-8b-instruct
  Model type: Paid (~$0.06 per 1M tokens)
  IMPORTANT: Re-run cell 11 after changing this MODEL variable!


## Configuring the OpenRouter Client

### Method 1: OpenAI Client (Recommended)

The OpenAI Python library provides a clean, simple way to call OpenRouter's API. This is the recommended approach for production code.

We'll initialize the client below and use it for our RAG pipeline.

## Alternative API Method (For Learning)

The code below shows how to call OpenRouter using manual HTTP requests with the `requests` library. This is kept for educational purposes but is **not used** in this notebook.

```python
# Manual API call method (for reference/learning only - not used in this notebook)
import requests
import os

OPENROUTER_API_URL = "https://openrouter.ai/api/v1/chat/completions"
OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")

def call_openrouter_manual(prompt, max_tokens=50):
    """Manual way to call OpenRouter API using requests library."""
    headers = {
        "Authorization": f"Bearer {OPENROUTER_API_KEY}",
        "Content-Type": "application/json",
        "HTTP-Referer": "https://github.com/PacktPublishing/RAG-Driven-Generative-AI",
        "X-Title": "Dynamic RAG Notebook"
    }
    
    payload = {
        "model": MODEL,
        "messages": [{"role": "user", "content": prompt}],
        "max_tokens": max_tokens,
        "temperature": 0.5
    }
    
    try:
        response = requests.post(OPENROUTER_API_URL, headers=headers, json=payload)
        response.raise_for_status()
        result = response.json()
        return result['choices'][0]['message']['content']
    except Exception as e:
        return f"Error: {str(e)}"
```

**Note:** The notebook uses the cleaner OpenAI client method instead (see cell 11 below).

In [39]:
from openai import OpenAI
import os

# Configure OpenAI client for OpenRouter (named openai_client to avoid conflict with ChromaDB client)
openai_client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.getenv("OPENROUTER_API_KEY")
)

# Dynamically print current MODEL value
print(f"✓ OpenRouter client initialized")
print(f"✓ Using model: {MODEL}")  # This will show current MODEL value from cell 8
print("✓ No local model download required")

✓ OpenRouter client initialized
✓ Using model: meta-llama/llama-3.1-8b-instruct
✓ No local model download required


In [40]:
# Test the model with a simple request
test_prompt = "What is 2+2? Answer in one sentence."

print(f"Testing model: {MODEL}")
print(f"Question: {test_prompt}\n")

try:
    response = openai_client.chat.completions.create(
        model=MODEL,
        messages=[{"role": "user", "content": test_prompt}],
        max_tokens=50
    )
    
    answer = response.choices[0].message.content
    print(f"✓ Model Response: {answer}")
    print(f"✓ Model working successfully")
except Exception as e:
    error_str = str(e)
    print(f"❌ Error: {e}")
    
    # Provide helpful guidance based on error type
    if "429" in error_str or "rate-limited" in error_str.lower():
        print("\n⚠ Model is rate-limited. Try one of these alternatives:")
        print("   1. MODEL = 'mistralai/mistral-7b-instruct:free'")
        print("   2. MODEL = 'google/gemma-2-9b-it:free'")
        print("   3. MODEL = 'mistralai/mistral-7b-instruct' (paid but very cheap)")
        print("\n   Then re-run this cell and cell 11")
    elif "404" in error_str or "not found" in error_str.lower():
        print("\n⚠ Model not found. Check available models at:")
        print("   https://openrouter.ai/models")
    else:
        print("\n⚠ Check your OPENROUTER_API_KEY in .env file")

Testing model: meta-llama/llama-3.1-8b-instruct
Question: What is 2+2? Answer in one sentence.

✓ Model Response: The result of 2+2 is 4.
✓ Model working successfully


### Installing spaCy Model

**Option 1: Install using uv (Recommended for this project)**

This project uses `uv` for package management. Open your terminal and run:

```bash
uv pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.7.1/en_core_web_md-3.7.1-py3-none-any.whl
```

**Option 2: Install using pip (Standard method)**

If you're not using `uv`, use pip instead:

```bash
# For Windows
python -m pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.7.1/en_core_web_md-3.7.1-py3-none-any.whl

# For Mac/Linux
python3 -m pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.7.1/en_core_web_md-3.7.1-py3-none-any.whl
```

**Option 3: Install in Notebook (Automatic)**

Run the code cell below. It will attempt to install automatically.

**Note:** If automatic installation fails, use Option 1 or 2 in your terminal.

**Note:** You may need to restart the Jupyter kernel after installing spaCy models. Try continuing without restarting first, and only restart if you encounter import errors.

In [8]:
# Install spaCy language model
# Note: Run this cell only once or if model is not installed
import subprocess
import sys

model_installed = False

# First, check if spacy is available
try:
    import spacy
    spacy_available = True
except ImportError:
    print("❌ spaCy not installed. Please install it first:")
    print("   uv add spacy")
    print("   OR")
    print("   pip install spacy")
    spacy_available = False

# If spacy is available, try to load the model
if spacy_available:
    try:
        nlp = spacy.load('en_core_web_md')
        print("✓ en_core_web_md already installed and working")
        model_installed = True
    except (OSError, ModuleNotFoundError) as e:
        print(f"en_core_web_md not found or incompatible: {e}")
        print("\n⚠ Please install manually using one of these methods:")
        print("\nMethod 1 (uv - Recommended for this project):")
        print("  uv pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.7.1/en_core_web_md-3.7.1-py3-none-any.whl")
        print("\nMethod 2 (pip):")
        print("  python -m pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.7.1/en_core_web_md-3.7.1-py3-none-any.whl")
        print("\n⚠ After installation, RESTART THE JUPYTER KERNEL and run this cell again.")
        model_installed = False

if not model_installed and spacy_available:
    print("\n" + "="*70)
    print("IMPORTANT: Restart the Jupyter kernel after installing the model!")
    print("="*70)

✓ en_core_web_md already installed and working


---

# Part 1: Data Preparation

## Session Time Tracking


Session time is there to measure the whole one run dynamic RAG process to prepare a daily meeting in the notebook's scenario.

It is recommended to use a GPU if one is available.

This does not include the environment installation time since this program can run on a pre-installed local machine.

In [9]:
import time
# Start timing before the request
session_start_time = time.time()

## Downloading and Preparing the SciQ Dataset

We'll use the SciQ dataset from HuggingFace, which contains 10,481 science questions with support text.

In [10]:
# Import required libraries
from datasets import load_dataset
import pandas as pd

# Load the SciQ dataset from HuggingFace
dataset = load_dataset("sciq", split="train")

# Filter the dataset to include only questions with support and correct answer
filtered_dataset = dataset.filter(lambda x: x["support"] != "" and x["correct_answer"] != "")


# Print the number of questions with support
print("Number of questions with support: ", len(filtered_dataset))

Number of questions with support:  10481


In [11]:
# Convert the filtered dataset to a pandas DataFrame
df = pd.DataFrame(filtered_dataset)

# Columns to drop
columns_to_drop = ['distractor3', 'distractor1', 'distractor2']

# Dropping the columns from the DataFrame
df.drop(columns=columns_to_drop, inplace=True)

# Create a new column 'completion' by merging 'correct_answer' and 'support'
df['completion'] = df['correct_answer'] + " because " + df['support']

# Ensure no NaN values are in the 'completion' column
df.dropna(subset=['completion'], inplace=True)
df

Unnamed: 0,question,correct_answer,support,completion
0,What type of organism is commonly used in prep...,mesophilic organisms,"Mesophiles grow best in moderate temperature, ...",mesophilic organisms because Mesophiles grow b...
1,What phenomenon makes global winds blow northe...,coriolis effect,Without Coriolis Effect the global winds would...,coriolis effect because Without Coriolis Effec...
2,Changes from a less-ordered state to a more-or...,exothermic,Summary Changes of state are examples of phase...,exothermic because Summary Changes of state ar...
3,What is the least dangerous radioactive decay?,alpha decay,All radioactive decay is dangerous to living t...,alpha decay because All radioactive decay is d...
4,Kilauea in hawaii is the world’s most continuo...,smoke and ash,Example 3.5 Calculating Projectile Motion: Hot...,smoke and ash because Example 3.5 Calculating ...
...,...,...,...,...
10476,The enzyme pepsin plays an important role in t...,peptides,Protein A large part of protein digestion take...,peptides because Protein A large part of prote...
10477,What remains a constant of radioactive substan...,rate of decay,The rate of decay of a radioactive substance i...,rate of decay because The rate of decay of a r...
10478,"Terrestrial ecosystems, also known for their d...",biomes,"Terrestrial ecosystems, also known for their d...","biomes because Terrestrial ecosystems, also kn..."
10479,High explosives create shock waves that exceed...,supersonic,The modern day formulation of gun powder is ca...,supersonic because The modern day formulation ...


In [12]:
df.shape

(10481, 4)

In [13]:
# Assuming 'df' is your DataFrame
print(df.columns)

Index(['question', 'correct_answer', 'support', 'completion'], dtype='object')


---

# Part 2: Vector Storage with ChromaDB

## Embedding and Upserting Data


## Creating the Chroma collection

In [14]:
# Import Chroma and instantiate a client
import chromadb

# Create client
client = chromadb.Client()
print("✓ ChromaDB client initialized (in-memory, ephemeral)")

✓ ChromaDB client initialized (in-memory, ephemeral)


In [15]:
collection_name = "sciq_supports6"
print(f"Collection name: {collection_name}")

Collection name: sciq_supports6


In [16]:
# Ensure we start fresh - delete collection if it exists from previous run
try:
    client.delete_collection(collection_name)
    print(f"✓ Deleted existing collection '{collection_name}' from previous run")
except:
    print(f"✓ No existing collection to delete")

# List all collections to verify
collections = client.list_collections()
collection_exists = any(collection.name == collection_name for collection in collections)
print(f"Collection exists: {collection_exists}")
print("✓ Ready to create new collection")

✓ No existing collection to delete
Collection exists: False
✓ Ready to create new collection


In [17]:
# Create a new Chroma collection to store the supporting evidence
# We don't need to specify an embedding function, the default will be used
collection = client.create_collection(collection_name)
print(f"✓ Created new collection: {collection_name}")

✓ Created new collection: sciq_supports6


In [18]:
# Printing the dictionary
results = collection.get()
for result in results:
    print(result)  # This will print the dictionary for each item

ids
embeddings
documents
uris
included
data
metadatas


## Selecting a model

In [None]:
model_name = "all-MiniLM-L6-v2"  # The name of the default model to use for embedding in ChromaDB

## Embedding and storing the  completions


In [20]:
ldf=len(df)

In [21]:
nb = ldf  # number of questions to embed and store
import time
start_time = time.time()  # Start timing before the request

# Convert Series to list of strings
completion_list = df["completion"][:nb].astype(str).tolist()

# ChromaDB has a max batch size, so we need to process in batches
batch_size = 5000  # Safe batch size (max is 5461)
total_batches = (nb + batch_size - 1) // batch_size  # Calculate number of batches

print(f"Adding {nb} documents in {total_batches} batches...")

for batch_idx in range(total_batches):
    start_idx = batch_idx * batch_size
    end_idx = min((batch_idx + 1) * batch_size, nb)
    
    # Embed and store this batch
    collection.add(
        ids=[str(i) for i in range(start_idx, end_idx)],
        documents=completion_list[start_idx:end_idx],
        metadatas=[{"type": "completion"} for _ in range(start_idx, end_idx)],
    )
    
    print(f"  Batch {batch_idx + 1}/{total_batches}: Added documents {start_idx}-{end_idx}")

print("✓ All documents added successfully")

response_time = time.time() - start_time  # Measure response time
print(f"Response Time: {response_time:.2f} seconds")  # Print response time

Adding 10481 documents in 3 batches...
  Batch 1/3: Added documents 0-5000
  Batch 2/3: Added documents 5000-10000
  Batch 3/3: Added documents 10000-10481
✓ All documents added successfully
Response Time: 944.21 seconds


## Displaying the embeddings and the completions

In [None]:
# Fetch the collection with embeddings included
result = collection.get(include=['embeddings'])

# Extract the first embedding from the result
first_embedding = result['embeddings'][0]

# If you need to work with the length or manipulate the first embedding:
embedding_length = len(first_embedding) # for model: "all-MiniLM-L6-v2"

print("First embedding:", first_embedding)
print("Embedding length:", embedding_length)

First embedding: [ 3.68907079e-02 -5.88156618e-02 -4.81813326e-02  6.92331642e-02
  1.66964978e-02 -4.07537222e-02  1.88399665e-02  1.81023628e-02
  1.78051423e-02  7.78705478e-02  2.52816640e-02 -1.57923087e-01
 -2.36181635e-02  9.52994600e-02 -5.83179388e-03 -9.35172942e-03
  8.79396722e-02 -2.97825877e-02 -3.17596346e-02  3.58476944e-04
  4.81602177e-02  3.59455980e-02 -6.36885539e-02 -3.58013026e-02
  8.47947598e-03 -4.70491946e-02 -1.44115845e-02  1.53261637e-02
 -1.74492616e-02  3.77150923e-02 -5.39003126e-02  1.29380950e-03
  1.40758231e-01 -1.21125570e-02  1.60011258e-02  2.58895960e-02
  9.29332245e-03 -1.31458566e-01  4.73491177e-02  5.54820485e-02
 -2.50272304e-02  4.49109487e-02  6.07553348e-02 -1.31188298e-03
 -2.81656906e-02  1.87065490e-02 -5.63845932e-02  7.59200156e-02
 -7.12970924e-03 -6.82346597e-02 -9.04978346e-03  5.66561222e-02
 -1.45302843e-02  5.78948557e-02 -6.67471290e-02  2.99725756e-02
 -5.11366464e-02 -2.36395728e-02 -6.88513648e-03 -9.38077550e-03
  5.5031

In [23]:
# Fetch the collection with embeddings included
result = collection.get(include=['documents'])

# Extract the first embedding from the result
first_doc = result['documents'][0]

print("First document:", first_doc)

First document: mesophilic organisms because Mesophiles grow best in moderate temperature, typically between 25°C and 40°C (77°F and 104°F). Mesophiles are often found living in or on the bodies of humans or other animals. The optimal growth temperature of many pathogenic mesophiles is 37°C (98°F), the normal human body temperature. Mesophilic organisms have important uses in food preparation, including cheese, yogurt, beer and wine.


---

# Part 3: Retrieval and Similarity Evaluation

## Querying the Collection

In [24]:
import time
start_time = time.time()  # Start timing before the request

# Convert questions to list of strings
question_list = df["question"][:nb].astype(str).tolist()

# Query in batches to avoid memory issues with large queries
batch_size = 1000  # Query batch size
results = {"documents": [], "distances": [], "ids": []}

print(f"Querying {nb} questions in batches...")
for batch_idx in range(0, nb, batch_size):
    end_idx = min(batch_idx + batch_size, nb)
    batch_results = collection.query(
        query_texts=question_list[batch_idx:end_idx],
        n_results=1
    )
    
    # Append results
    results["documents"].extend(batch_results["documents"])
    results["distances"].extend(batch_results["distances"])
    results["ids"].extend(batch_results["ids"])
    
    if (batch_idx // batch_size + 1) % 5 == 0:
        print(f"  Processed {end_idx}/{nb} queries...")

print(f"✓ All queries completed")

response_time = time.time() - start_time  # Measure response time
print(f"Response Time: {response_time:.2f} seconds")  # Print response time

Querying 10481 questions in batches...
  Processed 5000/10481 queries...
  Processed 10000/10481 queries...
✓ All queries completed
Response Time: 747.93 seconds


## Creating a Similarity Measurement Function

This function uses spaCy's medium language model to compute cosine similarity between texts.

In [25]:
import spacy
import numpy as np

# Load the pre-trained spaCy language model
nlp = spacy.load('en_core_web_md')  # Ensure that you've installed this model with 'python -m spacy download en_core_web_md'

def simple_text_similarity(text1, text2):
    # Convert the texts into spaCy document objects
    doc1 = nlp(text1)
    doc2 = nlp(text2)

    # Get the vectors for each document
    vector1 = doc1.vector
    vector2 = doc2.vector

    # Compute the cosine similarity between the two vectors
    # Check for zero vectors to avoid division by zero
    if np.linalg.norm(vector1) == 0 or np.linalg.norm(vector2) == 0:
        return 0.0  # Return zero if one of the texts does not have a vector representation
    else:
        similarity = np.dot(vector1, vector2) / (np.linalg.norm(vector1) * np.linalg.norm(vector2))
        return similarity

## Evaluating Retrieval Quality

Displaying the query questions along with their retrieved completions, the original documents, and a simple text similarity score.

In [26]:
nbqd = 100  # the number of responses to display supposing there are more than 100 records

# Print the question, the original completion, the retrieved document, and compare them
acc_counter = 0
display_counter = 0

print(f"Evaluating {nb} documents...")
print("=" * 70)

for i, q in enumerate(df['question'][:nb]):
    original_completion = df['completion'][i]  # Access the original completion for the question
    retrieved_document = results['documents'][i][0]  # Retrieve the corresponding document
    similarity_score = simple_text_similarity(original_completion, retrieved_document)
    
    if similarity_score > 0.7:
        acc_counter += 1
    
    display_counter += 1
    
    # Display progress every 1000 documents
    if display_counter % 1000 == 0:
        print(f"Processed {display_counter}/{nb} documents...")
    
    # Only display first 100 and last 100
    if display_counter <= nbqd or display_counter > nb - nbqd:
        print(f"{i}  Question: {q}")
        print(f"Retrieved document: {retrieved_document}")
        print(f"Original completion: {original_completion}")
        print(f"Similarity Score: {similarity_score:.2f}")
        print()  # Blank line for better readability between entries

print("=" * 70)
if nb > 0:
    acc = acc_counter / nb
    print(f"Number of documents: {nb}")
    print(f"Overall similarity score: {acc:.2f}")
    print(f"Documents with similarity > 0.7: {acc_counter}/{nb} ({acc*100:.1f}%)")
print("✓ Evaluation complete")

Evaluating 10481 documents...
0  Question: What type of organism is commonly used in preparation of foods such as cheese and yogurt?
Retrieved document: lactic acid because Bacteria can be used to make cheese from milk. The bacteria turn the milk sugars into lactic acid. The acid is what causes the milk to curdle to form cheese. Bacteria are also involved in producing other foods. Yogurt is made by using bacteria to ferment milk ( Figure below ). Fermenting cabbage with bacteria produces sauerkraut.
Original completion: mesophilic organisms because Mesophiles grow best in moderate temperature, typically between 25°C and 40°C (77°F and 104°F). Mesophiles are often found living in or on the bodies of humans or other animals. The optimal growth temperature of many pathogenic mesophiles is 37°C (98°F), the normal human body temperature. Mesophilic organisms have important uses in food preparation, including cheese, yogurt, beer and wine.
Similarity Score: 0.73

1  Question: What phenomenon

---

# Part 4: Retrieval Augmented Generation

## Testing Prompt and Retrieval

### Example Query

**Question:** Millions of years ago, plants used energy from the sun to form what?

**Retrieved support:** glucose because Cellular respiration and photosynthesis are direct opposite reactions. Energy from the sun enters a plant and is converted into glucose during photosynthesis. Some of the energy is used to make ATP in the mitochondria during cellular respiration, and some is lost to the environment as heat.

In [41]:
# initial question
prompt = "Millions of years ago, plants used energy from the sun to form what?"
# variant 1 similar
#prompt = "Eons ago, plants used energy from the sun to form what?"
# variant 2 divergent
#prompt = "Eons ago, plants used sun energy to form what?"

In [42]:
import time
import textwrap

# Start timing before the request
start_time = time.time()

# Query the collection using the prompt
results = collection.query(
    query_texts=[prompt],  # Use the prompt in a list as expected by the query method
    n_results=1  # Number of results to retrieve
)

# Measure response time
response_time = time.time() - start_time

# Print response time
print(f"Response Time: {response_time:.2f} seconds\n")

# Check if documents are retrieved
if results['documents'] and len(results['documents'][0]) > 0:
    # Use textwrap to format the output for better readability
    wrapped_question = textwrap.fill(prompt, width=70)  # Wrap text at 70 characters
    wrapped_document = textwrap.fill(results['documents'][0][0], width=70)

    # Print formatted results
    print(f"Question: {wrapped_question}")
    print("\n")
    print(f"Retrieved document: {wrapped_document}")
    print()
else:
    print("No documents retrieved.")


Response Time: 2.34 seconds

Question: Millions of years ago, plants used energy from the sun to form what?


Retrieved document: chloroplasts because When ancient plants underwent photosynthesis,
they changed energy in sunlight to stored chemical energy in food. The
plants used the food and so did the organisms that ate the plants.
After the plants and other organisms died, their remains gradually
changed to fossil fuels as they were covered and compressed by layers
of sediments. Petroleum and natural gas formed from ocean organisms
and are found together. Coal formed from giant tree ferns and other
swamp plants.



---

# Part 4: RAG Generation with OpenRouter

This section demonstrates how to use the retrieved context with OpenRouter's LLM API to generate responses.

## LLaMA2 Function - OpenAI Client Method (Recommended)

This uses the OpenAI Python library for clean, production-ready code.

In [47]:
def LLaMA2(prompt):
    """
    Call LLM via OpenRouter using OpenAI client (RECOMMENDED METHOD).
    
    This uses the OpenAI Python library which provides a cleaner API
    and is recommended for production code.
    
    Args:
        prompt: The text prompt to send to the model
        
    Returns:
        List containing dict with 'generated_text' key (format matches original pipeline)
    """
    try:
        response = openai_client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": prompt}],
            max_tokens=1000,
            temperature=0.5
        )
        
        generated_text = response.choices[0].message.content
        
        # Return in format similar to original pipeline
        return [{'generated_text': generated_text}]
    except Exception as e:
        print(f"Error calling OpenRouter API: {e}")
        return [{'generated_text': f"Error: {str(e)}"}]

print("✓ LLaMA2 function defined (using OpenAI client)")

✓ LLaMA2 function defined (using OpenAI client)


## Alternative: Manual Requests Method (For Learning)

The code below shows how to call the API using manual HTTP requests. This is commented out but kept for educational reference.

```python
# Manual method - for learning purposes only
def LLaMA2_manual(prompt):
    """
    Call LLM via OpenRouter API using manual requests method.
    
    This uses manual HTTP requests with headers and payloads to show
    how the API works under the hood. Keep this commented out.
    """
    headers = {
        "Authorization": f"Bearer {os.getenv('OPENROUTER_API_KEY')}",
        "Content-Type": "application/json",
        "HTTP-Referer": "https://github.com/PacktPublishing/RAG-Driven-Generative-AI",
        "X-Title": "Dynamic RAG Notebook"
    }
    
    payload = {
        "model": MODEL,
        "messages": [{"role": "user", "content": prompt}],
        "max_tokens": 100,
        "temperature": 0.5,
        "top_p": 0.9
    }
    
    try:
        response = requests.post(OPENROUTER_API_URL, headers=headers, json=payload)
        response.raise_for_status()
        result = response.json()
        generated_text = result['choices'][0]['message']['content']
        return [{'generated_text': generated_text}]
    except Exception as e:
        print(f"Error: {e}")
        return [{'generated_text': f"Error: {str(e)}"}]
```

**Note:** The notebook uses the OpenAI client method above. Uncomment this code only if you want to learn about manual API calls.

# Create the augmented prompt with retrieved context
iprompt = 'Read the following input and write a summary for beginners.'
lprompt = iprompt + " " + results['documents'][0][0]

print("Context retrieved from ChromaDB:")
print("=" * 70)
print(results['documents'][0][0][:200] + "...")
print("=" * 70)

In [48]:
# Create the augmented prompt with retrieved context
iprompt = 'Read the following input and write a summary for beginners.'
lprompt = iprompt + " " + results['documents'][0][0]

print("Context retrieved from ChromaDB:")
print("=" * 70)
print(results['documents'][0][0][:200] + "...")
print("=" * 70)

Context retrieved from ChromaDB:
chloroplasts because When ancient plants underwent photosynthesis, they changed energy in sunlight to stored chemical energy in food. The plants used the food and so did the organisms that ate the pla...


In [49]:
import time
start_time = time.time()  # Start timing before the request

# Call LLM with augmented prompt
response = LLaMA2(lprompt)

# Extract generated text
for seq in response:
    generated_part = seq['generated_text']

response_time = time.time() - start_time  # Measure response time
print(f"\nResponse Time: {response_time:.2f} seconds")  # Print response time


Response Time: 1.96 seconds


In [50]:
# Display the generated summary
import textwrap

print("\n" + "=" * 70)
print("GENERATED SUMMARY:")
print("=" * 70)
wrapped_response = textwrap.fill(generated_part.strip(), width=70)
print(wrapped_response)
print("=" * 70)


GENERATED SUMMARY:
Here's a summary of the text for beginners:  **How Fossil Fuels
Formed**  A long time ago, plants used sunlight to make food through a
process called photosynthesis. When these plants and other living
things died, they were covered by layers of dirt and water. Over time,
these remains were squished together and changed into fossil fuels
like petroleum, natural gas, and coal.


---

# Summary

## Total Session Time

This measures the complete dynamic RAG process (does not include environment installation time).

In [52]:
end_time = time.time() - session_start_time  # Measure total session time
print("=" * 70)
print(f"Total Session Time: {end_time:.2f} seconds ({end_time/60:.2f} minutes)")
print("=" * 70)
print("\n✓ Dynamic RAG session complete")
print("\nSession Summary:")
print(f"  - Documents processed: {nb}")
print(f"  - Model used: {MODEL}")
print(f"  - Vector store: ChromaDB (ephemeral)")
print(f"  - Total time: {end_time/60:.2f} minutes")

Total Session Time: 2617.13 seconds (43.62 minutes)

✓ Dynamic RAG session complete

Session Summary:
  - Documents processed: 10481
  - Model used: meta-llama/llama-3.1-8b-instruct
  - Vector store: ChromaDB (ephemeral)
  - Total time: 43.62 minutes


---

# Part 5: Cleanup

## Deleting the Collection

Set `delete_collection=True` when the session is over to clean up memory.

In [None]:
# Verify collection status
collections = client.list_collections()
collection_exists = any(collection.name == collection_name for collection in collections)

print(f"Collection '{collection_name}' exists: {collection_exists}")
if collection_exists:
    print(f"Collection contains {collection.count()} documents")

Collection 'sciq_supports6' exists: True
Collection contains 10481 documents


In [56]:
delete_collection = False  # Set to True to delete collection

if delete_collection == True:
    client.delete_collection(collection_name)
    print(f"✓ Collection '{collection_name}' deleted")
else:
    print("Collection retained in memory (set delete_collection=True to clean up)")

Collection retained in memory (set delete_collection=True to clean up)
