# SimpleVecDB RAG with Ollama - Multi-Collection Demo

This notebook demonstrates retrieval-augmented generation (RAG) using SimpleVecDB with Ollama.
Learn how to use multiple collections in a single database for organizing different knowledge domains.

## 1. Install Dependencies

Install the Ollama Python client if not already installed.

In [None]:
# Install ollama if needed
# !pip install ollama -q

## 2. Import Libraries

In [None]:
import ollama
from simplevecdb import VectorDB, Quantization
import os

## 3. Initialize Vector Database

Create a SimpleVecDB instance with BIT quantization for minimal storage overhead.
The database supports multiple collections - each collection is a separate namespace for documents.
Collections automatically use the embedding server running at `http://localhost:8000`.

In [None]:
# Clean up any existing database
db_path = "rag_demo.db"
if os.path.exists(db_path):
    os.remove(db_path)
    print(f"Removed existing database: {db_path}")

# Create new database with BIT quantization (1-bit per dimension = smallest storage)
db = VectorDB(
    path=db_path,
    quantization=Quantization.BIT,
)

# Get or create a collection for this demo
collection = db.collection("ollama_demo")

print(f"Created SimpleVecDB at {db_path}")
print(f"Collection: {collection.name}")
print(f"Quantization: {collection.quantization}")

## 4. Prepare Knowledge Base

Add documents to the vector database. These will be embedded and stored for later retrieval.

In [None]:
# Sample documents about various topics
documents = [
    "SQLite is a C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine. It is the most used database engine in the world.",
    "Vector databases enable semantic search by storing embeddings - numerical representations of text that capture meaning. This allows finding similar content even without exact keyword matches.",
    "Ollama is a tool that allows you to run large language models locally on your own computer. It supports models like Llama 3, Mistral, and many others without requiring cloud services.",
    "Retrieval-Augmented Generation (RAG) combines information retrieval with text generation. First, relevant documents are retrieved from a knowledge base, then an LLM generates answers based on that context.",
    "SimpleVecDB uses usearch HNSW indexing, enabling fast similarity searches on embedded documents stored in a single SQLite file.",
    "Quantization reduces the memory footprint of vectors by using fewer bits per dimension. BIT quantization uses only 1 bit per dimension, offering 32x compression compared to float32.",
    "Local-first AI means running models and storing data entirely on your own hardware, without cloud dependencies. This ensures privacy, reduces costs, and works offline.",
    "Python is a high-level programming language known for its simplicity and readability. It's widely used in data science, machine learning, and web development."
]

# Add documents with metadata to the collection
metadatas = [{"source": f"doc_{i}", "topic": "tech"} for i in range(len(documents))]

print(f"Adding {len(documents)} documents to collection '{collection.name}'...")
collection.add_texts(texts=documents, metadatas=metadatas)

print(f"✓ Successfully added {len(documents)} documents")
print(f"Database size: {os.path.getsize(db_path) / 1024:.2f} KB")

## 5. Test Retrieval

Search the database for documents similar to a query.

In [None]:
# Test query on the collection
test_query = "How does semantic search work?"

# Retrieve top 3 most relevant documents
results = collection.similarity_search(query=test_query, k=3)

print(f"Query: {test_query}\n")
print(f"Retrieved {len(results)} documents:\n")

for i, (doc, score) in enumerate(results, 1):
    print(f"{i}. [Score: {score:.4f}]")
    print(f"   {doc.page_content[:100]}...")
    print()

## 6. Build RAG Function

Create a function that:
1. Retrieves relevant documents from the vector database
2. Builds a context-aware prompt
3. Generates an answer using Ollama

In [None]:
def rag_query(question: str, k: int = 3, model: str = "llama3.2:3b", verbose: bool = True) -> str:
    """
    Answer a question using RAG.
    
    Args:
        question: The question to answer
        k: Number of documents to retrieve
        model: Ollama model to use
        verbose: Print retrieval details
    
    Returns:
        Generated answer
    """
    # Step 1: Retrieve relevant documents from the collection
    results = collection.similarity_search(query=question, k=k)
    
    if verbose:
        print(f"Retrieved {len(results)} documents:")
        for i, (doc, score) in enumerate(results, 1):
            print(f"  {i}. Score: {score:.4f} | {doc.page_content[:80]}...")
        print()
    
    # Step 2: Build context from retrieved documents
    context = "\n\n".join([doc.page_content for doc, _ in results])
    
    # Step 3: Create prompt with context
    prompt = f"""You are a helpful assistant. Answer the question based ONLY on the provided context. If the context doesn't contain enough information, say so.

Context:
{context}

Question: {question}

Answer:"""
    
    # Step 4: Generate answer with Ollama
    if verbose:
        print(f"Generating answer with {model}...\n")
    
    response = ollama.generate(
        model=model,
        prompt=prompt,
        options={
            "temperature": 0.1,  # Low temperature for more factual answers
            "num_predict": 256   # Limit response length
        }
    )
    
    return response['response']

print("✓ RAG function ready")

## 7. Ask Questions

Now let's test the RAG system with various questions.

In [None]:
# Question 1: About SQLite
question1 = "Why is SQLite good for AI applications?"

print(f"Question: {question1}")
print("=" * 80)
answer1 = rag_query(question1)
print(f"\nAnswer:\n{answer1}")

In [None]:
# Question 2: About local AI
question2 = "What are the benefits of running AI models locally?"

print(f"Question: {question2}")
print("=" * 80)
answer2 = rag_query(question2)
print(f"\nAnswer:\n{answer2}")

In [None]:
# Question 3: About quantization
question3 = "How does quantization help with vector databases?"

print(f"Question: {question3}")
print("=" * 80)
answer3 = rag_query(question3)
print(f"\nAnswer:\n{answer3}")

In [None]:
# Question 4: Testing with non-verbose mode
question4 = "Explain how RAG works in simple terms."

print(f"Question: {question4}")
print("=" * 80)
answer4 = rag_query(question4, verbose=False)
print(f"\nAnswer:\n{answer4}")

## 8. Interactive Query

Ask your own questions!

In [None]:
# Try your own question
my_question = "What is SimpleVecDB?"

print(f"Question: {my_question}")
print("=" * 80)
my_answer = rag_query(my_question)
print(f"\nAnswer:\n{my_answer}")

## 8.5 Multi-Collection Demo

SimpleVecDB supports multiple collections in a single database. This is useful for:
- Organizing different knowledge domains
- Isolating experiments
- Managing different embedding dimensions or quantization strategies

In [None]:
# Create a second collection in the same database for Python-related content
python_collection = db.collection("python_docs")

# Add Python-specific documents
python_docs = [
    "Python uses indentation to define code blocks instead of braces like C or Java.",
    "List comprehensions in Python provide a concise way to create lists: [x**2 for x in range(10)]",
    "Python's GIL (Global Interpreter Lock) means only one thread executes Python bytecode at a time.",
]

python_collection.add_texts(python_docs, metadatas=[{"topic": "python"} for _ in python_docs])

print(f"✓ Created second collection: {python_collection.name}")
print(f"  Documents in '{collection.name}': (check via SQL)")
print(f"  Documents in '{python_collection.name}': {len(python_docs)}")

# Query the Python collection specifically
py_results = python_collection.similarity_search("What is the GIL?", k=1)
print(f"\nQuery on '{python_collection.name}' collection:")
print(f"  {py_results[0][0].page_content}")

## 9. Database Statistics

In [None]:
# Show database statistics
import sqlite3

conn = sqlite3.connect(db_path)
cursor = conn.cursor()

# Count documents in the collection's table
cursor.execute(f"SELECT COUNT(*) FROM {collection._table_name}")
doc_count = cursor.fetchone()[0]

# Get database size
db_size_kb = os.path.getsize(db_path) / 1024

print("Database Statistics:")
print(f"  Collection: {collection.name}")
print(f"  Documents: {doc_count}")
print(f"  Embedding dimension: {collection._dim}")
print(f"  Quantization: {collection.quantization}")
print(f"  Database size: {db_size_kb:.2f} KB")
print(f"  Average size per doc: {db_size_kb/doc_count:.2f} KB")

# For comparison, show what float32 would use
if collection._dim:
    float32_size = (doc_count * collection._dim * 4) / 1024  # 4 bytes per float32
    print("\nComparison:")
    print(f"  BIT quantization: {db_size_kb:.2f} KB")
    print(f"  FLOAT32 (uncompressed): ~{float32_size:.2f} KB")
    print(f"  Compression ratio: {float32_size/db_size_kb:.1f}x")

conn.close()

## 10. Cleanup (Optional)

Remove the demo database when done.

In [None]:
# Uncomment to remove the database
# db.close()
# if os.path.exists(db_path):
#     os.remove(db_path)
#     print(f"Removed {db_path}")

print("Done! The database will persist for future use.")

## Summary

This notebook demonstrated:

1. **Setting up SimpleVecDB** with BIT quantization for minimal storage
2. **Using collections** to organize documents in namespaces
3. **Adding documents** to create a knowledge base
4. **Semantic search** using vector similarity
5. **RAG pipeline** combining retrieval + generation
6. **Local LLM** inference with Ollama
7. **Storage efficiency** through quantization (32x compression)
8. **Multi-collection support** for organizing different knowledge domains

### Key Benefits

- **Fully local**: No cloud dependencies, works offline
- **Privacy-preserving**: All data stays on your machine
- **Lightweight**: Single SQLite file, minimal dependencies
- **Fast**: Efficient vector search with quantization
- **Flexible**: Multiple collections per database, easy to customize
- **Organized**: Separate collections for different domains or experiments

### Multi-Collection Use Cases

- **Domain separation**: Tech docs in one collection, legal docs in another
- **Experimentation**: Test different embedding models per collection
- **Multi-tenant**: Isolate data for different users or projects
- **Version control**: Keep different versions of a knowledge base

### Next Steps

- Try different Ollama models (e.g., `gemma3`, `qwen3`, `deepseek-r1`)
- Experiment with different quantization levels (INT8, FLOAT)
- Create multiple collections for different knowledge domains
- Add more documents from your own data sources
- Integrate with LangChain or LlamaIndex (see other notebooks)
- Build a chat interface with conversation history