# Getting Started with HybridRAG

Welcome! This notebook will guide you through the basics of HybridRAG.

**What you'll learn:**
- Set up your environment
- Connect to MongoDB Atlas
- Ingest sample documents
- Run your first queries
- Visualize results

**Prerequisites:** MongoDB Atlas cluster + API keys in `.env`

## 1. Environment Setup

In [None]:
# Import required libraries

# HybridRAG imports
# Load environment variables
from dotenv import load_dotenv

from hybridrag import create_hybridrag
from hybridrag.config import get_settings

load_dotenv()

print("✓ Imports successful!")

## 2. Check Configuration

In [None]:
# Verify settings
settings = get_settings()

print(f"MongoDB Database: {settings.MONGODB_DATABASE}")
print(f"Embeddings Model: {settings.EMBEDDINGS_MODEL}")
print(f"LLM Model: {settings.LLM_MODEL}")
print(f"Collection: {settings.COLLECTION_NAME}")

# Check API keys (masked)
print(f"\nMongoDB URI: {'✓ Set' if settings.MONGODB_URI else '✗ Missing'}")
print(f"Voyage API Key: {'✓ Set' if settings.VOYAGE_API_KEY else '✗ Missing'}")
print(f"Anthropic API Key: {'✓ Set' if settings.ANTHROPIC_API_KEY else '✗ Missing'}")

## 3. Initialize HybridRAG

In [None]:
# Create RAG instance
rag = await create_hybridrag()

print("✓ HybridRAG initialized!")
print(f"Database: {rag.config.MONGODB_DATABASE}")
print(f"Collection: {rag.config.COLLECTION_NAME}")

## 4. Ingest Sample Documents

In [None]:
# Sample documents about MongoDB
sample_docs = [
    {
        "content": "MongoDB Atlas is a fully managed cloud database service. It provides automated backups, monitoring, and scaling.",
        "metadata": {
            "source": "mongodb_docs",
            "topic": "atlas",
            "category": "infrastructure",
        },
    },
    {
        "content": "Vector search in MongoDB Atlas allows semantic similarity searches using embeddings. It supports HNSW and IVF algorithms.",
        "metadata": {
            "source": "mongodb_docs",
            "topic": "vector_search",
            "category": "features",
        },
    },
    {
        "content": "MongoDB 8.0 introduces $rankFusion for combining vector and text search results with configurable weights.",
        "metadata": {
            "source": "release_notes",
            "topic": "hybrid_search",
            "category": "features",
        },
    },
    {
        "content": "Atlas Search provides full-text search capabilities with fuzzy matching, synonyms, and autocomplete.",
        "metadata": {
            "source": "mongodb_docs",
            "topic": "atlas_search",
            "category": "features",
        },
    },
    {
        "content": "Knowledge graphs in MongoDB represent relationships between entities using graph structures and traversal queries.",
        "metadata": {
            "source": "mongodb_docs",
            "topic": "knowledge_graph",
            "category": "features",
        },
    },
]

print(f"Ingesting {len(sample_docs)} documents...")

# Ingest documents
for idx, doc in enumerate(sample_docs, 1):
    await rag.insert(doc["content"], metadata=doc["metadata"])
    print(f"  {idx}/{len(sample_docs)} - {doc['metadata']['topic']}")

print("\n✓ Documents ingested successfully!")

## 5. Run Your First Query

In [None]:
# Simple query
query = "What is vector search in MongoDB?"

print(f"Query: {query}\n")

# Search with hybrid mode (default)
results = await rag.query(
    query=query,
    mode="hybrid",
    top_k=3,
)

print(f"Found {len(results)} results:\n")

for idx, result in enumerate(results, 1):
    print(f"Result {idx}:")
    print(f"  Content: {result.content[:100]}...")
    print(f"  Score: {result.score:.4f}")
    print(f"  Topic: {result.metadata.get('topic', 'N/A')}")
    print()

## 6. Compare Search Modes

In [None]:
query = "semantic similarity embeddings"

modes = ["vector", "keyword", "hybrid"]
all_results = {}

for mode in modes:
    results = await rag.query(query=query, mode=mode, top_k=2)
    all_results[mode] = results

    print(f"\n{mode.upper()} Mode:")
    for idx, result in enumerate(results, 1):
        print(f"  {idx}. Score: {result.score:.4f} - {result.content[:80]}...")

## 7. Query with Answer Generation

In [None]:
# Query with LLM answer generation
query = "How does MongoDB Atlas handle vector search?"

print(f"Query: {query}\n")

answer = await rag.query_with_answer(
    query=query,
    mode="hybrid",
    top_k=3,
)

print("Answer:")
print(answer)
print("\n" + "=" * 60)

## 8. Visualize Score Distribution

In [None]:
# Compare scores across modes
import matplotlib.pyplot as plt

query = "MongoDB vector search capabilities"
modes = ["vector", "keyword", "hybrid"]

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for idx, mode in enumerate(modes):
    results = await rag.query(query=query, mode=mode, top_k=5)
    scores = [r.score for r in results]
    labels = [f"Doc {i + 1}" for i in range(len(scores))]

    axes[idx].bar(
        labels,
        scores,
        color=["#4CAF50", "#2196F3", "#FF9800", "#9C27B0", "#F44336"][: len(scores)],
    )
    axes[idx].set_title(f"{mode.capitalize()} Mode")
    axes[idx].set_ylabel("Score")
    axes[idx].set_ylim(0, max(scores) * 1.2 if scores else 1)

plt.tight_layout()
plt.show()

print("\n✓ Visualization complete!")

## 9. Cleanup (Optional)

In [None]:
# Uncomment to delete test documents
# await rag.clear_collection()
# print("✓ Test collection cleared")

## Next Steps

Congratulations! You've completed the basics.

**Continue learning:**
- `02_hybrid_search_deep_dive.ipynb` - Advanced search techniques
- `03_knowledge_graph_exploration.ipynb` - Graph-based retrieval
- `04_prompt_engineering.ipynb` - Optimize prompts
- `05_performance_tuning.ipynb` - Production optimization

**Resources:**
- [HybridRAG Documentation](../README.md)
- [Examples](../examples/)
- [MongoDB Atlas Docs](https://www.mongodb.com/docs/atlas/)