# RAG Demo - Quick Start Guide

This notebook provides a quick introduction to the RAG (Retrieval-Augmented Generation) system.

## Prerequisites

1. Qdrant running on localhost:6333
2. Ollama running on localhost:11434 with llama3 model
3. Dependencies installed: `pip install -r requirements.txt`

## 1. Setup and Imports

In [None]:
import sys
sys.path.append('..')

from src.pipeline.ingestion import IngestionPipeline
from src.pipeline.rag import RAGPipeline
from pprint import pprint

## 2. Sample Documents

Let's create some sample documents about infrastructure tools.

In [None]:
documents = [
    {
        "content": """
        Ansible is an open-source automation tool for configuration management,
        application deployment, and task automation. It uses YAML syntax for
        playbooks and is agentless, connecting via SSH. To create a playbook,
        define tasks in a YAML file with the .yml extension.
        """,
        "metadata": {"source": "ansible", "topic": "configuration_management"}
    },
    {
        "content": """
        Terraform is an infrastructure as code tool that lets you build, change,
        and version infrastructure safely and efficiently. It uses HCL
        (HashiCorp Configuration Language). To create a module, organize your
        .tf files in a directory with variables.tf, main.tf, and outputs.tf.
        """,
        "metadata": {"source": "terraform", "topic": "infrastructure_as_code"}
    },
    {
        "content": """
        Packer is a tool for creating identical machine images for multiple
        platforms from a single source configuration. It supports AWS, Azure,
        GCP, and VMware. Use JSON or HCL2 templates to define your image build.
        """,
        "metadata": {"source": "packer", "topic": "image_building"}
    }
]

print(f"Created {len(documents)} sample documents")

## 3. Ingest Documents

Ingest the documents into the vector database.

In [None]:
# Initialize ingestion pipeline
ingestion_pipeline = IngestionPipeline(config_path="../configs/config.yaml")

# Ingest documents
print("Ingesting documents...")
ingestion_pipeline.ingest(documents)
print("âœ“ Documents ingested successfully!")

## 4. Query the RAG System

Now let's ask questions about our documents.

In [None]:
# Initialize RAG pipeline
rag_pipeline = RAGPipeline(config_path="../configs/config.yaml")

# Ask a question
query = "How do I create an Ansible playbook?"
print(f"\nQuery: {query}\n")

response = rag_pipeline.query(query)

print("Answer:")
print(response["answer"])
print("\nSources:")
for i, source in enumerate(response["sources"], 1):
    print(f"{i}. {source['metadata']['source']} (score: {source['score']:.3f})")

## 5. Try More Queries

In [None]:
queries = [
    "What is Terraform used for?",
    "How do I build a machine image with Packer?",
    "What's the difference between Ansible and Terraform?"
]

for query in queries:
    print(f"\n{'='*60}")
    print(f"Query: {query}")
    print('='*60)
    
    response = rag_pipeline.query(query)
    print(f"\nAnswer: {response['answer']}")
    print(f"\nRetrieved {len(response['sources'])} relevant documents")

## 6. Query with Metadata Filters

Filter results by source or topic.

In [None]:
# Query only Terraform documents
query = "How do I create a module?"
filters = {"source": "terraform"}

print(f"Query: {query}")
print(f"Filters: {filters}\n")

response = rag_pipeline.query(query, filters=filters)
print(f"Answer: {response['answer']}")
print(f"\nSources (filtered to Terraform only):")
for source in response['sources']:
    print(f"- {source['metadata']['source']}")

## Next Steps

1. Check out `02_advanced_features.ipynb` for advanced RAG features
2. Explore `03_custom_components.ipynb` to learn about swapping components
3. Run the Streamlit app: `streamlit run ../app.py`