# RAG Demo - Advanced Features

This notebook demonstrates advanced features of the RAG system:
- Hybrid retrieval (semantic + keyword)
- Different chunking strategies
- Streaming responses
- Custom prompts

In [None]:
import sys
sys.path.append('..')

from src.pipeline.rag import RAGPipeline
from src.embeddings.factory import create_embedding_provider
from src.retrieval.factory import create_vector_store, create_retrieval_strategy
from src.ingest.chunking import RecursiveChunker, ParagraphChunker
from src.utils.config_loader import load_config

## 1. Hybrid Retrieval

Combine semantic search with keyword-based search for better results.

In [None]:
# Load config and modify for hybrid retrieval
config = load_config("../configs/config.yaml")
config["retrieval"]["strategy"] = "hybrid"
config["retrieval"]["config"]["semantic_weight"] = 0.7
config["retrieval"]["config"]["keyword_weight"] = 0.3

# Create pipeline with hybrid retrieval
pipeline = RAGPipeline(config=config)

query = "How do I automate infrastructure provisioning?"
response = pipeline.query(query)

print(f"Query: {query}\n")
print(f"Answer: {response['answer']}\n")
print(f"Retrieved {len(response['sources'])} documents using hybrid search")

## 2. Different Chunking Strategies

Compare results from different chunking strategies.

In [None]:
sample_text = """
Terraform is a powerful infrastructure as code tool. It allows you to define
infrastructure in configuration files. These files use HCL syntax.

Terraform modules help organize your code. A module is a container for multiple
resources that are used together. Modules are the main way to package and reuse
resource configurations in Terraform.

To create a module, you need three main files: variables.tf for input variables,
main.tf for resource definitions, and outputs.tf for output values.
"""

# Test different chunking strategies
strategies = {
    "Recursive": RecursiveChunker(chunk_size=100, chunk_overlap=20),
    "Paragraph": ParagraphChunker(max_chunk_size=200)
}

for name, chunker in strategies.items():
    chunks = chunker.chunk_text(sample_text)
    print(f"\n{name} Chunking: {len(chunks)} chunks")
    for i, chunk in enumerate(chunks, 1):
        print(f"  Chunk {i} ({len(chunk)} chars): {chunk[:60]}...")

## 3. Streaming Responses

Stream LLM responses token by token for better UX.

In [None]:
from IPython.display import display, clear_output
import time

pipeline = RAGPipeline(config_path="../configs/config.yaml")

query = "Explain what Ansible playbooks are used for"
print(f"Query: {query}\n")
print("Streaming response:\n")

# Stream the response
full_response = ""
for chunk in pipeline.query_stream(query):
    full_response += chunk
    print(chunk, end="", flush=True)

print("\n\nâœ“ Streaming complete!")

## 4. Adjusting Retrieval Parameters

Fine-tune retrieval for your use case.

In [None]:
# Test different top_k values
query = "How do I use infrastructure as code?"

for top_k in [1, 3, 5]:
    config = load_config("../configs/config.yaml")
    config["retrieval"]["config"]["top_k"] = top_k
    
    pipeline = RAGPipeline(config=config)
    response = pipeline.query(query)
    
    print(f"\ntop_k={top_k}: Retrieved {len(response['sources'])} documents")
    print(f"Answer length: {len(response['answer'])} characters")

## 5. Examining Retrieved Context

Inspect what documents are being retrieved.

In [None]:
pipeline = RAGPipeline(config_path="../configs/config.yaml")

query = "What tools are available for configuration management?"
response = pipeline.query(query)

print(f"Query: {query}\n")
print("Retrieved Context:\n")

for i, source in enumerate(response['sources'], 1):
    print(f"\nDocument {i}:")
    print(f"  Source: {source['metadata'].get('source', 'unknown')}")
    print(f"  Score: {source['score']:.4f}")
    print(f"  Content: {source['content'][:150]}...")

print(f"\n\nGenerated Answer:\n{response['answer']}")

## 6. Performance Comparison

Compare different retrieval strategies.

In [None]:
import time

query = "How do I deploy applications with automation?"
strategies = ["semantic", "hybrid"]

results = {}

for strategy in strategies:
    config = load_config("../configs/config.yaml")
    config["retrieval"]["strategy"] = strategy
    
    pipeline = RAGPipeline(config=config)
    
    start_time = time.time()
    response = pipeline.query(query)
    elapsed = time.time() - start_time
    
    results[strategy] = {
        "time": elapsed,
        "sources": len(response['sources']),
        "answer_length": len(response['answer'])
    }

print("Performance Comparison:\n")
for strategy, metrics in results.items():
    print(f"{strategy.capitalize()}:")
    print(f"  Time: {metrics['time']:.2f}s")
    print(f"  Sources retrieved: {metrics['sources']}")
    print(f"  Answer length: {metrics['answer_length']} chars\n")

## Next Steps

- Explore `03_custom_components.ipynb` for component customization
- Experiment with different embedding models
- Try different LLM providers (OpenAI, Anthropic)
- Tune parameters for your specific use case