# 06 - Vector Database RAG

## Objectives
- Vector database setup (Chroma/FAISS)
- Job description embedding and retrieval
- RAG pipeline for context-aware questions
- Similarity search and relevance scoring

## Expected Output
RAG system for job-specific preparation

In [1]:
import os
from openai import OpenAI
from dotenv import load_dotenv

import nltk

from src.core.rag_pipeline import (
    VectorStoreConfig, setup_hybrid_vector_store,
    implement_adaptive_embeddings,
    create_semantic_chunking,
    implement_coarse_retrieval,
    create_fine_grained_filtering,
    build_metadata_fusion,
    setup_reranking_model,
    implement_listwise_reranking,
    optimize_reranking_batching,
    create_context_compression,
    implement_relevance_filtering,
    build_context_fusion,
    implement_aggressive_caching,
    optimize_retrieval_latency,
    create_fallback_strategies
)

## Phase 1: Advanced Vector Foundation

In [2]:
config = VectorStoreConfig(
    faiss_index_path="./data/faiss_index.idx",
    chroma_persist_dir="./data/chroma_db",
    embedding_dim=1536,
    n_clusters=100,
    metadata_fields=["job_role", "seniority", "domain", "company_size"]
)

faiss_index, chroma_collection = setup_hybrid_vector_store(config)

[RAG] Initializing hybrid vector store configuration
[RAG] FAISS IVF index created: 100 clusters, 1536D
[RAG] Chroma collection ready: 0 documents
[SUCCESS] Hybrid vector store configured successfully


In [3]:
load_dotenv()
openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
adaptive_embedder = implement_adaptive_embeddings(openai_client)

[RAG] Initializing adaptive embeddings system
[RAG] Fallback model loaded: BAAI/bge-large-en-v1.5
[RAG] Embedding 2 texts with adaptive strategy
[RAG] Primary embeddings completed: 0.96s, (2, 3072)
[RAG] Test embeddings shape: (2, 3072)
[SUCCESS] Adaptive embeddings system ready


In [4]:
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt')

try:
    nltk.data.find('tokenizers/punkt_tab')
except LookupError:
    nltk.download('punkt_tab')

semantic_chunker = create_semantic_chunking()

[RAG] Initializing semantic chunking system
[RAG] Semantic chunker initialized with all-MiniLM-L6-v2
[RAG] Test chunking: 5 chunks created
[RAG] Chunk 1: 195 chars
[RAG] Chunk 2: 256 chars
[RAG] Chunk 3: 224 chars
[RAG] Chunk 4: 193 chars
[RAG] Chunk 5: 184 chars
[SUCCESS] Semantic chunking system ready


## Phase 2: Hierarchical Retrieval Pipeline

In [5]:
coarse_retriever = implement_coarse_retrieval(adaptive_embedder)

[RAG] Initializing coarse retrieval system
[RAG] Embedding 1 texts with adaptive strategy
[RAG] Primary embeddings completed: 0.35s, (1, 3072)
[RAG] Detected embedding dimension: 3072
[RAG] Index created: 3072D Flat index (will upgrade to IVF)
[RAG] Coarse retriever initialized: 0 vectors, 3072D
[RAG] Embedding 4 texts with adaptive strategy
[RAG] Primary embeddings completed: 0.45s, (4, 3072)
[RAG] Added 4 documents, total: 4
[RAG] Embedding 1 texts with adaptive strategy
[RAG] Primary embeddings completed: 0.33s, (1, 3072)
[RAG] Coarse retrieval: 3 candidates in 0.019s
[RAG] Test query results: 3 matches
[RAG] Match 1: score=0.656, doc='Senior Python Developer with machine learning expe...'
[RAG] Match 2: score=0.719, doc='Data Scientist position open for predictive analyt...'
[RAG] Match 3: score=1.250, doc='DevOps Engineer with Kubernetes and AWS experience...'
[SUCCESS] Coarse retrieval system ready


In [6]:
fine_grained_filter = create_fine_grained_filtering(adaptive_embedder)

[RAG] Initializing fine-grained filtering system
[RAG] Fine-grained filter initialized: cosine similarity
[RAG] Fine-grained filtering: 3 candidates
[RAG] Embedding 4 texts with adaptive strategy
[RAG] Primary embeddings completed: 0.58s, (4, 3072)
[RAG] Fine-grained filtering completed: 2 refined in 0.585s
[RAG] Test refinement: 2 documents refined
[RAG] Refined 1: idx=0, score=1.000, doc='Senior Python Developer with machine lea...'
[RAG] Refined 2: idx=2, score=0.915, doc='Data Scientist position open for predict...'
[SUCCESS] Fine-grained filtering system ready


In [7]:
metadata_fusion = build_metadata_fusion(coarse_retriever)

[RAG] Initializing metadata fusion system
[RAG] Metadata fusion initialized: sim=0.6, role=0.2
[RAG] Metadata fusion: processing 3 candidates
[RAG] Query metadata: {'role': 'senior', 'domain': 'ai', 'skills': ['python', 'ml']}
[RAG] Rank 1: idx=0, fused=1.310 (sim=0.850, role=1.000, domain=1.000, skills=1.000)
[RAG] Rank 2: idx=2, fused=0.900 (sim=0.680, role=1.000, domain=0.500, skills=0.333)
[RAG] Rank 3: idx=1, fused=0.587 (sim=0.720, role=0.700, domain=0.100, skills=0.000)
[RAG] Test fusion: 3 documents reranked
[RAG] Original order: [0, 1, 2]
[RAG] Fused order: [0, 2, 1]
[RAG] Score improvement: 0.460
[SUCCESS] Metadata fusion system ready


## Phase 3: Cross-Encoder Reranking

In [8]:
cross_encoder_reranker = setup_reranking_model()

[RAG] Initializing cross-encoder reranking system
[RAG] Model loaded successfully: cross-encoder/ms-marco-MiniLM-L-6-v2
[RAG] Cross-encoder reranker initialized: cross-encoder/ms-marco-MiniLM-L-6-v2 on mps
[RAG] Cross-encoder reranking: 3 passages
[RAG] Cross-encoder reranking completed: 1 results in 0.038s
[RAG] Rerank 1: idx=0, score=5.595
[RAG] Test reranking: 1 passages reranked
[RAG] Original order: [0, 1, 2]
[RAG] Reranked order: [0]
[RAG] Direct scoring test: 2 pairs scored
[SUCCESS] Cross-encoder reranking system ready


In [9]:
listwise_reranker = implement_listwise_reranking(cross_encoder_reranker)

[RAG] Initializing listwise reranking system
[RAG] Listwise reranker initialized: max_candidates=20, final_k=5
[RAG] Listwise reranking: 6 candidates -> top 5
[RAG] Listwise reranking completed: 5 results in 0.093s
[RAG] Listwise 1: idx=0, score=0.919
[RAG] Listwise 2: idx=5, score=1.185
[RAG] Listwise 3: idx=4, score=2.232
[RAG] Listwise 4: idx=2, score=3.359
[RAG] Listwise 5: idx=1, score=4.044
[RAG] Test listwise reranking: 5 final selections
[RAG] Original candidates: 6
[RAG] Final selection: [0, 5, 4, 2, 1]
[SUCCESS] Listwise reranking system ready


In [10]:
optimized_batch_processor = optimize_reranking_batching(cross_encoder_reranker)

[RAG] Initializing optimized batch processing system
[RAG] MPS optimization: moderate batching with threading
[RAG] Batch processor initialized: device=mps, batch_size=8
[RAG] Sequential batch processing: 8 pairs
[RAG] Batch 0: 8 pairs processed in 0.033s
[RAG] Sequential processing completed: 8 scores in 0.033s
[RAG] Test batch processing: 8 scores computed
[RAG] Score range: -11.351 to 5.595
[RAG] Final batch size: 8
[SUCCESS] Optimized batch processing system ready


## Phase 4: Context Optimization

In [11]:
context_compressor = create_context_compression(openai_client)

[RAG] Initializing context compression system
[RAG] Context compressor initialized: model=gpt-4.1-nano, ratio=0.4
[RAG] Compressing 3 contexts for query relevance
[RAG] Compression completed: 148 -> 362 tokens (2.45 ratio) in 4.378s
[RAG] Test compression: 791 -> 2180 chars
[RAG] Compressing 1 contexts for query relevance
[RAG] Compression completed: 55 -> 16 tokens (0.29 ratio) in 0.883s
[RAG] Compressing 1 contexts for query relevance
[RAG] Compression completed: 51 -> 16 tokens (0.31 ratio) in 0.478s
[RAG] Compressing 1 contexts for query relevance
[RAG] Compression completed: 42 -> 16 tokens (0.38 ratio) in 0.522s
[RAG] Overlap compression: ratio=0.324, groups=3
[SUCCESS] Context compression system ready


In [12]:
relevance_filter = implement_relevance_filtering()

[RAG] Initializing relevance filtering system
[RAG] Relevance filter initialized: method=adaptive, min_results=1
[RAG] Filtering 7 documents by relevance
[RAG] Using threshold: 0.323 (method: adaptive)
[RAG] Applied diversity bonus: avg_diversity=0.826
[RAG] Filtering completed: 7 -> 4 documents
[RAG] Score range: 0.736 to 1.031
[RAG] Test filtering: 7 -> 4 documents
[RAG] Filtered indices: [0, 3, 1, 6]
[RAG] Filtering 5 documents by relevance
[RAG] Using threshold: 0.275 (method: adaptive)
[RAG] Filtering completed: 5 -> 1 documents
[RAG] Score range: 0.300 to 0.300
[RAG] Low score test: 5 -> 1 documents
[RAG] Threshold stats: adaptive=0.275
[SUCCESS] Relevance filtering system ready


In [13]:
context_fusion = build_context_fusion()

[RAG] Initializing context fusion system
[RAG] Context fusion initialized: strategy=hierarchical, overlap_threshold=0.3
[RAG] Fusing 4 contexts using hierarchical strategy
[RAG] Context fusion completed: 4 contexts -> 807 chars (compression: 1.004)
[RAG] Average overlap detected: 0.075
[RAG] Test fusion results:
[RAG] Original contexts: 4
[RAG] Compression ratio: 1.004
[RAG] Average overlap: 0.075
[RAG] Final text length: 807 characters
[RAG] Fusing 3 contexts using hierarchical strategy
[RAG] Context fusion completed: 3 contexts -> 172 chars (compression: 1.012)
[RAG] Average overlap detected: 0.023
[RAG] Diverse contexts test: overlap=0.023, compression=1.012
[SUCCESS] Context fusion system ready


## Phase 5: Production Optimization

In [14]:
semantic_cache = implement_aggressive_caching()

[RAG] Initializing semantic caching system
[RAG] Semantic cache initialized: threshold=0.95, ttl=3600s
[RAG] Result cached for query
[RAG] Semantic cache test: no match found (as expected for dissimilar queries)
[SUCCESS] Semantic caching system ready


In [15]:
latency_optimizer = optimize_retrieval_latency(
    adaptive_embedder, coarse_retriever, fine_grained_filter, semantic_cache
)

[RAG] Initializing retrieval latency optimization
[RAG] Latency optimizer initialized: target=100.0ms
[RAG] Warming up cache with 4 queries
[RAG] Embedding 1 texts with adaptive strategy
[RAG] Primary embeddings completed: 0.35s, (1, 3072)
[RAG] Embedding 1 texts with adaptive strategy
[RAG] Primary embeddings completed: 0.76s, (1, 3072)
[RAG] Embedding 1 texts with adaptive strategy
[RAG] Primary embeddings completed: 0.53s, (1, 3072)
[RAG] Embedding 1 texts with adaptive strategy
[RAG] Primary embeddings completed: 0.34s, (1, 3072)
[RAG] Cache warmup completed
[RAG] Testing latency optimization...
[RAG] Embedding 1 texts with adaptive strategy
[RAG] Primary embeddings completed: 0.57s, (1, 3072)
[RAG] Embedding 1 texts with adaptive strategy
[RAG] Primary embeddings completed: 0.52s, (1, 3072)
[RAG] Coarse retrieval: 4 candidates in 0.000s
[RAG] Skipped fine-grained filtering: budget=-421.8ms
[RAG] Pipeline latency: 1090.4ms (target: 100.0ms)
[RAG] Query latency: 1090.4ms, results: 4

In [16]:
fallback_strategies = create_fallback_strategies()

[RAG] Initializing fallback strategies system
[RAG] Fallback strategies initialized: max_wait=500.0ms
[RAG] Attempting primary retrieval
[RAG] Primary retrieval successful: 1.3ms
[RAG] Test 1 - Normal operation: 1 results via cached, 1.3ms
[RAG] Attempting primary retrieval
[RAG] Primary retrieval failed: Vector store unavailable
[RAG] Using cached fallback results
[RAG] Using cached fallback: 0.3ms
[RAG] Test 2 - With failure: 1 results via cached, 0.3ms
[RAG] Attempting primary retrieval
[RAG] Primary retrieval failed: Vector store unavailable
[RAG] Using cached fallback results
[RAG] Using cached fallback: 0.7ms
[RAG] Test 3 - Cached fallback: 1 results via cached, 0.7ms
[RAG] Attempting primary retrieval
[RAG] Primary retrieval failed: Vector store unavailable
[RAG] Using emergency response fallback
[RAG] Test 4 - Emergency response: 3 results via emergency, 0.2ms
[RAG] System status: 1 cached entries, healthy: True
[SUCCESS] Fallback strategies system ready
