# RAG Data Ingestion and Vector Database Pipeline

This notebook demonstrates building a complete RAG system:
1. Load PDF/text documents
2. Create embeddings with HuggingFace
3. Store in FAISS vector database
4. Perform semantic search
5. Build Q&A system with Claude

## Setup
```bash
pip install langchain langchain-anthropic langchain-community
pip install faiss-cpu sentence-transformers pypdf
```

In [None]:
# Import libraries
import os, sys
sys.path.append('..')
from Python_RAG_Agent.data_loader import load_all_documents
from Python_RAG_Agent.Embeddings import EmbeddingManager
from Python_RAG_Agent.vector_store import VectorStoreManager

In [None]:
# Load documents
documents = load_all_documents('../sample_data/pdf_files')
print(f'Loaded {len(documents)} documents')

In [None]:
# Create embeddings and vector store
embedding_manager = EmbeddingManager()
embeddings = embedding_manager.get_embeddings()
vector_manager = VectorStoreManager(embeddings)
vector_manager.create_vector_store(documents)

In [None]:
# Save vector store
vector_manager.save('../data_storage/vector_store')

In [None]:
# Test similarity search
results = vector_manager.similarity_search_with_score('What is the revenue?', k=3)
for doc, score in results:
    print(f'Score: {score:.4f}, Source: {doc.metadata}')