# RAG Pipeline Walkthrough

This notebook demonstrates the core components of the RAG (Retrieval Augmented Generation) pipeline using the simplified `core.py` module.

## 1. Setup & Configuration

First, we load the configuration to ensure all API keys and settings are correct.

In [None]:
import sys
import os

# Add parent directory to path to import core
sys.path.append("..")

from core import Config, IngestionPipeline, get_rag_chain, ingest_files

print(f"LLM Provider: {Config.LLM_PROVIDER}")
print(f"Embedding Model: {Config.EMBEDDING_MODEL_NAME}")
print(f"Vector Store: {Config.VECTORSTORE_PATH}")

## 2. Ingestion Pipeline

The ingestion pipeline handles:
1.  **Loading**: Reading PDF/DOCX files.
2.  **Splitting**: Breaking text into manageable chunks.
3.  **Embedding**: Converting text to vector representations.
4.  **Indexing**: Storing vectors in FAISS/Chroma.

In [None]:
# Example: Ingest a sample contract if it exists
sample_file = "../sample_contract.docx"

if os.path.exists(sample_file):
    print(f"Ingesting {sample_file}...")
    vectorstore = ingest_files([sample_file])
    print("Ingestion complete.")
else:
    print("Sample file not found. Please create one or upload via UI.")

## 3. RAG Retrieval & Generation

We use the `get_rag_chain()` function to build a chain that:
1.  Takes a user question.
2.  Retrieves relevant documents from the vector store.
3.  Passes context + question to the LLM.
4.  Returns an answer with citations.

In [None]:
chain = get_rag_chain()

query = "What are the key terms of the agreement?"

# Run the chain
response = chain.invoke({"question": query})

# Display result
if isinstance(response, dict):
    print("Answer:", response.get("answer"))
    print("\nCitations:")
    for c in response.get("citations", []):
        print(f"- {c.source} (pg {c.page})")
else:
    print(response)