##### ScholarAgent: Full Ingestion and Retrieval Test
**Objective:** This notebook provides a self-contained workflow to debug our RAG pipeline. It will:
1. Ingest the raw PDF documents.
2. Create the ChromaDB vector store.
3. Immediately test the retriever with a simple query to confirm it's working.

#### 2. Imports and Path Configuration

In [2]:
import sys
import os
from dotenv import load_dotenv

project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
if project_root not in sys.path:
    sys.path.insert(0, project_root)

#### 3. Run the ingestion pipeline
The cell below will execute the full data ingestion process:
1. Load PDFs from `data/raw`
2. Split them into chunks
3. Create embeddings and persist the vector store to `data/processed/chroma_db`"

In [3]:
from src.data_processing.ingestion import run_ingestion_pipeline
run_ingestion_pipeline()

2025-08-29 02:03:32,338 - src.data_processing.ingestion - INFO - Starting data ingestion pipeline...
2025-08-29 02:03:32,339 - src.data_processing.ingestion - INFO - Loading documents from /workspaces/scholar-agent/data/raw...
2025-08-29 02:04:03,105 - src.data_processing.ingestion - INFO - Total documents loaded: 415
2025-08-29 02:04:03,105 - src.data_processing.ingestion - INFO - Splitting documents into chunks...
2025-08-29 02:04:03,151 - src.data_processing.ingestion - INFO - Created 1707 chunks.
2025-08-29 02:04:03,152 - src.data_processing.ingestion - INFO - Creating vector store...
  embedding_model = SentenceTransformerEmbeddings(model_name=settings.EMBEDDING_MODEL_NAME)
  from .autonotebook import tqdm as notebook_tqdm
2025-08-29 02:06:25,916 - src.data_processing.ingestion - INFO - Vector store created and persisted at /workspaces/scholar-agent/data/processed/chroma_db
2025-08-29 02:06:25,917 - src.data_processing.ingestion - INFO - Data ingestion pipeline finished successful