A full, experiment-friendly RAG pipeline that answers questions from a Group Member Life Insurance Policy PDF. It includes three layers:
- Embedding Layer: PDF parsing, text cleaning, and multiple chunking strategies + choice of embedding model (OpenAI or SentenceTransformers).
- Search Layer: ChromaDB vector store with caching and cross-encoder re-ranking.
- Generation Layer: Robust prompt + few-shot options with OpenAI Chat (configurable).
It also produces 6 screenshots automatically:
- 3 images showing top-3 retrieved chunks (one per query).
- 3 images showing the final generated answers (one per query).