This project implements a Retrieval Augmented Generation (RAG) pipeline that allows you to ask questions about PDF documents.
rag_pipeline.py- Python script implementing the RAG pipelinerag_pipeline.ipynb- Jupyter Notebook version of the RAG pipelinerag_api_helper.py- Helper script for the APIapi/- Express API server that exposes the RAG pipeline via HTTPchroma_db_meta/- Directory containing the ChromaDB vector database (created after running the pipeline)
- Python 3.10+ with pip
- Node.js 16+ with npm
- Ollama installed and running with the
mistralmodel
- Create a Python virtual environment and install dependencies:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install langchain langchain-community langchain-classic chromadb pypdf ollama ipywidgets-
Make sure you have a PDF document named
ai.pdfin the project root directory. -
Run the Jupyter notebook to build the RAG pipeline and create the ChromaDB vector database:
jupyter notebook rag_pipeline.ipynb- Run all cells in the notebook
There are three ways to use the RAG pipeline:
Open rag_pipeline.ipynb and use the interactive Q&A cells at the bottom of the notebook.
Run the Python script from the command line:
python rag_pipeline.pyStart the Express API server:
cd api
npm install
npm startThen query the API:
curl -X POST http://localhost:3000/ask -H "Content-Type: application/json" -d '{"question":"What is the main concept of the document?"}'Or use the test script:
node api/test-api.js "Your question here"