Skip to content

muthutm1701/pdf-qa-rag-langchain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG Project

This project implements a Retrieval Augmented Generation (RAG) pipeline that allows you to ask questions about PDF documents.

Project Structure

  • rag_pipeline.py - Python script implementing the RAG pipeline
  • rag_pipeline.ipynb - Jupyter Notebook version of the RAG pipeline
  • rag_api_helper.py - Helper script for the API
  • api/ - Express API server that exposes the RAG pipeline via HTTP
  • chroma_db_meta/ - Directory containing the ChromaDB vector database (created after running the pipeline)

Getting Started

Prerequisites

  1. Python 3.10+ with pip
  2. Node.js 16+ with npm
  3. Ollama installed and running with the mistral model

Setup

  1. Create a Python virtual environment and install dependencies:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install langchain langchain-community langchain-classic chromadb pypdf ollama ipywidgets
  1. Make sure you have a PDF document named ai.pdf in the project root directory.

  2. Run the Jupyter notebook to build the RAG pipeline and create the ChromaDB vector database:

jupyter notebook rag_pipeline.ipynb
  • Run all cells in the notebook

Using the RAG Pipeline

There are three ways to use the RAG pipeline:

1. Jupyter Notebook

Open rag_pipeline.ipynb and use the interactive Q&A cells at the bottom of the notebook.

2. Python Script

Run the Python script from the command line:

python rag_pipeline.py

3. API Server

Start the Express API server:

cd api
npm install
npm start

Then query the API:

curl -X POST http://localhost:3000/ask -H "Content-Type: application/json" -d '{"question":"What is the main concept of the document?"}'

Or use the test script:

node api/test-api.js "Your question here"

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published